Maximilian Paß
08c3a3d53d
Decouple InfluxDB writings from request handling.
...
With #451 , we found that writing an InfluxDB data point might block and lead to high latencies.
2024-01-28 10:57:01 +01:00
Maximilian Paß
e3a8d202ac
Adjust Influxdb buffering
...
as we have experienced silent package drops. This issue is not fixed, it is just made less probable.
2023-12-03 01:27:49 +01:00
Maximilian Paß
ab12c9046d
Decrease Log Severity
...
of errors trying to read the request body.
2023-11-22 19:14:42 +01:00
Maximilian Paß
e7df777db4
Always log Runner and Environment ID.
...
Systematically log the runner id and the environment id by adding the information at the findRunnerMiddleware.
2023-07-15 21:46:56 +02:00
Maximilian Paß
0bfef5e105
Degrade InfluxDB Retry Write log.
2023-07-14 18:54:57 +02:00
Maximilian Paß
f377b1376c
Add Client Status to Nomad Allocation monitoring
...
Also add the Nomad Node name as additional debug information.
2023-05-10 19:09:31 +01:00
Maximilian Paß
42efebc194
Monitor the Nomad events
...
and send all Nomad events to Influxdb.
2023-05-09 00:13:58 +01:00
Maximilian Paß
d8d9abbddd
Add Job ID to Nomad Allocation monitoring.
2023-04-23 12:54:57 +01:00
Maximilian Paß
0c8fa9ccfa
Add context to log statements.
2023-04-11 20:45:30 +01:00
Maximilian Paß
038d71ff51
Nomad: Handle Container re-allocation
2023-03-31 14:42:55 +02:00
Maximilian Paß
e0db1bafe8
Fix multiple user Runner use
...
A before unknown Nomad reload adds already known runner again to the idle runner - even if they are already in use.
2023-03-31 14:42:55 +02:00
Maximilian Paß
a4599f2cf9
Fix panic on influx shutdown.
...
Influx was shutdown before Poseidon was terminated. In that mean time the Profiling data has been written. Also in that mean time, a periodical influx event triggers a panic since influx is already shutdown.
We implemented two changes, each fixing this scenario.
2023-03-13 15:21:24 +01:00
Sebastian Serth
aa9d4d30e2
Actual retry sending InfluxDB data
...
Previously, we always logged the error on first failure and (nevertheless) tried to send the data within 3 minutes (default configuration).
Fixes POSEIDON-1H
Closes #262
2023-02-28 23:47:35 +01:00
Maximilian Paß
5e5e13806e
Monitor file download.
2022-10-26 01:33:26 +02:00
Maximilian Paß
89fc7b2637
Fix Nomad event stream is ignoring errors
...
when an event stream could be established once.
2022-09-07 21:16:20 +02:00
Maximilian Paß
9677253b35
Change Influx field name for the startup duration
...
due to a currently not resolvable type mismatch.
2022-08-10 20:46:17 +02:00
Maximilian Paß
c6e65c14bb
Monitor Nomad allocation startup duration.
2022-07-31 19:42:35 +02:00
Maximilian Paß
49c7a2d405
Save the runner and environment id for executions monitoring.
2022-07-31 19:42:35 +02:00
Maximilian Paß
d9b7989a6c
Enable logging for failed monitoring.
2022-07-01 15:29:31 +02:00
Maximilian Paß
498e8f5ff5
#110 Refactor influxdb monitoring
...
to use it as singleton.
This enables the possibility to monitor processes that are independent of an incoming request.
2022-07-01 15:29:31 +02:00
Maximilian Paß
a4d13fb8cb
#148 Add stage to influx monitoring.
2022-06-21 15:31:29 +02:00
Maximilian Paß
25f92e5f94
Add environment specific data to the influxdb data.
2022-04-18 13:17:49 +02:00