Commit Graph

21 Commits

Author SHA1 Message Date
Maximilian Paß
e3a8d202ac Adjust Influxdb buffering
as we have experienced silent package drops. This issue is not fixed, it is just made less probable.
2023-12-03 01:27:49 +01:00
Maximilian Paß
ab12c9046d Decrease Log Severity
of errors trying to read the request body.
2023-11-22 19:14:42 +01:00
Maximilian Paß
e7df777db4 Always log Runner and Environment ID.
Systematically log the runner id and the environment id by adding the information at the findRunnerMiddleware.
2023-07-15 21:46:56 +02:00
Maximilian Paß
0bfef5e105 Degrade InfluxDB Retry Write log. 2023-07-14 18:54:57 +02:00
Maximilian Paß
f377b1376c Add Client Status to Nomad Allocation monitoring
Also add the Nomad Node name as additional debug information.
2023-05-10 19:09:31 +01:00
Maximilian Paß
42efebc194 Monitor the Nomad events
and send all Nomad events to Influxdb.
2023-05-09 00:13:58 +01:00
Maximilian Paß
d8d9abbddd Add Job ID to Nomad Allocation monitoring. 2023-04-23 12:54:57 +01:00
Maximilian Paß
0c8fa9ccfa Add context to log statements. 2023-04-11 20:45:30 +01:00
Maximilian Paß
038d71ff51 Nomad: Handle Container re-allocation 2023-03-31 14:42:55 +02:00
Maximilian Paß
e0db1bafe8 Fix multiple user Runner use
A before unknown Nomad reload adds already known runner again to the idle runner - even if they are already in use.
2023-03-31 14:42:55 +02:00
Maximilian Paß
a4599f2cf9 Fix panic on influx shutdown.
Influx was shutdown before Poseidon was terminated. In that mean time the Profiling data has been written. Also in that mean time, a periodical influx event triggers a panic since influx is already shutdown.

We implemented two changes, each fixing this scenario.
2023-03-13 15:21:24 +01:00
Sebastian Serth
aa9d4d30e2 Actual retry sending InfluxDB data
Previously, we always logged the error on first failure and (nevertheless) tried to send the data within 3 minutes (default configuration).

Fixes POSEIDON-1H
Closes #262
2023-02-28 23:47:35 +01:00
Maximilian Paß
5e5e13806e Monitor file download. 2022-10-26 01:33:26 +02:00
Maximilian Paß
89fc7b2637 Fix Nomad event stream is ignoring errors
when an event stream could be established once.
2022-09-07 21:16:20 +02:00
Maximilian Paß
9677253b35 Change Influx field name for the startup duration
due to a currently not resolvable type mismatch.
2022-08-10 20:46:17 +02:00
Maximilian Paß
c6e65c14bb Monitor Nomad allocation startup duration. 2022-07-31 19:42:35 +02:00
Maximilian Paß
49c7a2d405 Save the runner and environment id for executions monitoring. 2022-07-31 19:42:35 +02:00
Maximilian Paß
d9b7989a6c Enable logging for failed monitoring. 2022-07-01 15:29:31 +02:00
Maximilian Paß
498e8f5ff5 #110 Refactor influxdb monitoring
to use it as singleton.
This enables the possibility to monitor processes that are independent of an incoming request.
2022-07-01 15:29:31 +02:00
Maximilian Paß
a4d13fb8cb #148 Add stage to influx monitoring. 2022-06-21 15:31:29 +02:00
Maximilian Paß
25f92e5f94 Add environment specific data to the influxdb data. 2022-04-18 13:17:49 +02:00