Commit Graph

23 Commits

Author SHA1 Message Date
eb818f92f7 Refactor Runner Destroy Reason Masking
and ignore expected reasons such when the runner got destroyed by an API request.
2023-07-24 11:48:14 +01:00
6a1677dea0 Introduce reason for destroying runner
in order to return a specific error for OOM Killed Executions.
2023-07-21 15:30:21 +02:00
bfb5977d24 Destroy runner on allocation stopped
Destroying the runner when Nomad informs us about its allocation being stopped, fixes the error of executions running into their timeout even if the allocation was stopped long ago.
2023-07-21 15:30:21 +02:00
e7df777db4 Always log Runner and Environment ID.
Systematically log the runner id and the environment id by adding the information at the findRunnerMiddleware.
2023-07-15 21:46:56 +02:00
527aaf713f Fix decreased prewarming pool due to inactivity timer.
When allocations fail and restart they are added again to the idle runners. The bug fixed with this commit is that the inactivity timer was not stopped at the restart. This led to the idle runner being removed when the timer expired.
2023-06-16 17:27:45 +01:00
f031219cb8 Fix Nomad event race condition
that was triggered by simultaneous deletion of the runner due to inactivity, and the allocation being rescheduled due to a lost node.
It led to the allocation first being rescheduled, and then being stopped. This caused an unexpected stopping of a pending runner on a lower level.
To fix it we added communication from the upper level that the stop of the job was expected.
2023-06-13 14:20:20 +02:00
b620d0fad7 Introduce Allocation State Tracking
in order to break down the current state and evaluate if it is invalid.
2023-06-13 14:20:20 +02:00
8f89c14ea1 Cleanup logs for Allocation recovery
on startup. The changes do not have functional consequences as adding the allocation just overwrites the old one.
2023-05-10 18:56:51 +01:00
0c8fa9ccfa Add context to log statements. 2023-04-11 20:45:30 +01:00
038d71ff51 Nomad: Handle Container re-allocation 2023-03-31 14:42:55 +02:00
e0db1bafe8 Fix multiple user Runner use
A before unknown Nomad reload adds already known runner again to the idle runner - even if they are already in use.
2023-03-31 14:42:55 +02:00
a78ee22e67 Reduce time racetrack of delete and listFileSystem route. 2023-01-02 11:23:02 +01:00
160df3d9e6 Add retry-mechanism for sample, mark-as-used and return
of Nomad runners.
2022-10-24 22:12:09 +01:00
9677253b35 Change Influx field name for the startup duration
due to a currently not resolvable type mismatch.
2022-08-10 20:46:17 +02:00
89e15c5c2f Fix startup time format
Before it was a string. To use it efficiently we want it to be a number - in this case in nanoseconds.
2022-08-05 21:16:58 +02:00
c6e65c14bb Monitor Nomad allocation startup duration. 2022-07-31 19:42:35 +02:00
34040162c2 #89 Generalise the three Storage interfaces and structs into one generic storage manager. 2022-06-29 16:21:19 +02:00
b7a20e3114 Introduce method "Environment" to the Runners interface.
This way we can relate to which environment a runner belongs.
2022-04-18 13:17:49 +02:00
136f596dc2 Add aws environments to the statistics
but only with the field usedRunners.
2022-04-09 16:35:53 +02:00
6123d20525 Implement core functionality of AWS integration 2022-02-28 14:54:40 +01:00
dd41e0d5c4 Generate structures for an AWS environment and runner 2022-02-28 14:54:40 +01:00
0ef5a4e39f Make Execution Environment interface Nomad independent 2022-02-28 14:54:40 +01:00
ba43f667c2 Add architecture for multiple managers
using the chain of responsibility pattern.
2022-02-28 14:54:40 +01:00