Maximilian Paß
a7d27e8f65
Add missing error log statements.
...
When "markRunnerAsUsed" fails, we silently ignored it. Only, when additionally the return of the runner failed, we threw the error.
When a Runner is destroyed, we are only notified that Nomad removed the allocation, but cannot tell about the reason.
For "the execution did not stop after SIGQUIT" we did not log the belonging runner id.
2023-08-21 22:40:37 +02:00
Maximilian Paß
13cd19ed58
Refactor Nomad Event Stream log message.
2023-08-18 09:28:23 +02:00
Maximilian Paß
73759f8a3c
Retry Environment Recovery
2023-08-18 09:28:23 +02:00
Maximilian Paß
eb818f92f7
Refactor Runner Destroy Reason Masking
...
and ignore expected reasons such when the runner got destroyed by an API request.
2023-07-24 11:48:14 +01:00
Maximilian Paß
6a1677dea0
Introduce reason for destroying runner
...
in order to return a specific error for OOM Killed Executions.
2023-07-21 15:30:21 +02:00
Maximilian Paß
bfb5977d24
Destroy runner on allocation stopped
...
Destroying the runner when Nomad informs us about its allocation being stopped, fixes the error of executions running into their timeout even if the allocation was stopped long ago.
2023-07-21 15:30:21 +02:00
Maximilian Paß
e7df777db4
Always log Runner and Environment ID.
...
Systematically log the runner id and the environment id by adding the information at the findRunnerMiddleware.
2023-07-15 21:46:56 +02:00
Maximilian Paß
527aaf713f
Fix decreased prewarming pool due to inactivity timer.
...
When allocations fail and restart they are added again to the idle runners. The bug fixed with this commit is that the inactivity timer was not stopped at the restart. This led to the idle runner being removed when the timer expired.
2023-06-16 17:27:45 +01:00
Maximilian Paß
f031219cb8
Fix Nomad event race condition
...
that was triggered by simultaneous deletion of the runner due to inactivity, and the allocation being rescheduled due to a lost node.
It led to the allocation first being rescheduled, and then being stopped. This caused an unexpected stopping of a pending runner on a lower level.
To fix it we added communication from the upper level that the stop of the job was expected.
2023-06-13 14:20:20 +02:00
Maximilian Paß
b620d0fad7
Introduce Allocation State Tracking
...
in order to break down the current state and evaluate if it is invalid.
2023-06-13 14:20:20 +02:00
Maximilian Paß
8f89c14ea1
Cleanup logs for Allocation recovery
...
on startup. The changes do not have functional consequences as adding the allocation just overwrites the old one.
2023-05-10 18:56:51 +01:00
Maximilian Paß
0c8fa9ccfa
Add context to log statements.
2023-04-11 20:45:30 +01:00
Maximilian Paß
038d71ff51
Nomad: Handle Container re-allocation
2023-03-31 14:42:55 +02:00
Maximilian Paß
e0db1bafe8
Fix multiple user Runner use
...
A before unknown Nomad reload adds already known runner again to the idle runner - even if they are already in use.
2023-03-31 14:42:55 +02:00
Maximilian Paß
a78ee22e67
Reduce time racetrack of delete and listFileSystem route.
2023-01-02 11:23:02 +01:00
Maximilian Paß
160df3d9e6
Add retry-mechanism for sample, mark-as-used and return
...
of Nomad runners.
2022-10-24 22:12:09 +01:00
Maximilian Paß
9677253b35
Change Influx field name for the startup duration
...
due to a currently not resolvable type mismatch.
2022-08-10 20:46:17 +02:00
Maximilian Paß
89e15c5c2f
Fix startup time format
...
Before it was a string. To use it efficiently we want it to be a number - in this case in nanoseconds.
2022-08-05 21:16:58 +02:00
Maximilian Paß
c6e65c14bb
Monitor Nomad allocation startup duration.
2022-07-31 19:42:35 +02:00
Maximilian Paß
34040162c2
#89 Generalise the three Storage interfaces and structs into one generic storage manager.
2022-06-29 16:21:19 +02:00
Maximilian Paß
b7a20e3114
Introduce method "Environment" to the Runners interface.
...
This way we can relate to which environment a runner belongs.
2022-04-18 13:17:49 +02:00
Maximilian Paß
136f596dc2
Add aws environments to the statistics
...
but only with the field usedRunners.
2022-04-09 16:35:53 +02:00
Maximilian Paß
6123d20525
Implement core functionality of AWS integration
2022-02-28 14:54:40 +01:00
Maximilian Paß
dd41e0d5c4
Generate structures for an AWS environment and runner
2022-02-28 14:54:40 +01:00
Maximilian Paß
0ef5a4e39f
Make Execution Environment interface Nomad independent
2022-02-28 14:54:40 +01:00
Maximilian Paß
ba43f667c2
Add architecture for multiple managers
...
using the chain of responsibility pattern.
2022-02-28 14:54:40 +01:00