bfb5977d24
Destroy runner on allocation stopped
...
Destroying the runner when Nomad informs us about its allocation being stopped, fixes the error of executions running into their timeout even if the allocation was stopped long ago.
2023-07-21 15:30:21 +02:00
40a5f2eca6
Insert debug message into execution tracing
...
to verify that the date command is sometimes returning an empty string with exit code 5.
2023-07-21 15:05:53 +02:00
d64d8995bd
Refactor monitoring of runner and environment id.
2023-07-15 21:46:56 +02:00
e7df777db4
Always log Runner and Environment ID.
...
Systematically log the runner id and the environment id by adding the information at the findRunnerMiddleware.
2023-07-15 21:46:56 +02:00
527aaf713f
Fix decreased prewarming pool due to inactivity timer.
...
When allocations fail and restart they are added again to the idle runners. The bug fixed with this commit is that the inactivity timer was not stopped at the restart. This led to the idle runner being removed when the timer expired.
2023-06-16 17:27:45 +01:00
f031219cb8
Fix Nomad event race condition
...
that was triggered by simultaneous deletion of the runner due to inactivity, and the allocation being rescheduled due to a lost node.
It led to the allocation first being rescheduled, and then being stopped. This caused an unexpected stopping of a pending runner on a lower level.
To fix it we added communication from the upper level that the stop of the job was expected.
2023-06-13 14:20:20 +02:00
b620d0fad7
Introduce Allocation State Tracking
...
in order to break down the current state and evaluate if it is invalid.
2023-06-13 14:20:20 +02:00
bcab46d746
Allow unlimited Nomad reschedules
...
With this measure, we want to avoid template jobs being removed on the second rescheduling.
2023-06-13 14:20:20 +02:00
f7339570ae
Fix increased prewarming pool size
...
by checking the number of required runners before creating an additional runner.
2023-05-28 23:47:07 +01:00
1061b15c3e
Fix Influx monitoring by renaming the time tag.
2023-05-12 18:36:34 +01:00
bbc15d9b71
Monitor Job events
...
and add time to Nomad event monitoring.
2023-05-12 16:35:30 +01:00
9300a82535
Fix missing idle runners.
...
In the context of #358 we identified that the event with the type `AllocationUpdated` and the client status `pending` is common but not always send by Nomad.
With this Commit we remove the condition that limits the evaluated Nomad events to the event with the type `AllocationUpdated`. Without the condition the event of the type `PlanResult` and the status `pending` will be evaluated equally. By now, this event seems to be sent every time.
This restriction led to started allocation not being registered when the `AllocationUpdated` event with client status `pending` was missing.
2023-05-12 16:25:43 +01:00
f377b1376c
Add Client Status to Nomad Allocation monitoring
...
Also add the Nomad Node name as additional debug information.
2023-05-10 19:09:31 +01:00
8f89c14ea1
Cleanup logs for Allocation recovery
...
on startup. The changes do not have functional consequences as adding the allocation just overwrites the old one.
2023-05-10 18:56:51 +01:00
5a147c4985
Add debug statements for allocation event handling
2023-05-10 18:56:51 +01:00
42efebc194
Monitor the Nomad events
...
and send all Nomad events to Influxdb.
2023-05-09 00:13:58 +01:00
d8d9abbddd
Add Job ID to Nomad Allocation monitoring.
2023-04-23 12:54:57 +01:00
801e4f489e
Synchronize Sentry debug message handling.
2023-04-11 20:58:57 +01:00
2aa10a130f
Introduce context for the codeOceanOutputWriter
...
that represents its lifespan.
2023-04-11 20:45:30 +01:00
0c8fa9ccfa
Add context to log statements.
2023-04-11 20:45:30 +01:00
a720553dd1
Fix missing Runner-Delete events.
2023-04-01 19:27:09 +02:00
8950ce29d8
Recover Runner Allocations on startup.
2023-04-01 19:27:09 +02:00
038d71ff51
Nomad: Handle Container re-allocation
2023-03-31 14:42:55 +02:00
e0db1bafe8
Fix multiple user Runner use
...
A before unknown Nomad reload adds already known runner again to the idle runner - even if they are already in use.
2023-03-31 14:42:55 +02:00
c3e5afaad0
Fix Concurrent Map Write
...
when handling the Sentry Debug Messages asynchronously.
2023-03-22 10:36:38 +00:00
e877cd1e52
Rename Sentry Span Descriptions.
2023-03-14 23:42:19 +01:00
e0419c2e58
Fix Sentry Debug Regex
...
that was ignoring composed messages including a newline.
Also, add regression test.
2023-03-14 23:42:19 +01:00
6e069f5d8a
Fix Nomad Exit Code
...
Due to the wrapping of the command, the exit code could not have been retrieved correct anymore.
2023-03-14 23:42:19 +01:00
7dadc5dfe9
Refactor Nomad Command Generation.
...
- Abstracting from the exec form while generating.
- Removal of single quotes (usage of only double-quotes).
- Bash-nesting using escaping of special characters.
2023-03-14 23:42:19 +01:00
f309d0f70e
Ensure sending of the Sentry End debug message.
2023-03-14 23:42:19 +01:00
4fb6ab980b
Implement merge request comments.
2023-03-14 23:42:19 +01:00
cc0c425197
Add Sentry Spans for Bash execution.
2023-03-14 23:42:19 +01:00
1a378ce640
Enable profiler and profile-guided builds
...
I used the chance to simplify the Makefile, as this is required for the file check to work correctly. Variables should not contain quotes, as these will be included in the value otherwise.
2023-02-28 01:14:05 +01:00
4550a4589e
Dangerous Context Enrichment
...
by passing the Sentry Context down our abstraction stack.
This included changes in the complex context management of managing a Command Execution.
2023-02-03 10:29:18 +00:00
2650efbb38
Sentry Tracing Identifier
2023-02-03 10:29:18 +00:00
a9581ac1d9
Performance for ListFileSystem
2023-02-03 10:29:18 +00:00
f2c205a8ed
Add additional performance spans
2023-02-03 10:29:18 +00:00
0d3c474acc
Enrich error message.
2023-01-02 11:23:02 +01:00
a78ee22e67
Reduce time racetrack of delete and listFileSystem route.
2023-01-02 11:23:02 +01:00
0c6c48c3cf
#190 Add unit tests for runner recovery.
2022-11-26 13:33:44 +00:00
81d777c9cb
Increase minimal memory usage
...
as we collected new insights about the actual memory usage.
2022-11-09 23:19:25 +01:00
8950ab3776
Add single quotes for inner command.
...
Change to bash as interpreter.
Forbid single quotes for user commands.
2022-11-04 15:15:43 +01:00
4c25473c9e
Hide Nomad specific environment variables
...
from the user environment.
2022-11-04 15:15:43 +01:00
b3eee17846
Support protected directories
...
by setting the sticky bit to all explicitly requested directories.
2022-10-29 19:11:05 +02:00
acb4d24c45
Change loglevel for context cancellation to DEBUG
2022-10-26 16:18:35 +02:00
5e5e13806e
Monitor file download.
2022-10-26 01:33:26 +02:00
28fb0ca61c
Catch context canceled error
2022-10-25 09:36:52 +02:00
160df3d9e6
Add retry-mechanism for sample, mark-as-used and return
...
of Nomad runners.
2022-10-24 22:12:09 +01:00
b9c923da8a
Remove unused and deprecated Storer interface.
2022-10-24 22:12:09 +01:00
7119f3e012
Fix not canceling monitoring events for removed environments
...
and runners.
2022-10-24 13:15:14 +02:00