Commit Graph

51 Commits

Author SHA1 Message Date
Maximilian Paß
1061b15c3e Fix Influx monitoring by renaming the time tag. 2023-05-12 18:36:34 +01:00
Maximilian Paß
bbc15d9b71 Monitor Job events
and add time to Nomad event monitoring.
2023-05-12 16:35:30 +01:00
Maximilian Paß
9300a82535 Fix missing idle runners.
In the context of #358 we identified that the event with the type `AllocationUpdated` and the client status `pending` is common but not always send by Nomad.

With this Commit we remove the condition that limits the evaluated Nomad events to the event with the type `AllocationUpdated`. Without the condition the event of the type `PlanResult` and the status `pending` will be evaluated equally. By now, this event seems to be sent every time.

This restriction led to started allocation not being registered when the `AllocationUpdated` event with client status `pending` was missing.
2023-05-12 16:25:43 +01:00
Maximilian Paß
f377b1376c Add Client Status to Nomad Allocation monitoring
Also add the Nomad Node name as additional debug information.
2023-05-10 19:09:31 +01:00
Maximilian Paß
8f89c14ea1 Cleanup logs for Allocation recovery
on startup. The changes do not have functional consequences as adding the allocation just overwrites the old one.
2023-05-10 18:56:51 +01:00
Maximilian Paß
5a147c4985 Add debug statements for allocation event handling 2023-05-10 18:56:51 +01:00
Maximilian Paß
42efebc194 Monitor the Nomad events
and send all Nomad events to Influxdb.
2023-05-09 00:13:58 +01:00
Maximilian Paß
d8d9abbddd Add Job ID to Nomad Allocation monitoring. 2023-04-23 12:54:57 +01:00
Maximilian Paß
801e4f489e Synchronize Sentry debug message handling. 2023-04-11 20:58:57 +01:00
Maximilian Paß
0c8fa9ccfa Add context to log statements. 2023-04-11 20:45:30 +01:00
Maximilian Paß
a720553dd1 Fix missing Runner-Delete events. 2023-04-01 19:27:09 +02:00
Maximilian Paß
8950ce29d8 Recover Runner Allocations on startup. 2023-04-01 19:27:09 +02:00
Maximilian Paß
038d71ff51 Nomad: Handle Container re-allocation 2023-03-31 14:42:55 +02:00
Maximilian Paß
c3e5afaad0 Fix Concurrent Map Write
when handling the Sentry Debug Messages asynchronously.
2023-03-22 10:36:38 +00:00
Maximilian Paß
e877cd1e52 Rename Sentry Span Descriptions. 2023-03-14 23:42:19 +01:00
Maximilian Paß
e0419c2e58 Fix Sentry Debug Regex
that was ignoring composed messages including a newline.
Also, add regression test.
2023-03-14 23:42:19 +01:00
Maximilian Paß
6e069f5d8a Fix Nomad Exit Code
Due to the wrapping of the command, the exit code could not have been retrieved correct anymore.
2023-03-14 23:42:19 +01:00
Maximilian Paß
7dadc5dfe9 Refactor Nomad Command Generation.
- Abstracting from the exec form while generating.
- Removal of single quotes (usage of only double-quotes).
- Bash-nesting using escaping of special characters.
2023-03-14 23:42:19 +01:00
Maximilian Paß
f309d0f70e Ensure sending of the Sentry End debug message. 2023-03-14 23:42:19 +01:00
Maximilian Paß
4fb6ab980b Implement merge request comments. 2023-03-14 23:42:19 +01:00
Maximilian Paß
cc0c425197 Add Sentry Spans for Bash execution. 2023-03-14 23:42:19 +01:00
Maximilian Paß
4550a4589e Dangerous Context Enrichment
by passing the Sentry Context down our abstraction stack.
This included changes in the complex context management of managing a Command Execution.
2023-02-03 10:29:18 +00:00
Maximilian Paß
0d3c474acc Enrich error message. 2023-01-02 11:23:02 +01:00
Maximilian Paß
8950ab3776 Add single quotes for inner command.
Change to bash as interpreter.
Forbid single quotes for user commands.
2022-11-04 15:15:43 +01:00
Maximilian Paß
4c25473c9e Hide Nomad specific environment variables
from the user environment.
2022-11-04 15:15:43 +01:00
Sebastian Serth
acb4d24c45 Change loglevel for context cancellation to DEBUG 2022-10-26 16:18:35 +02:00
Maximilian Paß
28fb0ca61c Catch context canceled error 2022-10-25 09:36:52 +02:00
Sebastian Serth
1a5a49d7c8 Explicitly switch user for code execution.
Co-authored-by: Maximilian Pass <maximilian.pass@student.hpi.uni-potsdam.de>
2022-09-24 23:09:23 +02:00
Sebastian Serth
7454e577e4 Allow using a local Docker image, e.g., for tests 2022-09-24 23:09:23 +02:00
Maximilian Paß
89fc7b2637 Fix Nomad event stream is ignoring errors
when an event stream could be established once.
2022-09-07 21:16:20 +02:00
Maximilian Paß
c6e65c14bb Monitor Nomad allocation startup duration. 2022-07-31 19:42:35 +02:00
Maximilian Paß
1239699e74 Add a warning when allocations fail (#83)
* Log a warning when an allocation fails

* Restructure allocation event handling
2021-12-23 13:10:55 +01:00
Maximilian Paß
c22b76720c Add documentation for guarding the Nomad tasks 2021-12-22 17:30:16 +01:00
Maximilian Paß
251129aa74 Modify filter for runners that should deleted
Only "dead" jobs are now not requested to be deleted. Before also pending and starting runners are ignored.
2021-12-22 17:30:16 +01:00
Maximilian Paß
d57a0c07b8 Implement review suggestions 2021-12-22 17:30:16 +01:00
Maximilian Paß
9f0b04660f Fix goroutine leak in the nullio reader 2021-12-14 13:24:53 +01:00
Maximilian Paß
9cd81930e9 Add API Querier test 2021-12-10 11:30:56 +01:00
Sebastian Serth
ebbbfdb9be Unwrap Nomad error for allocation exec
* This will allow us to inspect whether the websocket connection was closed normally
2021-12-10 10:01:31 +01:00
Maximilian Paß
dce895faff Move the error handler to the api querier
to catch the ws normal close error for all Execute requests
2021-12-09 19:12:20 +01:00
Maximilian Paß
825ebdd3e6 Add forcePull option
* Add forcePull option
for pulling the image when the execution environment gets updated

* Apply suggestions from code review

Co-authored-by: Sebastian Serth <MrSerth@users.noreply.github.com>

* Add unit tests

* Clean up and implement option two

Co-authored-by: Sebastian Serth <MrSerth@users.noreply.github.com>
2021-12-09 14:54:14 +01:00
Maximilian Paß
af939b7810 Catch the "Close normal" error 2021-12-09 13:05:18 +01:00
Maximilian Paß
ac6ce56c38 Remove flaky test case 2021-11-10 13:11:38 +01:00
Maximilian Paß
fff67246d6 Infinite busy waiting for lost event (#31)
* Close evaluation stream for Nomad Job creation
 when set event handler have been finished

* Remove evaluation event stream requests
by handling the events via the main Nomad event handler.
2021-11-10 09:57:40 +01:00
Maximilian Paß
4db1ceb41e Fix Bug with the runner recovery
that the runners of the environment 10 are also recovered for the environment 1.
2021-10-22 16:24:55 +02:00
Maximilian Paß
34d4bb7ea0 Implement routes to list, get and delete execution environments
* #9 Implement routes to list, get and delete execution environments.
A refactoring was required to introduce the ExecutionEnvironment interface.

* Fix MR comments, linting issues and bug that lead to e2e test failure

* Add e2e tests

* Add unit tests
2021-10-21 10:33:52 +02:00
sirkrypt0
9b106f4cd8 Fix linting issues
An update of golangci-lint yielded new linting issues. This commit
fixes them.
2021-08-05 13:40:48 +02:00
Maximilian Paß
c8c5357b8c Rename module for GitHub 2021-07-30 16:43:05 +02:00
Jan-Eric Hellenberg
6a60b6cd89 Add config option to enable (m)TLS between Poseidon and Nomad 2021-07-29 09:43:21 +00:00
Konrad Hanff
8d24bda61a Send SIGQUIT when cancelling an execution
When the context passed to Nomad Allocation Exec is cancelled, the
process is not terminated. Instead, just the WebSocket connection is
closed. In order to terminate long-running processes, a special
character is injected into the standard input stream. This character is
parsed by the tty line discipline (tty has to be true). The line
discipline sends a SIGQUIT signal to the process, terminating it and
producing a core dump (in a file called 'core'). The SIGQUIT signal can
be caught but isn't by default, which is why the runner is destroyed if
the program does not terminate during a grace period after the signal
was sent.
2021-07-29 10:28:47 +02:00
Jan-Eric Hellenberg
3aa1227db6 Use authentication token from config for communication with Nomad 2021-07-27 11:35:55 +00:00