Commit Graph

29 Commits

Author SHA1 Message Date
19e0ae1583 Fix concurrent map write
in the Nomad `evaluations` map by replacing the simple map with our concurrency-ready storage object.
2024-04-17 13:19:49 +02:00
c820ff99e6 Fix flaky TestWithSeparateStderr. 2023-11-16 12:10:57 +01:00
90d591d4ec Change default behavior in Nomad Event Handling
to not propagate that pending runners are being stopped.
2023-09-18 00:54:26 +02:00
2eb15c8d93 Fix loosing of rescheduled runners
that are rescheduled while the previous allocation was still pending.
We fix this by removing the race condition handling that should prevent Poseidon from throwing warnings of unexpected allocation stopping.
2023-09-18 00:54:26 +02:00
788cb0f660 Add regression test for the recent lost runners. 2023-09-18 00:54:26 +02:00
68cd8f43b4 Defuse data race condition of TestWithSeparateStderrReturnsCommandError. 2023-09-11 13:44:29 +02:00
3abd4d9a3d Refactor all tests to use the MemoryLeakTestSuite. 2023-09-11 13:44:29 +02:00
6a1677dea0 Introduce reason for destroying runner
in order to return a specific error for OOM Killed Executions.
2023-07-21 15:30:21 +02:00
f031219cb8 Fix Nomad event race condition
that was triggered by simultaneous deletion of the runner due to inactivity, and the allocation being rescheduled due to a lost node.
It led to the allocation first being rescheduled, and then being stopped. This caused an unexpected stopping of a pending runner on a lower level.
To fix it we added communication from the upper level that the stop of the job was expected.
2023-06-13 14:20:20 +02:00
b620d0fad7 Introduce Allocation State Tracking
in order to break down the current state and evaluate if it is invalid.
2023-06-13 14:20:20 +02:00
9300a82535 Fix missing idle runners.
In the context of #358 we identified that the event with the type `AllocationUpdated` and the client status `pending` is common but not always send by Nomad.

With this Commit we remove the condition that limits the evaluated Nomad events to the event with the type `AllocationUpdated`. Without the condition the event of the type `PlanResult` and the status `pending` will be evaluated equally. By now, this event seems to be sent every time.

This restriction led to started allocation not being registered when the `AllocationUpdated` event with client status `pending` was missing.
2023-05-12 16:25:43 +01:00
8950ce29d8 Recover Runner Allocations on startup. 2023-04-01 19:27:09 +02:00
038d71ff51 Nomad: Handle Container re-allocation 2023-03-31 14:42:55 +02:00
7dadc5dfe9 Refactor Nomad Command Generation.
- Abstracting from the exec form while generating.
- Removal of single quotes (usage of only double-quotes).
- Bash-nesting using escaping of special characters.
2023-03-14 23:42:19 +01:00
cc0c425197 Add Sentry Spans for Bash execution. 2023-03-14 23:42:19 +01:00
4550a4589e Dangerous Context Enrichment
by passing the Sentry Context down our abstraction stack.
This included changes in the complex context management of managing a Command Execution.
2023-02-03 10:29:18 +00:00
8950ab3776 Add single quotes for inner command.
Change to bash as interpreter.
Forbid single quotes for user commands.
2022-11-04 15:15:43 +01:00
1a5a49d7c8 Explicitly switch user for code execution.
Co-authored-by: Maximilian Pass <maximilian.pass@student.hpi.uni-potsdam.de>
2022-09-24 23:09:23 +02:00
89fc7b2637 Fix Nomad event stream is ignoring errors
when an event stream could be established once.
2022-09-07 21:16:20 +02:00
c6e65c14bb Monitor Nomad allocation startup duration. 2022-07-31 19:42:35 +02:00
251129aa74 Modify filter for runners that should deleted
Only "dead" jobs are now not requested to be deleted. Before also pending and starting runners are ignored.
2021-12-22 17:30:16 +01:00
ac6ce56c38 Remove flaky test case 2021-11-10 13:11:38 +01:00
fff67246d6 Infinite busy waiting for lost event (#31)
* Close evaluation stream for Nomad Job creation
 when set event handler have been finished

* Remove evaluation event stream requests
by handling the events via the main Nomad event handler.
2021-11-10 09:57:40 +01:00
34d4bb7ea0 Implement routes to list, get and delete execution environments
* #9 Implement routes to list, get and delete execution environments.
A refactoring was required to introduce the ExecutionEnvironment interface.

* Fix MR comments, linting issues and bug that lead to e2e test failure

* Add e2e tests

* Add unit tests
2021-10-21 10:33:52 +02:00
c8c5357b8c Rename module for GitHub 2021-07-30 16:43:05 +02:00
6a60b6cd89 Add config option to enable (m)TLS between Poseidon and Nomad 2021-07-29 09:43:21 +00:00
8d24bda61a Send SIGQUIT when cancelling an execution
When the context passed to Nomad Allocation Exec is cancelled, the
process is not terminated. Instead, just the WebSocket connection is
closed. In order to terminate long-running processes, a special
character is injected into the standard input stream. This character is
parsed by the tty line discipline (tty has to be true). The line
discipline sends a SIGQUIT signal to the process, terminating it and
producing a core dump (in a file called 'core'). The SIGQUIT signal can
be caught but isn't by default, which is why the runner is destroyed if
the program does not terminate during a grace period after the signal
was sent.
2021-07-29 10:28:47 +02:00
3aa1227db6 Use authentication token from config for communication with Nomad 2021-07-27 11:35:55 +00:00
8b26ecbe5f Restructure project
We previously didn't really had any structure in our project apart
from creating a new folder for each package in our project root.
Now that we have accumulated some packages, we use the well-known
Golang project layout in order to clearly communicate our intent
with packages. See https://github.com/golang-standards/project-layout
2021-07-21 12:55:35 +02:00