poseidon

Author	SHA1	Message	Date
Maximilian Paß	9646542499	Inject Execution Debug Message for measuring the performance of the until loop of the stderr connection.	2023-11-23 16:17:30 +01:00
Maximilian Paß	64412e1c4b	Revert "Inject Execution Debug Message" This reverts commit `04a2d0ff3b`.	2023-11-23 15:27:31 +01:00
Maximilian Paß	04a2d0ff3b	Inject Execution Debug Message for measuring the performance of the until loop of the stderr connection.	2023-11-23 15:25:23 +01:00
Maximilian Paß	cb08787c7d	Rephrase Evaluation channel log statement.	2023-11-23 13:22:13 +01:00
Maximilian Paß	c820ff99e6	Fix flaky TestWithSeparateStderr.	2023-11-16 12:10:57 +01:00
Maximilian Paß	543939e5cb	Add independent environment reload in the case that the prewarming pool is depleting (see PrewarmingPoolThreshold) and is still depleting after a timeout (PrewarmingPoolReloadTimeout).	2023-11-09 13:11:39 +01:00
Maximilian Paß	6b69a2d732	Refactor Nomad Recovery from an approach that loaded the runners only once at the startup to a method that will be repeated i.e. if the Nomad Event Stream connection interrupts.	2023-10-31 15:49:56 +01:00
Maximilian Paß	90d591d4ec	Change default behavior in Nomad Event Handling to not propagate that pending runners are being stopped.	2023-09-18 00:54:26 +02:00
Maximilian Paß	2eb15c8d93	Fix loosing of rescheduled runners that are rescheduled while the previous allocation was still pending. We fix this by removing the race condition handling that should prevent Poseidon from throwing warnings of unexpected allocation stopping.	2023-09-18 00:54:26 +02:00
Maximilian Paß	788cb0f660	Add regression test for the recent lost runners.	2023-09-18 00:54:26 +02:00
Maximilian Paß	68cd8f43b4	Defuse data race condition of TestWithSeparateStderrReturnsCommandError.	2023-09-11 13:44:29 +02:00
Maximilian Paß	3abd4d9a3d	Refactor all tests to use the MemoryLeakTestSuite.	2023-09-11 13:44:29 +02:00
Maximilian Paß	354c16cc37	Fix missing rescheduled idle runners. In today's unattended upgrade, we have seen how the prewarming pool size dropped to (near) zero. This was based on lost Nomad allocations. The allocations got rescheduled, but not added again to Poseidon. The reason for this is a miscommunication between the Event Handling and the Nomad Manager. `removedByPoseidon` was true even if the runner was not removed by the manager, but an idle runner.	2023-09-05 15:15:39 +02:00
Maximilian Paß	8820938624	Increase severity of two log statements.	2023-09-05 15:15:39 +02:00
Maximilian Paß	90092c48c1	Fix incomplete debug message that is created by sending SIGQUIT to the bash process by not processing output after the the client disconnected / we have sent the SIGQUIT.	2023-08-14 11:37:51 +02:00
Maximilian Paß	4d661138e9	Revert "Insert debug message into execution tracing" This reverts commit 72d926ef6c5e9f8ddd0da39dbd1492dad3621c15.	2023-08-14 11:37:51 +02:00
Maximilian Paß	6a1677dea0	Introduce reason for destroying runner in order to return a specific error for OOM Killed Executions.	2023-07-21 15:30:21 +02:00
Maximilian Paß	40a5f2eca6	Insert debug message into execution tracing to verify that the date command is sometimes returning an empty string with exit code 5.	2023-07-21 15:05:53 +02:00
Maximilian Paß	e7df777db4	Always log Runner and Environment ID. Systematically log the runner id and the environment id by adding the information at the findRunnerMiddleware.	2023-07-15 21:46:56 +02:00
Maximilian Paß	f031219cb8	Fix Nomad event race condition that was triggered by simultaneous deletion of the runner due to inactivity, and the allocation being rescheduled due to a lost node. It led to the allocation first being rescheduled, and then being stopped. This caused an unexpected stopping of a pending runner on a lower level. To fix it we added communication from the upper level that the stop of the job was expected.	2023-06-13 14:20:20 +02:00
Maximilian Paß	b620d0fad7	Introduce Allocation State Tracking in order to break down the current state and evaluate if it is invalid.	2023-06-13 14:20:20 +02:00
Maximilian Paß	1061b15c3e	Fix Influx monitoring by renaming the time tag.	2023-05-12 18:36:34 +01:00
Maximilian Paß	bbc15d9b71	Monitor Job events and add time to Nomad event monitoring.	2023-05-12 16:35:30 +01:00
Maximilian Paß	9300a82535	Fix missing idle runners. In the context of #358 we identified that the event with the type `AllocationUpdated` and the client status `pending` is common but not always send by Nomad. With this Commit we remove the condition that limits the evaluated Nomad events to the event with the type `AllocationUpdated`. Without the condition the event of the type `PlanResult` and the status `pending` will be evaluated equally. By now, this event seems to be sent every time. This restriction led to started allocation not being registered when the `AllocationUpdated` event with client status `pending` was missing.	2023-05-12 16:25:43 +01:00
Maximilian Paß	f377b1376c	Add Client Status to Nomad Allocation monitoring Also add the Nomad Node name as additional debug information.	2023-05-10 19:09:31 +01:00
Maximilian Paß	8f89c14ea1	Cleanup logs for Allocation recovery on startup. The changes do not have functional consequences as adding the allocation just overwrites the old one.	2023-05-10 18:56:51 +01:00
Maximilian Paß	5a147c4985	Add debug statements for allocation event handling	2023-05-10 18:56:51 +01:00
Maximilian Paß	42efebc194	Monitor the Nomad events and send all Nomad events to Influxdb.	2023-05-09 00:13:58 +01:00
Maximilian Paß	d8d9abbddd	Add Job ID to Nomad Allocation monitoring.	2023-04-23 12:54:57 +01:00
Maximilian Paß	801e4f489e	Synchronize Sentry debug message handling.	2023-04-11 20:58:57 +01:00
Maximilian Paß	0c8fa9ccfa	Add context to log statements.	2023-04-11 20:45:30 +01:00
Maximilian Paß	a720553dd1	Fix missing Runner-Delete events.	2023-04-01 19:27:09 +02:00
Maximilian Paß	8950ce29d8	Recover Runner Allocations on startup.	2023-04-01 19:27:09 +02:00
Maximilian Paß	038d71ff51	Nomad: Handle Container re-allocation	2023-03-31 14:42:55 +02:00
Maximilian Paß	c3e5afaad0	Fix Concurrent Map Write when handling the Sentry Debug Messages asynchronously.	2023-03-22 10:36:38 +00:00
Maximilian Paß	e877cd1e52	Rename Sentry Span Descriptions.	2023-03-14 23:42:19 +01:00
Maximilian Paß	e0419c2e58	Fix Sentry Debug Regex that was ignoring composed messages including a newline. Also, add regression test.	2023-03-14 23:42:19 +01:00
Maximilian Paß	6e069f5d8a	Fix Nomad Exit Code Due to the wrapping of the command, the exit code could not have been retrieved correct anymore.	2023-03-14 23:42:19 +01:00
Maximilian Paß	7dadc5dfe9	Refactor Nomad Command Generation. - Abstracting from the exec form while generating. - Removal of single quotes (usage of only double-quotes). - Bash-nesting using escaping of special characters.	2023-03-14 23:42:19 +01:00
Maximilian Paß	f309d0f70e	Ensure sending of the Sentry End debug message.	2023-03-14 23:42:19 +01:00
Maximilian Paß	4fb6ab980b	Implement merge request comments.	2023-03-14 23:42:19 +01:00
Maximilian Paß	cc0c425197	Add Sentry Spans for Bash execution.	2023-03-14 23:42:19 +01:00
Maximilian Paß	4550a4589e	Dangerous Context Enrichment by passing the Sentry Context down our abstraction stack. This included changes in the complex context management of managing a Command Execution.	2023-02-03 10:29:18 +00:00
Maximilian Paß	0d3c474acc	Enrich error message.	2023-01-02 11:23:02 +01:00
Maximilian Paß	8950ab3776	Add single quotes for inner command. Change to bash as interpreter. Forbid single quotes for user commands.	2022-11-04 15:15:43 +01:00
Maximilian Paß	4c25473c9e	Hide Nomad specific environment variables from the user environment.	2022-11-04 15:15:43 +01:00
Sebastian Serth	acb4d24c45	Change loglevel for context cancellation to DEBUG	2022-10-26 16:18:35 +02:00
Maximilian Paß	28fb0ca61c	Catch context canceled error	2022-10-25 09:36:52 +02:00
Sebastian Serth	1a5a49d7c8	Explicitly switch user for code execution. Co-authored-by: Maximilian Pass <maximilian.pass@student.hpi.uni-potsdam.de>	2022-09-24 23:09:23 +02:00
Sebastian Serth	7454e577e4	Allow using a local Docker image, e.g., for tests	2022-09-24 23:09:23 +02:00

1 2

72 Commits