poseidon

Author	SHA1	Message	Date
Elmar Kresse	12ff205bd2	added k8s stub adapter for execution environment	2024-09-18 10:43:38 +02:00
Maximilian Paß	895dd8879f	Revert "Debug HTTPLoggingMiddleware latency." This reverts commit `ae86b1c261`.	2024-02-06 19:34:45 +00:00
Maximilian Paß	08c3a3d53d	Decouple InfluxDB writings from request handling. With #451, we found that writing an InfluxDB data point might block and lead to high latencies.	2024-01-28 10:57:01 +01:00
Maximilian Paß	ae86b1c261	Debug HTTPLoggingMiddleware latency.	2024-01-26 22:51:55 +01:00
Maximilian Paß	57590457a8	Add logging filter token The token is used to filter out request logs when the user agent matches a randomly generated string.	2024-01-24 17:21:00 +01:00
Maximilian Paß	e3a8d202ac	Adjust Influxdb buffering as we have experienced silent package drops. This issue is not fixed, it is just made less probable.	2023-12-03 01:27:49 +01:00
Maximilian Paß	c9922e2539	Decrease Log severity of failing requests because it's likely that another error with more information has already been reported.	2023-11-30 16:44:22 +01:00
Maximilian Paß	ab12c9046d	Decrease Log Severity of errors trying to read the request body.	2023-11-22 19:14:42 +01:00
Maximilian Paß	70c108aebf	Unify the representation of the three dots.	2023-11-09 13:11:39 +01:00
Maximilian Paß	c46a09eeae	Add Prewarming Pool Alert that checks for every environment if the filled share of the prewarmin pool is at least the specified threshold.	2023-11-09 13:11:39 +01:00
Maximilian Paß	d0dd5c08cb	Remove usage of context.DeadlineExceeded for internal decisions as this error is strongly used by other packages. By checking such wrapped errors the internal decision can be influenced accidentally. In this case the retry mechanism checked if the error is context.DeadlineExceeded and assumed it would be created by the internal context. This assumption was wrong.	2023-10-31 15:49:56 +01:00
Maximilian Paß	6b69a2d732	Refactor Nomad Recovery from an approach that loaded the runners only once at the startup to a method that will be repeated i.e. if the Nomad Event Stream connection interrupts.	2023-10-31 15:49:56 +01:00
Maximilian Paß	3abd4d9a3d	Refactor all tests to use the MemoryLeakTestSuite.	2023-09-11 13:44:29 +02:00
Maximilian Paß	e3161637a9	Extract the WatchEventStream retry mechanism into the utils including all other retry mechanisms. With this change we fix that the WatchEventStream goroutine does not stop directly when the context is done (but previously only one second after).	2023-09-11 13:44:29 +02:00
Maximilian Paß	b28b87d56f	Refactor periodicallySendMonitoringData in order to return directly when the context is done and not just at the next iteration.	2023-09-11 13:44:29 +02:00
Maximilian Paß	188d012bc4	Fix Memory Leak caused by the merge_context. The now removed statement of sending an empty struct into the channel blocked the goroutine until the channel of Done got listened for. This led to a goroutine leak as one does not necessarily has to call the Done function of a context. We fix this issue by removing this value. It was unnecessary anyway as a closed channel always returns the null-value of the returned type.	2023-08-26 22:51:22 +02:00
Maximilian Paß	09604997a7	Implement MergeContext that has multiple contexts as parent and chooses the earliest deadline.	2023-08-21 22:49:09 +02:00
Maximilian Paß	306512bf9c	Fix Context Values are not logged. Only the Sentry hook uses the values of the passed context. Therefore, we removed the values from our log statements when we shifted them from an extra `WithField` call to the context. We fix this behavior by introducing a Logrus Hook that copies a fixed set of context values to the logging data.	2023-08-21 22:40:37 +02:00
Maximilian Paß	13a9da95e5	Introduce a context for RetryExponential as second criteria (next to the maximum number of attempts) for canceling the retrying. This is required as we started with the previous commit to retry the nomad environment recovery. This always fails for unit tests (as they are not connected to an Nomad cluster). Before, we ignored the one error but the retrying leads to unit test timeouts. Additionally, we now stop retrying to create a runner when the environment got deleted.	2023-08-18 09:28:23 +02:00
Maximilian Paß	73759f8a3c	Retry Environment Recovery	2023-08-18 09:28:23 +02:00
Maximilian Paß	0fd6e42487	Add regression e2e test for incomplete debug message. See #325.	2023-08-14 11:37:51 +02:00
Maximilian Paß	731b60acd6	Remove Sentry Exceptions as workaround for having a usable title for the issue groups (not the error type).	2023-07-25 21:07:02 +01:00
Maximilian Paß	75f2f9b290	Add Sentry Stack Traces and exceptions for logs containing errors.	2023-07-25 21:07:02 +01:00
Maximilian Paß	ee26cf13e5	Sentry: Make runner and environment searchable by converting it into a Sentry Tag. Also, replace the unstructured Extra attribute by using a Sentry Context.	2023-07-15 21:46:56 +02:00
Maximilian Paß	e7df777db4	Always log Runner and Environment ID. Systematically log the runner id and the environment id by adding the information at the findRunnerMiddleware.	2023-07-15 21:46:56 +02:00
Maximilian Paß	0bfef5e105	Degrade InfluxDB Retry Write log.	2023-07-14 18:54:57 +02:00
Maximilian Paß	5b64725faa	Fix golangci-lint errors that appeared due to the new version v1.53.1.	2023-06-04 11:54:42 +01:00
Maximilian Paß	f377b1376c	Add Client Status to Nomad Allocation monitoring Also add the Nomad Node name as additional debug information.	2023-05-10 19:09:31 +01:00
Maximilian Paß	42efebc194	Monitor the Nomad events and send all Nomad events to Influxdb.	2023-05-09 00:13:58 +01:00
Maximilian Paß	d8d9abbddd	Add Job ID to Nomad Allocation monitoring.	2023-04-23 12:54:57 +01:00
Maximilian Paß	0c8fa9ccfa	Add context to log statements.	2023-04-11 20:45:30 +01:00
Maximilian Paß	43221c717e	Add context to Sentry Hook. With this context, tracing information stored in the context can be associated with sentry events/issues.	2023-04-11 20:45:30 +01:00
Maximilian Paß	038d71ff51	Nomad: Handle Container re-allocation	2023-03-31 14:42:55 +02:00
Maximilian Paß	e0db1bafe8	Fix multiple user Runner use A before unknown Nomad reload adds already known runner again to the idle runner - even if they are already in use.	2023-03-31 14:42:55 +02:00
Maximilian Paß	e877cd1e52	Rename Sentry Span Descriptions.	2023-03-14 23:42:19 +01:00
Maximilian Paß	7dadc5dfe9	Refactor Nomad Command Generation. - Abstracting from the exec form while generating. - Removal of single quotes (usage of only double-quotes). - Bash-nesting using escaping of special characters.	2023-03-14 23:42:19 +01:00
Maximilian Paß	a4599f2cf9	Fix panic on influx shutdown. Influx was shutdown before Poseidon was terminated. In that mean time the Profiling data has been written. Also in that mean time, a periodical influx event triggers a panic since influx is already shutdown. We implemented two changes, each fixing this scenario.	2023-03-13 15:21:24 +01:00
Sebastian Serth	aa9d4d30e2	Actual retry sending InfluxDB data Previously, we always logged the error on first failure and (nevertheless) tried to send the data within 3 minutes (default configuration). Fixes POSEIDON-1H Closes #262	2023-02-28 23:47:35 +01:00
Maximilian Paß	2650efbb38	Sentry Tracing Identifier	2023-02-03 10:29:18 +00:00
Maximilian Paß	a9581ac1d9	Performance for ListFileSystem	2023-02-03 10:29:18 +00:00
Maximilian Paß	8950ab3776	Add single quotes for inner command. Change to bash as interpreter. Forbid single quotes for user commands.	2022-11-04 15:15:43 +01:00
Maximilian Paß	5e5e13806e	Monitor file download.	2022-10-26 01:33:26 +02:00
Maximilian Paß	160df3d9e6	Add retry-mechanism for sample, mark-as-used and return of Nomad runners.	2022-10-24 22:12:09 +01:00
Maximilian Paß	b9c923da8a	Remove unused and deprecated Storer interface.	2022-10-24 22:12:09 +01:00
Maximilian Paß	7119f3e012	Fix not canceling monitoring events for removed environments and runners.	2022-10-24 13:15:14 +02:00
Maximilian Paß	3509109b6f	Fix Ls2JsonWriter by allowing more spaces in the ls response. by sending the error response of the list file system route only when no content has been written.	2022-10-05 12:11:47 +01:00
Maximilian Paß	195f88177e	Add Content-Length and Content-Disposition Header for GetFileContent route.	2022-10-05 12:11:47 +01:00
Maximilian Paß	847e5cda65	Extend ls2json reader by also parsing the link target, permissions, group and owner.	2022-10-05 12:11:47 +01:00
Maximilian Paß	fc77f11d4d	Enquote file path for shell execution. Also, fix json of 500 response.	2022-10-05 12:11:47 +01:00
Maximilian Paß	152b77afe5	Add listing of runners file system.	2022-10-05 12:11:47 +01:00

1 2

89 Commits