Commit Graph

39 Commits

Author SHA1 Message Date
ec3b2a93db Fix Golangci-lint configuration 2024-05-07 14:48:17 +02:00
39d25d2223 Replace IPv6 unspecified address for watchdog health checks 2024-04-02 17:44:10 +02:00
ab938bfc22 Refactor MemoryLeakTestSuite
as we identified two issues where the goroutine count from before differs from after the test.

1) It seemed like a Go runtime specific Goroutine appeared in rare cases before the test. To avoid this, we introduced a short timeout before looking up the Goroutines.
Another solution might be to do the lookup twice and check if the count matches.

2) A Goroutine that periodically monitors some storage unexpectedly got closed in rare cases. As we could not identify the cause for this, we removed the leaking Goroutines by properly cleaning up.
2024-02-28 11:52:51 +01:00
57590457a8 Add logging filter token
The token is used to filter out request logs when the user agent matches a randomly generated string.
2024-01-24 17:21:00 +01:00
221a6ff1b2 Watchdog: Verify Server TLS Certificate 2024-01-24 17:21:00 +01:00
b48c7fe8b6 Configure Systemd Watchdog
that monitors the reachability of Poseidon and automatically restarts Poseidon if required.
2024-01-24 17:21:00 +01:00
511b873e16 Configure Systemd Socket Activation
as new way for Poseidon to accept connections. This should reduce our issues caused by deployments.
2024-01-15 16:05:35 +00:00
eaddc65989 Configure Systemd Socket Activation
as new way for Poseidon to accept connections. This should reduce our issues caused by deployments.
2024-01-15 16:05:35 +00:00
eaa022282c Block Webserver during first Nomad recovery.
No requests are accepted while Poseidon is recovering Nomad environments and runners.
2024-01-15 16:05:35 +00:00
8292b073e3 Remove deprecated syscall dependency. 2023-11-23 16:06:26 +01:00
70c108aebf Unify the representation of the three dots. 2023-11-09 13:11:39 +01:00
6b69a2d732 Refactor Nomad Recovery
from an approach that loaded the runners only once at the startup
to a method that will be repeated i.e. if the Nomad Event Stream connection interrupts.
2023-10-31 15:49:56 +01:00
14b012486d Formalize Memory Monitoring
by extracting the interval and threshold into configuration options.

Related to f670b07e.
2023-10-12 16:16:46 +02:00
f670b07ea7 Introduce debug memory monitoring
that alerts and prints additional debug information in case that Poseidon exceeds 1 GB of memory usage.
2023-09-19 22:31:06 +02:00
3abd4d9a3d Refactor all tests to use the MemoryLeakTestSuite. 2023-09-11 13:44:29 +02:00
0d6b4f660c Refactor NewAbstractManager
to require a context used for the monitoring.
2023-09-11 13:44:29 +02:00
13a9da95e5 Introduce a context for RetryExponential
as second criteria (next to the maximum number of attempts) for canceling the retrying. This is required as we started with the previous commit to retry the nomad environment recovery. This always fails for unit tests (as they are not connected to an Nomad cluster). Before, we ignored the one error but the retrying leads to unit test timeouts.
Additionally, we now stop retrying to create a runner when the environment got deleted.
2023-08-18 09:28:23 +02:00
0fd6e42487 Add regression e2e test for incomplete debug message.
See #325.
2023-08-14 11:37:51 +02:00
93db065923 Write performance profile on SIGUSR1 2023-04-11 20:31:50 +01:00
0d829c9308 Fix Panic Recovery
by moving the recovery functionality in the main goroutine.
2023-03-31 12:14:42 +02:00
a4599f2cf9 Fix panic on influx shutdown.
Influx was shutdown before Poseidon was terminated. In that mean time the Profiling data has been written. Also in that mean time, a periodical influx event triggers a panic since influx is already shutdown.

We implemented two changes, each fixing this scenario.
2023-03-13 15:21:24 +01:00
efa746c940 Add automatic detection for release version 2023-03-04 23:35:31 +01:00
1a378ce640 Enable profiler and profile-guided builds
I used the chance to simplify the Makefile, as this is required for the file check to work correctly. Variables should not contain quotes, as these will be included in the value otherwise.
2023-02-28 01:14:05 +01:00
689344bd79 Enable Sentry Performance Tracing 2023-02-03 10:29:18 +00:00
7edd40b4f0 Add Read Header Timeout
to prevent a potential Slowloris attack.
2022-07-31 19:42:35 +02:00
498e8f5ff5 #110 Refactor influxdb monitoring
to use it as singleton.
This enables the possibility to monitor processes that are independent of an incoming request.
2022-07-01 15:29:31 +02:00
795c83f7b2 Fix deleting non existent environments
that is an error caused by throwing a panic when an environment is not found and a nonexistent runner manager at the end of the chain is asked for it.
2022-06-07 15:54:48 +02:00
358769eb6b Fix golangci lint. 2022-05-24 22:12:48 +02:00
8feffdae3a Add initial structure of influxdb monitoring. 2022-04-18 13:17:49 +02:00
f6d9a6ddbb Add unit tests 2022-02-28 14:54:40 +01:00
6123d20525 Implement core functionality of AWS integration 2022-02-28 14:54:40 +01:00
ba43f667c2 Add architecture for multiple managers
using the chain of responsibility pattern.
2022-02-28 14:54:40 +01:00
901aa3c8db Remove TCP Write Response Timeout 2021-12-12 10:27:03 +01:00
3ae83217d7 Add Sentry integration 2021-11-25 19:29:33 +01:00
c8c5357b8c Rename module for GitHub 2021-07-30 16:43:05 +02:00
67ebdbd650 Add option to configure template job HCL file
Previously, the template job HCL file was hardcoded using go:embed
in the binary. However, this did not allow users running Poseidon
to change its content. Now, users can change the content of the
template job HCL file using the configuration option.
2021-07-29 11:54:36 +00:00
6a60b6cd89 Add config option to enable (m)TLS between Poseidon and Nomad 2021-07-29 09:43:21 +00:00
3aa1227db6 Use authentication token from config for communication with Nomad 2021-07-27 11:35:55 +00:00
8b26ecbe5f Restructure project
We previously didn't really had any structure in our project apart
from creating a new folder for each package in our project root.
Now that we have accumulated some packages, we use the well-known
Golang project layout in order to clearly communicate our intent
with packages. See https://github.com/golang-standards/project-layout
2021-07-21 12:55:35 +02:00