Commit Graph

23 Commits

Author SHA1 Message Date
9deee186a7 Fix Runner DNS resolution
by adding public nameservers to the CNI secure bridge configuration.
2024-04-03 10:14:24 +02:00
1d93f3895f Reduce severity of "Too many idle runners" 2023-11-01 18:42:45 +01:00
6b69a2d732 Refactor Nomad Recovery
from an approach that loaded the runners only once at the startup
to a method that will be repeated i.e. if the Nomad Event Stream connection interrupts.
2023-10-31 15:49:56 +01:00
59da36303c Fix Goroutine Leak of Environment Get
that was caused by creating an intermediate environment `fetchedEnvironment` when fetching the environments but not removing it in case that we just copy its configuration to the existing environment.
2023-09-11 13:44:29 +02:00
e3161637a9 Extract the WatchEventStream retry mechanism
into the utils including all other retry mechanisms.

With this change we fix that the WatchEventStream goroutine does not stop directly when the context is done (but previously only one second after).
2023-09-11 13:44:29 +02:00
354c16cc37 Fix missing rescheduled idle runners.
In today's unattended upgrade, we have seen how the prewarming pool size dropped to (near) zero. This was based on lost Nomad allocations. The allocations got rescheduled, but not added again to Poseidon.

The reason for this is a miscommunication between the Event Handling and the Nomad Manager. `removedByPoseidon` was true even if the runner was not removed by the manager, but an idle runner.
2023-09-05 15:15:39 +02:00
13a9da95e5 Introduce a context for RetryExponential
as second criteria (next to the maximum number of attempts) for canceling the retrying. This is required as we started with the previous commit to retry the nomad environment recovery. This always fails for unit tests (as they are not connected to an Nomad cluster). Before, we ignored the one error but the retrying leads to unit test timeouts.
Additionally, we now stop retrying to create a runner when the environment got deleted.
2023-08-18 09:28:23 +02:00
73759f8a3c Retry Environment Recovery 2023-08-18 09:28:23 +02:00
e7df777db4 Always log Runner and Environment ID.
Systematically log the runner id and the environment id by adding the information at the findRunnerMiddleware.
2023-07-15 21:46:56 +02:00
f7339570ae Fix increased prewarming pool size
by checking the number of required runners before creating an additional runner.
2023-05-28 23:47:07 +01:00
160df3d9e6 Add retry-mechanism for sample, mark-as-used and return
of Nomad runners.
2022-10-24 22:12:09 +01:00
7119f3e012 Fix not canceling monitoring events for removed environments
and runners.
2022-10-24 13:15:14 +02:00
5d54b0f786 Fix wrong environment id at monitoring
data for created or updated environments.
2022-10-24 13:15:14 +02:00
d372e37d1a Add cni/secure-bridge to isolate host network 2022-09-18 19:02:04 +02:00
1eef26cc83 Add environment id to periodical monitoring events. 2022-08-20 09:17:43 +02:00
5590c50e14 #110 Add periodical monitoring events. 2022-08-19 20:48:46 +02:00
18daa1152c Save the environment id for runner monitoring. 2022-07-31 19:42:35 +02:00
498e8f5ff5 #110 Refactor influxdb monitoring
to use it as singleton.
This enables the possibility to monitor processes that are independent of an incoming request.
2022-07-01 15:29:31 +02:00
34040162c2 #89 Generalise the three Storage interfaces and structs into one generic storage manager. 2022-06-29 16:21:19 +02:00
a41659eed4 Enable memory oversubscription (#102)
* Enable memory oversubscription

* Fix and add e2e test
2022-03-18 08:31:27 +01:00
2cf890ab91 Implement review comments 2022-02-28 14:54:40 +01:00
6123d20525 Implement core functionality of AWS integration 2022-02-28 14:54:40 +01:00
dd41e0d5c4 Generate structures for an AWS environment and runner 2022-02-28 14:54:40 +01:00