Before the List function dropped all idleRunners of all environments when fetch was set.
Additionally, the replaced environment was not destroyed properly so that a goroutine for it and for all its idle runners remained running.
that was caused by creating an intermediate environment `fetchedEnvironment` when fetching the environments but not removing it in case that we just copy its configuration to the existing environment.
into the utils including all other retry mechanisms.
With this change we fix that the WatchEventStream goroutine does not stop directly when the context is done (but previously only one second after).
In today's unattended upgrade, we have seen how the prewarming pool size dropped to (near) zero. This was based on lost Nomad allocations. The allocations got rescheduled, but not added again to Poseidon.
The reason for this is a miscommunication between the Event Handling and the Nomad Manager. `removedByPoseidon` was true even if the runner was not removed by the manager, but an idle runner.
as second criteria (next to the maximum number of attempts) for canceling the retrying. This is required as we started with the previous commit to retry the nomad environment recovery. This always fails for unit tests (as they are not connected to an Nomad cluster). Before, we ignored the one error but the retrying leads to unit test timeouts.
Additionally, we now stop retrying to create a runner when the environment got deleted.
* The image previously used is not available publicly and not maintained any longer
* The new base image is not bound to any specific programming environment
* Add forcePull option
for pulling the image when the execution environment gets updated
* Apply suggestions from code review
Co-authored-by: Sebastian Serth <MrSerth@users.noreply.github.com>
* Add unit tests
* Clean up and implement option two
Co-authored-by: Sebastian Serth <MrSerth@users.noreply.github.com>
* Add statistics route for execution environments
* Add maximum to port api definition
Co-authored-by: Sebastian Serth <MrSerth@users.noreply.github.com>
* Close evaluation stream for Nomad Job creation
when set event handler have been finished
* Remove evaluation event stream requests
by handling the events via the main Nomad event handler.
* #9 Implement routes to list, get and delete execution environments.
A refactoring was required to introduce the ExecutionEnvironment interface.
* Fix MR comments, linting issues and bug that lead to e2e test failure
* Add e2e tests
* Add unit tests