To be able to restore the runner timeouts even after a Poseidon restart,
the timeout is stored in the Nomad metadata. The timeout will restart,
but at least the runner will be returned at all.
As we can't control which allocations are destroyed when downscaling a job, we decided
to use Nomad jobs as our runners. Thus for each runner we prewarm for an environment,
a corresponding job is created in Nomad. We create a default job that serves as a template
for the runners. Using this, already existing execution environments can easily be restored,
once Poseidon is restarted.
For unit tests, this mocks the runners Execute method with a
customizable function that operates on the request, streams and exit
channel to simulate a real execution.
End-to-end tests are moved to the tests/e2e_tests folder. The tests
folder allows us to have shared helper functions for all tests in a
separate package (tests) that is not included in the non-test build.
This also adds one second of delay before each end-to-end test case by
using the TestSetup method of suite. By slowing down test execution,
this gives Nomad time to create new allocations when a test requested a
runner. Another solution could be to increase the scale of the job to
have enough allocations for all end-to-end tests.
Co-authored-by: Maximilian Paß <maximilian.pass@student.hpi.uni-potsdam.de>