By removing runners after a specified timeout they no longer stay
around indefinitely and block Nomads capacities. The timeout can be set
individually per runner when requesting the provide route. If it is set
to 0, the runner is never removed automatically.
The timeout is reset when activity is detected. Currently that is when
something gets executed or the filesystem gets modified.
As we can't control which allocations are destroyed when downscaling a job, we decided
to use Nomad jobs as our runners. Thus for each runner we prewarm for an environment,
a corresponding job is created in Nomad. We create a default job that serves as a template
for the runners. Using this, already existing execution environments can easily be restored,
once Poseidon is restarted.
For unit tests, this mocks the runners Execute method with a
customizable function that operates on the request, streams and exit
channel to simulate a real execution.
End-to-end tests are moved to the tests/e2e_tests folder. The tests
folder allows us to have shared helper functions for all tests in a
separate package (tests) that is not included in the non-test build.
This also adds one second of delay before each end-to-end test case by
using the TestSetup method of suite. By slowing down test execution,
this gives Nomad time to create new allocations when a test requested a
runner. Another solution could be to increase the scale of the job to
have enough allocations for all end-to-end tests.
Co-authored-by: Maximilian Paß <maximilian.pass@student.hpi.uni-potsdam.de>