By removing runners after a specified timeout they no longer stay
around indefinitely and block Nomads capacities. The timeout can be set
individually per runner when requesting the provide route. If it is set
to 0, the runner is never removed automatically.
The timeout is reset when activity is detected. Currently that is when
something gets executed or the filesystem gets modified.
When running an execution, Nomad continuously reads from the stdin
reader. Because the readers we implemented (codeOceanToRawReader and
nullReader) return zero if there is no input available, this leads to
busy waiting and a high CPU load on Poseidon. By waiting indefinitely in
case of the nullReader and for at least one byte on case of the
codeOceanToRawReader before returning, we prevent this issue.
As we can't control which allocations are destroyed when downscaling a job, we decided
to use Nomad jobs as our runners. Thus for each runner we prewarm for an environment,
a corresponding job is created in Nomad. We create a default job that serves as a template
for the runners. Using this, already existing execution environments can easily be restored,
once Poseidon is restarted.
This enables executing commands in runners and forwarding input and
output between the runner and the websocket to the client.
Co-authored-by: Maximilian Paß <maximilian.pass@student.hpi.uni-potsdam.de>