By removing runners after a specified timeout they no longer stay
around indefinitely and block Nomads capacities. The timeout can be set
individually per runner when requesting the provide route. If it is set
to 0, the runner is never removed automatically.
The timeout is reset when activity is detected. Currently that is when
something gets executed or the filesystem gets modified.
When running an execution, Nomad continuously reads from the stdin
reader. Because the readers we implemented (codeOceanToRawReader and
nullReader) return zero if there is no input available, this leads to
busy waiting and a high CPU load on Poseidon. By waiting indefinitely in
case of the nullReader and for at least one byte on case of the
codeOceanToRawReader before returning, we prevent this issue.
Previously, low level Nomad job creation was done in the environment manager.
It used many functions of the nomad package so we felt like this logic
better belongs to the nomad package.
As we can't control which allocations are destroyed when downscaling a job, we decided
to use Nomad jobs as our runners. Thus for each runner we prewarm for an environment,
a corresponding job is created in Nomad. We create a default job that serves as a template
for the runners. Using this, already existing execution environments can easily be restored,
once Poseidon is restarted.
Previously the stderr fifo would not be removed, leaving unwanted
artifacts from the execution behind. We now remove the stderr fifo
after the command finished.
When running a command interactively, we previously would get stdout
and stderr both served on stdout by Nomad. To circumvent this issue,
we now start a separate execution inside the allocation to split
both streams.
As we pass the context to the Nomad API event stream, they close
the event stream once the passed context is cancelled. We use this
to exit our receive loop on the event stream once the stream is closed,
instead of having to check the context manually.