When running an execution, Nomad continuously reads from the stdin
reader. Because the readers we implemented (codeOceanToRawReader and
nullReader) return zero if there is no input available, this leads to
busy waiting and a high CPU load on Poseidon. By waiting indefinitely in
case of the nullReader and for at least one byte on case of the
codeOceanToRawReader before returning, we prevent this issue.
As the environment is no longer stored in the meta information,
Poseidon wasn't able to recover environments. It expected the
environment id to be found in the meta data. We now recover
the environment id from the job id.
Previously, low level Nomad job creation was done in the environment manager.
It used many functions of the nomad package so we felt like this logic
better belongs to the nomad package.
As we can't control which allocations are destroyed when downscaling a job, we decided
to use Nomad jobs as our runners. Thus for each runner we prewarm for an environment,
a corresponding job is created in Nomad. We create a default job that serves as a template
for the runners. Using this, already existing execution environments can easily be restored,
once Poseidon is restarted.
Previously the stderr fifo would not be removed, leaving unwanted
artifacts from the execution behind. We now remove the stderr fifo
after the command finished.
When running a command interactively, we previously would get stdout
and stderr both served on stdout by Nomad. To circumvent this issue,
we now start a separate execution inside the allocation to split
both streams.
Earlier we used a channel to store the runners. To make the environment
refresh block, we scheduled an additional runner as the buffered channel
was then filled up. As we don't use the channel anymore, we don't need
the additional runner anymore. Furthermore this leads to weird race
conditions in tests when comparing the runner count to the desired one.
Previously the minimum was not set, thus defaulting to the value of count.
This did not allow creating execution environments with a prewarmingPoolSize
of 0 as the task group count must not be less than the minimum coun in the
scaling policy.
Previously, the network_mode was only set when creating a job with
network_access = false. This results in Nomad leaving this setting
as is when updating the job to use network. Thus a job would have
had the mapped ports in the Nomad UI, but the Docker network_mode
would still be 'none'.
For unit tests, this mocks the runners Execute method with a
customizable function that operates on the request, streams and exit
channel to simulate a real execution.
End-to-end tests are moved to the tests/e2e_tests folder. The tests
folder allows us to have shared helper functions for all tests in a
separate package (tests) that is not included in the non-test build.
This also adds one second of delay before each end-to-end test case by
using the TestSetup method of suite. By slowing down test execution,
this gives Nomad time to create new allocations when a test requested a
runner. Another solution could be to increase the scale of the job to
have enough allocations for all end-to-end tests.
Co-authored-by: Maximilian Paß <maximilian.pass@student.hpi.uni-potsdam.de>
This enables executing commands in runners and forwarding input and
output between the runner and the websocket to the client.
Co-authored-by: Maximilian Paß <maximilian.pass@student.hpi.uni-potsdam.de>
As we pass the context to the Nomad API event stream, they close
the event stream once the passed context is cancelled. We use this
to exit our receive loop on the event stream once the stream is closed,
instead of having to check the context manually.
When running the `test` and `e2e-test` target with make, this prevents
`go test` from using cached test results. Rerunning the tests every time
allows for easy detection of flaky tests.
We used a git command to find the location of the users git directory.
This resulted in warnings in the CI where git was not installed.
For now we make the git directory static to .git to avoid this as
the directory is mostly located there.
Trivy is used in the CI after building our Docker image. It scans
the Docker image and our dependencies for known vulnerabilities.
The docker-make image is a simple docker:latest containing make.