Previously we used this file to deploy a job on Nomad that our API
used for e2e tests. Now that we create the environments in the e2e
tests, we don't need the demo job anymore.
Previously we had a dependency to the tests package. As the
nullreader package is in the pkg directory it should be publicly
available. However, having the tests dependency could lead to a
transitive dependency to an internal package, if the tests package
would import one. Thus, we removed it.
We previously didn't really had any structure in our project apart
from creating a new folder for each package in our project root.
Now that we have accumulated some packages, we use the well-known
Golang project layout in order to clearly communicate our intent
with packages. See https://github.com/golang-standards/project-layout
We now store the mapped ports returned by Nomad locally in our runner
struct and return them when requesting the runner. The returned ip
address is in most Nomad setups not reachable from external users.
Previously we would create as much runners as needed based on the
local idleRunnersCount and the desiredIdleRunnersCount. This is
problematic if two runners are claimed shortly after one another.
As we only add a runner to the idleRunners list once we get the
event from Nomad, the second runner claim in a short timeframe
would create two new runners. This has been fixed now.
To be able to restore the runner timeouts even after a Poseidon restart,
the timeout is stored in the Nomad metadata. The timeout will restart,
but at least the runner will be returned at all.
By removing runners after a specified timeout they no longer stay
around indefinitely and block Nomads capacities. The timeout can be set
individually per runner when requesting the provide route. If it is set
to 0, the runner is never removed automatically.
The timeout is reset when activity is detected. Currently that is when
something gets executed or the filesystem gets modified.
ChannelReceivesSomething (formerly WaitForChannel) originally was
located in the helpers package.
This move was done to remove a cyclic dependency with the nomand package.
When running an execution, Nomad continuously reads from the stdin
reader. Because the readers we implemented (codeOceanToRawReader and
nullReader) return zero if there is no input available, this leads to
busy waiting and a high CPU load on Poseidon. By waiting indefinitely in
case of the nullReader and for at least one byte on case of the
codeOceanToRawReader before returning, we prevent this issue.
As the environment is no longer stored in the meta information,
Poseidon wasn't able to recover environments. It expected the
environment id to be found in the meta data. We now recover
the environment id from the job id.
Previously, low level Nomad job creation was done in the environment manager.
It used many functions of the nomad package so we felt like this logic
better belongs to the nomad package.
As we can't control which allocations are destroyed when downscaling a job, we decided
to use Nomad jobs as our runners. Thus for each runner we prewarm for an environment,
a corresponding job is created in Nomad. We create a default job that serves as a template
for the runners. Using this, already existing execution environments can easily be restored,
once Poseidon is restarted.
Previously the stderr fifo would not be removed, leaving unwanted
artifacts from the execution behind. We now remove the stderr fifo
after the command finished.
When running a command interactively, we previously would get stdout
and stderr both served on stdout by Nomad. To circumvent this issue,
we now start a separate execution inside the allocation to split
both streams.
Earlier we used a channel to store the runners. To make the environment
refresh block, we scheduled an additional runner as the buffered channel
was then filled up. As we don't use the channel anymore, we don't need
the additional runner anymore. Furthermore this leads to weird race
conditions in tests when comparing the runner count to the desired one.
Previously the minimum was not set, thus defaulting to the value of count.
This did not allow creating execution environments with a prewarmingPoolSize
of 0 as the task group count must not be less than the minimum coun in the
scaling policy.
Previously, the network_mode was only set when creating a job with
network_access = false. This results in Nomad leaving this setting
as is when updating the job to use network. Thus a job would have
had the mapped ports in the Nomad UI, but the Docker network_mode
would still be 'none'.