poseidon

Author	SHA1	Message	Date
sirkrypt0	fe240c82b4	Remove demo job HCL file Previously we used this file to deploy a job on Nomad that our API used for e2e tests. Now that we create the environments in the e2e tests, we don't need the demo job anymore.	2021-07-27 16:31:03 +02:00
Jan-Eric Hellenberg	f323bdf169	Add documentation on authenticating against Nomad	2021-07-27 11:35:55 +00:00
Jan-Eric Hellenberg	3aa1227db6	Use authentication token from config for communication with Nomad	2021-07-27 11:35:55 +00:00
Konrad Hanff	23b726cef9	Correct behavior when WebSocket closes.	2021-07-26 06:47:55 +00:00
sirkrypt0	909f347d2f	Remove tests dependency from nullreader test Previously we had a dependency to the tests package. As the nullreader package is in the pkg directory it should be publicly available. However, having the tests dependency could lead to a transitive dependency to an internal package, if the tests package would import one. Thus, we removed it.	2021-07-21 12:55:35 +02:00
sirkrypt0	8b26ecbe5f	Restructure project We previously didn't really had any structure in our project apart from creating a new folder for each package in our project root. Now that we have accumulated some packages, we use the well-known Golang project layout in order to clearly communicate our intent with packages. See https://github.com/golang-standards/project-layout	2021-07-21 12:55:35 +02:00
sirkrypt0	2f1383b743	Add tests for returning mapped ports of runners	2021-07-21 08:22:10 +02:00
sirkrypt0	64764a9809	Return mapped ports when requesting runners We now store the mapped ports returned by Nomad locally in our runner struct and return them when requesting the runner. The returned ip address is in most Nomad setups not reachable from external users.	2021-07-20 23:22:58 +02:00
sirkrypt0	d7c1787b57	Disable allow-failure for linting pipeline Now that all linting issues are fixed, we disable allow-failure for the linting step to ensure that later commits adhere to the linter.	2021-07-13 08:59:25 +02:00
sirkrypt0	c7606f3d5f	Fix a lot of linting issues After we introduced the linter we haven't really touched the old code. This commit now fixes all linting issue that exist right now.	2021-07-13 08:59:25 +02:00
Maximilian Paß	bd7fb53385	Fix bug that the count of the default task group is set to the prewarming pool size	2021-07-07 09:21:57 +02:00
Maximilian Paß	68eacae7fe	Fix bug that config task group is not added to the template job (and the faulty tests)	2021-07-06 10:09:36 +02:00
Maximilian Paß	bbc1ce12ca	Delete idle runners when the environment is scaled down	2021-07-02 13:00:13 +02:00
Maximilian Paß	66d04fde2a	Remove unused function ScaleAllEnvironments	2021-07-01 09:21:09 +00:00
sirkrypt0	50a2a22b74	Only create exactly one new runner when one runner is claimed Previously we would create as much runners as needed based on the local idleRunnersCount and the desiredIdleRunnersCount. This is problematic if two runners are claimed shortly after one another. As we only add a runner to the idleRunners list once we get the event from Nomad, the second runner claim in a short timeframe would create two new runners. This has been fixed now.	2021-06-29 09:11:21 +02:00
Konrad Hanff	e0e254a6af	Persist runner timeout in metadata To be able to restore the runner timeouts even after a Poseidon restart, the timeout is stored in the Nomad metadata. The timeout will restart, but at least the runner will be returned at all.	2021-06-23 11:07:17 +02:00
Konrad Hanff	ae08e37106	Add end to end test for inactivity timeout	2021-06-23 11:04:19 +02:00
Konrad Hanff	6c887de6f1	Move NullReader from nomad to util package.	2021-06-23 11:04:19 +02:00
Konrad Hanff	14f8a096eb	Add unit and integration tests for runner inactivity timeout.	2021-06-23 11:04:19 +02:00
Konrad Hanff	4b2cae0bd1	Add inactivity timeout for runners. By removing runners after a specified timeout they no longer stay around indefinitely and block Nomads capacities. The timeout can be set individually per runner when requesting the provide route. If it is set to 0, the runner is never removed automatically. The timeout is reset when activity is detected. Currently that is when something gets executed or the filesystem gets modified.	2021-06-23 11:04:18 +02:00
Konrad Hanff	c7ed54942d	Move ChannelReceivesSomething to tests package. ChannelReceivesSomething (formerly WaitForChannel) originally was located in the helpers package. This move was done to remove a cyclic dependency with the nomand package.	2021-06-21 10:54:07 +02:00
Konrad Hanff	92f1af83ae	Add tests for codeOceanToRaw and null readers The tests ensure the readers do not return when there is no data available.	2021-06-21 08:20:04 +00:00
Konrad Hanff	17c1e379c2	Fix busy waiting on stdin When running an execution, Nomad continuously reads from the stdin reader. Because the readers we implemented (codeOceanToRawReader and nullReader) return zero if there is no input available, this leads to busy waiting and a high CPU load on Poseidon. By waiting indefinitely in case of the nullReader and for at least one byte on case of the codeOceanToRawReader before returning, we prevent this issue.	2021-06-21 08:20:04 +00:00
Tobias Kantusch	0b9e5a5ba5	Update README * Update port to 7200 * Update linter instructions * Update Docker instructions	2021-06-18 07:31:24 +00:00
sirkrypt0	f5f7521a18	Fix environment recovery As the environment is no longer stored in the meta information, Poseidon wasn't able to recover environments. It expected the environment id to be found in the meta data. We now recover the environment id from the job id.	2021-06-18 08:39:54 +02:00
Maximilian Paß	2e4a975588	Implement even more merge request comments	2021-06-15 12:05:51 +02:00
sirkrypt0	ff582805b4	Move Nomad job creation to Nomad package Previously, low level Nomad job creation was done in the environment manager. It used many functions of the nomad package so we felt like this logic better belongs to the nomad package.	2021-06-15 11:38:02 +02:00
Maximilian Paß	87f823756b	Implement merge request comments	2021-06-15 11:37:47 +02:00
Maximilian Paß	25d78df557	Restore existing jobs and fix rebase (7c99eff3) issues	2021-06-15 11:37:35 +02:00
sirkrypt0	0020590c96	Update all runners when updating environment Previously only the default job would be updated to the newest specs. Now all Nomad jobs that belong to the given environment are updated accordingly.	2021-06-15 11:35:59 +02:00
sirkrypt0	c7d59810e5	Use Nomad jobs as runners instead of allocations As we can't control which allocations are destroyed when downscaling a job, we decided to use Nomad jobs as our runners. Thus for each runner we prewarm for an environment, a corresponding job is created in Nomad. We create a default job that serves as a template for the runners. Using this, already existing execution environments can easily be restored, once Poseidon is restarted.	2021-06-15 11:35:54 +02:00
sirkrypt0	8de489929e	Remove stderr fifo after interactive execution with stderr finished Previously the stderr fifo would not be removed, leaving unwanted artifacts from the execution behind. We now remove the stderr fifo after the command finished.	2021-06-14 15:04:09 +02:00
sirkrypt0	d3300e839e	Add unit tests for separate stdout and stderr on execution	2021-06-11 08:47:25 +00:00
sirkrypt0	f122dd9376	Split stdout and stderr on interactive execution When running a command interactively, we previously would get stdout and stderr both served on stdout by Nomad. To circumvent this issue, we now start a separate execution inside the allocation to split both streams.	2021-06-11 08:47:25 +00:00
sirkrypt0	19cd4b840e	Update Nomad to 1.1.1 and other project dependencies	2021-06-10 18:53:48 +02:00
Jan-Eric Hellenberg	61bc7d0143	Add unit tests for provide runner route	2021-06-10 06:11:31 +00:00
sirkrypt0	7bbd7b7bae	Fix task group name Previously when creating a job, Poseidon would still use the old task group name format instead of default-group as expected.	2021-06-09 18:22:28 +02:00
Maximilian Paß	32fe47d669	Implement linting issues and merge request comments	2021-06-09 08:35:20 +00:00
Maximilian Paß	4b5f0a3eb6	Add tests for runner manager updating runners	2021-06-09 08:35:20 +00:00
Maximilian Paß	d0a2a1d96c	Add tests for receiving allocation updates from Nomad	2021-06-09 08:35:20 +00:00
sirkrypt0	3f572261c2	Add updating cached allocations	2021-06-09 08:35:20 +00:00
sirkrypt0	66821dbfc8	Add query options to Nomad API queries to make sure we query the correct namespace	2021-06-09 08:35:20 +00:00
Jan-Eric Hellenberg	ce2b82d43d	Copy files with relative path to active workspace directory of container	2021-06-09 10:24:29 +02:00
sirkrypt0	b32e9c2a67	Remove off by one with needed runners Earlier we used a channel to store the runners. To make the environment refresh block, we scheduled an additional runner as the buffered channel was then filled up. As we don't use the channel anymore, we don't need the additional runner anymore. Furthermore this leads to weird race conditions in tests when comparing the runner count to the desired one.	2021-06-03 13:21:49 +00:00
sirkrypt0	3d7b7e1761	Set default minimum count in scaling policy to 0 Previously the minimum was not set, thus defaulting to the value of count. This did not allow creating execution environments with a prewarmingPoolSize of 0 as the task group count must not be less than the minimum coun in the scaling policy.	2021-06-03 13:21:49 +00:00
sirkrypt0	630a006258	Use more uints Previously we accepted int values although only uint values made sense. We adjusted this to accept uints where appropriate.	2021-06-03 13:21:49 +00:00
sirkrypt0	1c4daa99a9	Add e2e tests for exec env createOrUpdate This also adds a Nomad client to the e2e_tests that can be used to query Nomad and validate that certain actions happened in Nomad correctly.	2021-06-03 13:21:49 +00:00
sirkrypt0	1be744f2d4	Explicitly set task groups network when networkAccess is false Previously, updating an environment from with to without network access would leave the network resource in the task group as they were before.	2021-06-03 13:21:49 +00:00
sirkrypt0	b990df7b9d	Add route to create or update execution environments	2021-06-03 13:21:49 +00:00
sirkrypt0	3d395f0a38	Set network_mode to bridge to overwrite old setting Previously, the network_mode was only set when creating a job with network_access = false. This results in Nomad leaving this setting as is when updating the job to use network. Thus a job would have had the mapped ports in the Nomad UI, but the Docker network_mode would still be 'none'.	2021-06-03 13:21:49 +00:00

1 2 3 4

165 Commits