poseidon

Author	SHA1	Message	Date
Jan-Eric Hellenberg	5c9f975285	Update api.tpl.nomad to allow configuration Nomad ACL Token for Poseidon	2021-07-29 12:49:17 +00:00
sirkrypt0	67ebdbd650	Add option to configure template job HCL file Previously, the template job HCL file was hardcoded using go:embed in the binary. However, this did not allow users running Poseidon to change its content. Now, users can change the content of the template job HCL file using the configuration option.	2021-07-29 11:54:36 +00:00
Maximilian Paß	12da813081	Describe how Poseidon abstracts from Nomad	2021-07-29 11:32:52 +00:00
sirkrypt0	81eccbdf9c	Remove custom deployment watcher script As of version 1.1.2 of Nomad, the CLI monitors job deployments by default until they are finished. Thus our custom job deployment watcher script is not required anymore.	2021-07-29 09:57:04 +00:00
sirkrypt0	3564cf767e	Update Nomad dependencies to 1.1.2	2021-07-29 09:57:04 +00:00
Jan-Eric Hellenberg	210a048b5e	Update api.tpl.nomad to allow configuring TLS to Nomad through gitlab	2021-07-29 09:43:21 +00:00
Jan-Eric Hellenberg	01d16600b0	Document activating TLS between Poseidon and Nomad	2021-07-29 09:43:21 +00:00
Jan-Eric Hellenberg	6a60b6cd89	Add config option to enable (m)TLS between Poseidon and Nomad	2021-07-29 09:43:21 +00:00
sirkrypt0	e2d71a11ad	Avoid concurrent writes to the websocket connection Previously, the server sometimes crashed due to concurrent writes to the websocket connection. Now, we ensure that only one concurrent function writes to the websocket at a time by enclosing the WriteMessage function with a mutex.	2021-07-29 09:21:15 +00:00
Konrad Hanff	6929169cb5	Add test for nullio.ReadWriter	2021-07-29 10:28:47 +02:00
Konrad Hanff	8d24bda61a	Send SIGQUIT when cancelling an execution When the context passed to Nomad Allocation Exec is cancelled, the process is not terminated. Instead, just the WebSocket connection is closed. In order to terminate long-running processes, a special character is injected into the standard input stream. This character is parsed by the tty line discipline (tty has to be true). The line discipline sends a SIGQUIT signal to the process, terminating it and producing a core dump (in a file called 'core'). The SIGQUIT signal can be caught but isn't by default, which is why the runner is destroyed if the program does not terminate during a grace period after the signal was sent.	2021-07-29 10:28:47 +02:00
sirkrypt0	91537a7364	Use test docker image in e2e tests The TestCreateOrUpdateEnvironment function would previously use the python:latest Docker image in its execution environment request. However, this lead to pull rate limiting by Docker Hub in our CI.	2021-07-27 15:26:53 +00:00
sirkrypt0	fe240c82b4	Remove demo job HCL file Previously we used this file to deploy a job on Nomad that our API used for e2e tests. Now that we create the environments in the e2e tests, we don't need the demo job anymore.	2021-07-27 16:31:03 +02:00
Jan-Eric Hellenberg	f323bdf169	Add documentation on authenticating against Nomad	2021-07-27 11:35:55 +00:00
Jan-Eric Hellenberg	3aa1227db6	Use authentication token from config for communication with Nomad	2021-07-27 11:35:55 +00:00
Konrad Hanff	23b726cef9	Correct behavior when WebSocket closes.	2021-07-26 06:47:55 +00:00
sirkrypt0	909f347d2f	Remove tests dependency from nullreader test Previously we had a dependency to the tests package. As the nullreader package is in the pkg directory it should be publicly available. However, having the tests dependency could lead to a transitive dependency to an internal package, if the tests package would import one. Thus, we removed it.	2021-07-21 12:55:35 +02:00
sirkrypt0	8b26ecbe5f	Restructure project We previously didn't really had any structure in our project apart from creating a new folder for each package in our project root. Now that we have accumulated some packages, we use the well-known Golang project layout in order to clearly communicate our intent with packages. See https://github.com/golang-standards/project-layout	2021-07-21 12:55:35 +02:00
sirkrypt0	2f1383b743	Add tests for returning mapped ports of runners	2021-07-21 08:22:10 +02:00
sirkrypt0	64764a9809	Return mapped ports when requesting runners We now store the mapped ports returned by Nomad locally in our runner struct and return them when requesting the runner. The returned ip address is in most Nomad setups not reachable from external users.	2021-07-20 23:22:58 +02:00
sirkrypt0	d7c1787b57	Disable allow-failure for linting pipeline Now that all linting issues are fixed, we disable allow-failure for the linting step to ensure that later commits adhere to the linter.	2021-07-13 08:59:25 +02:00
sirkrypt0	c7606f3d5f	Fix a lot of linting issues After we introduced the linter we haven't really touched the old code. This commit now fixes all linting issue that exist right now.	2021-07-13 08:59:25 +02:00
Maximilian Paß	bd7fb53385	Fix bug that the count of the default task group is set to the prewarming pool size	2021-07-07 09:21:57 +02:00
Maximilian Paß	68eacae7fe	Fix bug that config task group is not added to the template job (and the faulty tests)	2021-07-06 10:09:36 +02:00
Maximilian Paß	bbc1ce12ca	Delete idle runners when the environment is scaled down	2021-07-02 13:00:13 +02:00
Maximilian Paß	66d04fde2a	Remove unused function ScaleAllEnvironments	2021-07-01 09:21:09 +00:00
sirkrypt0	50a2a22b74	Only create exactly one new runner when one runner is claimed Previously we would create as much runners as needed based on the local idleRunnersCount and the desiredIdleRunnersCount. This is problematic if two runners are claimed shortly after one another. As we only add a runner to the idleRunners list once we get the event from Nomad, the second runner claim in a short timeframe would create two new runners. This has been fixed now.	2021-06-29 09:11:21 +02:00
Konrad Hanff	e0e254a6af	Persist runner timeout in metadata To be able to restore the runner timeouts even after a Poseidon restart, the timeout is stored in the Nomad metadata. The timeout will restart, but at least the runner will be returned at all.	2021-06-23 11:07:17 +02:00
Konrad Hanff	ae08e37106	Add end to end test for inactivity timeout	2021-06-23 11:04:19 +02:00
Konrad Hanff	6c887de6f1	Move NullReader from nomad to util package.	2021-06-23 11:04:19 +02:00
Konrad Hanff	14f8a096eb	Add unit and integration tests for runner inactivity timeout.	2021-06-23 11:04:19 +02:00
Konrad Hanff	4b2cae0bd1	Add inactivity timeout for runners. By removing runners after a specified timeout they no longer stay around indefinitely and block Nomads capacities. The timeout can be set individually per runner when requesting the provide route. If it is set to 0, the runner is never removed automatically. The timeout is reset when activity is detected. Currently that is when something gets executed or the filesystem gets modified.	2021-06-23 11:04:18 +02:00
Konrad Hanff	c7ed54942d	Move ChannelReceivesSomething to tests package. ChannelReceivesSomething (formerly WaitForChannel) originally was located in the helpers package. This move was done to remove a cyclic dependency with the nomand package.	2021-06-21 10:54:07 +02:00
Konrad Hanff	92f1af83ae	Add tests for codeOceanToRaw and null readers The tests ensure the readers do not return when there is no data available.	2021-06-21 08:20:04 +00:00
Konrad Hanff	17c1e379c2	Fix busy waiting on stdin When running an execution, Nomad continuously reads from the stdin reader. Because the readers we implemented (codeOceanToRawReader and nullReader) return zero if there is no input available, this leads to busy waiting and a high CPU load on Poseidon. By waiting indefinitely in case of the nullReader and for at least one byte on case of the codeOceanToRawReader before returning, we prevent this issue.	2021-06-21 08:20:04 +00:00
Tobias Kantusch	0b9e5a5ba5	Update README * Update port to 7200 * Update linter instructions * Update Docker instructions	2021-06-18 07:31:24 +00:00
sirkrypt0	f5f7521a18	Fix environment recovery As the environment is no longer stored in the meta information, Poseidon wasn't able to recover environments. It expected the environment id to be found in the meta data. We now recover the environment id from the job id.	2021-06-18 08:39:54 +02:00
Maximilian Paß	2e4a975588	Implement even more merge request comments	2021-06-15 12:05:51 +02:00
sirkrypt0	ff582805b4	Move Nomad job creation to Nomad package Previously, low level Nomad job creation was done in the environment manager. It used many functions of the nomad package so we felt like this logic better belongs to the nomad package.	2021-06-15 11:38:02 +02:00
Maximilian Paß	87f823756b	Implement merge request comments	2021-06-15 11:37:47 +02:00
Maximilian Paß	25d78df557	Restore existing jobs and fix rebase (7c99eff3) issues	2021-06-15 11:37:35 +02:00
sirkrypt0	0020590c96	Update all runners when updating environment Previously only the default job would be updated to the newest specs. Now all Nomad jobs that belong to the given environment are updated accordingly.	2021-06-15 11:35:59 +02:00
sirkrypt0	c7d59810e5	Use Nomad jobs as runners instead of allocations As we can't control which allocations are destroyed when downscaling a job, we decided to use Nomad jobs as our runners. Thus for each runner we prewarm for an environment, a corresponding job is created in Nomad. We create a default job that serves as a template for the runners. Using this, already existing execution environments can easily be restored, once Poseidon is restarted.	2021-06-15 11:35:54 +02:00
sirkrypt0	8de489929e	Remove stderr fifo after interactive execution with stderr finished Previously the stderr fifo would not be removed, leaving unwanted artifacts from the execution behind. We now remove the stderr fifo after the command finished.	2021-06-14 15:04:09 +02:00
sirkrypt0	d3300e839e	Add unit tests for separate stdout and stderr on execution	2021-06-11 08:47:25 +00:00
sirkrypt0	f122dd9376	Split stdout and stderr on interactive execution When running a command interactively, we previously would get stdout and stderr both served on stdout by Nomad. To circumvent this issue, we now start a separate execution inside the allocation to split both streams.	2021-06-11 08:47:25 +00:00
sirkrypt0	19cd4b840e	Update Nomad to 1.1.1 and other project dependencies	2021-06-10 18:53:48 +02:00
Jan-Eric Hellenberg	61bc7d0143	Add unit tests for provide runner route	2021-06-10 06:11:31 +00:00
sirkrypt0	7bbd7b7bae	Fix task group name Previously when creating a job, Poseidon would still use the old task group name format instead of default-group as expected.	2021-06-09 18:22:28 +02:00
Maximilian Paß	32fe47d669	Implement linting issues and merge request comments	2021-06-09 08:35:20 +00:00

1 2 3 4

177 Commits