Files
poseidon/docs/nomad_usage.md
Maximilian Paß 840bfed1b9 Update docs/nomad_usage.md
Co-authored-by: Sebastian Serth <MrSerth@users.noreply.github.com>
2023-01-07 01:00:29 +01:00

3.4 KiB

Nomad Usage

Poseidon is an abstraction of the functionality provided by Nomad. In the following we will look at how Poseidon uses Nomad's functionality.

Nomad is structured in different levels of abstraction. Jobs are collected in namespaces. Each Job can contain several Task Groups. Each Task Group can contain several Tasks. Finally, Allocations map Task Groups to Nomad Clients. For more insights take a look at the official description. In our case, a Task is executed in a Docker container.

Overview Poseidon-Nomad mapping

Execution environments as template Jobs

Execution Environments are mapped to Nomad Jobs. In the following, we will call these Jobs Template Jobs. The naming schema for Template Jobs is "template-<execution-environment-id>".

The last figure shows the structure in Nomad. Each template Job contains a "config" Task Group including a "config" Task. This Task does not perform any computations but is used to store environment-specific attributes, such as the prewarming pool size. In addition, the template Job contains a "default-group" Task Group with a "default-task" Task. In this Task, sleep infinity is executed so that the Task remains active and is ready for dynamic executions in the container.

As shown in the figure, the "config" Task Group has no Allocation, while the "default-group" has an Allocation. This is because the "config" Task Group only stores information but does not execute anything. In the "default-group" the user's code submissions are executed. Therefore, Nomad creates an Allocation that points the Task Group to a Nomad Client for execution.

Runner as Nomad Jobs

As an abstraction of the execution engine, we use Runner as a description for Docker containers (currently used) or microVMs. If a user requests a new runner, Poseidon duplicates the template Job of the corresponding environment.

When a user then executes their code, Poseidon copies the code into the container and executes it.

Prewarming

To reduce the response time in the process of claiming a runner, Poseidon creates a pool of runners that have been started in advance. When a user requests a runner, a runner from this pool can be used. In the background, a new runner is created, thus replenishing the pool. By running in the background, the user does not have to wait as long as the runner needs to start. The implementation of this concept can be seen in the Runner Manager.

Lifecycle

The prewarming pool is initiated when a new environment is requested/created according to the requested prewarming pool size.

Every change on the environment (resource constraints, prewarming pool size, network access) leads to the destruction of the environment including all used and idle runners (the prewarming pool). After that, the environment and its prewarming pool is re-created.

Other causes which lead to the destruction of the prewarming pool are the explicit deletion of the environment by using the API route or when the corresponding template job for a given enviornment is no longer available on Nomad but a force update is requested using the GET /execution-environments/{id}?fetch=true route. The issue described in the latter case should not occur in normal operation, but could arise from either manually deleting the template job, scheduling issues on Nomad or other unforseenable edge cases.