Commit Graph

741 Commits

Author SHA1 Message Date
f259d65aa4 Add unit tests for runner recovery. 2023-10-31 15:49:56 +01:00
160a097d07 Fix flaky test TestDestroysRunnerAfterTimeoutAndSignal. 2023-10-31 15:49:56 +01:00
d0dd5c08cb Remove usage of context.DeadlineExceeded
for internal decisions as this error is strongly used by other packages. By checking such wrapped errors the internal decision can be influenced accidentally.
In this case the retry mechanism checked if the error is context.DeadlineExceeded and assumed it would be created by the internal context. This assumption was wrong.
2023-10-31 15:49:56 +01:00
6b69a2d732 Refactor Nomad Recovery
from an approach that loaded the runners only once at the startup
to a method that will be repeated i.e. if the Nomad Event Stream connection interrupts.
2023-10-31 15:49:56 +01:00
b2898f9183 Fix List of the Environments with fetch.
Before the List function dropped all idleRunners of all environments when fetch was set.

Additionally, the replaced environment was not destroyed properly so that a goroutine for it and for all its idle runners remained running.
2023-10-31 15:49:56 +01:00
809ca0321d Bump github.com/hashicorp/nomad from 1.6.2 to 1.6.3
Bumps [github.com/hashicorp/nomad](https://github.com/hashicorp/nomad) from 1.6.2 to 1.6.3.
- [Release notes](https://github.com/hashicorp/nomad/releases)
- [Changelog](https://github.com/hashicorp/nomad/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/nomad/compare/v1.6.2...v1.6.3)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/nomad
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-31 10:33:28 +01:00
2713e8672c Add error for empty list file system execution.
Normally, the result of executing the `lsCommand` should never be empty. However, we have observed that CodeOcean sometimes receives an empty JSON result if the runner is being deleted while the list file system request is processed. Therefore, we add a check if something has been written to CodeOcean and otherwise report an error.
2023-10-29 15:23:40 +01:00
e4769ee1b3 Bump github.com/google/uuid from 1.3.1 to 1.4.0
Bumps [github.com/google/uuid](https://github.com/google/uuid) from 1.3.1 to 1.4.0.
- [Release notes](https://github.com/google/uuid/releases)
- [Changelog](https://github.com/google/uuid/blob/master/CHANGELOG.md)
- [Commits](https://github.com/google/uuid/compare/v1.3.1...v1.4.0)

---
updated-dependencies:
- dependency-name: github.com/google/uuid
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-27 09:45:21 +02:00
a024183802 Bump google.golang.org/grpc from 1.58.2 to 1.58.3
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.58.2 to 1.58.3.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.58.2...v1.58.3)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-26 15:15:23 +02:00
14b012486d Formalize Memory Monitoring
by extracting the interval and threshold into configuration options.

Related to f670b07e.
2023-10-12 16:16:46 +02:00
ca42369057 Bump golang.org/x/net from 0.15.0 to 0.17.0
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.15.0 to 0.17.0.
- [Commits](https://github.com/golang/net/compare/v0.15.0...v0.17.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-12 08:26:38 +02:00
3d87cfceeb Bump github.com/getsentry/sentry-go from 0.24.1 to 0.25.0
Bumps [github.com/getsentry/sentry-go](https://github.com/getsentry/sentry-go) from 0.24.1 to 0.25.0.
- [Release notes](https://github.com/getsentry/sentry-go/releases)
- [Changelog](https://github.com/getsentry/sentry-go/blob/master/CHANGELOG.md)
- [Commits](https://github.com/getsentry/sentry-go/compare/v0.24.1...v0.25.0)

---
updated-dependencies:
- dependency-name: github.com/getsentry/sentry-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-06 13:26:43 +02:00
bd6ca6f88f Bump org.apache.maven.plugins:maven-shade-plugin
Bumps [org.apache.maven.plugins:maven-shade-plugin](https://github.com/apache/maven-shade-plugin) from 3.5.0 to 3.5.1.
- [Release notes](https://github.com/apache/maven-shade-plugin/releases)
- [Commits](https://github.com/apache/maven-shade-plugin/compare/maven-shade-plugin-3.5.0...maven-shade-plugin-3.5.1)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-shade-plugin
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-01 13:37:24 +00:00
da63a8d119 Bump com.amazonaws:aws-java-sdk-apigatewaymanagementapi
Bumps [com.amazonaws:aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java) from 1.12.555 to 1.12.560.
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.555...1.12.560)

---
updated-dependencies:
- dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-10-01 13:27:54 +00:00
89a7ca8d13 Update Dependencies 2023-09-23 16:26:50 +02:00
83bc0115e4 Bump docker/metadata-action from 3 to 5
Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 3 to 5.
- [Release notes](https://github.com/docker/metadata-action/releases)
- [Upgrade guide](https://github.com/docker/metadata-action/blob/master/UPGRADE.md)
- [Commits](https://github.com/docker/metadata-action/compare/v3...v5)

---
updated-dependencies:
- dependency-name: docker/metadata-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-21 10:16:29 +02:00
110e2318e1 Bump tj-actions/branch-names from 6 to 7
Bumps [tj-actions/branch-names](https://github.com/tj-actions/branch-names) from 6 to 7.
- [Release notes](https://github.com/tj-actions/branch-names/releases)
- [Changelog](https://github.com/tj-actions/branch-names/blob/main/HISTORY.md)
- [Commits](https://github.com/tj-actions/branch-names/compare/v6...v7)

---
updated-dependencies:
- dependency-name: tj-actions/branch-names
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-21 10:16:19 +02:00
0058bb7dbf Bump docker/build-push-action from 3 to 5
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 3 to 5.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/v3...v5)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-21 10:16:09 +02:00
cadbfcf5ca Bump docker/login-action from 2 to 3
Bumps [docker/login-action](https://github.com/docker/login-action) from 2 to 3.
- [Release notes](https://github.com/docker/login-action/releases)
- [Commits](https://github.com/docker/login-action/compare/v2...v3)

---
updated-dependencies:
- dependency-name: docker/login-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-21 10:14:01 +02:00
4a93238a15 Bump actions/checkout from 3 to 4
Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-21 10:13:46 +02:00
dfad36dfde Bump com.amazonaws:aws-lambda-java-events in /deploy/aws/java11Exec
Bumps [com.amazonaws:aws-lambda-java-events](https://github.com/aws/aws-lambda-java-libs) from 3.11.2 to 3.11.3.
- [Commits](https://github.com/aws/aws-lambda-java-libs/commits)

---
updated-dependencies:
- dependency-name: com.amazonaws:aws-lambda-java-events
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-21 08:01:20 +00:00
2aaa4ab800 Bump com.amazonaws:aws-java-sdk-apigatewaymanagementapi
Bumps [com.amazonaws:aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java) from 1.12.542 to 1.12.555.
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.542...1.12.555)

---
updated-dependencies:
- dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-21 09:53:10 +02:00
1cfcd09454 Add Dependabot for GitHub actions 2023-09-21 09:44:46 +02:00
f670b07ea7 Introduce debug memory monitoring
that alerts and prints additional debug information in case that Poseidon exceeds 1 GB of memory usage.
2023-09-19 22:31:06 +02:00
c83bdbf083 Fix WebSocket JSON schema
by changing the required attribute to be an array as described in the official documentation.
2023-09-19 11:15:58 +02:00
3d252492fe Fix rescheduled used runners being removed.
As they are already rescheduled and therefore recreated they do not need to be removed, but can be handled as a new runner.
2023-09-18 01:06:35 +02:00
6dc83ca7b5 Add regression test for rescheduled used runners being removed.
As they are already rescheduled and therefore recreated they do not need to be removed, but can be handled as a new runner.
2023-09-18 01:06:35 +02:00
90d591d4ec Change default behavior in Nomad Event Handling
to not propagate that pending runners are being stopped.
2023-09-18 00:54:26 +02:00
2eb15c8d93 Fix loosing of rescheduled runners
that are rescheduled while the previous allocation was still pending.
We fix this by removing the race condition handling that should prevent Poseidon from throwing warnings of unexpected allocation stopping.
2023-09-18 00:54:26 +02:00
788cb0f660 Add regression test for the recent lost runners. 2023-09-18 00:54:26 +02:00
39fc0f9d9d Update Dependencies 2023-09-16 19:52:52 +02:00
68cd8f43b4 Defuse data race condition of TestWithSeparateStderrReturnsCommandError. 2023-09-11 13:44:29 +02:00
6159f2a045 Fix Goroutine Leak of Nomad execute command
that was triggered when [the execution timeout got exceeded, the runner got destroyed, or the WebSocket connection to CodeOcean closed] and the Allocation did not react to the SIGQUIT within the grace period.
2023-09-11 13:44:29 +02:00
59da36303c Fix Goroutine Leak of Environment Get
that was caused by creating an intermediate environment `fetchedEnvironment` when fetching the environments but not removing it in case that we just copy its configuration to the existing environment.
2023-09-11 13:44:29 +02:00
460b8b2065 Refactor TestReturnReturnsErrorWhenApiCallFailed
to handle the retry mechanism.
2023-09-11 13:44:29 +02:00
3abd4d9a3d Refactor all tests to use the MemoryLeakTestSuite. 2023-09-11 13:44:29 +02:00
e3161637a9 Extract the WatchEventStream retry mechanism
into the utils including all other retry mechanisms.

With this change we fix that the WatchEventStream goroutine does not stop directly when the context is done (but previously only one second after).
2023-09-11 13:44:29 +02:00
0d6b4f660c Refactor NewAbstractManager
to require a context used for the monitoring.
2023-09-11 13:44:29 +02:00
b28b87d56f Refactor periodicallySendMonitoringData
in order to return directly when the context is done and not just at the next iteration.
2023-09-11 13:44:29 +02:00
a01bd0fa7e Provide Memory Leak Test Suite
by adding an assertion about the number of Goroutines to the unit tests.
2023-09-11 13:44:29 +02:00
b708dddd23 Add Nomad Manager test case
that ensures that `onAllocationStopped` returns true when the runner was deleted before by the inactivity timer.
This feature is required for handling a race condition with the event handling of a rescheduled allocation.
2023-09-05 15:15:39 +02:00
354c16cc37 Fix missing rescheduled idle runners.
In today's unattended upgrade, we have seen how the prewarming pool size dropped to (near) zero. This was based on lost Nomad allocations. The allocations got rescheduled, but not added again to Poseidon.

The reason for this is a miscommunication between the Event Handling and the Nomad Manager. `removedByPoseidon` was true even if the runner was not removed by the manager, but an idle runner.
2023-09-05 15:15:39 +02:00
67297ec5a2 Add regression test for rescheduled idle runner. 2023-09-05 15:15:39 +02:00
8820938624 Increase severity of two log statements. 2023-09-05 15:15:39 +02:00
390d02055b Bump com.amazonaws:aws-lambda-java-core in /deploy/aws/java11Exec
Bumps [com.amazonaws:aws-lambda-java-core](https://github.com/aws/aws-lambda-java-libs) from 1.2.2 to 1.2.3.
- [Commits](https://github.com/aws/aws-lambda-java-libs/commits)

---
updated-dependencies:
- dependency-name: com.amazonaws:aws-lambda-java-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-01 03:22:19 +00:00
847e11387a Bump com.amazonaws:aws-java-sdk-apigatewaymanagementapi
Bumps [com.amazonaws:aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java) from 1.12.519 to 1.12.542.
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.519...1.12.542)

---
updated-dependencies:
- dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-01 03:13:52 +00:00
188d012bc4 Fix Memory Leak caused by the merge_context.
The now removed statement of sending an empty struct into the channel blocked the goroutine until the channel of Done got listened for. This led to a goroutine leak as one does not necessarily has to call the Done function of a context.

We fix this issue by removing this value. It was unnecessary anyway as a closed channel always returns the null-value of the returned type.
2023-08-26 22:51:22 +02:00
b06ff4088f Bump github.com/google/uuid from 1.3.0 to 1.3.1
Bumps [github.com/google/uuid](https://github.com/google/uuid) from 1.3.0 to 1.3.1.
- [Release notes](https://github.com/google/uuid/releases)
- [Changelog](https://github.com/google/uuid/blob/master/CHANGELOG.md)
- [Commits](https://github.com/google/uuid/compare/v1.3.0...v1.3.1)

---
updated-dependencies:
- dependency-name: github.com/google/uuid
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-08-22 06:35:59 +00:00
c0a3fb12c3 Fix UpdateFileSystem Context
to be done when either the runner is destroyed (case ignored before) or the request is interrupted.
2023-08-21 22:49:09 +02:00
09604997a7 Implement MergeContext
that has multiple contexts as parent and chooses the earliest deadline.
2023-08-21 22:49:09 +02:00