Commit Graph

848 Commits

Author SHA1 Message Date
8820938624 Increase severity of two log statements. 2023-09-05 15:15:39 +02:00
390d02055b Bump com.amazonaws:aws-lambda-java-core in /deploy/aws/java11Exec
Bumps [com.amazonaws:aws-lambda-java-core](https://github.com/aws/aws-lambda-java-libs) from 1.2.2 to 1.2.3.
- [Commits](https://github.com/aws/aws-lambda-java-libs/commits)

---
updated-dependencies:
- dependency-name: com.amazonaws:aws-lambda-java-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-01 03:22:19 +00:00
847e11387a Bump com.amazonaws:aws-java-sdk-apigatewaymanagementapi
Bumps [com.amazonaws:aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java) from 1.12.519 to 1.12.542.
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.519...1.12.542)

---
updated-dependencies:
- dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-09-01 03:13:52 +00:00
188d012bc4 Fix Memory Leak caused by the merge_context.
The now removed statement of sending an empty struct into the channel blocked the goroutine until the channel of Done got listened for. This led to a goroutine leak as one does not necessarily has to call the Done function of a context.

We fix this issue by removing this value. It was unnecessary anyway as a closed channel always returns the null-value of the returned type.
2023-08-26 22:51:22 +02:00
b06ff4088f Bump github.com/google/uuid from 1.3.0 to 1.3.1
Bumps [github.com/google/uuid](https://github.com/google/uuid) from 1.3.0 to 1.3.1.
- [Release notes](https://github.com/google/uuid/releases)
- [Changelog](https://github.com/google/uuid/blob/master/CHANGELOG.md)
- [Commits](https://github.com/google/uuid/compare/v1.3.0...v1.3.1)

---
updated-dependencies:
- dependency-name: github.com/google/uuid
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-08-22 06:35:59 +00:00
c0a3fb12c3 Fix UpdateFileSystem Context
to be done when either the runner is destroyed (case ignored before) or the request is interrupted.
2023-08-21 22:49:09 +02:00
09604997a7 Implement MergeContext
that has multiple contexts as parent and chooses the earliest deadline.
2023-08-21 22:49:09 +02:00
306512bf9c Fix Context Values are not logged.
Only the Sentry hook uses the values of the passed context. Therefore, we removed the values from our log statements when we shifted them from an extra `WithField` call to the context.
We fix this behavior by introducing a Logrus Hook that copies a fixed set of context values to the logging data.
2023-08-21 22:40:37 +02:00
a7d27e8f65 Add missing error log statements.
When "markRunnerAsUsed" fails, we silently ignored it. Only, when additionally the return of the runner failed, we threw the error.

When a Runner is destroyed, we are only notified that Nomad removed the allocation, but cannot tell about the reason.

For "the execution did not stop after SIGQUIT" we did not log the belonging runner id.
2023-08-21 22:40:37 +02:00
13cd19ed58 Refactor Nomad Event Stream log message. 2023-08-18 09:28:23 +02:00
13a9da95e5 Introduce a context for RetryExponential
as second criteria (next to the maximum number of attempts) for canceling the retrying. This is required as we started with the previous commit to retry the nomad environment recovery. This always fails for unit tests (as they are not connected to an Nomad cluster). Before, we ignored the one error but the retrying leads to unit test timeouts.
Additionally, we now stop retrying to create a runner when the environment got deleted.
2023-08-18 09:28:23 +02:00
73759f8a3c Retry Environment Recovery 2023-08-18 09:28:23 +02:00
89c18ad45c Refactor to WithoutCancel context.
With Go 1.21 the WithoutCancel context was introduced. This way we can keep the values passed in a new context without having the new context being canceled together with its parent. This behavior suits well for two occurrences where we explicitly had to copy one required value instead of implicitly keeping all values.
2023-08-16 15:13:05 +02:00
2f43bced08 Update Go to 1.21 2023-08-16 15:13:05 +02:00
90092c48c1 Fix incomplete debug message
that is created by sending SIGQUIT to the bash process
by not processing output after the the client disconnected / we have sent the SIGQUIT.
2023-08-14 11:37:51 +02:00
0fd6e42487 Add regression e2e test for incomplete debug message.
See #325.
2023-08-14 11:37:51 +02:00
4d661138e9 Revert "Insert debug message into execution tracing"
This reverts commit 72d926ef6c5e9f8ddd0da39dbd1492dad3621c15.
2023-08-14 11:37:51 +02:00
ed1b83d13c Bump github.com/getsentry/sentry-go from 0.22.0 to 0.23.0
Bumps [github.com/getsentry/sentry-go](https://github.com/getsentry/sentry-go) from 0.22.0 to 0.23.0.
- [Release notes](https://github.com/getsentry/sentry-go/releases)
- [Changelog](https://github.com/getsentry/sentry-go/blob/master/CHANGELOG.md)
- [Commits](https://github.com/getsentry/sentry-go/compare/v0.22.0...v0.23.0)

---
updated-dependencies:
- dependency-name: github.com/getsentry/sentry-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-08-02 15:12:39 +02:00
0078b4cfd8 Bump com.amazonaws:aws-java-sdk-apigatewaymanagementapi
Bumps [com.amazonaws:aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java) from 1.12.508 to 1.12.519.
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.508...1.12.519)

---
updated-dependencies:
- dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-08-01 04:04:22 +00:00
23a7f06bee Add more implicit dependencies to go.sum
This will allow a clean build that is not marked as modified.
2023-07-25 22:54:17 +02:00
4cc8ab422c Update Nomad version for GitHub actions 2023-07-25 22:08:08 +02:00
6bfe3d7517 Update Dependencies 2023-07-25 22:08:00 +02:00
731b60acd6 Remove Sentry Exceptions
as workaround for having a usable title for the issue groups (not the error type).
2023-07-25 21:07:02 +01:00
75f2f9b290 Add Sentry Stack Traces
and exceptions for logs containing errors.
2023-07-25 21:07:02 +01:00
eb818f92f7 Refactor Runner Destroy Reason Masking
and ignore expected reasons such when the runner got destroyed by an API request.
2023-07-24 11:48:14 +01:00
102b3f0701 Bump github.com/hashicorp/nomad from 1.6.0 to 1.6.1
Bumps [github.com/hashicorp/nomad](https://github.com/hashicorp/nomad) from 1.6.0 to 1.6.1.
- [Release notes](https://github.com/hashicorp/nomad/releases)
- [Changelog](https://github.com/hashicorp/nomad/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/nomad/compare/v1.6.0...v1.6.1)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/nomad
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-07-24 04:01:11 +00:00
8ef5f4e7c5 Fix OOM Kill race condition
due to the Nomad request exiting before the allocation is stopped. We catch this behavior by introducing a time period for the allocation being stopped iff the exit code is 128.
2023-07-21 15:30:21 +02:00
6a1677dea0 Introduce reason for destroying runner
in order to return a specific error for OOM Killed Executions.
2023-07-21 15:30:21 +02:00
b3fedf274c Handle Runner Timeout
Before, Nomad executions often got stopped because the runner was deleted.
With the previous commit, we cover the exception to this behaviour by stopping the execution Poseidon-side.
These different approaches lead to different context error messages.
In this commit, we move the check of the passed timeout, to respond with the corresponding client message again.
2023-07-21 15:30:21 +02:00
bfb5977d24 Destroy runner on allocation stopped
Destroying the runner when Nomad informs us about its allocation being stopped, fixes the error of executions running into their timeout even if the allocation was stopped long ago.
2023-07-21 15:30:21 +02:00
40a5f2eca6 Insert debug message into execution tracing
to verify that the date command is sometimes returning an empty string with exit code 5.
2023-07-21 15:05:53 +02:00
1663008eb6 Update Nomad and CNI version for GitHub actions 2023-07-19 11:59:57 +00:00
5fe6ad29af Bump github.com/hashicorp/nomad from 1.5.6 to 1.6.0
Bumps [github.com/hashicorp/nomad](https://github.com/hashicorp/nomad) from 1.5.6 to 1.6.0.
- [Release notes](https://github.com/hashicorp/nomad/releases)
- [Changelog](https://github.com/hashicorp/nomad/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/nomad/compare/v1.5.6...v1.6.0)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/nomad
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-07-19 11:59:57 +00:00
d75073e9de Bump aws-java-sdk-apigatewaymanagementapi in /deploy/aws/java11Exec
Bumps [aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java) from 1.12.500 to 1.12.508.
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.500...1.12.508)

---
updated-dependencies:
- dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-07-19 11:11:29 +00:00
d64d8995bd Refactor monitoring of runner and environment id. 2023-07-15 21:46:56 +02:00
ee26cf13e5 Sentry: Make runner and environment searchable
by converting it into a Sentry Tag.

Also, replace the unstructured Extra attribute by using a Sentry Context.
2023-07-15 21:46:56 +02:00
e7df777db4 Always log Runner and Environment ID.
Systematically log the runner id and the environment id by adding the information at the findRunnerMiddleware.
2023-07-15 21:46:56 +02:00
0bfef5e105 Degrade InfluxDB Retry Write log. 2023-07-14 18:54:57 +02:00
9d13613f37 Bump maven-shade-plugin from 3.4.1 to 3.5.0 in /deploy/aws/java11Exec
Bumps [maven-shade-plugin](https://github.com/apache/maven-shade-plugin) from 3.4.1 to 3.5.0.
- [Release notes](https://github.com/apache/maven-shade-plugin/releases)
- [Commits](https://github.com/apache/maven-shade-plugin/compare/maven-shade-plugin-3.4.1...maven-shade-plugin-3.5.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-shade-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-07-02 18:53:45 +02:00
94dfb1fa62 Bump aws-java-sdk-apigatewaymanagementapi in /deploy/aws/java11Exec
Bumps [aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java) from 1.12.479 to 1.12.500.
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.479...1.12.500)

---
updated-dependencies:
- dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-07-01 03:19:34 +00:00
21d7388c31 Remove Consul Dependency
It seems like consul is not required any longer
2023-06-23 22:38:24 +02:00
01dae3150e Update Dependencies 2023-06-23 22:33:15 +02:00
322d06540f Remove CodeClimate 2023-06-23 22:28:31 +02:00
3a5ab3aaea Bump github.com/getsentry/sentry-go from 0.21.0 to 0.22.0
Bumps [github.com/getsentry/sentry-go](https://github.com/getsentry/sentry-go) from 0.21.0 to 0.22.0.
- [Release notes](https://github.com/getsentry/sentry-go/releases)
- [Changelog](https://github.com/getsentry/sentry-go/blob/master/CHANGELOG.md)
- [Commits](https://github.com/getsentry/sentry-go/compare/v0.21.0...v0.22.0)

---
updated-dependencies:
- dependency-name: github.com/getsentry/sentry-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-19 09:15:25 +02:00
527aaf713f Fix decreased prewarming pool due to inactivity timer.
When allocations fail and restart they are added again to the idle runners. The bug fixed with this commit is that the inactivity timer was not stopped at the restart. This led to the idle runner being removed when the timer expired.
2023-06-16 17:27:45 +01:00
f031219cb8 Fix Nomad event race condition
that was triggered by simultaneous deletion of the runner due to inactivity, and the allocation being rescheduled due to a lost node.
It led to the allocation first being rescheduled, and then being stopped. This caused an unexpected stopping of a pending runner on a lower level.
To fix it we added communication from the upper level that the stop of the job was expected.
2023-06-13 14:20:20 +02:00
b620d0fad7 Introduce Allocation State Tracking
in order to break down the current state and evaluate if it is invalid.
2023-06-13 14:20:20 +02:00
bcab46d746 Allow unlimited Nomad reschedules
With this measure, we want to avoid template jobs being removed on the second rescheduling.
2023-06-13 14:20:20 +02:00
1b3f505075 Bump github.com/sirupsen/logrus from 1.9.2 to 1.9.3
Bumps [github.com/sirupsen/logrus](https://github.com/sirupsen/logrus) from 1.9.2 to 1.9.3.
- [Release notes](https://github.com/sirupsen/logrus/releases)
- [Changelog](https://github.com/sirupsen/logrus/blob/master/CHANGELOG.md)
- [Commits](https://github.com/sirupsen/logrus/compare/v1.9.2...v1.9.3)

---
updated-dependencies:
- dependency-name: github.com/sirupsen/logrus
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-05 04:08:22 +00:00
5b64725faa Fix golangci-lint errors
that appeared due to the new version v1.53.1.
2023-06-04 11:54:42 +01:00