0d6b4f660c
Refactor NewAbstractManager
...
to require a context used for the monitoring.
2023-09-11 13:44:29 +02:00
b28b87d56f
Refactor periodicallySendMonitoringData
...
in order to return directly when the context is done and not just at the next iteration.
2023-09-11 13:44:29 +02:00
a01bd0fa7e
Provide Memory Leak Test Suite
...
by adding an assertion about the number of Goroutines to the unit tests.
2023-09-11 13:44:29 +02:00
b708dddd23
Add Nomad Manager test case
...
that ensures that `onAllocationStopped` returns true when the runner was deleted before by the inactivity timer.
This feature is required for handling a race condition with the event handling of a rescheduled allocation.
2023-09-05 15:15:39 +02:00
354c16cc37
Fix missing rescheduled idle runners.
...
In today's unattended upgrade, we have seen how the prewarming pool size dropped to (near) zero. This was based on lost Nomad allocations. The allocations got rescheduled, but not added again to Poseidon.
The reason for this is a miscommunication between the Event Handling and the Nomad Manager. `removedByPoseidon` was true even if the runner was not removed by the manager, but an idle runner.
2023-09-05 15:15:39 +02:00
67297ec5a2
Add regression test for rescheduled idle runner.
2023-09-05 15:15:39 +02:00
8820938624
Increase severity of two log statements.
2023-09-05 15:15:39 +02:00
390d02055b
Bump com.amazonaws:aws-lambda-java-core in /deploy/aws/java11Exec
...
Bumps [com.amazonaws:aws-lambda-java-core](https://github.com/aws/aws-lambda-java-libs ) from 1.2.2 to 1.2.3.
- [Commits](https://github.com/aws/aws-lambda-java-libs/commits )
---
updated-dependencies:
- dependency-name: com.amazonaws:aws-lambda-java-core
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
2023-09-01 03:22:19 +00:00
847e11387a
Bump com.amazonaws:aws-java-sdk-apigatewaymanagementapi
...
Bumps [com.amazonaws:aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java ) from 1.12.519 to 1.12.542.
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md )
- [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.519...1.12.542 )
---
updated-dependencies:
- dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
2023-09-01 03:13:52 +00:00
188d012bc4
Fix Memory Leak caused by the merge_context.
...
The now removed statement of sending an empty struct into the channel blocked the goroutine until the channel of Done got listened for. This led to a goroutine leak as one does not necessarily has to call the Done function of a context.
We fix this issue by removing this value. It was unnecessary anyway as a closed channel always returns the null-value of the returned type.
2023-08-26 22:51:22 +02:00
b06ff4088f
Bump github.com/google/uuid from 1.3.0 to 1.3.1
...
Bumps [github.com/google/uuid](https://github.com/google/uuid ) from 1.3.0 to 1.3.1.
- [Release notes](https://github.com/google/uuid/releases )
- [Changelog](https://github.com/google/uuid/blob/master/CHANGELOG.md )
- [Commits](https://github.com/google/uuid/compare/v1.3.0...v1.3.1 )
---
updated-dependencies:
- dependency-name: github.com/google/uuid
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
2023-08-22 06:35:59 +00:00
c0a3fb12c3
Fix UpdateFileSystem Context
...
to be done when either the runner is destroyed (case ignored before) or the request is interrupted.
2023-08-21 22:49:09 +02:00
09604997a7
Implement MergeContext
...
that has multiple contexts as parent and chooses the earliest deadline.
2023-08-21 22:49:09 +02:00
306512bf9c
Fix Context Values are not logged.
...
Only the Sentry hook uses the values of the passed context. Therefore, we removed the values from our log statements when we shifted them from an extra `WithField` call to the context.
We fix this behavior by introducing a Logrus Hook that copies a fixed set of context values to the logging data.
2023-08-21 22:40:37 +02:00
a7d27e8f65
Add missing error log statements.
...
When "markRunnerAsUsed" fails, we silently ignored it. Only, when additionally the return of the runner failed, we threw the error.
When a Runner is destroyed, we are only notified that Nomad removed the allocation, but cannot tell about the reason.
For "the execution did not stop after SIGQUIT" we did not log the belonging runner id.
2023-08-21 22:40:37 +02:00
13cd19ed58
Refactor Nomad Event Stream log message.
2023-08-18 09:28:23 +02:00
13a9da95e5
Introduce a context for RetryExponential
...
as second criteria (next to the maximum number of attempts) for canceling the retrying. This is required as we started with the previous commit to retry the nomad environment recovery. This always fails for unit tests (as they are not connected to an Nomad cluster). Before, we ignored the one error but the retrying leads to unit test timeouts.
Additionally, we now stop retrying to create a runner when the environment got deleted.
2023-08-18 09:28:23 +02:00
73759f8a3c
Retry Environment Recovery
2023-08-18 09:28:23 +02:00
89c18ad45c
Refactor to WithoutCancel context.
...
With Go 1.21 the WithoutCancel context was introduced. This way we can keep the values passed in a new context without having the new context being canceled together with its parent. This behavior suits well for two occurrences where we explicitly had to copy one required value instead of implicitly keeping all values.
2023-08-16 15:13:05 +02:00
2f43bced08
Update Go to 1.21
2023-08-16 15:13:05 +02:00
90092c48c1
Fix incomplete debug message
...
that is created by sending SIGQUIT to the bash process
by not processing output after the the client disconnected / we have sent the SIGQUIT.
2023-08-14 11:37:51 +02:00
0fd6e42487
Add regression e2e test for incomplete debug message.
...
See #325 .
2023-08-14 11:37:51 +02:00
4d661138e9
Revert "Insert debug message into execution tracing"
...
This reverts commit 72d926ef6c5e9f8ddd0da39dbd1492dad3621c15.
2023-08-14 11:37:51 +02:00
ed1b83d13c
Bump github.com/getsentry/sentry-go from 0.22.0 to 0.23.0
...
Bumps [github.com/getsentry/sentry-go](https://github.com/getsentry/sentry-go ) from 0.22.0 to 0.23.0.
- [Release notes](https://github.com/getsentry/sentry-go/releases )
- [Changelog](https://github.com/getsentry/sentry-go/blob/master/CHANGELOG.md )
- [Commits](https://github.com/getsentry/sentry-go/compare/v0.22.0...v0.23.0 )
---
updated-dependencies:
- dependency-name: github.com/getsentry/sentry-go
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
2023-08-02 15:12:39 +02:00
0078b4cfd8
Bump com.amazonaws:aws-java-sdk-apigatewaymanagementapi
...
Bumps [com.amazonaws:aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java ) from 1.12.508 to 1.12.519.
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md )
- [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.508...1.12.519 )
---
updated-dependencies:
- dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
2023-08-01 04:04:22 +00:00
23a7f06bee
Add more implicit dependencies to go.sum
...
This will allow a clean build that is not marked as modified.
2023-07-25 22:54:17 +02:00
4cc8ab422c
Update Nomad version for GitHub actions
2023-07-25 22:08:08 +02:00
6bfe3d7517
Update Dependencies
2023-07-25 22:08:00 +02:00
731b60acd6
Remove Sentry Exceptions
...
as workaround for having a usable title for the issue groups (not the error type).
2023-07-25 21:07:02 +01:00
75f2f9b290
Add Sentry Stack Traces
...
and exceptions for logs containing errors.
2023-07-25 21:07:02 +01:00
eb818f92f7
Refactor Runner Destroy Reason Masking
...
and ignore expected reasons such when the runner got destroyed by an API request.
2023-07-24 11:48:14 +01:00
102b3f0701
Bump github.com/hashicorp/nomad from 1.6.0 to 1.6.1
...
Bumps [github.com/hashicorp/nomad](https://github.com/hashicorp/nomad ) from 1.6.0 to 1.6.1.
- [Release notes](https://github.com/hashicorp/nomad/releases )
- [Changelog](https://github.com/hashicorp/nomad/blob/main/CHANGELOG.md )
- [Commits](https://github.com/hashicorp/nomad/compare/v1.6.0...v1.6.1 )
---
updated-dependencies:
- dependency-name: github.com/hashicorp/nomad
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
2023-07-24 04:01:11 +00:00
8ef5f4e7c5
Fix OOM Kill race condition
...
due to the Nomad request exiting before the allocation is stopped. We catch this behavior by introducing a time period for the allocation being stopped iff the exit code is 128.
2023-07-21 15:30:21 +02:00
6a1677dea0
Introduce reason for destroying runner
...
in order to return a specific error for OOM Killed Executions.
2023-07-21 15:30:21 +02:00
b3fedf274c
Handle Runner Timeout
...
Before, Nomad executions often got stopped because the runner was deleted.
With the previous commit, we cover the exception to this behaviour by stopping the execution Poseidon-side.
These different approaches lead to different context error messages.
In this commit, we move the check of the passed timeout, to respond with the corresponding client message again.
2023-07-21 15:30:21 +02:00
bfb5977d24
Destroy runner on allocation stopped
...
Destroying the runner when Nomad informs us about its allocation being stopped, fixes the error of executions running into their timeout even if the allocation was stopped long ago.
2023-07-21 15:30:21 +02:00
40a5f2eca6
Insert debug message into execution tracing
...
to verify that the date command is sometimes returning an empty string with exit code 5.
2023-07-21 15:05:53 +02:00
1663008eb6
Update Nomad and CNI version for GitHub actions
2023-07-19 11:59:57 +00:00
5fe6ad29af
Bump github.com/hashicorp/nomad from 1.5.6 to 1.6.0
...
Bumps [github.com/hashicorp/nomad](https://github.com/hashicorp/nomad ) from 1.5.6 to 1.6.0.
- [Release notes](https://github.com/hashicorp/nomad/releases )
- [Changelog](https://github.com/hashicorp/nomad/blob/main/CHANGELOG.md )
- [Commits](https://github.com/hashicorp/nomad/compare/v1.5.6...v1.6.0 )
---
updated-dependencies:
- dependency-name: github.com/hashicorp/nomad
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
2023-07-19 11:59:57 +00:00
d75073e9de
Bump aws-java-sdk-apigatewaymanagementapi in /deploy/aws/java11Exec
...
Bumps [aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java ) from 1.12.500 to 1.12.508.
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md )
- [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.500...1.12.508 )
---
updated-dependencies:
- dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
2023-07-19 11:11:29 +00:00
d64d8995bd
Refactor monitoring of runner and environment id.
2023-07-15 21:46:56 +02:00
ee26cf13e5
Sentry: Make runner and environment searchable
...
by converting it into a Sentry Tag.
Also, replace the unstructured Extra attribute by using a Sentry Context.
2023-07-15 21:46:56 +02:00
e7df777db4
Always log Runner and Environment ID.
...
Systematically log the runner id and the environment id by adding the information at the findRunnerMiddleware.
2023-07-15 21:46:56 +02:00
0bfef5e105
Degrade InfluxDB Retry Write log.
2023-07-14 18:54:57 +02:00
9d13613f37
Bump maven-shade-plugin from 3.4.1 to 3.5.0 in /deploy/aws/java11Exec
...
Bumps [maven-shade-plugin](https://github.com/apache/maven-shade-plugin ) from 3.4.1 to 3.5.0.
- [Release notes](https://github.com/apache/maven-shade-plugin/releases )
- [Commits](https://github.com/apache/maven-shade-plugin/compare/maven-shade-plugin-3.4.1...maven-shade-plugin-3.5.0 )
---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-shade-plugin
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
2023-07-02 18:53:45 +02:00
94dfb1fa62
Bump aws-java-sdk-apigatewaymanagementapi in /deploy/aws/java11Exec
...
Bumps [aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java ) from 1.12.479 to 1.12.500.
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md )
- [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.479...1.12.500 )
---
updated-dependencies:
- dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
2023-07-01 03:19:34 +00:00
21d7388c31
Remove Consul Dependency
...
It seems like consul is not required any longer
2023-06-23 22:38:24 +02:00
01dae3150e
Update Dependencies
2023-06-23 22:33:15 +02:00
322d06540f
Remove CodeClimate
2023-06-23 22:28:31 +02:00
3a5ab3aaea
Bump github.com/getsentry/sentry-go from 0.21.0 to 0.22.0
...
Bumps [github.com/getsentry/sentry-go](https://github.com/getsentry/sentry-go ) from 0.21.0 to 0.22.0.
- [Release notes](https://github.com/getsentry/sentry-go/releases )
- [Changelog](https://github.com/getsentry/sentry-go/blob/master/CHANGELOG.md )
- [Commits](https://github.com/getsentry/sentry-go/compare/v0.21.0...v0.22.0 )
---
updated-dependencies:
- dependency-name: github.com/getsentry/sentry-go
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
2023-06-19 09:15:25 +02:00