poseidon

Author	SHA1	Message	Date
Maximilian Paß	b708dddd23	Add Nomad Manager test case that ensures that `onAllocationStopped` returns true when the runner was deleted before by the inactivity timer. This feature is required for handling a race condition with the event handling of a rescheduled allocation.	2023-09-05 15:15:39 +02:00
Maximilian Paß	354c16cc37	Fix missing rescheduled idle runners. In today's unattended upgrade, we have seen how the prewarming pool size dropped to (near) zero. This was based on lost Nomad allocations. The allocations got rescheduled, but not added again to Poseidon. The reason for this is a miscommunication between the Event Handling and the Nomad Manager. `removedByPoseidon` was true even if the runner was not removed by the manager, but an idle runner.	2023-09-05 15:15:39 +02:00
Maximilian Paß	67297ec5a2	Add regression test for rescheduled idle runner.	2023-09-05 15:15:39 +02:00
Maximilian Paß	8820938624	Increase severity of two log statements.	2023-09-05 15:15:39 +02:00
dependabot[bot]	390d02055b	Bump com.amazonaws:aws-lambda-java-core in /deploy/aws/java11Exec Bumps [com.amazonaws:aws-lambda-java-core](https://github.com/aws/aws-lambda-java-libs) from 1.2.2 to 1.2.3. - [Commits](https://github.com/aws/aws-lambda-java-libs/commits) --- updated-dependencies: - dependency-name: com.amazonaws:aws-lambda-java-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-09-01 03:22:19 +00:00
dependabot[bot]	847e11387a	Bump com.amazonaws:aws-java-sdk-apigatewaymanagementapi Bumps [com.amazonaws:aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java) from 1.12.519 to 1.12.542. - [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.519...1.12.542) --- updated-dependencies: - dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-09-01 03:13:52 +00:00
Maximilian Paß	188d012bc4	Fix Memory Leak caused by the merge_context. The now removed statement of sending an empty struct into the channel blocked the goroutine until the channel of Done got listened for. This led to a goroutine leak as one does not necessarily has to call the Done function of a context. We fix this issue by removing this value. It was unnecessary anyway as a closed channel always returns the null-value of the returned type.	2023-08-26 22:51:22 +02:00
dependabot[bot]	b06ff4088f	Bump github.com/google/uuid from 1.3.0 to 1.3.1 Bumps [github.com/google/uuid](https://github.com/google/uuid) from 1.3.0 to 1.3.1. - [Release notes](https://github.com/google/uuid/releases) - [Changelog](https://github.com/google/uuid/blob/master/CHANGELOG.md) - [Commits](https://github.com/google/uuid/compare/v1.3.0...v1.3.1) --- updated-dependencies: - dependency-name: github.com/google/uuid dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-08-22 06:35:59 +00:00
Maximilian Paß	c0a3fb12c3	Fix UpdateFileSystem Context to be done when either the runner is destroyed (case ignored before) or the request is interrupted.	2023-08-21 22:49:09 +02:00
Maximilian Paß	09604997a7	Implement MergeContext that has multiple contexts as parent and chooses the earliest deadline.	2023-08-21 22:49:09 +02:00
Maximilian Paß	306512bf9c	Fix Context Values are not logged. Only the Sentry hook uses the values of the passed context. Therefore, we removed the values from our log statements when we shifted them from an extra `WithField` call to the context. We fix this behavior by introducing a Logrus Hook that copies a fixed set of context values to the logging data.	2023-08-21 22:40:37 +02:00
Maximilian Paß	a7d27e8f65	Add missing error log statements. When "markRunnerAsUsed" fails, we silently ignored it. Only, when additionally the return of the runner failed, we threw the error. When a Runner is destroyed, we are only notified that Nomad removed the allocation, but cannot tell about the reason. For "the execution did not stop after SIGQUIT" we did not log the belonging runner id.	2023-08-21 22:40:37 +02:00
Maximilian Paß	13cd19ed58	Refactor Nomad Event Stream log message.	2023-08-18 09:28:23 +02:00
Maximilian Paß	13a9da95e5	Introduce a context for RetryExponential as second criteria (next to the maximum number of attempts) for canceling the retrying. This is required as we started with the previous commit to retry the nomad environment recovery. This always fails for unit tests (as they are not connected to an Nomad cluster). Before, we ignored the one error but the retrying leads to unit test timeouts. Additionally, we now stop retrying to create a runner when the environment got deleted.	2023-08-18 09:28:23 +02:00
Maximilian Paß	73759f8a3c	Retry Environment Recovery	2023-08-18 09:28:23 +02:00
Maximilian Paß	89c18ad45c	Refactor to WithoutCancel context. With Go 1.21 the WithoutCancel context was introduced. This way we can keep the values passed in a new context without having the new context being canceled together with its parent. This behavior suits well for two occurrences where we explicitly had to copy one required value instead of implicitly keeping all values.	2023-08-16 15:13:05 +02:00
Sebastian Serth	2f43bced08	Update Go to 1.21	2023-08-16 15:13:05 +02:00
Maximilian Paß	90092c48c1	Fix incomplete debug message that is created by sending SIGQUIT to the bash process by not processing output after the the client disconnected / we have sent the SIGQUIT.	2023-08-14 11:37:51 +02:00
Maximilian Paß	0fd6e42487	Add regression e2e test for incomplete debug message. See #325.	2023-08-14 11:37:51 +02:00
Maximilian Paß	4d661138e9	Revert "Insert debug message into execution tracing" This reverts commit 72d926ef6c5e9f8ddd0da39dbd1492dad3621c15.	2023-08-14 11:37:51 +02:00
dependabot[bot]	ed1b83d13c	Bump github.com/getsentry/sentry-go from 0.22.0 to 0.23.0 Bumps [github.com/getsentry/sentry-go](https://github.com/getsentry/sentry-go) from 0.22.0 to 0.23.0. - [Release notes](https://github.com/getsentry/sentry-go/releases) - [Changelog](https://github.com/getsentry/sentry-go/blob/master/CHANGELOG.md) - [Commits](https://github.com/getsentry/sentry-go/compare/v0.22.0...v0.23.0) --- updated-dependencies: - dependency-name: github.com/getsentry/sentry-go dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2023-08-02 15:12:39 +02:00
dependabot[bot]	0078b4cfd8	Bump com.amazonaws:aws-java-sdk-apigatewaymanagementapi Bumps [com.amazonaws:aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java) from 1.12.508 to 1.12.519. - [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.508...1.12.519) --- updated-dependencies: - dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-08-01 04:04:22 +00:00
Sebastian Serth	23a7f06bee	Add more implicit dependencies to go.sum This will allow a clean build that is not marked as modified.	2023-07-25 22:54:17 +02:00
Sebastian Serth	4cc8ab422c	Update Nomad version for GitHub actions	2023-07-25 22:08:08 +02:00
Sebastian Serth	6bfe3d7517	Update Dependencies	2023-07-25 22:08:00 +02:00
Maximilian Paß	731b60acd6	Remove Sentry Exceptions as workaround for having a usable title for the issue groups (not the error type).	2023-07-25 21:07:02 +01:00
Maximilian Paß	75f2f9b290	Add Sentry Stack Traces and exceptions for logs containing errors.	2023-07-25 21:07:02 +01:00
Maximilian Paß	eb818f92f7	Refactor Runner Destroy Reason Masking and ignore expected reasons such when the runner got destroyed by an API request.	2023-07-24 11:48:14 +01:00
dependabot[bot]	102b3f0701	Bump github.com/hashicorp/nomad from 1.6.0 to 1.6.1 Bumps [github.com/hashicorp/nomad](https://github.com/hashicorp/nomad) from 1.6.0 to 1.6.1. - [Release notes](https://github.com/hashicorp/nomad/releases) - [Changelog](https://github.com/hashicorp/nomad/blob/main/CHANGELOG.md) - [Commits](https://github.com/hashicorp/nomad/compare/v1.6.0...v1.6.1) --- updated-dependencies: - dependency-name: github.com/hashicorp/nomad dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-07-24 04:01:11 +00:00
Maximilian Paß	8ef5f4e7c5	Fix OOM Kill race condition due to the Nomad request exiting before the allocation is stopped. We catch this behavior by introducing a time period for the allocation being stopped iff the exit code is 128.	2023-07-21 15:30:21 +02:00
Maximilian Paß	6a1677dea0	Introduce reason for destroying runner in order to return a specific error for OOM Killed Executions.	2023-07-21 15:30:21 +02:00
Maximilian Paß	b3fedf274c	Handle Runner Timeout Before, Nomad executions often got stopped because the runner was deleted. With the previous commit, we cover the exception to this behaviour by stopping the execution Poseidon-side. These different approaches lead to different context error messages. In this commit, we move the check of the passed timeout, to respond with the corresponding client message again.	2023-07-21 15:30:21 +02:00
Maximilian Paß	bfb5977d24	Destroy runner on allocation stopped Destroying the runner when Nomad informs us about its allocation being stopped, fixes the error of executions running into their timeout even if the allocation was stopped long ago.	2023-07-21 15:30:21 +02:00
Maximilian Paß	40a5f2eca6	Insert debug message into execution tracing to verify that the date command is sometimes returning an empty string with exit code 5.	2023-07-21 15:05:53 +02:00
Sebastian Serth	1663008eb6	Update Nomad and CNI version for GitHub actions	2023-07-19 11:59:57 +00:00
dependabot[bot]	5fe6ad29af	Bump github.com/hashicorp/nomad from 1.5.6 to 1.6.0 Bumps [github.com/hashicorp/nomad](https://github.com/hashicorp/nomad) from 1.5.6 to 1.6.0. - [Release notes](https://github.com/hashicorp/nomad/releases) - [Changelog](https://github.com/hashicorp/nomad/blob/main/CHANGELOG.md) - [Commits](https://github.com/hashicorp/nomad/compare/v1.5.6...v1.6.0) --- updated-dependencies: - dependency-name: github.com/hashicorp/nomad dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2023-07-19 11:59:57 +00:00
dependabot[bot]	d75073e9de	Bump aws-java-sdk-apigatewaymanagementapi in /deploy/aws/java11Exec Bumps [aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java) from 1.12.500 to 1.12.508. - [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.500...1.12.508) --- updated-dependencies: - dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-07-19 11:11:29 +00:00
Maximilian Paß	d64d8995bd	Refactor monitoring of runner and environment id.	2023-07-15 21:46:56 +02:00
Maximilian Paß	ee26cf13e5	Sentry: Make runner and environment searchable by converting it into a Sentry Tag. Also, replace the unstructured Extra attribute by using a Sentry Context.	2023-07-15 21:46:56 +02:00
Maximilian Paß	e7df777db4	Always log Runner and Environment ID. Systematically log the runner id and the environment id by adding the information at the findRunnerMiddleware.	2023-07-15 21:46:56 +02:00
Maximilian Paß	0bfef5e105	Degrade InfluxDB Retry Write log.	2023-07-14 18:54:57 +02:00
dependabot[bot]	9d13613f37	Bump maven-shade-plugin from 3.4.1 to 3.5.0 in /deploy/aws/java11Exec Bumps [maven-shade-plugin](https://github.com/apache/maven-shade-plugin) from 3.4.1 to 3.5.0. - [Release notes](https://github.com/apache/maven-shade-plugin/releases) - [Commits](https://github.com/apache/maven-shade-plugin/compare/maven-shade-plugin-3.4.1...maven-shade-plugin-3.5.0) --- updated-dependencies: - dependency-name: org.apache.maven.plugins:maven-shade-plugin dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2023-07-02 18:53:45 +02:00
dependabot[bot]	94dfb1fa62	Bump aws-java-sdk-apigatewaymanagementapi in /deploy/aws/java11Exec Bumps [aws-java-sdk-apigatewaymanagementapi](https://github.com/aws/aws-sdk-java) from 1.12.479 to 1.12.500. - [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-java/compare/1.12.479...1.12.500) --- updated-dependencies: - dependency-name: com.amazonaws:aws-java-sdk-apigatewaymanagementapi dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-07-01 03:19:34 +00:00
Sebastian Serth	21d7388c31	Remove Consul Dependency It seems like consul is not required any longer	2023-06-23 22:38:24 +02:00
Sebastian Serth	01dae3150e	Update Dependencies	2023-06-23 22:33:15 +02:00
Sebastian Serth	322d06540f	Remove CodeClimate	2023-06-23 22:28:31 +02:00
dependabot[bot]	3a5ab3aaea	Bump github.com/getsentry/sentry-go from 0.21.0 to 0.22.0 Bumps [github.com/getsentry/sentry-go](https://github.com/getsentry/sentry-go) from 0.21.0 to 0.22.0. - [Release notes](https://github.com/getsentry/sentry-go/releases) - [Changelog](https://github.com/getsentry/sentry-go/blob/master/CHANGELOG.md) - [Commits](https://github.com/getsentry/sentry-go/compare/v0.21.0...v0.22.0) --- updated-dependencies: - dependency-name: github.com/getsentry/sentry-go dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2023-06-19 09:15:25 +02:00
Maximilian Paß	527aaf713f	Fix decreased prewarming pool due to inactivity timer. When allocations fail and restart they are added again to the idle runners. The bug fixed with this commit is that the inactivity timer was not stopped at the restart. This led to the idle runner being removed when the timer expired.	2023-06-16 17:27:45 +01:00
Maximilian Paß	f031219cb8	Fix Nomad event race condition that was triggered by simultaneous deletion of the runner due to inactivity, and the allocation being rescheduled due to a lost node. It led to the allocation first being rescheduled, and then being stopped. This caused an unexpected stopping of a pending runner on a lower level. To fix it we added communication from the upper level that the stop of the job was expected.	2023-06-13 14:20:20 +02:00
Maximilian Paß	b620d0fad7	Introduce Allocation State Tracking in order to break down the current state and evaluate if it is invalid.	2023-06-13 14:20:20 +02:00

... 2 3 4 5 6 ...

851 Commits