for internal decisions as this error is strongly used by other packages. By checking such wrapped errors the internal decision can be influenced accidentally.
In this case the retry mechanism checked if the error is context.DeadlineExceeded and assumed it would be created by the internal context. This assumption was wrong.
from an approach that loaded the runners only once at the startup
to a method that will be repeated i.e. if the Nomad Event Stream connection interrupts.
into the utils including all other retry mechanisms.
With this change we fix that the WatchEventStream goroutine does not stop directly when the context is done (but previously only one second after).
The now removed statement of sending an empty struct into the channel blocked the goroutine until the channel of Done got listened for. This led to a goroutine leak as one does not necessarily has to call the Done function of a context.
We fix this issue by removing this value. It was unnecessary anyway as a closed channel always returns the null-value of the returned type.
Only the Sentry hook uses the values of the passed context. Therefore, we removed the values from our log statements when we shifted them from an extra `WithField` call to the context.
We fix this behavior by introducing a Logrus Hook that copies a fixed set of context values to the logging data.
as second criteria (next to the maximum number of attempts) for canceling the retrying. This is required as we started with the previous commit to retry the nomad environment recovery. This always fails for unit tests (as they are not connected to an Nomad cluster). Before, we ignored the one error but the retrying leads to unit test timeouts.
Additionally, we now stop retrying to create a runner when the environment got deleted.
- Abstracting from the exec form while generating.
- Removal of single quotes (usage of only double-quotes).
- Bash-nesting using escaping of special characters.
Influx was shutdown before Poseidon was terminated. In that mean time the Profiling data has been written. Also in that mean time, a periodical influx event triggers a panic since influx is already shutdown.
We implemented two changes, each fixing this scenario.
Previously, we always logged the error on first failure and (nevertheless) tried to send the data within 3 minutes (default configuration).
Fixes POSEIDON-1H
Closes#262