Commit Graph

69 Commits

Author SHA1 Message Date
0fd6e42487 Add regression e2e test for incomplete debug message.
See #325.
2023-08-14 11:37:51 +02:00
731b60acd6 Remove Sentry Exceptions
as workaround for having a usable title for the issue groups (not the error type).
2023-07-25 21:07:02 +01:00
75f2f9b290 Add Sentry Stack Traces
and exceptions for logs containing errors.
2023-07-25 21:07:02 +01:00
ee26cf13e5 Sentry: Make runner and environment searchable
by converting it into a Sentry Tag.

Also, replace the unstructured Extra attribute by using a Sentry Context.
2023-07-15 21:46:56 +02:00
e7df777db4 Always log Runner and Environment ID.
Systematically log the runner id and the environment id by adding the information at the findRunnerMiddleware.
2023-07-15 21:46:56 +02:00
0bfef5e105 Degrade InfluxDB Retry Write log. 2023-07-14 18:54:57 +02:00
5b64725faa Fix golangci-lint errors
that appeared due to the new version v1.53.1.
2023-06-04 11:54:42 +01:00
f377b1376c Add Client Status to Nomad Allocation monitoring
Also add the Nomad Node name as additional debug information.
2023-05-10 19:09:31 +01:00
42efebc194 Monitor the Nomad events
and send all Nomad events to Influxdb.
2023-05-09 00:13:58 +01:00
d8d9abbddd Add Job ID to Nomad Allocation monitoring. 2023-04-23 12:54:57 +01:00
0c8fa9ccfa Add context to log statements. 2023-04-11 20:45:30 +01:00
43221c717e Add context to Sentry Hook.
With this context, tracing information stored in the context can be associated with sentry events/issues.
2023-04-11 20:45:30 +01:00
038d71ff51 Nomad: Handle Container re-allocation 2023-03-31 14:42:55 +02:00
e0db1bafe8 Fix multiple user Runner use
A before unknown Nomad reload adds already known runner again to the idle runner - even if they are already in use.
2023-03-31 14:42:55 +02:00
e877cd1e52 Rename Sentry Span Descriptions. 2023-03-14 23:42:19 +01:00
7dadc5dfe9 Refactor Nomad Command Generation.
- Abstracting from the exec form while generating.
- Removal of single quotes (usage of only double-quotes).
- Bash-nesting using escaping of special characters.
2023-03-14 23:42:19 +01:00
a4599f2cf9 Fix panic on influx shutdown.
Influx was shutdown before Poseidon was terminated. In that mean time the Profiling data has been written. Also in that mean time, a periodical influx event triggers a panic since influx is already shutdown.

We implemented two changes, each fixing this scenario.
2023-03-13 15:21:24 +01:00
aa9d4d30e2 Actual retry sending InfluxDB data
Previously, we always logged the error on first failure and (nevertheless) tried to send the data within 3 minutes (default configuration).

Fixes POSEIDON-1H
Closes #262
2023-02-28 23:47:35 +01:00
2650efbb38 Sentry Tracing Identifier 2023-02-03 10:29:18 +00:00
a9581ac1d9 Performance for ListFileSystem 2023-02-03 10:29:18 +00:00
8950ab3776 Add single quotes for inner command.
Change to bash as interpreter.
Forbid single quotes for user commands.
2022-11-04 15:15:43 +01:00
5e5e13806e Monitor file download. 2022-10-26 01:33:26 +02:00
160df3d9e6 Add retry-mechanism for sample, mark-as-used and return
of Nomad runners.
2022-10-24 22:12:09 +01:00
b9c923da8a Remove unused and deprecated Storer interface. 2022-10-24 22:12:09 +01:00
7119f3e012 Fix not canceling monitoring events for removed environments
and runners.
2022-10-24 13:15:14 +02:00
3509109b6f Fix Ls2JsonWriter
by allowing more spaces in the ls response.
by sending the error response of the list file system route only when no content has been written.
2022-10-05 12:11:47 +01:00
195f88177e Add Content-Length and Content-Disposition Header
for GetFileContent route.
2022-10-05 12:11:47 +01:00
847e5cda65 Extend ls2json reader
by also parsing the link target, permissions, group and owner.
2022-10-05 12:11:47 +01:00
fc77f11d4d Enquote file path for shell execution.
Also, fix json of 500 response.
2022-10-05 12:11:47 +01:00
152b77afe5 Add listing of runners file system. 2022-10-05 12:11:47 +01:00
f2b25566dd #136 Copy files back from Nomad runner. 2022-10-05 12:11:47 +01:00
1a5a49d7c8 Explicitly switch user for code execution.
Co-authored-by: Maximilian Pass <maximilian.pass@student.hpi.uni-potsdam.de>
2022-09-24 23:09:23 +02:00
89fc7b2637 Fix Nomad event stream is ignoring errors
when an event stream could be established once.
2022-09-07 21:16:20 +02:00
e8457ca035 Remove monitoring debug statement. 2022-08-31 09:19:07 +02:00
5590c50e14 #110 Add periodical monitoring events. 2022-08-19 20:48:46 +02:00
9677253b35 Change Influx field name for the startup duration
due to a currently not resolvable type mismatch.
2022-08-10 20:46:17 +02:00
770327cf64 Add storage count debug statement. 2022-08-08 09:17:27 +02:00
6e52b8660d Avoid elements being removed multiple times
as this leads to multiple deletion events in the monitoring.
2022-08-01 11:36:18 +02:00
c6e65c14bb Monitor Nomad allocation startup duration. 2022-07-31 19:42:35 +02:00
49c7a2d405 Save the runner and environment id for executions monitoring. 2022-07-31 19:42:35 +02:00
d9b7989a6c Enable logging for failed monitoring. 2022-07-01 15:29:31 +02:00
3f0c781997 Monitor storage object count. 2022-07-01 15:29:31 +02:00
051fe29d59 Add unit test for monitored storage. 2022-07-01 15:29:31 +02:00
498e8f5ff5 #110 Refactor influxdb monitoring
to use it as singleton.
This enables the possibility to monitor processes that are independent of an incoming request.
2022-07-01 15:29:31 +02:00
275b6aa642 #89 Adjust golangci-lint configuration
as it does not support generics at this moment.

See https://github.com/golangci/golangci-lint/issues/2649
2022-06-29 16:21:19 +02:00
34040162c2 #89 Generalise the three Storage interfaces and structs into one generic storage manager. 2022-06-29 16:21:19 +02:00
a4d13fb8cb #148 Add stage to influx monitoring. 2022-06-21 15:31:29 +02:00
59ca63268b Add CODEOCEAN environment variable. 2022-06-10 18:10:28 +02:00
669ec039ce Update dependencies 2022-06-07 17:21:05 +02:00
1e59c1146e Fix CodeQL log injection warning
by removing newlines from logged user input.
2022-06-07 17:21:05 +02:00