Commit Graph

65 Commits

Author SHA1 Message Date
e7df777db4 Always log Runner and Environment ID.
Systematically log the runner id and the environment id by adding the information at the findRunnerMiddleware.
2023-07-15 21:46:56 +02:00
0bfef5e105 Degrade InfluxDB Retry Write log. 2023-07-14 18:54:57 +02:00
5b64725faa Fix golangci-lint errors
that appeared due to the new version v1.53.1.
2023-06-04 11:54:42 +01:00
f377b1376c Add Client Status to Nomad Allocation monitoring
Also add the Nomad Node name as additional debug information.
2023-05-10 19:09:31 +01:00
42efebc194 Monitor the Nomad events
and send all Nomad events to Influxdb.
2023-05-09 00:13:58 +01:00
d8d9abbddd Add Job ID to Nomad Allocation monitoring. 2023-04-23 12:54:57 +01:00
0c8fa9ccfa Add context to log statements. 2023-04-11 20:45:30 +01:00
43221c717e Add context to Sentry Hook.
With this context, tracing information stored in the context can be associated with sentry events/issues.
2023-04-11 20:45:30 +01:00
038d71ff51 Nomad: Handle Container re-allocation 2023-03-31 14:42:55 +02:00
e0db1bafe8 Fix multiple user Runner use
A before unknown Nomad reload adds already known runner again to the idle runner - even if they are already in use.
2023-03-31 14:42:55 +02:00
e877cd1e52 Rename Sentry Span Descriptions. 2023-03-14 23:42:19 +01:00
7dadc5dfe9 Refactor Nomad Command Generation.
- Abstracting from the exec form while generating.
- Removal of single quotes (usage of only double-quotes).
- Bash-nesting using escaping of special characters.
2023-03-14 23:42:19 +01:00
a4599f2cf9 Fix panic on influx shutdown.
Influx was shutdown before Poseidon was terminated. In that mean time the Profiling data has been written. Also in that mean time, a periodical influx event triggers a panic since influx is already shutdown.

We implemented two changes, each fixing this scenario.
2023-03-13 15:21:24 +01:00
aa9d4d30e2 Actual retry sending InfluxDB data
Previously, we always logged the error on first failure and (nevertheless) tried to send the data within 3 minutes (default configuration).

Fixes POSEIDON-1H
Closes #262
2023-02-28 23:47:35 +01:00
2650efbb38 Sentry Tracing Identifier 2023-02-03 10:29:18 +00:00
a9581ac1d9 Performance for ListFileSystem 2023-02-03 10:29:18 +00:00
8950ab3776 Add single quotes for inner command.
Change to bash as interpreter.
Forbid single quotes for user commands.
2022-11-04 15:15:43 +01:00
5e5e13806e Monitor file download. 2022-10-26 01:33:26 +02:00
160df3d9e6 Add retry-mechanism for sample, mark-as-used and return
of Nomad runners.
2022-10-24 22:12:09 +01:00
b9c923da8a Remove unused and deprecated Storer interface. 2022-10-24 22:12:09 +01:00
7119f3e012 Fix not canceling monitoring events for removed environments
and runners.
2022-10-24 13:15:14 +02:00
3509109b6f Fix Ls2JsonWriter
by allowing more spaces in the ls response.
by sending the error response of the list file system route only when no content has been written.
2022-10-05 12:11:47 +01:00
195f88177e Add Content-Length and Content-Disposition Header
for GetFileContent route.
2022-10-05 12:11:47 +01:00
847e5cda65 Extend ls2json reader
by also parsing the link target, permissions, group and owner.
2022-10-05 12:11:47 +01:00
fc77f11d4d Enquote file path for shell execution.
Also, fix json of 500 response.
2022-10-05 12:11:47 +01:00
152b77afe5 Add listing of runners file system. 2022-10-05 12:11:47 +01:00
f2b25566dd #136 Copy files back from Nomad runner. 2022-10-05 12:11:47 +01:00
1a5a49d7c8 Explicitly switch user for code execution.
Co-authored-by: Maximilian Pass <maximilian.pass@student.hpi.uni-potsdam.de>
2022-09-24 23:09:23 +02:00
89fc7b2637 Fix Nomad event stream is ignoring errors
when an event stream could be established once.
2022-09-07 21:16:20 +02:00
e8457ca035 Remove monitoring debug statement. 2022-08-31 09:19:07 +02:00
5590c50e14 #110 Add periodical monitoring events. 2022-08-19 20:48:46 +02:00
9677253b35 Change Influx field name for the startup duration
due to a currently not resolvable type mismatch.
2022-08-10 20:46:17 +02:00
770327cf64 Add storage count debug statement. 2022-08-08 09:17:27 +02:00
6e52b8660d Avoid elements being removed multiple times
as this leads to multiple deletion events in the monitoring.
2022-08-01 11:36:18 +02:00
c6e65c14bb Monitor Nomad allocation startup duration. 2022-07-31 19:42:35 +02:00
49c7a2d405 Save the runner and environment id for executions monitoring. 2022-07-31 19:42:35 +02:00
d9b7989a6c Enable logging for failed monitoring. 2022-07-01 15:29:31 +02:00
3f0c781997 Monitor storage object count. 2022-07-01 15:29:31 +02:00
051fe29d59 Add unit test for monitored storage. 2022-07-01 15:29:31 +02:00
498e8f5ff5 #110 Refactor influxdb monitoring
to use it as singleton.
This enables the possibility to monitor processes that are independent of an incoming request.
2022-07-01 15:29:31 +02:00
275b6aa642 #89 Adjust golangci-lint configuration
as it does not support generics at this moment.

See https://github.com/golangci/golangci-lint/issues/2649
2022-06-29 16:21:19 +02:00
34040162c2 #89 Generalise the three Storage interfaces and structs into one generic storage manager. 2022-06-29 16:21:19 +02:00
a4d13fb8cb #148 Add stage to influx monitoring. 2022-06-21 15:31:29 +02:00
59ca63268b Add CODEOCEAN environment variable. 2022-06-10 18:10:28 +02:00
669ec039ce Update dependencies 2022-06-07 17:21:05 +02:00
1e59c1146e Fix CodeQL log injection warning
by removing newlines from logged user input.
2022-06-07 17:21:05 +02:00
25f92e5f94 Add environment specific data to the influxdb data. 2022-04-18 13:17:49 +02:00
eabe3a1b27 Add the Environment ID to the influxdb data.
Also move the interface of an execution environment into its own file, execution_environment.go.
2022-04-18 13:17:49 +02:00
8feffdae3a Add initial structure of influxdb monitoring. 2022-04-18 13:17:49 +02:00
9f0b04660f Fix goroutine leak in the nullio reader 2021-12-14 13:24:53 +01:00