- Python 100%
|
All checks were successful
test / unit (push) Successful in 30s
Wires `iostorm tail` to a real polling loop in the new live module. Maintains an in-memory rolling window of the chosen metric, computes threshold = max(hard_floor, p95+3σ) on the window each tick, fires a one-line spike event for fresh rows above threshold, attempts L1/L4 attribution via attribute_spike on a small ts-window slice, and emits a heartbeat per tick. Adds --run-id / --all / --metric / --interval-s / --window-s / --sigma-mult / --heartbeat-every options. Pluggable MetricsSource protocol lets us mock postgres in tests; PostgresMetricsSource issues bounded queries against the metrics + samples + io_events + syscalls tables. Ctrl+C exits 0 cleanly. 33 tests pass, ruff clean. |
||
|---|---|---|
| .forgejo/workflows | ||
| io_storm_catcher | ||
| tests | ||
| .gitignore | ||
| LICENSE | ||
| pyproject.toml | ||
| README.md | ||
io-storm-catcher-ekaii
Attribution layer over plugbench-ekaii telemetry.
plugbench-ekaii runs Minecraft server benchmarks and captures multi-layer telemetry
(L0 spark health metrics, L1 async-profiler stacks, L2 JFR, L3 /proc/<pid>/io, L4
bpftrace syscalls + biosnoop). It writes a per-repeat events.parquet and (optionally)
mirrors everything into a postgres sidecar.
io-storm-catcher-ekaii reads that telemetry, detects MSPT spikes, and walks back
through the cross-layer event tree to attribute each spike to:
- the on-CPU thread(s) at the spike timestamp (L1 + L3),
- the syscalls those threads issued during the spike window (L4),
- the file paths or block devices touched (L4 biosnoop, L3 proc),
- a probable plugin owner, derived from L1 stack frames using a longest-prefix package
match (or the explicit
plugincolumn when plugbench was given aplugin_packagesregistry).
It produces both a markdown report (for humans, PR comments, the wiki) and a JSON dump (for downstream tools).
Install
python3 -m venv .venv && . .venv/bin/activate
pip install -e ".[dev]" # bare: parquet only
pip install -e ".[dev,postgres]" # adds psycopg for --dsn mode
Usage
# Detect on a single run, parquet auto-discovered under /fast/plugbench/runs/<run_id>/
iostorm detect <run_id>
# Explicit parquet path
iostorm detect <run_id> --parquet ./runs/abc/0/events.parquet --out-md spikes.md
# Postgres mode
IOSTORM_PG_DSN=postgresql://plugbench:plugbench@127.0.0.1:55432/plugbench \
iostorm detect <run_id>
# Run-vs-run diff (signature = top_plugin + first_syscall)
iostorm diff <run_a> <run_b> --out-md diff.md --out-json diff.json
# Live mode: poll postgres every N seconds, print spikes + heartbeats as they arrive
iostorm tail --run-id <run_id> --interval-s 5
iostorm tail --all --interval-s 2 --metric mspt_p50
Threshold
Spikes are anything at or above max(50ms, p95 + 3*sigma) on the highest-resolution MSPT
metric available (tick_ms if present, else mspt_p99 / mspt_p95 / mspt_p50 /
mspt). The 50ms floor is non-negotiable: under it, no MC tick is "late".
Tune with --threshold-ms (raises the floor) and --sigma-mult (widens the statistical
band).
How it sits next to plugbench
plugbench-ekaii writes; io-storm-catcher-ekaii reads. We never modify plugbench. The
expected schema (column names + types) is documented in plugbench-ekaii/orchestrator/correlate.py
and plugbench-ekaii/schemas/postgres.sql. If a column is missing — e.g. frames because
the run only enabled L0 — io-storm-catcher fills it with nulls and degrades gracefully
(spike still reported, attribution columns empty, with a note explaining why).
Tests
pytest -q
Tests use synthetic parquet files generated in tests/conftest.py; no real bench output
required.
License
MIT.