Attribution layer over plugbench-ekaii: detect MSPT spikes and attribute them to threads, syscalls, paths, plugins.
Find a file
exo 4240c50555
All checks were successful
test / unit (push) Successful in 30s
tail: implement incremental live spike detection from postgres
Wires `iostorm tail` to a real polling loop in the new live module.
Maintains an in-memory rolling window of the chosen metric, computes
threshold = max(hard_floor, p95+3σ) on the window each tick, fires a
one-line spike event for fresh rows above threshold, attempts L1/L4
attribution via attribute_spike on a small ts-window slice, and emits
a heartbeat per tick. Adds --run-id / --all / --metric / --interval-s
/ --window-s / --sigma-mult / --heartbeat-every options. Pluggable
MetricsSource protocol lets us mock postgres in tests; PostgresMetricsSource
issues bounded queries against the metrics + samples + io_events + syscalls
tables. Ctrl+C exits 0 cleanly. 33 tests pass, ruff clean.
2026-05-04 08:46:11 +02:00
.forgejo/workflows init io-storm-catcher: spike attribution over plugbench events 2026-05-04 02:49:25 +02:00
io_storm_catcher tail: implement incremental live spike detection from postgres 2026-05-04 08:46:11 +02:00
tests tail: implement incremental live spike detection from postgres 2026-05-04 08:46:11 +02:00
.gitignore init io-storm-catcher: spike attribution over plugbench events 2026-05-04 02:49:25 +02:00
LICENSE init io-storm-catcher: spike attribution over plugbench events 2026-05-04 02:49:25 +02:00
pyproject.toml init io-storm-catcher: spike attribution over plugbench events 2026-05-04 02:49:25 +02:00
README.md tail: implement incremental live spike detection from postgres 2026-05-04 08:46:11 +02:00

io-storm-catcher-ekaii

Attribution layer over plugbench-ekaii telemetry.

plugbench-ekaii runs Minecraft server benchmarks and captures multi-layer telemetry (L0 spark health metrics, L1 async-profiler stacks, L2 JFR, L3 /proc/<pid>/io, L4 bpftrace syscalls + biosnoop). It writes a per-repeat events.parquet and (optionally) mirrors everything into a postgres sidecar.

io-storm-catcher-ekaii reads that telemetry, detects MSPT spikes, and walks back through the cross-layer event tree to attribute each spike to:

  • the on-CPU thread(s) at the spike timestamp (L1 + L3),
  • the syscalls those threads issued during the spike window (L4),
  • the file paths or block devices touched (L4 biosnoop, L3 proc),
  • a probable plugin owner, derived from L1 stack frames using a longest-prefix package match (or the explicit plugin column when plugbench was given a plugin_packages registry).

It produces both a markdown report (for humans, PR comments, the wiki) and a JSON dump (for downstream tools).

Install

python3 -m venv .venv && . .venv/bin/activate
pip install -e ".[dev]"           # bare: parquet only
pip install -e ".[dev,postgres]"  # adds psycopg for --dsn mode

Usage

# Detect on a single run, parquet auto-discovered under /fast/plugbench/runs/<run_id>/
iostorm detect <run_id>

# Explicit parquet path
iostorm detect <run_id> --parquet ./runs/abc/0/events.parquet --out-md spikes.md

# Postgres mode
IOSTORM_PG_DSN=postgresql://plugbench:plugbench@127.0.0.1:55432/plugbench \
  iostorm detect <run_id>

# Run-vs-run diff (signature = top_plugin + first_syscall)
iostorm diff <run_a> <run_b> --out-md diff.md --out-json diff.json

# Live mode: poll postgres every N seconds, print spikes + heartbeats as they arrive
iostorm tail --run-id <run_id> --interval-s 5
iostorm tail --all --interval-s 2 --metric mspt_p50

Threshold

Spikes are anything at or above max(50ms, p95 + 3*sigma) on the highest-resolution MSPT metric available (tick_ms if present, else mspt_p99 / mspt_p95 / mspt_p50 / mspt). The 50ms floor is non-negotiable: under it, no MC tick is "late".

Tune with --threshold-ms (raises the floor) and --sigma-mult (widens the statistical band).

How it sits next to plugbench

plugbench-ekaii writes; io-storm-catcher-ekaii reads. We never modify plugbench. The expected schema (column names + types) is documented in plugbench-ekaii/orchestrator/correlate.py and plugbench-ekaii/schemas/postgres.sql. If a column is missing — e.g. frames because the run only enabled L0 — io-storm-catcher fills it with nulls and degrades gracefully (spike still reported, attribution columns empty, with a note explaining why).

Tests

pytest -q

Tests use synthetic parquet files generated in tests/conftest.py; no real bench output required.

License

MIT.