fleet-metrics-cron
v1.0.0
Published
Cloudflare Worker cron that aggregates fleet observability metrics from D1 into KV
Readme
fleet-metrics-cron
Cloudflare Worker that aggregates observability metrics for the SuperInstance fleet of 15 Workers. Runs every 5 minutes via Cron Trigger.
What It Does
Every 5 minutes, fleet-metrics-cron:
- Queries D1 — Reads the
spansandeventstables for the last 5-minute window for all 15 fleet workers - Computes metrics — Latency percentiles (p50/p75/p90/p95/p99), error rates, throughput, budget consumption
- Detects anomalies — Runs 8 anomaly rules (high error rate, latency spikes, budget drain, cascading failures, etc.)
- Writes to KV — Stores results under
metrics:*keys with 10-minute TTL - Updates baselines — Recomputes 7-day rolling baselines daily at midnight UTC
KV Key Patterns
| Key | TTL | Content |
|---|---|---|
| metrics:latency:{worker}:{window} | 10 min | Latency percentiles per worker |
| metrics:errors:{worker}:{window} | 10 min | Error counts and rates per worker |
| metrics:throughput:{worker}:{window} | 10 min | Request counts per worker |
| metrics:budget:{window} | 10 min | Budget consumption rates |
| metrics:overview:{window} | 10 min | Fleet-wide dashboard summary |
| metrics:anomalies:{window} | 10 min | Active anomaly flags |
| metrics:baselines | 24 h | 7-day rolling baselines per worker |
Where {window} is an ISO 8601 timestamp truncated to the 5-minute boundary (e.g., 2026-06-10T03:30:00Z).
Anomaly Rules
| Rule | Condition | Severity | |---|---|---| | A1 | error_rate > 5% | critical | | A2 | p95 > 2× baseline | warning | | A3 | budget drain > 10%/min | critical | | A4 | error rate > 3× previous window | warning | | A5 | Worker silent (no spans, was active before) | warning | | A6 | ≥3 workers degraded simultaneously | critical | | A7 | p99 > 5× baseline | critical | | A8 | >10 timeouts in 5-min window | warning |
Each rule has a cooldown to prevent alert storms.
Prerequisites
- D1 database named
fleet-eventswithspansandeventstables (migrations V006/V007 applied) - KV namespace named
fleet-orchestrator-kv
Setup
# Install dependencies
npm install
# Create D1 database (if not already created)
npx wrangler d1 create fleet-events
# → copy database_id into wrangler.toml
# Create KV namespace (if not already created)
npx wrangler kv namespace create fleet-orchestrator-kv
# → copy id into wrangler.toml
# Apply migrations (from the main fleet repo)
npx wrangler d1 execute fleet-events --file=migrations/V006__add_trace_columns.sql
npx wrangler d1 execute fleet-events --file=migrations/V007__create_spans_table.sqlDevelopment
# Local dev with cron trigger simulation
npx wrangler dev --test-scheduled
# Then: curl "http://localhost:8787/__scheduled?cron=*/5+*+*+*+*"
# Type check
npm run typecheckDeploy
npm run deployMonitoring
# Live tail logs
npm run tailArchitecture
15 Workers → D1 (spans + events) → fleet-metrics-cron (every 5 min) → KV
↓
Dashboard APIWorkers write spans to D1 as fire-and-forget (.run() without await). This cron reads them back in batches of 5 (D1 connection limit), aggregates, and writes to KV for dashboard consumption.
