@vmallela/spectral
v0.1.0
Published
Causal Observability for AI Agents
Downloads
46
Readme
Spectral
Causal observability for AI agents. Drop one line into your app and get a full trace of every LLM call, tool use, cost, latency, and behavioral invariant — with a CLI to explore, replay, and evaluate everything.
npm install spectral-obs
spectral tracesGetting Started
1. Install
npm install spectral-obs2. Wrap your Anthropic client
import Anthropic from '@anthropic-ai/sdk';
import { spectral } from 'spectral-obs';
const client = spectral.wrap(new Anthropic(), {
taskType: 'code-review', // label for grouping traces + evals
captureInputs: true, // store prompts (disable for sensitive data)
});
// Use client exactly as before — nothing else changes
const response = await client.messages.create({
model: 'claude-opus-4-6',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Review this PR...' }],
});That's it. Every call is now traced, hashed, and stored in ~/.spectral/spectral.db.
CLI
spectral traces # recent runs
spectral inspect <trace-id> # tree view of every span
spectral waterfall <trace-id> # latency waterfall chart
spectral cost --last 7d # cost breakdown by model
spectral replay <trace-id> \
--swap-step 2 --with-input "..." # re-run one step, see the diff
spectral scan # silent failure detection
spectral eval learn <task-type> # mine behavioral invariants
spectral eval run <trace-id> <type> # run invariants against a trace
spectral eval show <task-type> # list learned invariants
spectral eval pin <invariant-id> # lock an invariant across updates
spectral eval export <task-type> # dump suite as JSONCore features
Trace explorer
spectral traces lists your most recent runs with cost, latency, and status.
spectral inspect <id> renders the full trace DAG as a tree with token counts
and timing for every span.
Waterfall
spectral waterfall <trace-id> renders a terminal bar chart of every span's
contribution to total latency — useful for finding which tool or LLM call is
the bottleneck.
Cost tracking
spectral cost --last 7d breaks down spend by model across the last N days.
Pricing is built in for all current Claude models.
Replay engine
spectral replay abc123 --swap-step 2 --with-input "Be more concise"Loads the cached trace, replaces step 2's input with your new prompt, calls the API live, and shows:
- A line-by-line diff of the old vs new output
- Cost delta and latency delta
No need to re-run your whole agent to test a single prompt change.
Silent failure detection
spectral scanRuns z-score anomaly detection on the output hash distribution for each task type. Flags runs where the output unexpectedly changed while the input didn't — a common sign of silent regressions after a model upgrade or prompt edit.
Behavioral evals
Spectral can learn what "normal" looks like from your production traces and then check new traces against those expectations automatically.
Learn invariants from traces
spectral eval learn code-review --limit 50Analyzes your last 50 code-review runs and extracts invariants across three
dimensions:
| Dimension | What it mines | Cost | |-----------|--------------|------| | Structural | Tool ordering, call counts, step count, repetition loops, never-final tools | Free | | Content | Output line-count bounds, LLM-extracted presence/absence/format patterns | Free + optional Haiku | | Causal | Which tool outputs flow into downstream inputs (Jaccard similarity) | Free |
Each invariant gets a score:
score = 0.4·consistency + 0.25·specificity + 0.25·actionability − 0.1·costOnly invariants above the threshold (default 0.5) are saved.
Run evals on a new trace
spectral eval run <trace-id> code-reviewChecks the trace against all learned invariants in priority order:
- Structural — pure graph analysis, instant
- Deterministic content — regex / line-count, instant
- Heuristic causal — Jaccard similarity, instant
- LLM judge — Claude Haiku, skipped if a critical violation is already found
✓ 11/12 checks passed (91%)
Violations:
✗ [critical] search_files output flows into write_file input
write_file shows 2% overlap with search_files output (min 8%)
Fix: write_file may be ignoring output from search_files — blind operation detectedPin invariants
spectral eval pin inv_01abc123Pinned invariants survive future eval learn refreshes — useful for
invariants you've manually reviewed and want to treat as ground truth.
How it works
Zero-overhead hot path
messages.create() called
│
▼
generateId() ← ~0.001 ms
Date.now() ×2 ← ~0.001 ms
pipeline.push(ref) ← ring buffer write, ~0.001 ms
│
▼ (background, off the call stack)
drain() ← serialize + hash
batch flush ← single SQLite transactionThe intercepted call adds ~0.003 ms to TTFT. The rest happens asynchronously.
Storage
All data lives in ~/.spectral/spectral.db — a single WAL-mode SQLite file.
No server, no account, no data leaves your machine.
Performance internals
| Component | Technique | Benefit |
|-----------|-----------|---------|
| RingBuffer<T> | Pre-allocated power-of-2 array, bitwise modulo | O(1) push/drain, no GC pressure |
| fastHash | Murmur3 × 2 seeds | ~35× faster than SHA-256 |
| BatchWriter | Prepared statement + db.transaction() | One fsync per batch, not per trace |
| TracePipeline | Three-lane: hot → drain → flush | Hot path never touches SQLite |
SDK reference
import { spectral } from 'spectral-obs';
// Wrap a client
const client = spectral.wrap(anthropicClient, {
taskType?: string, // groups runs for evals + cost tracking
captureInputs?: boolean, // default true
dbPath?: string, // default ~/.spectral/spectral.db
});
// Access the underlying stores directly if needed
const store = spectral.getStore();
const pipeline = spectral.getPipeline();
// Clean shutdown (flushes pending traces)
spectral.closeAll();Development
npm test # 249 tests, all green
npm run build # compile to dist/
npm run dev # watch modeTests use Vitest with pool: 'forks' for native module compatibility.
All tests are self-contained and create temporary SQLite databases.
License
MIT
