agent-trace-debugger

v0.1.0

Published

9 days ago

0High
0Medium
0Low

jsleekr

agent-trace-debugger

Debug, diff, and replay AI agent traces

Record every LLM call, tool invocation, and reasoning step your agent makes. Then diff two runs side-by-side to find exactly where behavior diverged.

Why This Exists

Building AI agents is hard. Debugging them is harder.

"It worked yesterday." — LLM non-determinism means the same prompt can produce different tool calls, different reasoning chains, different outcomes. You need a way to see exactly what changed between two runs.
"Which call is eating my budget?" — A single agent run can make dozens of LLM calls across multiple models. Without per-span cost and token tracking, you are flying blind.
"I can't reproduce the bug." — Agent executions are ephemeral. Once the run is over, the intermediate steps are gone. You need a recording you can replay step by step.

agent-trace-debugger gives you a persistent, queryable, diffable record of every agent execution — so you can stop guessing and start debugging.

Features

| | Feature | Description | |---|---|---| | :pencil2: | Trace Collector | Auto-instrument OpenAI and Anthropic SDKs, or build spans manually with the fluent SpanBuilder API | | :package: | SQLite + FTS5 Store | Persistent storage with full-text search across all span inputs and outputs | | :arrows_counterclockwise: | Step-by-Step Replay | Walk through any trace span-by-span with breadcrumb navigation | | :mag: | Trace Diff | LCS-based structural alignment, content diffs, metrics diffs, and automatic divergence detection | | :bar_chart: | Multi-Trace Compare | Compare N traces to find the cheapest, fastest, and best run | | :mag_right: | Full-Text Search | FTS5-powered search across span names, inputs, and outputs | | :chart_with_upwards_trend: | Aggregated Stats | Cost, tokens, latency, and per-model breakdown across all recorded traces | | :outbox_tray: | Export (HTML/MD/JSON) | Export traces as self-contained HTML, Markdown for GitHub, or raw JSON | | :label: | Tags | Attach arbitrary string tags to traces for grouping and filtering | | :speech_balloon: | Annotations | Attach free-text notes to any span for post-hoc commentary | | :moneybag: | Cost Alerts | Budget monitoring with warning/exceeded callbacks, checked automatically after each trace | | :wrench: | Tool Result Capture | Auto-close pending tool spans when results appear, or record results manually | | :fast_forward: | Parallel Spans | Trace concurrent tool calls with isolated async contexts via startParallelSpans() | | :electric_plug: | Generic Adapter | wrapFunction() instruments any async function without an SDK-specific adapter | | :wastebasket: | Maintenance | Prune old or errored traces with cleanup, reclaim space with vacuum | | :computer: | CLI (15 commands) | list, show, record, replay, diff, compare, search, stats, export, tag, untag, tags, annotate, cleanup, vacuum |

Quick Start

npm install agent-trace-debugger

import { Tracer, FileExporter, SpanKind } from 'agent-trace-debugger';

// 1. Create a tracer with a file exporter
const tracer = new Tracer({
  name: 'my-agent',
  exporter: new FileExporter('./traces/run.json'),
});

// 2. Start a trace
const root = tracer.startTrace('agent-run');

// 3. Record spans
const span = tracer.startSpan('think', SpanKind.REASONING);
span.setOutput('I should search the database first.');
tracer.endSpan(span);

// 4. End and export
const trace = await tracer.endTrace();

Then inspect the trace from the command line:

npx trace-debugger show <trace-id>
npx trace-debugger replay <trace-id>

Collector SDK

Tracer

The Tracer manages the lifecycle of a single trace. It creates spans, tracks parent-child relationships via AsyncLocalStorage, and exports the finished trace.

import { Tracer, StoreExporter, SQLiteTraceStore } from 'agent-trace-debugger';

const store = new SQLiteTraceStore('./traces.db');
const tracer = new Tracer({
  name: 'my-agent',
  exporter: new StoreExporter(store),
});

const root = tracer.startTrace('run-42');

// Spans are automatically parented to the current context
const llmSpan = tracer.startSpan('gpt-4o-call', SpanKind.LLM_CALL);
llmSpan.setOutput('The answer is 42.');
llmSpan.setMetadata({
  model: 'gpt-4o',
  promptTokens: 150,
  completionTokens: 12,
  cost: 0.00093,
});
tracer.endSpan(llmSpan);

const trace = await tracer.endTrace();

SpanBuilder (Fluent API)

For lower-level control, use SpanBuilder directly:

import { SpanBuilder, SpanKind } from 'agent-trace-debugger';

const activeSpan = new SpanBuilder(traceId)
  .named('search-documents')
  .ofKind(SpanKind.TOOL_CALL)
  .withInput(JSON.stringify({ query: 'quarterly revenue' }))
  .childOf(parentSpanId)
  .start();

// ... do work ...

activeSpan.setOutput(JSON.stringify(results));
activeSpan.setMetadata({ latency: 230 });
const span = activeSpan.end();

Parallel Spans

Trace concurrent work (e.g. multiple tool calls) with isolated async contexts so nested spans parent correctly:

const branches = tracer.startParallelSpans(
  ['search-docs', 'query-db', 'call-api'],
  SpanKind.TOOL_CALL,
);

await Promise.all(
  branches.map(({ span, run }) =>
    run(async () => {
      // Any startSpan() calls here will parent under this branch's span
      const result = await doWork();
      span.setOutput(result);
      tracer.endSpan(span);
    }),
  ),
);

Exporters

| Exporter | Description | |---|---| | FileExporter(path) | Writes the trace as a JSON file | | StoreExporter(store) | Saves the trace into a SQLiteTraceStore | | HtmlExporter(path) | Writes a self-contained HTML report with collapsible span tree | | MarkdownExporter(path) | Writes a structured Markdown document |

SDK Integration

OpenAI Adapter

Automatically instruments client.chat.completions.create to capture every LLM call and tool invocation as spans. Supports both sync and streaming responses.

import { Tracer, StoreExporter, OpenAIAdapter } from 'agent-trace-debugger';
import OpenAI from 'openai';

const client = new OpenAI();
const tracer = new Tracer({ name: 'openai-agent', exporter: new StoreExporter(store) });

const root = tracer.startTrace('chat-session');

// Instrument the client
const adapter = new OpenAIAdapter(root.traceId);
adapter.instrument(client);

// All calls are now traced automatically
const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'What is 2+2?' }],
});
// ^ Creates an llm_call span with model, tokens, and cost metadata.
//   If the response includes tool_calls, each one becomes a child tool_call span.

// Restore original method when done
adapter.restore(client);
await tracer.endTrace();

Anthropic Adapter

Automatically instruments client.messages.create to capture LLM calls and tool_use content blocks.

import { Tracer, StoreExporter, AnthropicAdapter } from 'agent-trace-debugger';
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();
const tracer = new Tracer({ name: 'claude-agent', exporter: new StoreExporter(store) });

const root = tracer.startTrace('claude-session');

// Instrument the client
const adapter = new AnthropicAdapter(root.traceId);
adapter.instrument(client);

// All calls are now traced automatically
const response = await client.messages.create({
  model: 'claude-4-sonnet',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Summarize this document.' }],
});
// ^ Creates an llm_call span with input/output tokens.
//   tool_use content blocks become child tool_call spans.

adapter.restore(client);
await tracer.endTrace();

Generic Adapter

wrapFunction() instruments any async (or sync) function without requiring an SDK-specific adapter. It creates a span around each invocation, records input/output, and attaches it to the current trace context.

import { wrapFunction, SpanKind } from 'agent-trace-debugger';
import type { SpanTracker } from 'agent-trace-debugger';

const tracker: SpanTracker = {
  traceId: root.traceId,
  onSpanEnd: (span) => finishedSpans.push(span),
};

const tracedFetch = wrapFunction(fetchDocuments, tracker, {
  name: 'fetch-documents',
  kind: SpanKind.TOOL_CALL,
  extractInput: (query) => JSON.stringify(query),
  extractOutput: (result) => JSON.stringify(result),
  extractUsage: (result) => ({ promptTokens: result.tokens }),
});

// Use it as a drop-in replacement -- spans are created automatically
const docs = await tracedFetch({ query: 'quarterly revenue' });

Streaming Support

Both adapters automatically handle streaming responses. When you pass stream: true to either SDK, the adapter accumulates chunks, reconstructs the full response, and records it as a single span.

OpenAI: Accumulates streamed content deltas and extracts token usage from the final chunk
Anthropic: Processes message_start, content_block_delta, and message_delta events to reconstruct the complete message

When streaming responses do not include exact token counts, the built-in estimateTokens fallback provides approximate usage metrics so cost tracking remains functional.

Tool Result Capture

Both adapters automatically close pending tool-call spans when tool results appear in a subsequent API call. You can also record a result explicitly:

// Manually record the output of a tool call span
adapter.recordToolResult(toolCallId, resultContent);

Auto-detection: When the next LLM call contains a tool_result block (Anthropic) or a tool role message (OpenAI) referencing a previously opened tool span, the adapter matches it by ID and closes the span with the result automatically -- no manual wiring required.

Trace Replay

Step through a recorded trace span-by-span using the ReplayEngine or the CLI.

Programmatic API

import { ReplayEngine } from 'agent-trace-debugger';

const engine = new ReplayEngine(trace);

let frame = engine.current();
console.log(frame.span.name);    // "agent-run"
console.log(frame.position);     // "1/5"
console.log(frame.breadcrumb);   // [{ spanId: "...", name: "agent-run" }]

frame = engine.next();           // advance to next span
frame = engine.prev();           // go back one span
frame = engine.jumpTo(spanId);   // jump to specific span
engine.toStart();                // return to first span
engine.toEnd();                  // jump to last span

CLI Replay

# Replay all steps sequentially
trace-debugger replay <trace-id>

# Jump to a specific step
trace-debugger replay <trace-id> --step 3

# Compact output (span name + position only, no input/output bodies)
trace-debugger replay <trace-id> --compact

Example output:

[1/5] agent-run  (reasoning)
  input:  "What is the capital of France?"
  output: "I should look this up."

[2/5] search-documents  (tool_call)
  input:  {"query":"capital of France"}
  output: {"result":"Paris"}

[3/5] gpt-4o-call  (llm_call)
  input:  "Given the search result, answer the question."
  output: "The capital of France is Paris."
...

The --budget flag on show and record prints a cost warning when the trace exceeds a threshold:

trace-debugger show <trace-id> --budget 0.10
trace-debugger record --file run.json --budget 0.05

Diff Analysis

This is the core feature. Compare two traces to understand exactly what changed and why.

The diff engine performs four analyses in a single analyze() call:

1. Structural Diff

LCS-based alignment detects added, removed, and reordered spans between two runs.

import { analyze } from 'agent-trace-debugger';

const result = analyze(traceA, traceB);

console.log(result.structuralDiff.added);     // Span[] - spans only in B
console.log(result.structuralDiff.removed);   // Span[] - spans only in A
console.log(result.structuralDiff.reordered); // [Span, Span][] - position changes

2. Content Diff

For each matched span pair, computes line-level diffs of input and output, classifying changes as minor (whitespace only) or major (substantive).

for (const diff of result.contentDiffs) {
  console.log(diff.spanName);      // "chat.completions.create"
  console.log(diff.significance);  // "major" | "minor"
  console.log(diff.inputDiff);     // unified diff string
  console.log(diff.outputDiff);    // unified diff string
}

3. Metrics Diff

Per-span and trace-level deltas for cost, latency, and token usage.

const metrics = result.metricsSummary;

console.log(metrics.costChangePercent);     // -12.5 (percent)
console.log(metrics.latencyChangePercent);  // +8.3
console.log(metrics.totalCostA);            // 0.0234
console.log(metrics.totalCostB);            // 0.0205

for (const s of metrics.perSpan) {
  console.log(s.spanName, s.costDelta, s.latencyDelta, s.tokenDelta);
}

4. Divergence Finder

Walks matched pairs chronologically to find the first point of divergence, classifying it as:

llm_nondeterminism -- same input, different output (the LLM chose differently)
input_change -- different input (caused by an upstream change)

if (result.divergencePoint) {
  console.log(result.divergencePoint.type);             // "llm_nondeterminism"
  console.log(result.divergencePoint.cause);            // human-readable explanation
  console.log(result.divergencePoint.downstreamImpact); // 3 subsequent spans affected
}

Summary

Every diff result includes a human-readable summary string:

console.log(result.summary);
// "Structural changes: 1 added, 1 removed spans. 2 span(s) have major content
//  differences. Divergence at "plan": llm_nondeterminism, affecting 3 downstream
//  span(s). Cost decreased by 12.5%."

CLI

# Full diff between two traces
trace-debugger diff <trace-id-1> <trace-id-2>

# Short summary only
trace-debugger diff <trace-id-1> <trace-id-2> --format short

Multi-Trace Compare

Compare two or more traces at once to identify the cheapest, fastest, and overall best run.

import { analyze, generateCompareSummary } from 'agent-trace-debugger';

// Generate pairwise diffs between adjacent traces
const diffs = [
  analyze(traceA, traceB),
  analyze(traceB, traceC),
];

const summary = generateCompareSummary(diffs);

console.log(summary.runs[summary.cheapest].totalCost);     // lowest cost
console.log(summary.runs[summary.fastest].totalLatency);    // lowest latency
console.log(summary.runs[summary.bestRun].score);           // best composite score
console.log(summary.text);                                  // formatted comparison table

# CLI: compare 2 or more traces
trace-debugger compare <id1> <id2> [id3 ...]

Example output:

=== Trace Comparison Summary ===

Run | Cost       | Latency (ms) | Score
--- | ---------- | ------------ | -----
  0 | $0.0234    |         2340 | 0.026740
  1 | $0.0205    |         1890 | 0.022390
  2 | $0.0310    |         3100 | 0.034100

Cheapest: Run 1 ($0.0205)
Fastest:  Run 1 (1890ms)
Most different: Run 2
Best run: Run 1 (lowest combined score)

Search & Stats

Full-Text Search

SQLite FTS5-powered search across span names, inputs, and outputs.

const store = new SQLiteTraceStore('./traces.db');

// Search all spans
const spans = store.searchSpans('quarterly revenue');

// Search within a specific trace
const spans = store.searchSpans('error', traceId);

# CLI search
trace-debugger search "quarterly revenue"
trace-debugger search "error" --kind llm_call --limit 20

Aggregated Statistics

import { aggregateStats } from 'agent-trace-debugger';

const stats = aggregateStats(store, { since: '2026-01-01', model: 'gpt-4o' });

console.log(stats.count);        // 47 traces
console.log(stats.totalCost);    // 1.23
console.log(stats.avgCost);      // 0.026
console.log(stats.totalTokens);  // 142000
console.log(stats.avgLatency);   // 2340 ms

for (const [model, data] of stats.modelBreakdown) {
  console.log(model, data.count, data.cost, data.tokens);
}

# CLI stats
trace-debugger stats
trace-debugger stats --since 2026-01-01 --model gpt-4o

Export Formats

Export any recorded trace to HTML, Markdown, or JSON for sharing and reporting.

# Export to self-contained HTML
trace-debugger export <trace-id> --format html --output report.html

# Export to Markdown (for GitHub issues/PRs)
trace-debugger export <trace-id> --format markdown --output trace.md

# Export to JSON (default)
trace-debugger export <trace-id> --format json --output trace.json

# Print to stdout (omit --output)
trace-debugger export <trace-id> --format json

| Format | Description | |---|---| | html | Self-contained HTML file with embedded CSS, collapsible span tree, and color-coded span kinds | | markdown | Structured Markdown document suited for pasting into GitHub issues or pull requests | | json | Raw JSON trace data |

Tags & Annotations

Annotations

Attach free-text notes to any span for post-hoc commentary, review, or debugging context.

// Add an annotation
store.addAnnotation(traceId, spanId, 'This retried due to rate limit', 'alice', new Date().toISOString());

// Retrieve annotations for a span
const annotations = store.getAnnotations(spanId);
// [{ text: "This retried due to rate limit", author: "alice", timestamp: "..." }]

trace-debugger annotate <trace-id> <span-id> "This span retried twice due to a rate limit."
trace-debugger annotate <trace-id> <span-id> "Looks correct" --author bob

Annotations are stored alongside the span and displayed in trace show output.

Cost Alerts

Monitor cumulative spend across traces and trigger callbacks when thresholds are crossed.

import { CostAlert, Tracer, StoreExporter, SQLiteTraceStore } from 'agent-trace-debugger';

const store = new SQLiteTraceStore('./traces.db');

const alert = new CostAlert({
  budget: 5.00,               // total budget in dollars
  warningThreshold: 0.8,      // fire onWarning at 80% of budget (default)
  onWarning: (cumulative, budget) =>
    console.warn(`Cost warning: $${cumulative.toFixed(4)} of $${budget.toFixed(2)} budget`),
  onExceeded: (cumulative, budget) =>
    console.error(`Budget exceeded: $${cumulative.toFixed(4)} > $${budget.toFixed(2)}`),
});

// Pass the alert to Tracer -- it auto-checks after every endTrace()
const tracer = new Tracer({
  name: 'my-agent',
  exporter: new StoreExporter(store),
  costAlert: alert,
});

const root = tracer.startTrace('agent-run');
// ... record spans ...
await tracer.endTrace(); // CostAlert.check() is called automatically

// Query remaining budget programmatically
console.log(alert.getCumulativeCost());   // 3.42
console.log(alert.getRemainingBudget());  // 1.58

Maintenance

Keep the trace store lean by removing traces you no longer need and optimizing the database.

# Delete traces older than 30 days
trace-debugger cleanup --older-than 30d

# Delete all traces with error status
trace-debugger cleanup --status error

# Combine filters
trace-debugger cleanup --older-than 7d --status error

# Reclaim disk space after deletions (SQLite VACUUM)
trace-debugger vacuum

Monitor database size programmatically:

const store = new SQLiteTraceStore('./traces.db');
const bytes = store.getStorageSize();
console.log(`Store is ${(bytes / 1024 / 1024).toFixed(2)} MB`);

Examples

The examples/ directory contains three ready-to-run scripts:

| Script | Description | |---|---| | basic-usage.ts | Creates a tracer, records a few spans manually, and prints the finished trace | | openai-agent.ts | Instruments an OpenAI client with OpenAIAdapter, runs a tool-use loop, and saves to SQLite | | diff-two-runs.ts | Loads two traces from the store, calls analyze(), and prints the divergence report |

Run any example:

npx tsx examples/basic-usage.ts
npx tsx examples/openai-agent.ts
npx tsx examples/diff-two-runs.ts

Supported Models

Built-in cost calculation for the following models (per 1K tokens):

| Model | Prompt | Completion | |---|---|---| | claude-4-opus | $0.0150 | $0.0750 | | claude-4-sonnet | $0.0030 | $0.0150 | | claude-3.5-sonnet | $0.0030 | $0.0150 | | claude-3.5-haiku | $0.0010 | $0.0050 | | claude-3-opus | $0.0150 | $0.0750 | | claude-3-sonnet | $0.0030 | $0.0150 | | claude-3-haiku | $0.00025 | $0.00125 | | gpt-4o | $0.0050 | $0.0150 | | gpt-4o-mini | $0.00015 | $0.00060 | | gpt-4-turbo | $0.0100 | $0.0300 | | gemini-1.5-pro | $0.0035 | $0.0105 | | gemini-1.5-flash | $0.000075 | $0.00030 | | gemini-2.0-flash | $0.00010 | $0.00040 |

Custom models work with all features except automatic cost calculation. Token estimation uses model-aware character ratios (GPT ~4 chars/token, Claude ~3.5 chars/token) with a word-based fallback, returning the higher of the two estimates.

API Reference

Schema

| Export | Description | |---|---| | Trace | Trace type: id, name, startTime, endTime, rootSpanId, metadata, spans, tags | | Span | Span type: id, traceId, parentSpanId, name, kind, status, startTime, endTime, input, output, metadata, children, annotations | | SpanKind | Enum: llm_call, tool_call, reasoning, decision, custom | | SpanStatus | Enum: ok, error | | SpanMetadata | model, promptTokens, completionTokens, cost, latency, plus arbitrary keys | | SpanAnnotation | text, author, timestamp | | TraceFilter | Filter for listTraces: since, until, model, status, name, tags, limit, offset |

Collector

| Export | Description | |---|---| | Tracer | Main tracer class. startTrace(name?, { tags? }), startSpan(name, kind), endSpan(span), endTrace(), startParallelSpans(names, kind), getContext() | | SpanBuilder | Fluent builder. named(), ofKind(), withInput(), childOf(), start() | | ActiveSpan | Live span handle. setOutput(), setMetadata(), end(), getSpan(), plus id and traceId getters | | TraceContext | Async context manager. currentSpan(), withSpan(span, fn) | | CostAlert | Budget monitor. Constructor: { budget, warningThreshold?, onWarning?, onExceeded? }. Methods: check(trace), getCumulativeCost(), getRemainingBudget() |

Adapters

| Export | Description | |---|---| | OpenAIAdapter | Instruments client.chat.completions.create. Constructor: new OpenAIAdapter(traceId). Methods: instrument(client), restore(client), recordToolResult(id, result) | | AnthropicAdapter | Instruments client.messages.create. Constructor: new AnthropicAdapter(traceId). Methods: instrument(client), restore(client), recordToolResult(id, result) | | wrapFunction(fn, tracker, options?) | Generic adapter. Wraps any function as a traced span. Options: name, kind, extractInput, extractOutput, extractUsage | | SpanTracker | Interface for wrapFunction: { traceId, onSpanEnd(span) } |

Exporters

| Export | Description | |---|---| | Exporter | Interface: export(trace, spans): Promise<void> | | FileExporter | Writes trace as JSON to a file path | | StoreExporter | Saves trace into a TraceStore | | HtmlExporter | Writes self-contained HTML with collapsible span tree | | MarkdownExporter | Writes structured Markdown document |

Store

| Export | Description | |---|---| | TraceStore | Interface: saveTrace(), getTrace(), listTraces(), searchSpans(), deleteTrace(), getStats(), close() | | SQLiteTraceStore | Full implementation with additional methods: addTag(), removeTag(), listTags(), listTagsWithCounts(), addAnnotation(), getAnnotations(), cleanup(options), vacuum(), getStorageSize() | | CleanupOptions | { olderThanDays?, status? } |

Diff

| Export | Description | |---|---| | analyze(traceA, traceB) | Returns DiffResult with structural, content, metrics diffs and divergence point | | DiffResult | { structuralDiff, contentDiffs, metricsSummary, divergencePoint?, summary } | | StructuralDiff | { matched, added, removed, reordered } | | ContentDiff | Per-span input/output diffs with significance (major or minor) | | MetricsSummary | { totalCostA, totalCostB, costChangePercent, totalLatencyA, totalLatencyB, latencyChangePercent, perSpan } | | DivergencePoint | { type, spanA, spanB, cause, downstreamImpact } | | generateCompareSummary(diffs) | Returns CompareSummary with runs, cheapest, fastest, mostDifferent, bestRun, text | | alignTraces(spansA, spansB) | LCS-based span alignment returning matched, unmatchedA, unmatchedB | | findDivergencePoint(matched) | Locate first divergence in matched span pairs |

Replay

| Export | Description | |---|---| | ReplayEngine | Constructor: new ReplayEngine(trace). Methods: current(), next(), prev(), jumpTo(spanId), toStart(), toEnd(). Property: total | | ReplayFrame | { span, position, depth, breadcrumb } | | ReplayCursor | Internal cursor tracking position state |

Stats

| Export | Description | |---|---| | aggregateStats(store, filter?) | Returns AggregatedStats: { count, totalCost, avgCost, totalTokens, avgLatency, modelBreakdown } | | StatsFilter | { since?, model? } | | ModelStats | { count, cost, tokens } |

Utilities

| Export | Description | |---|---| | calculateCost(model, promptTokens, completionTokens) | Returns cost in dollars or null for unsupported models | | getSupportedModels() | Returns array of model names with built-in pricing | | estimateTokens(text) | Character-based token estimate (~4 chars/token) | | estimateTokensByWords(text) | Word-based token estimate (~1.3 tokens/word) | | estimateTokensForModel(text, model) | Model-aware estimate returning the higher of char-based and word-based | | generateTraceId() | Generate a unique trace ID | | generateSpanId() | Generate a unique span ID |

CLI Commands

trace-debugger list       [--since <date>] [--model <model>] [--status <status>] [--limit <n>]
trace-debugger show       <id> [--budget <dollars>]
trace-debugger record     --file <path> [--budget <dollars>]
trace-debugger replay     <id> [--step <n>] [--compact]
trace-debugger diff       <id1> <id2> [--format short|full]
trace-debugger compare    <id1> <id2> [ids...] [--format short|full]
trace-debugger search     <query> [--kind <kind>] [--limit <n>]
trace-debugger stats      [--since <date>] [--model <model>]
trace-debugger export     <id> [--format json|html|markdown] [--output <path>]
trace-debugger tag        <id> <tag>
trace-debugger untag      <id> <tag>
trace-debugger tags
trace-debugger annotate   <trace-id> <span-id> "<text>" [--author <name>]
trace-debugger cleanup    [--older-than <duration>] [--status <status>]
trace-debugger vacuum

Global option: --db <path> to specify a custom database file path.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

agent-trace-debugger

Debug, diff, and replay AI agent traces

Why This Exists

Features

Quick Start

Collector SDK

Tracer

SpanBuilder (Fluent API)

Parallel Spans

Exporters

SDK Integration

OpenAI Adapter

Anthropic Adapter

Generic Adapter

Streaming Support

Tool Result Capture

Trace Replay

Programmatic API

CLI Replay

Diff Analysis

1. Structural Diff

2. Content Diff

3. Metrics Diff

4. Divergence Finder

Summary

CLI

Multi-Trace Compare

Search & Stats

Full-Text Search

Aggregated Statistics

Export Formats

Tags & Annotations

Tags

Annotations

Cost Alerts

Maintenance

Examples

Supported Models

API Reference

Schema

Collector

Adapters

Exporters

Store

Diff

Replay

Stats

Utilities

CLI Commands

License