@codepawl/tracepawl
v0.2.1
Published
Failure diagnosis and replay for coding agents.
Maintainers
Readme
TracePawl
Failure diagnosis for failed coding-agent runs.
TracePawl is a postmortem engine for autonomous coding agents. It takes a normalized JSON trace of a failed run, identifies the likely failure category, pinpoints where execution started to drift, surfaces the supporting evidence, and suggests a recovery action. It is the first focused product in the CodePawl stack.
What TracePawl is not
- Not a generic observability dashboard
- Not a LangSmith / Langfuse clone
- Not a multi-agent runtime
- Not a hosted or adapter-based trace collector
- Not an LLM-as-judge service
v0 flow
coding-agent trace JSON → parser → rule-based analyzer → FailureReport → terminal reportThe analyzer is deterministic — no LLM, no network. Rules live in src/analyzer/rules/; the schema is defined in src/schema/.
Quickstart
pnpm install
pnpm build
node dist/cli.js analyze examples/stale-context-edit.jsonSample output
TracePawl Failure Report
========================
Failure: stale_context_edit
Summary:
Edit to `src/paginate.ts` failed because the agent's snippet did not match current file content (2 attempts).
Root cause:
The agent read `src/paginate.ts` earlier in the run, then attempted to edit it using that cached snippet. Something changed the file (or the agent's snippet was inaccurate to begin with), so the `old_string` anchor no longer appears verbatim. Retrying with the same stale snippet cannot succeed.
Failure onset: evt_006
Evidence:
- [evt_006, evt_008] Failed `file_edit` event(s) whose `old_string` did not match current file content — a stale-context signal.
- [evt_002] Prior `file_read` event(s) for the same path — the agent's edit context likely went stale between the read and the failed edit.
Contradicting evidence:
None
Suggested recovery:
Action: re_read_file
Re-read the file from disk, locate the intended target by current line content, and retry the edit with a narrower, freshly-anchored patch.
Parameters: {"path":"src/paginate.ts"}
Confidence: 0.85
Related events: evt_006, evt_008, evt_002
Trace ID: run_stale_context_edit_001v0 failure categories
| Category | What it catches |
|---|---|
| stale_context_edit | Agent edited a file using outdated context; old_string doesn't match current content. |
| tool_misuse | Tool called with invalid arguments, missing required fields, or violated preconditions. |
| loop_or_stall | Same tool call (or failing command) repeated ≥3 times with identical arguments. |
| test_failure_misdiagnosis | Failed test points at one file; agent edits unrelated files or silences the assertion. |
| unsafe_or_broad_edit | Narrow user request produced edits spanning many files, directories, or lines. |
See docs/FAILURE_CATEGORIES.md for the long form.
Try the other fixtures
node dist/cli.js analyze examples/tool-misuse.json
node dist/cli.js analyze examples/loop-or-stall.json
node dist/cli.js analyze examples/test-failure-misdiagnosis.json
node dist/cli.js analyze examples/unsafe-broad-edit.jsonRecording a real run
TracePawl can wrap a local command and write a trace that the analyzer can read:
tracepawl record --output .tracepawl/runs/latest.json -- pnpm test
tracepawl analyze .tracepawl/runs/latest.jsonWhen running from this repository before publishing or installing the package, use the built CLI directly:
pnpm build
node dist/cli.js record --output .tracepawl/runs/latest.json -- pnpm test
node dist/cli.js analyze .tracepawl/runs/latest.jsonSee docs/RECORDER.md for output-file behavior, latest.json, best-effort git_diff capture, exit codes, signal handling, and current recorder limits.
Current limitations
- No dashboard or hosted service.
- No external integrations (LangSmith, Langfuse, OpenTelemetry, Claude Code adapter, OpenCode adapter).
- No runtime adapters yet (Claude Code, OpenCode, OpenTelemetry, LangSmith, Langfuse). Build traces via
tracepawl record, theTraceWriterSDK (seedocs/SDK.md), or externally produced JSON conforming todocs/TRACE_SCHEMA.md. - No LLM-based diagnosis. Rules are deterministic (see
.tracepawl/active/DECISIONS.mdD003). - No replay engine. Replay-lite and full replay are post-v0.
Recording a trace with the SDK
The TraceWriter SDK lets you record events from an agent runtime — no hand-authored JSON required. It owns event IDs and ISO timestamps, validates constructor inputs, and writes JSON that round-trips cleanly through the parser.
import { TraceWriter, analyzeTrace, formatTerminalReport } from "@codepawl/tracepawl";
const writer = new TraceWriter({ agent: "my-agent", userGoal: "Fix paginate()" });
writer.recordFileRead({ path: "src/paginate.ts" });
writer.recordFileEdit({
path: "src/paginate.ts",
oldString: "items.slice(start, end - 1)",
newString: "items.slice(start, end)",
applied: false,
error: "old_string not found in file",
});
writer.finalize();
console.log(formatTerminalReport(analyzeTrace(writer.toJSON())));See docs/SDK.md for the full API reference, ID/timestamp contracts, and common patterns. A runnable demo lives at examples/sdk/record-failed-run.ts:
pnpm tsx examples/sdk/record-failed-run.tsLibrary usage
import { parseTraceFile, analyzeTrace, formatTerminalReport } from "@codepawl/tracepawl";
const trace = await parseTraceFile("examples/stale-context-edit.json");
const report = analyzeTrace(trace);
console.log(formatTerminalReport(report));analyzeTrace(trace) returns a FailureReport — see src/schema/failure.ts for the full shape.
Development
pnpm typecheck
pnpm test
pnpm lint
pnpm buildpnpm check runs the same typecheck + lint + format-check gate that CI enforces.
Project status
v0 CLI analyzer and local recorder are functional. Five example traces are included — one per category — and all five resolve to their real failure category at confidence ≥ 0.80. Diagnosis is rule-based and deterministic.
Docs
docs/RECORDER.md— local command recorder guide.docs/SDK.md—TraceWriterproducer-side reference.docs/FAILURE_CATEGORIES.md— the five v0 categories in depth.docs/TRACE_SCHEMA.md—TraceEventunion andFailureReportshape, for adapter authors.
License
MIT © An Nguyen
