@codepawl/tracepawl

v0.5.0

Published

a month ago

Failure diagnosis and replay for coding agents.

Downloads

185

0High
0Medium
0Low

nxank4

typescript cli codepawl coding-agent failure-diagnosis trace postmortem

TracePawl

Failure diagnosis and visual postmortems for failed coding-agent runs.

TracePawl is a postmortem engine for autonomous coding agents. It takes a normalized JSON trace of a failed run, identifies the likely failure category, pinpoints where execution started to drift, surfaces the supporting evidence, and suggests a recovery action. It is the first focused product in the CodePawl stack.

What TracePawl is not

Not a generic observability dashboard
Not a LangSmith / Langfuse clone
Not a multi-agent runtime
Not a hosted or adapter-based trace collector
Not an LLM-as-judge service

Developer Preview flow

generic JSONL → strict validation → normalized TraceRun → Markdown postmortem → visual JSON → local demo HTML

The analyzer is deterministic: no LLM, no network. Rules live in src/analyzer/rules/. The event protocol is documented in docs/protocol/event-protocol-v0.md, with schema and profile registry in schemas/ and profiles/.

Quickstart

After installing the package, run:

tracepawl demo --open

When working from this repository, build first and use the local CLI:

bun install
bun run build
node dist/cli.js demo --open

The demo writes:

.tracepawl/runs/failing-run.trace.json
.tracepawl/reports/failing-run.md
.tracepawl/visual/failing-run.visual.json
.tracepawl/visual/failing-run.html

To run the Developer Preview path manually:

tracepawl import examples/generic-jsonl/failing-run.tracepawl.jsonl --out .tracepawl/runs/failing-run.trace.json
tracepawl analyze .tracepawl/runs/failing-run.trace.json --out .tracepawl/reports/failing-run.md
tracepawl visualize .tracepawl/runs/failing-run.trace.json --format json --out .tracepawl/visual/failing-run.visual.json
tracepawl visualize .tracepawl/runs/failing-run.trace.json --format html --out .tracepawl/visual/failing-run.html
tracepawl demo --open

From this repository without installing the package globally, use the built CLI:

bun run build
node dist/cli.js import examples/generic-jsonl/failing-run.tracepawl.jsonl --out .tracepawl/runs/failing-run.trace.json
node dist/cli.js analyze .tracepawl/runs/failing-run.trace.json --out .tracepawl/reports/failing-run.md
node dist/cli.js visualize .tracepawl/runs/failing-run.trace.json --format json --out .tracepawl/visual/failing-run.visual.json
node dist/cli.js visualize .tracepawl/runs/failing-run.trace.json --format html --out .tracepawl/visual/failing-run.html

Developer Preview scope

The public Developer Preview includes:

strict generic JSONL import for TracePawl event protocol v0
normalized TraceRun JSON output
deterministic Markdown postmortems
visual postmortem JSON
static local HTML viewer
local command recorder and existing analyzer fixtures

Screenshots are not committed yet. Use .tracepawl/visual/failing-run.html from tracepawl demo --open as the screenshot/video capture target; future assets should live under docs/assets/.

OpenCode-compatible preview

TracePawl includes a local, sanitized OpenCode-style demo export. It is not an official OpenCode partnership or hosted integration; it is a deterministic import path for compatible JSON session/export data.

tracepawl import --source opencode examples/importers/opencode-demo-session.json --out .tracepawl/runs/opencode-demo.trace.json
tracepawl analyze .tracepawl/runs/opencode-demo.trace.json --out .tracepawl/reports/opencode-demo.md
tracepawl visualize .tracepawl/runs/opencode-demo.trace.json --format html --out .tracepawl/visual/opencode-demo.html

The demo shows a coding-agent session with a user goal, file reads, a formatter edit, shell test commands, repeated validation failure, and a handoff. The resulting postmortem classifies the run as test_failure_misdiagnosis, links evidence to command/test/file events, marks the failure onset, and suggests a local recovery action.

From a built checkout, generate all three artifacts at once:

node dist/cli.js demo --open

OpenCode-compatible real demo

For screenshots or video, TracePawl also includes a local OpenCode-compatible sandbox harness. It prepares a disposable JavaScript repo with a deterministic failing test, captures git/test/session artifacts, builds an OpenCode-style JSON session artifact, and renders TracePawl outputs locally.

This is separate from the fixture demo above. It is still local-first and is not an official OpenCode integration or partnership. The real scenario requires your own local OpenCode-compatible CLI setup:

bun run build
scripts/demo/opencode-real-scenario.sh prepare
# run the printed OpenCode command, then:
scripts/demo/opencode-real-scenario.sh resume

When the final captured npm test passes, TracePawl reports initial_test_failure_resolved: initial failing test, focused src/eventValidator.js edit, passing verification, and a review-and-commit next step. If the final test still fails, the report shows the captured failure diagnosis and recovery action instead.

For a deterministic no-OpenCode smoke:

scripts/demo/opencode-real-scenario.sh dry-run

See docs/demo/opencode-real-scenario.md for prerequisites, resume flow, artifact paths, and screenshot/video checklist.

Sample output

TracePawl Failure Report
========================

Failure: stale_context_edit

Summary:
  Edit to `src/paginate.ts` failed because the agent's snippet did not match current file content (2 attempts).

Root cause:
  The agent read `src/paginate.ts` earlier in the run, then attempted to edit it using that cached snippet. Something changed the file (or the agent's snippet was inaccurate to begin with), so the `old_string` anchor no longer appears verbatim. Retrying with the same stale snippet cannot succeed.

Failure onset: evt_006

Evidence:
  - [evt_006, evt_008] Failed `file_edit` event(s) whose `old_string` did not match current file content — a stale-context signal.
  - [evt_002] Prior `file_read` event(s) for the same path — the agent's edit context likely went stale between the read and the failed edit.

Contradicting evidence:
  None

Suggested recovery:
  Action: re_read_file
  Re-read the file from disk, locate the intended target by current line content, and retry the edit with a narrower, freshly-anchored patch.
  Parameters: {"path":"src/paginate.ts"}

Confidence: 0.85

Related events: evt_006, evt_008, evt_002

Trace ID: run_stale_context_edit_001

v0 failure categories

| Category | What it catches | |---|---| | stale_context_edit | Agent edited a file using outdated context; old_string doesn't match current content. | | tool_misuse | Tool called with invalid arguments, missing required fields, or violated preconditions. | | loop_or_stall | Same tool call (or failing command) repeated ≥3 times with identical arguments. | | test_failure_misdiagnosis | Failed test points at one file; agent edits unrelated files or silences the assertion. | | initial_test_failure_resolved | Initial test failure is followed by a focused edit and passing verification rerun. | | unsafe_or_broad_edit | Narrow user request produced edits spanning many files, directories, or lines. |

See docs/FAILURE_CATEGORIES.md for the long form.

Try the other fixtures

node dist/cli.js analyze examples/tool-misuse.json
node dist/cli.js analyze examples/loop-or-stall.json
node dist/cli.js analyze examples/test-failure-misdiagnosis.json
node dist/cli.js analyze examples/unsafe-broad-edit.json

Realistic demo traces

examples/realistic/ contains small coding-agent failure demos with richer event timelines. They are useful for seeing both terminal reports and Markdown postmortems with timeline context around the failure onset:

node dist/cli.js analyze examples/realistic/stale-edit-after-file-change.json
node dist/cli.js analyze examples/realistic/stale-edit-after-file-change.json --format markdown --output /tmp/stale-edit-postmortem.md

Recording a real run

TracePawl can wrap a local command and write a trace that the analyzer can read:

tracepawl record --output tracepawl-runs/latest.json -- bun run test
tracepawl analyze tracepawl-runs/latest.json

Failed commands still produce valid traces. For example, with the installed package:

tracepawl record --output trace.json -- node -e "console.error('boom'); process.exit(1)"
tracepawl analyze trace.json

The recorder returns the child command's exit code after writing the trace, so the first command above exits non-zero. The analyzer can still inspect the trace and, when no deterministic rule matches, surfaces the failed command as an unknown report with command, cwd, exit, stderr/stdout snippets when present, and recovery guidance:

Failure: unknown

Summary:
  Recorded command `node -e console.error('boom'); process.exit(1)` failed with exit code 1.

Evidence:
  - [evt_001] Failed command
      Command: node -e console.error('boom'); process.exit(1)
      Cwd: /path/to/project
      Exit: exit code 1
      Stderr: "boom"

Suggested recovery:
  Action: request_human_input
  Inspect the failed command's stderr/stdout and any changed files.

When running from this repository before publishing or installing the package, use the built CLI directly:

bun run build
node dist/cli.js record --output tracepawl-runs/latest.json -- bun run test
node dist/cli.js analyze tracepawl-runs/latest.json

See docs/RECORDER.md for output-file behavior, latest.json, best-effort git_diff capture, exit codes, signal handling, and current recorder limits.

Current limitations

No dashboard or hosted service.
No external integrations (LangSmith, Langfuse, OpenTelemetry, Claude Code adapter, OpenCode-compatible adapter).
No runtime adapters yet (Claude Code, OpenCode-compatible, OpenTelemetry, LangSmith, Langfuse). Build traces via tracepawl record, the TraceWriter SDK (see docs/SDK.md), or externally produced JSON conforming to docs/TRACE_SCHEMA.md.
Importers are local converters for sanitized external logs; they are not hosted collectors.
Deterministic rules are the source of truth. Optional LLM review is opt-in, evidence-bounded, and separate from the rule diagnosis.
No replay engine. Replay-lite and full replay are post-v0.
No CloudPawl backend, authentication, billing, tenant UI, BYOK, advanced dashboards, or private deployment in this MVP. See docs/product/cloudpawl-roadmap.md for the hosted roadmap.

Optional LLM review

TracePawl does not call an LLM by default. tracepawl analyze and tracepawl demo --open remain deterministic and offline unless --llm-review is provided.

When enabled, LLM review is advisory only. The deterministic diagnosis remains the source of truth, and reports keep deterministic facts, rule diagnosis, and LLM review in separate sections.

Supported providers are mock, noop, openai-compatible, and ollama. The OpenAI-compatible provider works with compatible cloud or local runtimes such as llama.cpp server, LM Studio, LocalAI, vLLM, DeepInfra/OpenRouter-style endpoints when they expose compatible chat completions. Local OpenAI-compatible base URLs do not require an API key; non-local base URLs use TRACEPAWL_LLM_API_KEY.

Providers receive a bounded prompt: redacted run metadata, extracted facts, an evidence graph summary, the baseline deterministic diagnosis, and selected snippets keyed by event ID. TracePawl does not send the full raw trace by default, redacts .env-style secrets/API keys/bearer tokens from prompt fields, clips command output snippets, and bounds returned LLM event IDs to events present in the trace. If provider config is missing or unavailable, analysis still succeeds and the LLM review is reported as unavailable.

OpenAI-compatible local llama.cpp:

tracepawl analyze .tracepawl/runs/opencode-real.trace.json \
  --llm-review \
  --llm-provider openai-compatible \
  --llm-base-url http://localhost:8080/v1 \
  --llm-model Qwen2.5-Coder-7B-Instruct

Ollama:

tracepawl analyze .tracepawl/runs/opencode-real.trace.json \
  --llm-review \
  --llm-provider ollama \
  --llm-base-url http://localhost:11434 \
  --llm-model qwen2.5-coder:7b

Environment configuration:

TRACEPAWL_LLM_PROVIDER=ollama
TRACEPAWL_LLM_MODEL=qwen2.5-coder:7b
TRACEPAWL_LLM_BASE_URL=http://localhost:11434

CloudPawl roadmap

TracePawl is the local-first open-core product. CloudPawl is the future hosted layer for run history, visual postmortem sharing, search, review workflows, retention, team analytics, and governance. Developer Preview users who want hosted sharing or team workflows should follow the CloudPawl roadmap in docs/product/cloudpawl-roadmap.md; a public waitlist link is not committed yet.

Recording a trace with the SDK

The TraceWriter SDK lets you record events from an agent runtime — no hand-authored JSON required. It owns event IDs and ISO timestamps, validates constructor inputs, and writes JSON that round-trips cleanly through the parser.

import { TraceWriter, analyzeTrace, formatTerminalReport } from "@codepawl/tracepawl";

const writer = new TraceWriter({ agent: "my-agent", userGoal: "Fix paginate()" });
writer.recordFileRead({ path: "src/paginate.ts" });
writer.recordFileEdit({
  path: "src/paginate.ts",
  oldString: "items.slice(start, end - 1)",
  newString: "items.slice(start, end)",
  applied: false,
  error: "old_string not found in file",
});
writer.finalize();

console.log(formatTerminalReport(analyzeTrace(writer.toJSON())));

See docs/SDK.md for the full API reference, ID/timestamp contracts, and common patterns. A runnable demo lives at examples/sdk/record-failed-run.ts:

bun run tsx examples/sdk/record-failed-run.ts

Library usage

import { parseTraceFile, analyzeTrace, formatTerminalReport } from "@codepawl/tracepawl";

const trace = await parseTraceFile("examples/stale-context-edit.json");
const report = analyzeTrace(trace);
console.log(formatTerminalReport(report));

analyzeTrace(trace) returns a FailureReport — see src/schema/failure.ts for the full shape.

Development

bun run typecheck
bun run test
bun run lint
bun run build

bun run check runs the same typecheck + lint + format-check gate that CI enforces.

Project status

v0 CLI analyzer and local recorder are functional. Five example traces are included — one per category — and all five resolve to their real failure category at confidence ≥ 0.80. Diagnosis is rule-based and deterministic.

Docs

docs/RELEASE.md — release prep, execution, and recovery checklist.
docs/RECORDER.md — local command recorder guide.
docs/SDK.md — TraceWriter producer-side reference.
docs/FAILURE_CATEGORIES.md — the five v0 categories in depth.
docs/TRACE_SCHEMA.md — TraceEvent union and FailureReport shape, for adapter authors.
docs/importers/generic-jsonl.md — v0 contract for a future generic JSONL importer.
docs/product/developer-preview.md — Developer Preview scope and demo requirements.
docs/protocol/event-protocol-v0.md — event protocol for hand-written and custom-agent JSONL.