@shan8851/blackbox

v0.1.0

Published

12 days ago

OpenClaw-native inspection reports for failed, stalled, expensive, or weird agent runs.

0High
0Medium
0Low

shan8851

openclaw agents cli debugging observability agent-tools

OpenClaw Blackbox

Debug OpenClaw runs from your terminal. Built for agent operators, still readable by humans.

blackbox doctor                         # Can Blackbox see OpenClaw state?
blackbox list failures --limit 10       # Recent failed or suspicious cron runs
blackbox list sessions --agent all      # Local session artifacts across agents
blackbox find request --query "..."     # Find the session behind a user request
blackbox inspect --session-id <id>      # Generate a report for one run

Blackbox answers the useful question first:

What happened in this agent run, and why did it fail or get weird?

It reads local OpenClaw evidence — transcripts, trajectory files, cron history, checkpoints, locks, delivery metadata, and tool traces — then renders deterministic reports as terminal output, Markdown, JSON, or static HTML.

Small warning: this is a local debugging tool. Reports can include prompts, tool arguments/results, URLs, and local paths. Review before sharing.

Install

npm install -g @shan8851/blackbox

Or from source:

git clone https://github.com/shan8851/openclaw-blackbox.git
cd openclaw-blackbox
pnpm install
pnpm run build
pnpm link --global

Blackbox reads ~/.openclaw by default. To inspect another OpenClaw home:

OPENCLAW_HOME=/path/to/.openclaw blackbox doctor

Commands

| Command | What it does | | --- | --- | | blackbox doctor | Check resolved paths and visible OpenClaw evidence | | blackbox list failures | Show failed or suspicious cron runs | | blackbox list sessions | Show local session transcript/trajectory artifacts | | blackbox find request | Find the session behind a user message/query | | blackbox inspect | Build a focused report for one run/session/request |

Examples

Inspect the latest failed run for a cron job:

blackbox inspect --cron-job "Example nightly research job" --latest-error

Generate a full Markdown report:

blackbox inspect \
  --cron-job example-nightly-research \
  --cron-run example-session-001 \
  --view full \
  --out reports/example-context-overflow.md

Open a local HTML report:

blackbox inspect --query "summarise the failed run" --agent all --open

Write JSON for downstream tooling:

blackbox inspect --session-id example-session-001 --json-out reports/example.json

Find the run behind a Discord/Telegram/etc request:

blackbox find request --query "summarise the failed run" --agent all
blackbox find request --message-id example-message-001 --agent main
blackbox find request --query "summarise the failed run" --agent all --commands

inspect --query is conservative: it only proceeds when matches resolve to one unique session. If the text spans multiple sessions, run find request first and inspect by the chosen --session-id.

Output modes

inspect always prints a terminal snapshot and writes a Markdown report unless you choose another path.

blackbox inspect --session-id example-session-001
blackbox inspect --session-id example-session-001 --view full
blackbox inspect --session-id example-session-001 --out report.md
blackbox inspect --session-id example-session-001 --json-out report.json
blackbox inspect --session-id example-session-001 --html-out report.html --open

Markdown has two views:

--view simple — default triage report: outcome, likely cause, next checks, evidence summary.
--view full — fuller postmortem: identity, evidence files, timeline, context pressure, tool footprint, raw snippets, compactions, recommendations.

JSON output uses the full deterministic BlackboxReport object. Simple/full are render modes over the same report model.

Agent-friendly bits

Human list/find output includes copy-pasteable Inspect: commands.

blackbox list failures --commands
blackbox list sessions --agent all --commands
blackbox find request --query "failed to send" --commands

Discovery commands support --json:

blackbox doctor --json
blackbox list failures --json
blackbox list sessions --agent all --json
blackbox find request --message-id 123 --json

Exit codes are explicit enough for scripts:

0 success
1 unexpected failure
2 usage / invalid argument
3 not found / missing OpenClaw state
4 ambiguous request match

Configuration

| Environment variable | Purpose | | --- | --- | | OPENCLAW_HOME | OpenClaw state directory. Defaults to ~/.openclaw. | | BLACKBOX_REPORT_DIR | Default report output directory. Defaults to reports/. |

Explicit output flags always win over BLACKBOX_REPORT_DIR.

Failure labels

The classifier is deterministic and based on local OpenClaw failure shapes. Current cause labels include:

context_overflow
message_delivery_failed
timeout
session_lock_timeout
gateway_restart_interrupted
model/auth labels
tool read/write/edit/apply-patch labels
unknown_error

Evidence conditions are separate from causes: missing_transcript, transcript_deleted, transcript_reset, missing_trajectory, checkpoint_present, and session_write_in_progress.

Reports also include confidence: high|medium|low. That describes evidence quality, not agent quality.

Development

pnpm install
pnpm run validate
npm pack --dry-run

During development you can run the CLI directly:

pnpm dev doctor
pnpm dev inspect --session-id example-session-001

Screenshots

HTML report

Blackbox HTML report

Inspect snapshot

Blackbox inspect command

Failure list

Blackbox failure list

Doctor

Blackbox doctor command

License

MIT