@ironbee-ai/cli

v0.8.3

Published

10 hours ago

The CLI for IronBee — Verification and Intelligence Layer for Agentic Development

Downloads

23,207

IronBee ensures that AI agents verify their code changes before completing a task. When an agent edits code, it cannot finish until it exercises the affected paths through real tools — in the browser for frontend changes, via the Node.js V8 inspector for backend changes — and submits a passing verdict.

No more "it should work" — every change is tested.

IronBee also tracks every verification cycle — coding time, fix time, pass/fail rates, problematic files — and provides session and project-level analytics for LLM-powered semantic insights.

Browser mode (bdt_* tools): the agent navigates pages, clicks buttons, fills forms, takes screenshots, checks console errors.
Node mode (ndt_* tools, opt-in): the agent connects to a running Node process, sets V8 probes (tracepoint / logpoint / exceptionpoint) at the changed code paths, exercises them, and reads back snapshots or runtime logs.

A single Stop hook can drive multiple cycles in parallel — touching frontend and backend in the same change requires evidence for both before the task can complete.

Demo

https://github.com/user-attachments/assets/9d4e602b-6c05-4b48-89a8-3df429d10e00

Supported Clients

| Client | Status | |--------|--------| | Claude Code | Supported | | Cursor | Supported | | Codex | Planned | | OpenCode | Planned |

Quick Start

Install IronBee globally

npm install -g @ironbee-ai/cli

Set up a project

cd your-project
ironbee install

This auto-detects your AI client and writes:

Hook configuration (so the client calls IronBee automatically)
Verification skill/rules (so the agent knows the workflow — covers both browser and node cycles)
Two MCP server entries from the same browser-devtools-mcp package:
- browser-devtools (PLATFORM=browser, bdt_ prefix) — always active for frontend verification
- node-devtools (PLATFORM=node, ndt_ prefix) — installed but inert until you opt in
Permissions (mcp__browser-devtools__*, mcp__node-devtools__*)

Optional: backfill historical sessions

Already have weeks of Claude Code sessions on disk? ironbee import walks them and ships every session / activity / tool_call / file_change / analytics event to the IronBee Collector — so your dashboard fills with historical context the moment you finish installing. Already-tracked sessions (live or previously imported) are skipped automatically; pass --force to re-import.

Typical three-step flow:

# 1. Preview — zero POSTs, shows exact cost and event counts
ironbee import --since 30d --dry-run

# 2. Confirm and ship (interactive y/N prompt by default)
ironbee import --since 30d

# 3. Optional: cast a wider net later
ironbee import --all-projects --since 6m --concurrency 2

--dry-run always shows the exact cost_usd that will surface in your dashboard before you confirm — $342.18 is much less surprising when you know it's coming.

Common scenarios

| Scenario | Command | |---|---| | Onboarding — current project, last 30 days | ironbee import --since 30d | | Current project, full history | ironbee import | | One specific project from anywhere | ironbee import --projects /path/to/repo | | Multiple projects | ironbee import --projects /repos/auth,/repos/payments | | Every project on this machine | ironbee import --all-projects --since 6m | | Explicit date range (e.g. Q1 retrospective) | ironbee import --all-projects --from 2025-01-01 --to 2025-03-31 | | Single transcript file (debug / cherry-pick) | ironbee import --transcript ~/.claude/projects/-Users-me-foo/abc.jsonl | | CI / scripted onboarding (no prompt) | ironbee import --since 60d --yes | | Tune backend load | ironbee import --since 6m --concurrency 2 (or 16 for fast pipes) | | Force re-import a single session | ironbee import --transcript path.jsonl --force --yes |

Flag groups

Scope (mutually exclusive — pick at most one; default is the current directory):

--transcript <path> — single .jsonl file
--projects <p1,p2,...> — comma-separated absolute project paths
--all-projects — every directory under ~/.claude/projects/

Time range (mutually exclusive; default is no filter):

--since <duration> — 30d, 2w, 6m, 12h (relative to now)
--from <iso-date> [--to <iso-date>] — explicit window; --to defaults to now

Behavior:

--dry-run — print summary, make zero POSTs, exit 0
--yes — skip the confirm prompt
--force — bypass the "already tracked" skip rule
--concurrency <N> — parallel sessions (default 4, clamped to [1, 32]); also configurable via import.concurrency in ~/.ironbee/config.json or <project>/.ironbee/config.json

Optional: enable Node backend verification

ironbee enable-backend node

Run this once per project that has Node backend code you want IronBee to gate. It writes opinionated default verifyPatterns (e.g. server/**, pages/api/**, **/server.{ts,js,mjs,cjs}) under backend.node.verifyPatterns in the project config. From then on, edits to matching paths require Node-cycle verification (connect + probes/logs) alongside any browser-cycle verification.

To revert: ironbee disable-backend node. This empties verifyPatterns (no enforcement) but preserves any customizations you made to alwaysRequired / evidencePaths, so re-enabling later restores your tuned setup.

Optional: monitoring-only mode (no enforcement)

ironbee disable-verification

Turns off enforcement but keeps the telemetry path intact. Session lifecycle and tool-call events still flow to the IronBee Collector, but the agent never sees a verify-gate, skill, rule, or /ironbee-verify command — useful when you want observability without slowing the agent down. To re-enable: ironbee enable-verification.

The toggle re-renders all client artifacts (hooks, skill, rule, MCP servers, permissions) atomically. The change takes effect on the next agent session — restart your editor / agent after toggling.

Cursor: additional setup

Cursor requires manual activation of MCP servers after install:

Restart Cursor to load the new hooks and MCP config
Go to Settings → Tools & MCP and verify both browser-devtools and node-devtools are enabled
If a server shows as enabled but tools are unavailable, toggle it off and on

Note: This is a known Cursor limitation — MCP servers added via mcp.json may need manual activation.

That's it

The next time your AI agent edits code, IronBee will require verification before the task can complete — browser-cycle for frontend changes, Node-cycle for backend changes (if enabled), or both in parallel.

Commands

ironbee install [project-dir] [--client <name>] [--all]   Set up hooks and config; --all → batch across every registered project
ironbee uninstall [project-dir] [--client <name>] [--all] [-y]   Remove hooks and config; --all → batch wipe across every registered project (destructive, prompts unless --yes)
ironbee status [project-dir]                      Show verdict status for active sessions
ironbee verify [session-id]                       Dry-run verdict validation
ironbee analyze [session-id]                      Analyze session metrics (or all sessions)
ironbee enable-backend <runtime>                  Opt into backend verification (today: node)
ironbee disable-backend <runtime>                 Opt out (resets verifyPatterns to []; preserves customizations)
ironbee enable-verification                       Turn enforcement on (default state)
ironbee disable-verification                      Monitoring-only mode (no enforcement; sessions still ship to collector)
ironbee config get <key>                          Read a config value (default: merged effective value)
ironbee config set <key> <value>                  Write a config value; auto re-renders client artifacts when needed
ironbee config unset <key>                        Remove a config value; auto re-renders when needed
ironbee config list                               Print the entire config (merged / global / project)
ironbee config path                               Print the on-disk path of the config file
ironbee register                                  Add this project to the user-home inventory (no artifact writes)
ironbee unregister                                Remove this project from the user-home inventory (no artifact writes)

Projects inventory

ironbee install records each project it touches in ~/.ironbee/projects.json; ironbee uninstall removes it. The inventory powers two cross-project workflows:

ironbee install --all — explicit batch op that re-runs install on every registered project. Use after a global config change to propagate it everywhere; uses each project's currently detected clients (or pass --client <name> to override).
ironbee uninstall --all — destructive batch op that wipes ironbee from every registered project. Prompts with default-No before acting; pass --yes / -y to skip the prompt. Refuses without --yes in non-interactive contexts.
Prompt on global config writes — ironbee config set <key> <val> -g (and unset) on an artifact-affecting key (collector, verification, browser, backend, browserDevTools, nodeDevTools) lists up to 10 other registered project paths still on the prior state and asks Apply this change to these N projects now? [Y/n] (default Yes). Pass --apply-all / --no-apply-all to skip the prompt; non-TTY contexts skip it and print a hint pointing at install --all.

For pure inventory bookkeeping (no artifact writes):

ironbee register — adds the current project to the inventory. Useful for projects set up before this feature existed.
ironbee unregister — removes the current project from the inventory. Works on already-deleted project dirs.

Agent Commands (slash commands)

IronBee installs slash commands that the agent can use inside Claude Code or Cursor:

| Command | Description | |---------|-------------| | /ironbee-verify | Verify changes — focused on affected areas (default) | | /ironbee-verify full | Full verification — complete visual + functional + accessibility checklists | | /ironbee-verify visual | Visual-only — contrast, layout, spacing, fonts, images, theming | | /ironbee-verify functional | Functional-only — clicks, forms, navigation, data flow, error handling | | /ironbee-analyze | Run session analytics and provide LLM-powered semantic insights |

/ironbee-verify guides the agent through a systematic verification process. The default mode focuses on what changed, while full runs every checklist item. Use visual or functional to narrow the scope when you know what type of testing is needed.

Configuration

IronBee loads config from two locations (project deep-merges over global):

Global: ~/.ironbee/config.json
Project: <project>/.ironbee/config.json

{
  "ignoredVerifyPatterns": ["*.test.ts", "*.spec.ts"],
  "maxRetries": 5,

  "browser": {
    "verifyPatterns": ["*.ts", "*.tsx", "*.css"],
    "additionalVerifyPatterns": ["*.mdx"]
  },

  "backend": {
    "node": {
      "verifyPatterns": ["server/**/*.ts", "pages/api/**/*.ts"]
    }
  },

  "verification": {
    "enable": false
  },

  "fileChange": {
    "captureChangeset": true
  }
}

| Key | Description | Default | |-----|-------------|---------| | browser.verifyPatterns | Glob patterns for files requiring browser verification (replaces defaults) | 40+ code extensions | | browser.additionalVerifyPatterns | Extra browser patterns appended to defaults | [] | | backend.<runtime>.verifyPatterns | Glob patterns for files requiring backend verification, per runtime (today: node). Empty by default — opt in via ironbee enable-backend <runtime>. | [] | | backend.<runtime>.additionalVerifyPatterns | Extra backend patterns | [] | | ignoredVerifyPatterns | Patterns to exclude from verification (checked first, applies to all cycles) | [] | | maxRetries | Max retry attempts before allowing completion (single global counter regardless of how many cycles run) | 3 | | verification.enable | Master switch for enforcement. Inverse semantics from recording/jobQueue/collector — verification is the core feature, opt-out via enable: false. When disabled, ironbee runs in monitoring-only mode (no enforcement hooks, skill, rule, or MCP servers; only session/activity/tool_call telemetry flows to the collector). | true | | fileChange.captureChangeset | When true, every file_change event carries a hunks-only unified-diff changeset string (@@ headers + space/-/+ lines, no filename header — file_path already lives on the parent event). Off by default — the default tool_input whitelist deliberately strips file content from the wire; turning this on routes content through file_change instead. PreToolUse pre-reads the file when enabled so PostToolUse can produce a real before/after diff (Write/Edit on Claude; Write/StrReplace/Delete on Cursor). Skipped on binary content (NUL byte in first 4 KB). | false | | fileChange.maxChangesetBytes | Hard cap on the changeset string size. Diffs over the cap are sliced on a UTF-8 byte boundary and end with a \n... (truncated, N bytes omitted)\n footer so the collector POST stays within typical reverse-proxy body limits. | 65536 (64 KB) |

Migrating from older versions: the previous flat verifyPatterns / additionalVerifyPatterns at the top level is no longer supported. The config loader fails loudly on legacy shape — move them under browser.* (or the appropriate runtime under backend.<runtime>.*).

Editing config from the CLI (`ironbee config`)

You can edit either config file via the CLI instead of hand-rolling JSON:

# Read the effective (merged) value
ironbee config get collector.url

# Write to project config (default)
ironbee config set collector.url https://collector.example.com
ironbee config set maxRetries 5
ironbee config set verification.enable false
ironbee config set browser.verifyPatterns '["*.ts", "*.tsx", "*.css"]'

# Write to global config (~/.ironbee/config.json)
ironbee config set collector.apiKey sk-... --global

# Remove a value (idempotent — no-op if absent)
ironbee config unset collector.url

# Inspect
ironbee config list                # merged effective config
ironbee config list --project      # project file only
ironbee config list --global       # global file only
ironbee config path                # print the project config file path

Type coercion — set parses the value as JSON when it can (true/42/[…]/{…}) and falls back to a raw string when JSON parse fails. URLs and paths pass through unquoted; pass --json to force strict JSON parsing (e.g. when you want the literal string "42" instead of the number 42).

Smart artifact re-render — when a top-level key affects installed client artifacts (verification, collector, browser, backend, browserDevTools, nodeDevTools), set and unset re-render the client files (hooks, MCP entries, skill, rule, permissions) automatically — same code path enable-verification / enable-backend use. Other keys (maxRetries, recording, jobQueue, analytics, import, ignoredVerifyPatterns) are pure config flips that the next agent session picks up — no rerender needed.

Pass --no-rerender to skip the rerender on artifact-affecting keys (handy for scripted bulk edits — follow up with ironbee install to resync). If a rerender fails midway, the config file is rolled back to its prior bytes so disk state never diverges from installed artifacts.

Restart your editor / agent session after changing artifact-affecting keys — the host caches hook config at session start, so the new state takes effect on the next run.

Default verify patterns

By default, the browser cycle matches common code file extensions: .ts, .tsx, .js, .jsx, .css, .scss, .html, .py, .go, .rs, .java, .vue, .svelte, and many more. Backend file edits trigger browser verification by default since they often affect frontend behavior.

The node cycle has no default patterns — running ironbee enable-backend node writes opinionated defaults (server/**, pages/api/**, etc.) which you can customize after.

Non-code files like README.md, package.json, or .gitignore do not trigger any cycle.

Devtools MCP server config

IronBee installs two MCP server entries from the same browser-devtools-mcp package — one in browser mode (browser-devtools, bdt_ prefix) and one in Node mode (node-devtools, ndt_ prefix). Each can be customized independently.

For the browser server, use browserDevTools:

{
  "browserDevTools": {
    "mcp": {
      "url": "http://localhost:4000/mcp"
    }
  }
}

For the node server, use nodeDevTools:

{
  "nodeDevTools": {
    "env": { "NODE_INSPECTOR_HOST": "127.0.0.1" }
  }
}

You can mix-and-match: full config replacement via mcp, or just env-var additions via env.

{
  "browserDevTools": {
    "mcp": {
      "command": "node",
      "args": ["./my-server.js"],
      "env": { "MY_VAR": "value" }
    }
  },
  "browserDevTools": {
    "env": { "BROWSER_HEADLESS_ENABLE": "true", "OTEL_ENABLE": "true" }
  }
}

| Key | Description | |-----|-------------| | browserDevTools.mcp / nodeDevTools.mcp | Full MCP server config — used as-is when provided. Supports command+args (stdio) or url (HTTP) | | browserDevTools.env / nodeDevTools.env | Extra env vars merged into the default config. Only used when mcp is not provided |

Note: IronBee always sets TOOL_NAME_PREFIX (bdt_ / ndt_), TOOL_INPUT_METADATA_ENABLE=true, and PLATFORM (browser / node) — these cannot be overridden.

Verification Flow (multi-cycle)

When the agent tries to complete a task, IronBee runs these checks:

Were code files edited? — If no matching files were changed, the agent completes normally.
Which cycles are active? — IronBee matches each edited file against browser.verifyPatterns and (if you opted in) backend.<runtime>.verifyPatterns. A single file may activate both cycles; both then run in parallel.
Were the cycle's required tools used?
- Browser cycle: navigate, screenshot, accessibility snapshot, console check (all-of)
- Node cycle: connect; then either probe path ((put-tracepoint | put-logpoint | put-exceptionpoint) AND get-probe-snapshots) OR log path (get-logs)
Does a verdict exist? — The agent must submit a single verdict carrying evidence for every active cycle via ironbee hook submit-verdict.
Is the verdict valid? — Per active cycle: browser fields (pages_tested, console_errors, network_failures) and/or node fields (backend_node_processes_connected, backend_node_probes_set / backend_node_log_errors).
Pass or fail? — status: "pass" is honored only if every active cycle's evidence backs the claim. The gate overrides to fail if it doesn't.
Retry limit — After maxRetries failed attempts (default 3, single global counter), the agent is allowed to complete but must report unresolved issues.

Verdict format

Verdicts are submitted via echo '<json>' | ironbee hook submit-verdict:

{
  "session_id": "<your-session-id>",
  "status": "pass",
  "pages_tested": ["http://localhost:3000/dashboard"],
  "checks": ["form submits successfully", "new item appears in list"],
  "console_errors": 0,
  "network_failures": 0
}

On failure, include an issues array describing what went wrong:

{
  "session_id": "<your-session-id>",
  "status": "fail",
  "pages_tested": ["http://localhost:3000/dashboard"],
  "checks": ["form renders", "submit button unresponsive"],
  "console_errors": 2,
  "network_failures": 0,
  "issues": ["button click handler not firing", "TypeError in console"]
}

On pass after a previous fail, include a fixes array describing what was fixed:

{
  "session_id": "<your-session-id>",
  "status": "pass",
  "pages_tested": ["http://localhost:3000/dashboard"],
  "checks": ["form submits successfully", "new item appears in list"],
  "console_errors": 0,
  "network_failures": 0,
  "fixes": ["reattached click handler to submit button", "fixed TypeError in event handler"]
}

For a node-cycle verdict (probe path), use the backend_node_* fields instead of (or alongside) the browser fields:

{
  "session_id": "<your-session-id>",
  "status": "pass",
  "checks": ["POST /api/orders returned 201", "tracepoint at handler.ts:42 fired once"],
  "backend_node_processes_connected": ["pid:12345 (next-server)"],
  "backend_node_probes_set": [
    { "type": "tracepoint", "location": "src/api/orders.ts:42", "triggered": true }
  ],
  "backend_node_probe_snapshots_collected": 1,
  "backend_node_log_errors": []
}

If both cycles are active, populate browser fields and backend_node_* fields in the same verdict — both cycles' pass criteria must hold for the gate to honor status: "pass".

The agent must submit a verdict after every verification attempt — both pass and fail. File edits are blocked until a verdict is submitted after using devtools tools.

Session Isolation

Each AI session gets its own directory under .ironbee/sessions/<session-id>/:

.ironbee/sessions/<session-id>/
  actions.jsonl    # Event log (file edits, tool calls, verification markers)
  verdict.json     # Current verdict (cleared on code edit)
  state.json       # Session state (retries, active verification, trace ID, active fix, active activity, phase)
  session.log      # Debug log

This means parallel sessions (e.g., multiple Claude Code instances) don't interfere with each other.

Analytics

ironbee analyze provides metrics about verification sessions — how time is spent, how effective verifications are, and how confident we can be in the agent's code.

Usage

ironbee analyze <session-id>                    # single session analysis
ironbee analyze                                 # all sessions (project-level)
ironbee analyze --json                          # JSON output
ironbee analyze --detailed                      # include verdict details (checks, issues, fixes)
ironbee analyze --json --detailed               # JSON with verdict text for LLM semantic analysis
ironbee analyze <session-id> --json --detailed  # single session JSON with verdict details

The --detailed flag includes raw verdict text (checks, issues, fixes) in the output. This is designed for LLM-powered semantic analysis — use /ironbee-analyze in Claude Code or Cursor to have the agent interpret these details automatically.

Session Analysis

Phase Distribution

Each session is divided into three phases:

| Phase | What it measures | |-------|-----------------| | Coding | Time from session start to first verification, and between fix end and next verification start | | Verification | Time between verification_start and verification_end — browser testing | | Fixing | Time between fix_start and fix_end — fixing failed verifications |

Cycles

| Metric | Meaning | |--------|---------| | Verifications | Number of verification cycles in the session | | Fixes | Number of fix cycles (each fail verdict starts a fix) | | Avg verify | Average duration of a verification cycle | | Avg fix | Average duration of a fix cycle | | First verify | Time from session start to first verification |

Verification Quality

| Metric | Meaning | |--------|---------| | First-pass rate | Percentage of verification chains where the first verdict was pass | | Verdicts | Total verdict count (pass + fail) | | Avg retries | Average number of fail verdicts before pass per chain | | Avg console errs | Average console_errors across all verdicts | | Avg network fails | Average network_failures across all verdicts | | Avg pages tested | Average number of pages tested per verdict | | Avg checks | Average number of checks performed per verdict |

Code Changes

| Metric | Meaning | |--------|---------| | Total edits | Total file edit operations in the session | | Unique files | Number of distinct files edited | | Avg per verify | Average file edits before each verification | | Avg per fix | Average file edits during each fix cycle | | Hot Files | Top 5 most frequently edited files | | Problematic Files | Top 5 files with most edits during fix cycles | | Edit Churn | Files edited in 2+ separate fix cycles (root cause may not be resolved) |

Fix Effectiveness

| Metric | Meaning | |--------|---------| | Success rate | Percentage of fixes followed by a pass verdict | | Re-fail rate | Percentage of fixes followed by another fail verdict | | Fix/verify | Ratio of fix cycles to verification cycles (0 = no fixes needed) |

Scoring

Three scores summarize the session:

| Score | Formula | What it measures | |-------|---------|-----------------| | Efficiency | coding_time / (coding_time + fix_time) × 100 | How much productive time vs fix overhead. High = minimal wasted time on fixes | | Quality | (pass_pct + pages_pct + checks_pct + clean_pct) / 4 | How thorough and clean the verification was. Components: pass rate, page coverage (3+ = 100%), check depth (5+ = 100%), error cleanliness (0 errors = 100%) | | Confidence | pass_count / total_verdicts × 100 | How likely the agent's code works. Based on verdict pass rate |

Project Analysis

When run without a session ID, ironbee analyze aggregates metrics across all sessions:

| Metric | Meaning | |--------|---------| | Session History | Each session's summary — duration, cycles, outcome, score | | Avg duration | Average session duration across all sessions | | Avg verifies | Average verification cycles per session | | Avg fixes | Average fix cycles per session | | First-pass rate | Percentage of sessions where the first verdict was pass | | Fix success rate | Percentage of all fixes (across sessions) that succeeded | | Abandon rate | Percentage of sessions with interrupted verification/fix cycles | | Avg efficiency | Average efficiency score across all sessions | | Avg confidence | Average confidence score across all sessions | | Problematic Files | Top 5 files with most fix edits across all sessions |

Telemetry

IronBee collects anonymous usage data to help improve the product. No source code, file contents, or personally identifiable information is ever sent.

Events collected: install/uninstall, session start, verdict submissions (pass/fail status only), and verification gate decisions.

To opt out, set the environment variable:

export IRONBEE_TELEMETRY=false

Or set telemetryEnabled: false in ~/.ironbee/telemetry.json.

Development

Requires Node.js ≥ 22 (Node 20 hit EOL on 2026-04-30).

npm install
npm run build       # tsc + scripts/copy-assets.js (mirrors .md/.mdc + assets/ to dist/)
npm run lint        # ESLint
npm run test        # Jest (unit + integration + client tests)
npm run dev         # Run via ts-node

CI runs the full test suite across linux × x64/arm64, darwin (Apple Silicon), and windows × x64/arm64 with Node 22 and 24. The build script is pure Node (no bash) so npm run build produces identical output on every OS.

License

MIT