altimate-receipts

v0.22.1

Published

2 days ago

Agent-work verification, not code review — a deterministic, cross-agent Report Card of what your coding agent actually did.

0High
0Medium
0Low

anandgupta1

ai agent-verification agent-work-verification coding-agent claude-code cursor copilot codex deterministic audit provenance ci receipts cli

🧾 receipts

See what your AI coding agent actually did — not just what it says it did.

Deterministic · local · reads the agent's own transcript · never grades your code.

npx altimate-receipts          # no install, no account — reads your last agent session

The problem

Your agent writes the code now. You review the diff — but the diff is only the result. It can't show you the work behind it:

  the agent's work    ████████████████████████████   1 hour · 90 commands · 40 files
  what you review     ███                             the final diff

The test that failed and got retried until green, the file quietly deleted, the check weakened to pass, the "all done!" that wasn't — it's all in the part you didn't read.

What receipts does

It reads the transcript your agent already saved and shows you what it actually did — cited to the line, deterministic, on your machine.

Your tests say whether the code is good. Receipts says what the agent did.

On a pull request

The comment receipts leaves on the PR — the events that deserve a human's eyes, then the full append-only record. No grade, no verdict:

Agent work record — claude-code · 16 files · spent ≈ $13.72 on this PR
push 2 · 7289522 — +0 new · 0 cleared · 1 open · transcript covers 16/16 changed files
Sanity-check
src/cli.ts:699 — Silently swallowed an error in cli.ts · error hidden · evidence tool-1983-0
Other 15 files: nothing detected.
tests: 564 passed (parsed from runner output in transcript)
| push | events | change cost | | :-- | :-- | :-- | | 1 · 2026-06-12 | +0 new · 0 cleared · 1 open | — | | 2 · 7289522 · 2026-06-12 | +0 new · 0 cleared · 1 open | $13.72 |
Checks ("not detected" means this check found nothing — not that nothing exists)
| check | reading | | :-- | :-- | | swallowed errors | 1 detected | | destructive ops | not detected | | CI/CD tampering | not detected | | … 14 more checks | not detected |
a record of the agent's process, not a code review · entries are never edited or removed · re-derivable (L1) · 0 model calls

Real output — the PR that built this very feature, wearing its own record: the masthead carries the PR's attributed agent spend, the ledger shows per-commit cost, and the one flagged event is the author's own swallowed error.

In your terminal

npx altimate-receipts prints the same record as a card:


  ╔══════════════════════════════════════════════════════════════════════════╗
  ║ 🧾  RECEIPTS — Agent Work Record                        proof, not vibes ║
  ╚══════════════════════════════════════════════════════════════════════════╝

  Session  Pull latest main branch
  Agent    claude-code · claude-fable-5
  Scope    44h 18m · 1k msgs · 407 tools · 342M tok · $713.90

  ┌─ RECORD ─────────────────────────────────────────────────────────────────┐
  │ 3 critical · 4 high · 2 medium                                           │
  └──────────────────────────────────────────────────────────────────────────┘

  CRITICAL
   ⛔ Destructive op: git checkout -- .github/meta/commit.txt
      data-loss risk
   ⛔ Destructive op: git reset --hard origin/main
      data-loss risk
   ⛔ Destructive op: rm -rf /tmp/codex-gaps
      data-loss risk
  HIGH
   ⚠️  Edited a file it never read: SPEC-0074-m75-store-ref-default.md
      clobber risk · SPEC-0074-m75-store-ref-default.md
   ⚠️  Force-pushed over remote history
      history overwrite
   ⚠️  Rewrote git history made earlier this session
      history rewritten
   ⚠️  Stuck loop wasted $8.75 / 3m 12s
      $8.75 · 3m 12s
  MEDIUM
   🔍 Bash failed 10× before succeeding
      10 retries
   🔍 Edit failed 4× before succeeding
      4 retries
     ▸ 1 minor (collapsed)

  EVIDENCE
   21 files changed · 55 edits · 308 commands · tests ran ✓ · 13 destructive ops · cache 100%

  ──────────────────────────────────────────────────────────────────────────
  ✅ Verified by Receipts  ·  deterministic  ·  0 model calls  ·  evidence, not judgement
  what it did — not whether it's correct. your tests are the oracle for success.

Also real, reproduced verbatim: a 44-hour receipts development session, graded by the tool it was building — the destructive ops, the force-push, and the $8.75 stuck loop are all the author's own. A clean session reads nothing detected (not a ✅ pass), because "not detected" is a fact about what was checked, never a verdict on your code.

What it catches that the diff and green CI don't

"Tests pass" — did they? The run that printed FAILED right before the agent declared success.
Claimed vs. actually done. "Committed and pushed," "added the validation" — checked against the trace.
Destructive ops. rm -rf, force-pushes, history rewrites.
Gamed checks. A weakened linter or tsconfig, an edited grader or test assertion.
Quiet churn. Files edited then reverted; loops that burned spend with nothing to show.
Half-finished work. Agent TODOs left open, deferred-for-later markers, and net-new stubs introduced in this change (agent-todo-incomplete, deferred-marker-introduced, net-new-stub).
Unverified completion. Acceptance criteria with no covering evidence, and "green" runs with a weak or self-authored oracle (weak-oracle, self-tested — behind RECEIPTS_EXPERIMENTAL_DETECTORS).
Claims vs. the world. External-state claims ("opened the PR", "cut the release") reconciled against the GitHub API (receipts reconcile, opt-in).

Every finding is cited to a line in the transcript, or it doesn't ship.

Beyond the record, receipts can also produce a work-verdict (receipts verdict — VERIFIED / INCOMPLETE / WRONG / ESCALATE, from the facts + your .receipts/asserts.json policy), fold external scanner results (SARIF/semgrep) into the receipt, and — when the session ran through altimate-router — carry an independent witnessed attestation of the model's decisions (receipts verify --witness).

Why you can trust it

Deterministic — no LLM in the path. Findings come from rules over the transcript. Same session in, same account out. There is nothing to hallucinate.
It never grades your code. It reports what happened, never whether it's good — no letter grade, no "do not merge." Your tests stay the judge of correctness.
Near-zero false positives — measured, not claimed. 100% precision on a 70-session labeled corpus; a 1% flag rate across 1,200 real local sessions, every flag adjudicated as a true positive, zero confirmed false positives (docs/eval.md).
Local-first. Runs entirely on your machine. No account, no upload, no telemetry.
Works with your agent. Claude Code, Codex, Cursor, OpenClaw — one tool, one format.

Try it (10 seconds)

npx altimate-receipts            # your most recent agent session
npx altimate-receipts --list     # choose from recent sessions, any agent

No install and no signup — it reads transcripts that are already on your disk.

Add it to your repo's PRs

npx altimate-receipts init

Commit the files it writes and merge the PR. From then on, every agent session attaches its Work Record to the PR automatically, before the push — contributors install nothing. The hook rides along with git clone, the CLI is fetched on demand, and publishing a new release updates everyone. Every path is best-effort: a missing session or an unreachable registry never blocks a push.

Humans pushing from a terminal, other agents, or repos that won't commit .claude/ config → the onboarding guide has a one-command fallback for each.

Live guard (optional — prevention, not just record)

The Work Record is post-hoc — it documents what the agent did. The live guard is the pre-execution arm: a PreToolUse hook that classifies a tool call before it runs and either blocks it, asks you to confirm, or just records it.

npx altimate-receipts init --guard

It uses the same FP-aware policy as the Work Record (policy/guardrails.policy.json), in three tiers:

deny — never-legitimate catastrophes only (rm -rf / / ~ / .git / .receipts, force-push to a protected branch, mkfs, DROP DATABASE). Hard-blocked.
ask — ambiguous-but-dangerous (rm -rf of an unknown dir, git reset --hard, --no-verify, pipe-a-script-from-the-internet, force-push to a feature branch). Routed to your normal confirmation prompt — never silently blocked.
warn — surfaced + recorded, never blocked (editing a CI workflow / a grader / an .env).

Blocked calls never reach the transcript, so the guard records each deny/ask/warn into the Receipt — the highest-signal line a record can carry: what the agent was stopped from doing.

Active gating is your call — one env var, three modes:

RECEIPTS_GUARD=watch   # shadow: narrate what it WOULD block ("👁️ WOULD BLOCK …"), never block
RECEIPTS_GUARD=on      # enforce (the default once installed) — deny/ask as above
RECEIPTS_GUARD=0       # off without un-installing (fail-open, records nothing)

watch is the on-ramp: see the policy fire on your real traffic, build trust, then flip to on. Either way the guard is fail-open by contract — any error, or the opt-out, allows the call. The post-hoc Record is the backstop, so live enforcement favors letting work through.

Going deeper (optional)

receipts --json — a portable, vendor-neutral in-toto Receipt carrying the same evidence, for feeding other tooling (schema).
Sign it. A Receipt can be Sigstore-signed and posted as a "Verified by Receipts" PR check (docs), then re-derived from its transcript to prove it wasn't hand-edited (trust model). Opt-in — for when compliance or settlement needs make it concrete.
More surfaces. receipts guardrails (prevention rules for AGENTS.md), receipts trends (what your agent gets wrong over time), receipts mcp (for IDEs/agents, docs), SARIF for code-scanning, badges. Run receipts --help.

How it works

the agent's own local transcript (JSONL / SQLite)
  → adapter        normalize each agent's format to one session model (tool calls raw)
  → spans          edits · commands · reads · cost · destructive ops
  → detectors      deterministic, file:line-cited findings
  → Work Record    the human-readable account (default), or a signed --json Receipt

One pipeline, every agent, nothing leaving your machine. The full set of binding constraints (deterministic, evidence-not-judgement, near-zero-FP, local-first) lives in SPEC-0000.

Status

🚧 Early, and working today across Claude Code, Codex, Cursor, and OpenClaw. The detection engine is measured for fidelity (see trust, above); the roadmap lives in specs/. Receipts audits its own development — every PR in this repo carries its own Work Record.

How we build

Spec-driven: every change starts from a spec in specs/ (template, contributing). See docs/ARCHITECTURE.md for the pipeline and the five extension surfaces (adapters, detectors, test parsers, guard rules, renderers) — each has a one-seam checklist and a Claude skill (/add-detector, /add-adapter, /add-test-parser, /add-guard-rule).

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme