altimate-receipts
v0.22.1
Published
Agent-work verification, not code review — a deterministic, cross-agent Report Card of what your coding agent actually did.
Maintainers
Readme
🧾 receipts
See what your AI coding agent actually did — not just what it says it did.
Deterministic · local · reads the agent's own transcript · never grades your code.
npx altimate-receipts # no install, no account — reads your last agent sessionThe problem
Your agent writes the code now. You review the diff — but the diff is only the result. It can't show you the work behind it:
the agent's work ████████████████████████████ 1 hour · 90 commands · 40 files
what you review ███ the final diffThe test that failed and got retried until green, the file quietly deleted, the check weakened to pass, the "all done!" that wasn't — it's all in the part you didn't read.
What receipts does
It reads the transcript your agent already saved and shows you what it actually did — cited to the line, deterministic, on your machine.
Your tests say whether the code is good. Receipts says what the agent did.
On a pull request
The comment receipts leaves on the PR — the events that deserve a human's eyes, then the full append-only record. No grade, no verdict:
Agent work record — claude-code · 16 files · spent ≈ $13.72 on this PR
push 2 ·
7289522— +0 new · 0 cleared · 1 open · transcript covers 16/16 changed filesSanity-check
src/cli.ts:699— Silently swallowed an error in cli.ts · error hidden · evidencetool-1983-0Other 15 files: nothing detected.
tests: 564 passed (parsed from runner output in transcript)
| push | events | change cost | | :-- | :-- | :-- | | 1 · 2026-06-12 | +0 new · 0 cleared · 1 open | — | | 2 ·
7289522· 2026-06-12 | +0 new · 0 cleared · 1 open | $13.72 |Checks ("not detected" means this check found nothing — not that nothing exists)
| check | reading | | :-- | :-- | | swallowed errors | 1 detected | | destructive ops | not detected | | CI/CD tampering | not detected | | … 14 more checks | not detected |
a record of the agent's process, not a code review · entries are never edited or removed · re-derivable (L1) · 0 model calls
Real output — the PR that built this very feature, wearing its own record: the masthead carries the PR's attributed agent spend, the ledger shows per-commit cost, and the one flagged event is the author's own swallowed error.
In your terminal
npx altimate-receipts prints the same record as a card:
╔══════════════════════════════════════════════════════════════════════════╗
║ 🧾 RECEIPTS — Agent Work Record proof, not vibes ║
╚══════════════════════════════════════════════════════════════════════════╝
Session Pull latest main branch
Agent claude-code · claude-fable-5
Scope 44h 18m · 1k msgs · 407 tools · 342M tok · $713.90
┌─ RECORD ─────────────────────────────────────────────────────────────────┐
│ 3 critical · 4 high · 2 medium │
└──────────────────────────────────────────────────────────────────────────┘
CRITICAL
⛔ Destructive op: git checkout -- .github/meta/commit.txt
data-loss risk
⛔ Destructive op: git reset --hard origin/main
data-loss risk
⛔ Destructive op: rm -rf /tmp/codex-gaps
data-loss risk
HIGH
⚠️ Edited a file it never read: SPEC-0074-m75-store-ref-default.md
clobber risk · SPEC-0074-m75-store-ref-default.md
⚠️ Force-pushed over remote history
history overwrite
⚠️ Rewrote git history made earlier this session
history rewritten
⚠️ Stuck loop wasted $8.75 / 3m 12s
$8.75 · 3m 12s
MEDIUM
🔍 Bash failed 10× before succeeding
10 retries
🔍 Edit failed 4× before succeeding
4 retries
▸ 1 minor (collapsed)
EVIDENCE
21 files changed · 55 edits · 308 commands · tests ran ✓ · 13 destructive ops · cache 100%
──────────────────────────────────────────────────────────────────────────
✅ Verified by Receipts · deterministic · 0 model calls · evidence, not judgement
what it did — not whether it's correct. your tests are the oracle for success.Also real, reproduced verbatim: a 44-hour receipts development session, graded by the tool it was building — the destructive ops, the force-push, and the $8.75 stuck loop are all the author's own. A clean session reads
nothing detected(not a ✅ pass), because "not detected" is a fact about what was checked, never a verdict on your code.
What it catches that the diff and green CI don't
- "Tests pass" — did they? The run that printed
FAILEDright before the agent declared success. - Claimed vs. actually done. "Committed and pushed," "added the validation" — checked against the trace.
- Destructive ops.
rm -rf, force-pushes, history rewrites. - Gamed checks. A weakened linter or
tsconfig, an edited grader or test assertion. - Quiet churn. Files edited then reverted; loops that burned spend with nothing to show.
- Half-finished work. Agent TODOs left open, deferred-for-later markers, and net-new stubs
introduced in this change (
agent-todo-incomplete,deferred-marker-introduced,net-new-stub). - Unverified completion. Acceptance criteria with no covering evidence, and "green" runs with a
weak or self-authored oracle (
weak-oracle,self-tested— behindRECEIPTS_EXPERIMENTAL_DETECTORS). - Claims vs. the world. External-state claims ("opened the PR", "cut the release") reconciled
against the GitHub API (
receipts reconcile, opt-in).
Every finding is cited to a line in the transcript, or it doesn't ship.
Beyond the record, receipts can also produce a work-verdict (receipts verdict —
VERIFIED / INCOMPLETE / WRONG / ESCALATE, from the facts + your .receipts/asserts.json policy),
fold external scanner results (SARIF/semgrep) into the receipt, and — when the session ran
through altimate-router — carry an independent
witnessed attestation of the model's decisions (receipts verify --witness).
Why you can trust it
- Deterministic — no LLM in the path. Findings come from rules over the transcript. Same session in, same account out. There is nothing to hallucinate.
- It never grades your code. It reports what happened, never whether it's good — no letter grade, no "do not merge." Your tests stay the judge of correctness.
- Near-zero false positives — measured, not claimed. 100% precision on a 70-session
labeled corpus; a 1% flag rate across 1,200 real local sessions, every flag adjudicated
as a true positive, zero confirmed false positives (
docs/eval.md). - Local-first. Runs entirely on your machine. No account, no upload, no telemetry.
- Works with your agent. Claude Code, Codex, Cursor, OpenClaw — one tool, one format.
Try it (10 seconds)
npx altimate-receipts # your most recent agent session
npx altimate-receipts --list # choose from recent sessions, any agentNo install and no signup — it reads transcripts that are already on your disk.
Add it to your repo's PRs
npx altimate-receipts initCommit the files it writes and merge the PR. From then on, every agent session attaches
its Work Record to the PR automatically, before the push — contributors install
nothing. The hook rides along with git clone, the CLI is fetched on demand, and
publishing a new release updates everyone. Every path is best-effort: a missing session
or an unreachable registry never blocks a push.
Humans pushing from a terminal, other agents, or repos that won't commit .claude/
config → the onboarding guide has a one-command fallback for
each.
Live guard (optional — prevention, not just record)
The Work Record is post-hoc — it documents what the agent did. The live guard is the pre-execution arm: a PreToolUse hook that classifies a tool call before it runs and either blocks it, asks you to confirm, or just records it.
npx altimate-receipts init --guardIt uses the same FP-aware policy as the Work Record (policy/guardrails.policy.json), in
three tiers:
- deny — never-legitimate catastrophes only (
rm -rf //~/.git/.receipts, force-push to a protected branch,mkfs,DROP DATABASE). Hard-blocked. - ask — ambiguous-but-dangerous (
rm -rfof an unknown dir,git reset --hard,--no-verify, pipe-a-script-from-the-internet, force-push to a feature branch). Routed to your normal confirmation prompt — never silently blocked. - warn — surfaced + recorded, never blocked (editing a CI workflow / a grader / an
.env).
Blocked calls never reach the transcript, so the guard records each deny/ask/warn into the Receipt — the highest-signal line a record can carry: what the agent was stopped from doing.
Active gating is your call — one env var, three modes:
RECEIPTS_GUARD=watch # shadow: narrate what it WOULD block ("👁️ WOULD BLOCK …"), never block
RECEIPTS_GUARD=on # enforce (the default once installed) — deny/ask as above
RECEIPTS_GUARD=0 # off without un-installing (fail-open, records nothing)watch is the on-ramp: see the policy fire on your real traffic, build trust, then flip to on.
Either way the guard is fail-open by contract — any error, or the opt-out, allows the call. The
post-hoc Record is the backstop, so live enforcement favors letting work through.
Going deeper (optional)
receipts --json— a portable, vendor-neutral in-toto Receipt carrying the same evidence, for feeding other tooling (schema).- Sign it. A Receipt can be Sigstore-signed and posted as a "Verified by Receipts" PR check (docs), then re-derived from its transcript to prove it wasn't hand-edited (trust model). Opt-in — for when compliance or settlement needs make it concrete.
- More surfaces.
receipts guardrails(prevention rules forAGENTS.md),receipts trends(what your agent gets wrong over time),receipts mcp(for IDEs/agents, docs), SARIF for code-scanning, badges. Runreceipts --help.
How it works
the agent's own local transcript (JSONL / SQLite)
→ adapter normalize each agent's format to one session model (tool calls raw)
→ spans edits · commands · reads · cost · destructive ops
→ detectors deterministic, file:line-cited findings
→ Work Record the human-readable account (default), or a signed --json ReceiptOne pipeline, every agent, nothing leaving your machine. The full set of binding
constraints (deterministic, evidence-not-judgement, near-zero-FP, local-first) lives in
SPEC-0000.
Status
🚧 Early, and working today across Claude Code, Codex, Cursor, and OpenClaw. The
detection engine is measured for fidelity (see trust, above); the roadmap lives in
specs/. Receipts audits its own development — every PR in this repo carries
its own Work Record.
How we build
Spec-driven: every change starts from a spec in specs/
(template, contributing).
See docs/ARCHITECTURE.md for the pipeline and the five extension
surfaces (adapters, detectors, test parsers, guard rules, renderers) — each has a one-seam
checklist and a Claude skill (/add-detector, /add-adapter, /add-test-parser, /add-guard-rule).
License
Apache-2.0 © altimate.ai
