aclaim

v0.1.0

Published

a month ago

CI-style truth oracle for AI coding agents — mechanically re-runs an agent's stated claims (tests pass, build green, only these files changed, lint clean) in the actual repo and emits a deterministic pass/fail receipt. Verifies claims, not code quality; U

0High
0Medium
0Low

allenwu06

ai-agents coding-agent claude-code cursor ci verification receipt truth-oracle cli github-action

aclaim — did-it-actually

A lie-detector for what an AI coding agent says it did.

AI coding agents (Claude Code, Cursor, and the like) hand you a change with confident claims attached — "tests pass", "build is green", "only these files changed", "no lint errors". Those are claims, not proof. aclaim doesn't take their word for it: it actually re-runs each of those exact checks in your real codebase and prints a plain pass/fail receipt — every claim marked VERIFIED, REFUTED, or UNVERIFIABLE, with the command it ran, the output it captured, and a fingerprint of the exact change it checked.

npx <OWNER>/aclaim --spec .aclaim.yml

It does not judge whether the code is good or well-designed. It answers one narrow, purely mechanical question: did the agent actually do what it said? That narrowness is the point — there's no human re-review and no AI involved, so it can sit in your automated checks (CI — "continuous integration", the checks that run on every push) as a hard gate.

Why

When an AI agent opens a change, the description is a claim, not evidence. The expensive failure isn't the obviously-broken change — it's the confident-sounding one that says "all tests pass, only src/auth.js changed" while it also quietly rewrote src/db.js and left a test failing. People skim; the automated checks trust the code change, not the prose around it. Several independent builders are working on "verify the agent" (HN 47600204, HN 47322273) — the missing piece is a fully mechanical receipt with no human in the loop. That is this tool.

aclaim says nothing about how often agents are wrong. It ships with no benchmark, no detection rate, no accuracy number — there is no labeled dataset behind it, and inventing one would be dishonest. It just re-runs the checks the same way every time: the evidence in each receipt is the only thing it claims.

The claims spec

A small JSON or YAML-subset document. The agent (or you) declares what it did; aclaim checks each line. Full example: examples/aclaim.example.yml.

version: 1
# base: "origin/main"   # optional; auto-resolved if omitted
claims:
  - id: unit-tests
    type: tests-pass
    cmd: ["npm", "test"]          # ARG ARRAY — never a shell string
  - id: build
    type: build-green
    cmd: ["npm", "run", "build"]
  - id: scope
    type: files-changed
    files: ["src/feature.js"]
    mode: exact                    # exact = nothing undisclosed may change
  - id: lint
    type: lint-clean
    cmd: ["npm", "run", "lint"]
  - id: types
    type: typecheck-clean
    cmd: ["npx", "tsc", "--noEmit"]
  - { id: no-todo,   type: no-todo-introduced }
  - { id: no-secret, type: no-secret-introduced }

Claim types

| type | VERIFIED when | REFUTED when | |------|---------------|--------------| | tests-pass | cmd exits 0 | cmd exits non-zero | | build-green | cmd exits 0 | cmd exits non-zero | | lint-clean | cmd exits 0 | cmd exits non-zero | | typecheck-clean | cmd exits 0 | cmd exits non-zero | | files-changed (mode: exact) | exactly the claimed files changed vs base | a claimed file did not change, or a file changed that was not claimed (an undisclosed change — the highest-value check) | | files-changed (mode: subset) | every claimed file changed | a claimed file did not change (extra changes allowed) | | no-todo-introduced | no TODO/FIXME/XXX/HACK in added lines | such a marker is in an added line | | no-secret-introduced | no known secret pattern in added lines | a known secret pattern is in an added line |

Any of the above is UNVERIFIABLE (never VERIFIED, never REFUTED) if the check could not actually be performed: the command could not spawn, it timed out, the directory is not a git repo, the base ref does not resolve. "Changed vs base" includes staged, unstaged, and untracked files — an agent that left a file uncommitted is still caught.

Embedding claims in a PR

If you'd rather the agent put claims in the PR body, wrap them in a marker block (invisible in rendered Markdown) or a fenced ```aclaim block, then aclaim --pr pr-body.md:

<!-- aclaim:begin -->
version: 1
claims:
  - { type: tests-pass, cmd: ["npm","test"] }
<!-- aclaim:end -->

aclaim never infers claims from prose ("I ran the tests, they pass"). Inferring intent from English is exactly the unreliable thing it replaces — no block, no verification (it says so explicitly and exits non-zero).

The receipt

--format json emits a stable, sorted-key receipt:

{
  "schemaVersion": 1,
  "summary": { "total": 7, "verified": 6, "refuted": 1, "unverifiable": 0, "gate": "fail" },
  "git": { "base": "origin/main", "baseHow": "merge-base(HEAD, origin/main)", "diffHash": "sha256:…" },
  "claims": [
    {
      "id": "scope", "type": "files-changed", "verdict": "REFUTED",
      "reason": "UNDISCLOSED change: file(s) changed that were not in the claim (mode=exact): src/db.js",
      "command": null,
      "evidence": { "claimedFiles": ["src/feature.js"], "actualChangedFiles": ["src/db.js","src/feature.js"], "undisclosed": ["src/db.js"], "diffHash": "sha256:…" }
    }
  ],
  "receiptHash": "sha256:…",
  "meta": { "tool": "aclaim", "toolVersion": "0.1.0", "generatedAt": "…" }
}

Deterministic. Two runs that reach the same verdicts on the same diff produce the same receiptHash. Wall-clock fields (durationMs, generatedAt) are present for humans but excluded from the hash so they don't perturb it.
Content-bound. git.diffHash is a SHA-256 of the exact diff verified (including untracked content). A VERIFIED is only meaningful for that diff.
Tamper-evident ("signed-ish"). receiptHash lets a consumer detect a receipt edited after the fact (verifyReceiptHash()). It is not a cryptographic signature — there is no private key (shipping one would be theatre). Real signing is a CI-key concern, out of scope here.

Exit codes (usable as a CI gate)

| code | meaning | |------|---------| | 0 | receipt produced, gate passed (no REFUTED; UNVERIFIABLE does not fail unless --strict). | | 1 | receipt produced, gate failed — a claim was REFUTED (or --strict and something was UNVERIFIABLE). | | 2 | usage error / unreadable or invalid spec — no receipt could be produced (distinct from "verified, something is wrong"). |

REFUTED is only ever emitted from positive evidence that a claim is false. A check that could not run is UNVERIFIABLE, never a silent VERIFIED and never a REFUTED. It fails safe, in both directions.

Security model — read this (it RUNS your commands by design)

You cannot mechanically verify "tests pass" without running the tests. So aclaim executes the commands in the claims spec. That is the whole point, and the danger is contained explicitly, not hidden:

No shell. Ever. Commands run via spawn(file, argsArray) with shell:false. The spec must already be a tokenized array — a string cmd is rejected. There is no string interpolation into a shell and therefore no shell-injection surface in how aclaim invokes commands.
Only what's in the spec. There is no implicit "install deps" or "guess the build command". aclaim runs the exact cmd arrays the agent asserted, nothing else.
cwd-scoped. Commands run in the --repo directory, never an attacker-chosen path.
Per-command timeout. Every invocation is hard-bounded; on timeout the process group is killed (SIGTERM→SIGKILL) and the claim becomes UNVERIFIABLE.
Default-deny network (best-effort, opt-in per claim). Unless a claim sets allowNetwork: true, children inherit offline/no-proxy env hints (npm_config_offline, PIP_NO_INDEX, GIT_TERMINAL_PROMPT=0, HTTP(S)_PROXY → an unroutable sink, …). This is a strong hint, not a kernel sandbox — Node's stdlib cannot portably revoke a child's raw sockets. For hard isolation, run aclaim inside a network-restricted CI job/container (the normal deployment). This limitation is stated loudly on purpose.
Bounded capture. stdout/stderr are truncated to a cap (truncation is recorded) so a runaway command can't exhaust memory.

The command, once running, is still arbitrary native code — aclaim makes the invocation safe and the result honest; it is not a malware sandbox. Only point aclaim at claims specs your own agents/PRs produce, and run it least-privilege.

GitHub Action

A thin wrapper around the same verify core. Copy examples/aclaim.yml into .github/workflows/:

- uses: <OWNER>/aclaim@v0
  with:
    spec: ".aclaim.yml"   # or:  pr: "pr-body.md"
    # strict: "true"

It posts a GitHub annotation per REFUTED (and UNVERIFIABLE) claim, writes the receipt to the job summary and an optional file, and sets outputs (gate, verified, refuted, unverifiable, receipt-hash). No GITHUB_TOKEN or secret is needed — aclaim does no API/network I/O of its own. Unlike a security linter, the Action fails the step if it cannot produce a receipt: an unverifiable run is not a verified one.

Limitations (be clear-eyed)

It verifies CLAIMS, not correctness. A passing test command means the command exited 0 — not that the code is correct, complete, or well-designed. aclaim cannot tell you the agent solved the right problem; only that its stated checks hold.
UNVERIFIABLE is common and is not failure. No git, a missing test binary, a timeout, a shallow clone with no base — all yield UNVERIFIABLE. Use --strict (or a required check) if you want those to block.
The secret/TODO scan is a conservative diff scan. It matches a small set of high-confidence secret shapes in added lines only. It will miss obfuscated/encoded secrets and is not a substitute for a real secret scanner. Pre-existing TODOs/secrets are intentionally not flagged — the claim is "did this change introduce one".
Network deny is best-effort (see Security model). Run in a network-restricted job for a hard guarantee.
No accuracy/benchmark numbers are claimed. There is no labeled corpus; the receipt's evidence is the only assertion.
It is one layer. Use it alongside human review and your existing CI — not instead of them.

How it works

src/schema.js — pure parse + validate of the claims spec (no I/O).
src/exec.js — the sandboxed, no-shell command runner (Runner interface
- RealRunner + FakeRunner for tests).
src/git.js — read-only git diff/hash via the same no-shell runner; failure is first-class.
src/verifiers.js — pure dispatch: claim + repo state → verdict + evidence; fail-safe to UNVERIFIABLE.
src/receipt.js — deterministic receipt assembly + human/JSON format.
bin/aclaim.js / src/action.js — thin CLI / Action glue.

No LLM is used anywhere — this is mechanical by design. (If a future heuristic ever needs one it must be an injected interface with a Fake for CI; a real adapter would be Anthropic-only via env, never hardcoded, never run in CI.)

Development

npm ci
npm test          # vitest — NO network, NO API key required
# try it on a throwaway repo:
node bin/aclaim.js --help

The suite builds throwaway git repos in a temp dir and exercises three fixtures: one where every claim is true (all VERIFIED), one where claims are false (failing tests → REFUTED; an undisclosed file change → REFUTED; a TODO/fake-secret added → REFUTED), and one where the harness cannot decide (missing binary, timeout → UNVERIFIABLE, never VERIFIED). Every command in every test is a trivial local command (node -e, true/false, a missing binary) — never an external service.

Feedback

A wrong verdict — a false VERIFIED, a false REFUTED, a stuck UNVERIFIABLE — is the most valuable input. See FEEDBACK.md: add the aclaim-feedback label to an issue, or use the issue template. Reports are captured verbatim — read exactly as written, never paraphrased.

License

MIT — see LICENSE.