@lannguyensi/review-claim-gate
v0.1.1
Published
Claim-gate consumer for PR-review subagents, enforces a merge_approval policy backed by evidence-ledger evidence
Downloads
258
Readme
review-claim-gate
A claim-gate-shaped evaluator specialised for PR review / merge decisions.
Instead of asking "has the agent verified enough to assert a diagnosis?"
(that's claim-gate), this package asks "has the reviewer done enough
to approve this merge?", and encodes that question as a policy the
review subagent must actually pass, not a checklist it can quietly skip.
Why
Today's PR-review subagent workflow relies on the orchestrating Claude
session handing the reviewer a checklist in-prompt. Nothing enforces
that the reviewer actually covered every dimension, a shallow review
can return "APPROVE" after catching one obvious bug and missing four
subtle ones (see feedback_review_briefing.md for the 2026-04-14
Connect-Agent-Modal incident).
review-claim-gate closes that gap by turning the checklist into a
typed merge_approval policy with five prerequisites. The reviewer is
expected to collect evidence as it works (via evidence-ledger),
confirm the other four dimensions by passing flags, and only then the
gate returns allowed: true. A --json output makes the verdict
machine-parseable, so the parent session can refuse to proceed to
task_finish(approve) when the gate fails.
The merge_approval policy
| Prerequisite | Meaning |
| -------------------------------- | ----------------------------------------------------------------------------------------- |
| tests_pass | CI green or local npm test exit 0 |
| review_checklist_complete | Every rubric item (correctness, security/scope, tests, docs) ticked off |
| no_unresolved_review_comments | Every review comment resolved or replied to |
| scope_matches_task | PR diff stays inside the task scope, no drive-by refactors |
| evidence_logged | ≥1 evidence-ledger entry tagged with this PR's task id (session = task id) |
All five must be true for allowed: true. Missing prereqs surface in
next_steps so the reviewer sees exactly what's left.
Install
npm install @lannguyensi/review-claim-gateCLI
review-claim-gate check --task-id <id> [--pr <url>]
[--tests-pass]
[--review-checklist-complete]
[--comments-resolved] # → no_unresolved_review_comments
[--scope-matches-task]
[--evidence-logged] # forces true (highest precedence)
[--evidence-file <path>] # committed JSONL file; overrides auto-detect
[--ledger-db <path>] # default $EVIDENCE_LEDGER_DB or ~/.evidence-ledger/ledger.db
[--json] # machine-readable output
[--claim <text>] # custom claim string
review-claim-gate export --task-id <id>
[--from-session <session-id>] # export from grounding session into a task/branch-named file
[--ledger-db <path>]
[--out <path>] # default: write to stdout; auto-mkdirs parent
review-claim-gate describe # print prereq keys + descriptionsExit code for check: 0 if allowed:true, 1 otherwise.
Evidence source precedence (used by check)
--evidence-loggedforcesevidence_logged=trueregardless of any other signal.- Committed evidence file. If
--evidence-file <path>is given (and exists), count non-empty JSON lines there. Otherwise auto-detect./.agent-grounding/evidence/<task-id>.jsonlrelative toprocess.cwd(). - Local evidence-ledger DB (
session = <task-id>) as fallback.
When the reviewer used a grounding session id instead of the branch/task id,
use review-claim-gate export --task-id <branch-name> --from-session <gs-id>
to write the evidence file under the branch/task naming convention without
having to re-log the findings.
Explicit --evidence-file <path> with a non-existent path throws, silent fallback would be misleading. The auto-detect path is also the convention export writes to, so the reviewer round-trip is export → commit → check.
JSON shape
{
"taskId": "t-abc",
"pr": "https://github.com/org/repo/pull/42",
"evidenceEntries": 3,
"evidenceSource": "file",
"evidenceFilePath": ".agent-grounding/evidence/t-abc.jsonl",
"result": {
"claim": "PR for task t-abc is safe to merge",
"type": "merge_approval",
"allowed": false,
"score": 80,
"reasons": ["prerequisite not met: …"],
"next_steps": ["…"],
"prerequisites": {
"tests_pass": true,
"review_checklist_complete": true,
"no_unresolved_review_comments": true,
"scope_matches_task": true,
"evidence_logged": false
}
}
}evidenceSource is one of "file" | "ledger" | "forced" | "none". evidenceFilePath is set only when evidenceSource === "file".
Programmatic
import {
evaluateMergeApproval,
isMergeAllowed,
type ReviewContext,
} from "@lannguyensi/review-claim-gate";
const ctx: ReviewContext = {
tests_pass: true,
review_checklist_complete: true,
no_unresolved_review_comments: true,
scope_matches_task: true,
evidence_logged: true,
};
const result = evaluateMergeApproval(
"PR #42 is safe to merge",
ctx,
);
// result.allowed === true, result.score === 100Reviewer subagent template
Drop this into the review-subagent prompt you spawn from a parent Claude session. After the reviewer completes its human-readable verdict, it calls the CLI and returns the structured output so the parent can gate on it.
You are a rigorous PR reviewer. Review PR #<N> on branch `<BRANCH>` in
`<REPO>`. Task id: `<TASK-ID>`.
Walk the standard review checklist:
1. Correctness: effect/lifecycle bugs, race conditions, error paths.
2. Security / scope / least-privilege: scope creep, injection, secrets.
3. Permission gating: backend invariants for UI-surfaced controls.
4. Scope creep: anything unrelated to the task?
5. Open questions from the task description: each answered or deferred?
6. Backend invariants: does the called endpoint accept this shape?
7. Docs coherence: README/getting-started consistent?
8. Test coverage of the risky bits.
9. Integration touchpoints.
For every finding you surface, call `ledger add --session <TASK-ID>
--type <fact|rejected|unknown> --content "<one line>"` so the parent
session can audit the review. If you already started a dedicated grounding
session (for example `gs-agent-grounding-...`), either keep `session = <TASK-ID>`
as the default convention or export later via `review-claim-gate export --task-id <TASK-ID> --from-session <GROUNDING-SESSION-ID>`.
When you finish the checklist, call:
review-claim-gate check --task-id <TASK-ID> --pr <PR-URL> --json \
--tests-pass # if CI is green
--review-checklist-complete # only if you actually walked every dim above
--comments-resolved # only if no open comments remain
--scope-matches-task # only if the diff stays inside the task
Print the JSON verdict as the LAST line of your response. If `allowed:
false`, your human-readable verdict MUST be REQUEST CHANGES regardless
of how minor the missing prereq looks. The parent session will parse
the JSON and refuse to proceed to `task_finish(approve)` when the gate
fails.GitHub Action
A ready-to-use composite action that runs this CLI on a PR and posts
the verdict as a Check-Run lives in action/. Required
branch-protection on the merge-approval check turns the gate into a
hard merge blocker, a reviewer cannot click "Merge" when the JSON
verdict says allowed: false. See action/README.md
for the consumer workflow snippet and local act instructions.
Out of scope
- Wiring into
mcp__agent-tasks__task_finish(approve)server-side: separately tracked. - Auto-deriving
scope_matches_taskfrom a real diff: first pass is a manual flag. - Multi-reviewer aggregation: one reviewer per claim today.
Why a separate package and not a 10th claim-gate type?
The core claim-gate policies address agent diagnostic claims (root
cause, architecture, security, …) and share a single ClaimContext
shape (readme_read, process_checked, has_evidence, …). merge_approval
is a different ontology (tests_pass, review_checklist_complete, …).
Keeping it out of claim-gate's core lets the CLI, evidence-ledger
integration, and policy evolve without churning the stable diagnostic
policies. The MergeApprovalResult shape intentionally mirrors
ClaimResult, so consumers that already parse claim-gate output do not
need a second parser.
