@humanbased/verifyflow
v0.1.0
Published
Evidence-backed delivery verification agent: does a PR deliver its linked Linear issue?
Downloads
138
Maintainers
Readme
VerifyFlow
Evidence-backed delivery verification for Linear-driven pull requests.
VerifyFlow answers one question:
Did this PR actually deliver what the linked ticket asked for?
It is not code review. Crosscheck reviews code and merge readiness; VerifyFlow runs after that, executes the ticket acceptance criteria, captures evidence, and reports the delivery verdict.
Linear issue + GitHub PR -> execution evidence -> delivery verdictThe CLI is verifyflow, with the short alias vf.
Quickstart
The fastest way to see a full run — no credentials, no checkout, fully offline:
npx github:humanbased-ai/verifyflow demovf demo runs bundled fixtures through the whole pipeline and writes a report you can read.
Then check your environment and (optionally) get a guided setup:
vf doctor # are gh / claude / LINEAR_API_KEY / Playwright / a sandbox runtime ready?
vf onboard # guided first-run setup; prints the exact fix command for anything missingvf onboard can save your Linear key to ~/.verifyflow/credentials.json (mode 0600). At runtime
LINEAR_API_KEY is resolved from the environment first, then that credentials file — so you never
have to export it again.
Install
After the first npm release:
npm install -g @humanbased/verifyflow
vf doctorBefore npm publish, run straight from GitHub:
npx github:humanbased-ai/verifyflow doctor
npx github:humanbased-ai/verifyflow run \
--fixtures fixtures/example-cli \
--linear EX-1 \
--pr example/greet#7 \
--level functionalVerify a PR — vf run
vf run \
--linear IN-123 \
--pr humanbased-ai/monorepo#456 \
--level auto \
--checkout \
--comment--linearis optional: if omitted, VerifyFlow derives the issue from the PR body's Linear link or the branch name.--checkoutclones the repo and checks out the PR head for real execution. Or point at an existing checkout with--workdir <dir>, or run offline with--fixtures <dir>.--commentposts the markdown report as a PR comment (idempotent — updates in place).--linear-writebackalso posts the delivery verdict back to the linked Linear issue.
Preview what VerifyFlow would do without checking out or executing anything (exits 0):
vf run --linear IN-123 --pr humanbased-ai/monorepo#456 --level auto --dry-runVerify a PR that has no resolvable ticket, against its own description (verdict capped at
manual_review_required):
vf run --pr humanbased-ai/monorepo#456 --allow-no-ticketLevels — --level
| Level | What it does | Needs |
| --- | --- | --- |
| functional | Command/test probes against a checkout | checkout/workdir |
| ui | AI-driven browser checks via Playwright | Playwright + a running app |
| journey | Multi-step end-to-end (backend + browser) | checkout + Playwright |
| auto | Picks the level from the ticket; downgrades to functional if no browser is available, and says why | — |
For ui / journey, point at the app with --base-url <url> (otherwise VerifyFlow tries to find a
deployment preview from the PR's checks), and supply an authenticated session with
--ui-auth <storageState.json> (create it with playwright codegen --save-storage=auth.json).
Merge policy — --policy
| Policy | Behavior |
| --- | --- |
| advisory | Default. Reports only, never blocks. |
| merge_gate | Exits non-zero on needs_fix, so a failing verdict blocks merge. |
| strict | Also blocks on manual_review_required / accept_with_risks. |
Other commands
| Command | What it does |
| --- | --- |
| vf run | Verify one PR against its Linear ticket (see above). |
| vf step | Orchestrator-facing step (Symphony/Jazzband): advisory-only, auto-resolves the issue, checks out + executes + comments, and prints one machine-readable JSON line to stdout. Never blocks. |
| vf watch | Independent daemon: watch a repo's Crosscheck-approved PRs, verify delivery, and (with --auto-merge) squash-merge on a clean accept. |
| vf report | Aggregate accumulated runs into quality metrics; --trend, --since, --repo, --level, --json filters. |
| vf replay <runId> | Re-run the verdict engine against a past run's stored evidence — no probes/tests re-execute. |
| vf show <runId> | Re-render a past run's report.md (or report.json). |
| vf signal <runId> | Pretty-print a past run's improvement-signal. |
| vf memory | Inspect reusable test-point memory: vf memory ls, vf memory show <key>, vf memory clear [--repo <o/r>] [--yes]. |
| vf init | Scaffold a verifyflow.config.json in the target repo (auto-detects npm / uv / go / cargo / make). |
| vf doctor | Check that the tools/env VerifyFlow relies on are ready. |
| vf onboard | Guided first-run setup; --non-interactive to skip prompts. |
| vf demo | Offline demo with bundled fixtures; --open to open the report. |
Full flag-by-flag reference: docs/commands.html. Run vf --help for the
same usage in the terminal.
Watch a repo's Crosscheck-approved PRs:
vf watch --repo humanbased-ai/monorepo --interval 120 # monitor + verify + comment
vf watch --repo humanbased-ai/monorepo --auto-merge --interval 120 # also squash-merge on acceptWhat works today
- One Linear issue against one GitHub PR.
- Acceptance-criteria extraction from the Linear issue (or the PR description with
--allow-no-ticket). functional,ui,journey, andautolevels.- Real command/test execution against a checkout or workdir.
- Browser-backed UI and journey checks through Playwright.
- Markdown and JSON reports, screenshots, traces, command logs, and reusable test memory.
- Idempotent PR comments and optional Linear writeback.
advisory/merge_gate/strictmerge policies.- Standalone CLI, orchestration step, and Crosscheck-approved watcher modes.
- Colorized terminal output in an interactive shell (auto-disabled when piped or under
NO_COLOR).
VerifyFlow is conservative: uncertainty becomes blocked or not_evaluable, not a fake product
failure.
Requirements
VerifyFlow stores no secrets. It reuses local tools and environment variables.
| Tool | Needed for | Required? |
| --- | --- | --- |
| Node.js >= 20 | CLI runtime | required |
| gh authenticated | GitHub PR context and comments | required |
| LINEAR_API_KEY (env or ~/.verifyflow/credentials.json) | Linear issue reads | required (or use --fixtures) |
| claude authenticated | LLM planning and judging; otherwise rules-only fallback | optional |
| docker or podman | Sandbox isolation for executing PR code (IN-555); without it, probes run on the host | optional |
| Playwright | ui and browser-backed journey runs | optional |
| npm, uv, etc. | Target repo setup and tests | as the target needs |
Roadmap
Next major layer: project-level verification.
Linear project -> ticket/PR matrix -> leveled runs -> evidence bundle -> project reportPlanned work:
- Read a Linear project and map tickets to PRs.
- Generate a coverage matrix across tickets, criteria, PRs, SHAs, and evidence.
- Run per-ticket functional/UI/journey checks.
- Produce one project-level implementation gap report.
- Keep stronger sandbox isolation for untrusted PR execution.
- Add opt-in Linear status transitions and follow-up issue filing.
Boundaries
VerifyFlow does not:
- review code or decide merge readiness; that is Crosscheck
- decompose tickets or dispatch coding agents; that is Symphony today and Jazzband next
- move money, send production email, or perform irreversible side effects
- replace CI
- provide native mobile evaluation yet
Toolchain fit
Symphony / Jazzband -> Crosscheck -> VerifyFlow
orchestration code review delivery verificationSymphony is the current Python orchestration layer. Jazzband is the planned open TypeScript/npm successor. VerifyFlow works with both, and also runs alone, by using public artifacts: GitHub PR metadata, Linear issue links, SHA-bound comments, CLI JSON, and evidence files.
More detail:
Development
npm install
npm run typecheck
npm test
npm run build
npm pack --dry-run