@mizchi/vrt
v0.5.0
Published
Visual Regression Testing harness: multi-viewport snapshot + diff + workflow CLI driven by VLM/LLM analysis.
Readme
vrt
Visual Regression Testing toolkit — pixel diff, computed-style diff, a11y tree diff, agent-readable Markdown reports, and AI-powered CSS fix generation.
Requires Node 24+.
npm install -g @mizchi/vrt
# or, from this repo:
pnpm install && pnpm buildThe CLI is organized into verb groups. Run vrt <group> --help for
options.
| Group | Subcommands |
|---|---|
| vrt diff | html, png, elements, browsers, agent, runs |
| vrt check | a11y {contrast,touch,focus}, tokens, theme, perf, drift {component,pages} |
| vrt inspect | interact, explore, smoke |
| vrt stress | i18n, media |
| vrt scan | component, breakpoints |
| vrt build | component |
| vrt snapshot | [<url>...], approve, fix-prompt, stability, flipbook, report |
| vrt migration | compare, blind, subagent |
| vrt workflow | init, capture, verify, approve, graph, affected, introspect, spec-verify, expect |
| Standalone | vrt watch, vrt manifest, vrt diff-pr, vrt baseline, vrt api, vrt bench, vrt report, vrt skill |
The single-token commands from 0.4.x (vrt compare, vrt png-diff,
vrt theme-parity, …) remain as deprecation shims that forward to
the new names and print a one-line hint. See the CHANGELOG
for the full old → new mapping.
Features
- Pixel diff — pixelmatch v7 + heatmap generation
- Computed style diff —
getComputedStylecapture including hover/focus states - A11y tree diff — accessibility snapshot comparison
- CSS challenge bench — automated CSS deletion/recovery with detection rate tracking (96.7%)
- 2-stage AI pipeline — VLM (image → structured diff) + LLM (diff → CSS fix)
- Migration VRT — compare HTML before/after across responsive viewports
- Snapshot — URL-based multi-viewport capture with baseline diff
- Mask — selector-based masking for dynamic content (animations, counters)
- Crater integration — lightweight prescanner via BiDi (1.66x speedup, 0% false positive)
- Markup-assistance toolkit (10+ commands): build from screenshot, theme-parity, i18n stress, a11y contrast / touch / focus-order, media-variant adaptations, cross-browser parity, design-token conformance, interaction sequences.
Quick Start
The examples below assume the vrt command is already installed and available on your PATH.
pnpm install
# Run tests
pnpm test
# Compare two HTML files
vrt diff html before.html after.html --output reports/
# Render the diff into an agent-friendly Markdown report
vrt diff agent reports/diff-report.json > reports/diff.md
# Compare two existing PNG screenshots without Playwright
vrt diff png baselines/home.png snapshots/home.png
# Compare two URLs
vrt diff html --url http://localhost:3000/ --current-url http://localhost:8080/ \
--output reports/
# Snapshot URLs (creates baseline on first run, diffs on subsequent runs)
vrt snapshot http://localhost:3000/ http://localhost:3000/about/ --output snapshots/
# Use explicit labels when URL-derived names are not ideal
vrt snapshot http://localhost:3000/issues?severity=critical --label critical-issues
# Fail CI when diffs or new baselines are detected
vrt snapshot http://localhost:3000/ --fail-on-diff --fail-on-new-baseline --max-diff-ratio 0.01
# Promote accepted snapshot diffs to the new baseline
vrt snapshot approve --output snapshots/
# Load snapshot targets from vrt.config.json
vrt snapshot
# Dev inner loop with rich signal output (token-aware + cross-round)
vrt diff html baseline.html variant.html --tokens DESIGN.md --output reports/
vrt watch baseline.html variant.html --tokens DESIGN.md
# Author approval rules (sub-pixel deviations, intentional design exceptions, etc.)
vrt manifest add --selector .hero__body --max-px 2 --reason "AA artifact" --expires 2026-08-15
vrt manifest add --a11y-contrast --selector "button" --reason "decorative" --expires 2026-08-15
vrt manifest add --from-run .vrt/runs/diff-pr/ # auto-acknowledge sub-pixel deltas
vrt manifest list
# CI gate — declare routes in vrt.config.json, pin baselines, gate per PR
vrt baseline pin # on main
vrt baseline verify # in PR
vrt baseline post --pr owner/repo#123 # send summary.md as PR comment
# Legacy internal-dogfood verification loop (vrt's own e2e suite)
vrt workflow init
vrt workflow capture
vrt workflow verify
# Workflow loop with external-project routes/config
vrt workflow init --config ./vrt.config.json
vrt workflow capture --config ./vrt.config.json
# Prepare a migration diff packet for an external fixer
pkf run migration-subagent-prepare -- --report test-results/migration/migration-report.json --output test-results/migration/subagent-task.md
# Measure success rate from before/after migration reports
pkf run migration-subagent-evaluate -- --before-report test-results/migration/migration-report.json --after-report test-results/migration/migration-report.after.json
# Inspect blind migration scenarios
pkf run migration-blind-list
pkf run migration-blind-show --scenario shadcn-to-luna
pkf run migration-blind-prepare --scenario shadcn-to-luna -- --packet test-results/migration/blind/shadcn-to-luna/task.md
pkf run migration-blind-solo --scenario shadcn-to-luna -- --output test-results/migration/blind/shadcn-to-luna/solo/after-blind.html --report-output-dir test-results/migration/blind/shadcn-to-luna/solo-report
pkf run migration-blind-evaluate --scenario shadcn-to-luna -- --before-report test-results/migration/blind/shadcn-to-luna/migration-report.json --after-report test-results/migration/blind/shadcn-to-luna/solo-report/migration-report.json --rounds 1
# Mask dynamic content
vrt snapshot http://localhost:3000/ --mask ".marquee-container,.hero-badge"
# Detect broken baseline renders (e.g. CDN failed to load) — on by default
vrt diff html --dir fixtures/migration/tailwind-to-vanilla \
--baseline before.html --variants after.html
# Add --strict-baseline-sanity to exit non-zero when warnings fire,
# or --no-baseline-sanity to skip the check entirely.
# CSS challenge benchmark
pkf run css-bench --fixture page --trials 30
# Fix loop (break CSS → VLM analyze → LLM fix → verify)
pkf run fix-loop --fixture page --seed 42Task runner & spec gates (pkfire / pkspec)
Tasks live in Taskfile.pkl (typed; replaces the previous bash
justfile). Specs live in
Spec.pkl (Goals + Scenarios) and Test.pkl (smoke implementations).
# List every available task
pkf list
# Run a task (mirrors any old `just <name>` invocation)
pkf run smoke-all
pkf run vrt-test
# Spec gates
pkf run spec-check # pkspec check Spec.pkl Test.pkl
pkf run spec-render # render Spec.pkl → docs/SPEC.md
pkf run spec-run # execute every Test.pkl testInstall pkf / pkspec via nix flake:
nix run git+https://github.com/mizchi/pkfire -- list
nix run git+https://github.com/mizchi/pkspec -- check Spec.pkl Test.pklCLI Surface
Diff (compare two things)
vrt diff html <baseline> <variant> # HTML/URL pair → multi-viewport diff + report.json
vrt diff agent <report.json> # Render report.json as agent-friendly Markdown
vrt diff png <baseline.png> <current.png> # Direct PNG pixel diff + heatmap
vrt diff elements [options] # Element-level diff with shift isolation
vrt diff browsers <html|url> # chromium / firefox / webkit parity
vrt diff runs <dir...> # Aggregate multiple VRT runs into one tableSnapshot (URL → baseline + diff)
vrt snapshot <url1> [url2] ... # First run: baseline. Subsequent: baseline + diff
vrt snapshot approve # Promote *-current.png → *-baseline.png
vrt snapshot fix-prompt # Emit a subagent-ready prompt from snapshot-report.json
vrt snapshot stability <url...> # Run N iterations and report false-positive rate
vrt snapshot flipbook # Diff three-frame (baseline ↔ current ↔ heatmap) HTML flipbooks
vrt snapshot report # Render snapshot-report.json as MarkdownCheck (gates: a11y / tokens / theme / perf / drift)
vrt check a11y contrast <html> # WCAG AA contrast scan
vrt check a11y touch <html|url> # Touch target size (WCAG 2.5.5 / 2.5.8)
vrt check a11y focus <html|url> # Tab order vs visual order
vrt check tokens <html> # radius/spacing/z-index/shadow scale conformance
vrt check theme <html> # prefers-color-scheme dark / unthemed components
vrt check perf <html|url> # Web Vitals (CLS / LCP / FCP)
vrt check drift component <html> --selector .card
vrt check drift pages --selector .footer --files A.html B.html C.htmlBuild / Scan / Inspect / Stress (markup-assistance)
# Build component from a target screenshot, iterate until close.
vrt build component <target.png> <current.html>
# signals: bbox + heatmap regions + dominant fill + typography hints
# + spacing-fix table + palette diff + multi-state suspect flags.
# Detect components in a screenshot.
vrt scan component <screenshot.png> # Crop to standalone PNGs
vrt scan breakpoints <html-file> # Discover responsive breakpoints
# Scripted / exploratory interaction.
vrt inspect interact <html|url> --sequence <path.json>
vrt inspect explore <html|url> # Auto-discover declared actions and diff each
vrt inspect smoke <html|url> # A11y-driven exploratory smoke test
# Stress tests.
vrt stress i18n <html> # Text-node inflation overflow detection
vrt stress media <html> # forced-colors, reduced-motion, print, RTL, 200% zoomAll emit a self-contained Markdown report under --output-dir. Each
finding includes pasteable hex / px values + a heuristic remediation
hint. See docs/reports/2026-05-13-capability-survey.md for the full
scenario × coverage matrix.
Snapshot labels are query-aware by default, so /issues and /issues?severity=critical no longer share the same baseline name.
Use repeated --label flags to override labels explicitly when needed.
The same --label flag can be used with vrt snapshot approve to approve only selected labels.
Minimal vrt.config.json:
{
"baseUrl": "http://localhost:3000",
"routes": [
"/",
{ "path": "/issues?severity=critical", "label": "critical-issues" }
],
"outputDir": "test-results/snapshots/sample-webapp-2026",
"threshold": 0.1,
"failOnDiff": true,
"maxDiffRatio": 0.01,
"workflow": {
"captureSpec": "./e2e/vrt-capture.spec.ts"
}
}When vrt.config.json exists in the current directory, vrt snapshot loads it automatically. Use --config <path> to point at another file, and pass URLs or flags directly when you want CLI values to override config defaults.
vrt workflow init and vrt workflow capture also auto-load the same file, reuse baseUrl/routes, and accept workflow.captureSpec or --capture-spec <path> when you want a custom Playwright entrypoint.
Subagent-ready fix prompt
vrt snapshot fix-prompt reads the last snapshot-report.json and emits a structured task list that a coding agent can act on:
# Markdown prompt to stdout (default)
vrt snapshot fix-prompt --output test-results/snapshots
# Limit to the 5 worst diffs, write to a file
vrt snapshot fix-prompt --output test-results/snapshots --limit 5 --out fix-prompt.md
# Filter by label, minimum diff ratio, and emit JSON for programmatic use
vrt snapshot fix-prompt --label home --min-diff 0.01 --format jsonThe prompt includes per-task URL, viewport, diff ratio (with shift compensation), and relative paths to the baseline / current / heatmap PNGs plus the captured HTML, so a subagent can map the visual regression back to source code.
Measuring false-positive rate
vrt snapshot stability captures the same URLs across N iterations against a
baseline locked in on iteration 0, then reports how often comparisons showed a
non-zero diff. Useful for tracking renderer noise, animation leakage, or mask
gaps before turning on --fail-on-diff in CI:
# 3 iterations (default), any non-zero diff counts as a positive
vrt snapshot stability http://localhost:3000/ http://localhost:3000/about/
# Fail CI if the overall FP rate exceeds 5%
vrt snapshot stability http://localhost:3000/ \
--iterations 5 \
--fail-above-rate 0.05 \
--output test-results/stability
# Only count diffs above 1% as positives (filters out subpixel noise)
vrt snapshot stability http://localhost:3000/ --fp-threshold 0.01The run writes stability-report.json to the output directory with per-URL +
per-viewport FP rate, mean / max diff ratios, and shift-compensated max — well
suited to artifact upload + over-time tracking.
Capture backend (--backend)
By default vrt snapshot launches a local Chromium via Playwright. To offload
capture to Cloudflare Browser Run
without installing Playwright browsers in CI, switch the backend:
# Connect via CDP WebSocket; credentials come from env vars
CLOUDFLARE_ACCOUNT_ID=... CLOUDFLARE_API_TOKEN=... \
vrt snapshot --backend cloudflare http://localhost:3000/Resolution order for the backend selector:
--backend <local|cloudflare>CLI flagVRT_CAPTURE_BACKENDenv var- Default
local
For the Cloudflare backend, additional env vars are required:
| Variable | Required | Purpose |
|---|---|---|
| CLOUDFLARE_ACCOUNT_ID | yes | Account id for the CDP URL |
| CLOUDFLARE_API_TOKEN | yes | Token with Browser Rendering permissions |
| CLOUDFLARE_BROWSER_RUN_ENDPOINT | no | Override the default WS endpoint |
See examples/vrt-snapshot-cloudflare.workflow.yml for a complete GitHub
Actions template that skips the local Playwright install step.
Visualizing the VRT process — flipbooks + video
The VRT process can be saved as a self-contained HTML "flipbook" (PNGs embedded as base64, vanilla-JS play/pause/scrub controls). One file per scenario, no external assets, opens in any browser, attachable to PRs:
# 1. Fix-loop convergence (or any ordered PNG sequence)
vrt snapshot flipbook round-0.png round-1.png round-2.png \
--label "round 0" --label "round 1" --label "round 2" \
--title "Fix-loop convergence" --out fix-loop.html
# 2. Diff three-frame (baseline ↔ current ↔ heatmap) for every regressed entry
vrt snapshot flipbook --output test-results/snapshots
# → test-results/snapshots/flipbooks/<label>-<viewport>.html
# 3. Stability iterations as flipbook per (URL, viewport)
vrt snapshot stability http://localhost:3000/ \
--iterations 5 --flipbook --output test-results/stability
# → test-results/stability/flipbooks/<label>-<viewport>-stability.html
# 4. WebM recording of a smoke-test session (Playwright recordVideo)
vrt inspect smoke --url http://localhost:3000/ --max-actions 20 --record-video videos/
# → videos/<hash>.webmCommon flags: --delay <ms> controls per-frame duration (default 700),
--no-loop stops at the last frame, --no-autoplay opens paused.
Agent-friendly diff summary
When a coding agent is iterating with vrt diff html, the natural workflow
(see docs/reports/2026-05-12-dogfood-shadcn-luna.md)
is: read the worst-viewport PNGs side-by-side, then write a CSS patch.
vrt diff agent collapses the inputs the agent needs into a single
Markdown blob:
vrt diff html --dir fixtures/migration/shadcn-to-luna \
--baseline before.html --variants working.html \
--output test-results/iter1
vrt diff agent test-results/iter1/diff-report.json --max-viewports 2The output contains: a worst-first diff table, category totals across
viewports, fix candidates aggregated by (selector, property) with the
number of viewports each is flagged on, and absolute paths to the
baseline / current / heatmap PNGs for the worst N viewports — all in
one context window.
Workflow Commands
These commands manage state under the current project root: baselines/, snapshots/, output/, vrt-report.json, expectation.json, and spec.json.
Before running them, start the target app and point VRT_BASE_URL at it when needed.
The built-in capture workflow defaults to http://127.0.0.1:4174.
vrt workflow verify itself only compares the PNG and .a11y.json artifacts already present under baselines/ and snapshots/; it does not launch Playwright.
vrt workflow init
vrt workflow capture
vrt workflow verify
vrt workflow approve
vrt workflow report
vrt workflow graph
vrt workflow affected
vrt workflow introspect
vrt workflow spec-verify
vrt workflow expectIf vrt.config.json defines routes, the built-in capture spec uses those routes instead of the repo-local defaults.
The PR workflow also runs a deterministic snapshot false-positive check against fixtures/css-challenge using .github/vrt-snapshot-ci.config.json.
It creates baselines once, re-runs the same URLs, and summarizes test-results/snapshots/ci/snapshot-report.json with vrt snapshot report.
For migration workflows, vrt migration subagent packages the highest-impact diff per variant into a prompt for an external fixer, then compares before/after migration-report.json files to measure resolved/improved success rates.
Blind migration scenarios are declared in fixtures/migration/blind-scenarios.json, including the existing reset-css blind target and a scaffolded shadcn-to-luna/after-blind.html target for reproducible E3 runs. vrt migration blind supports list, show, prepare, solo, and evaluate so the blind run can emit a fresh compare report, generate a fixer packet, run a deterministic reference-CSS repair, and check the diff < 1% within 3 rounds contract without hand-assembling paths.
Workflow aliases are kept for ergonomics where they do not collide:
vrt init,vrt capture,vrt verify,vrt approvevrt graph,vrt affected,vrt introspect,vrt spec-verify,vrt expect
vrt report remains the detection pattern report, so verification output lives under vrt workflow report.
Capture routes for external projects
vrt workflow init|capture runs e2e/vrt-capture.spec.ts, which now resolves
its route list from your project rather than hard-coding vrt's own pages.
Drop a vrt.config.json next to your app with a capture block:
{
"baseUrl": "http://localhost:3000",
"capture": {
"routes": [
{ "name": "home", "path": "/", "waitFor": "main" },
{ "name": "about", "path": "/about" },
"/contact"
]
}
}Each route accepts name (defaults to a sanitized form of path), path, and
an optional waitFor CSS selector. Resolution order:
VRT_CAPTURE_ROUTESenv var (JSON-encoded array)--config <path>flag orVRT_CONFIG_PATHenv varvrt.config.jsonauto-discovered in the working directory- Built-in defaults (vrt's own UI — useful only when developing vrt itself)
# External project usage
vrt workflow init --config ./vrt.config.json --base-url http://localhost:5173
vrt workflow capture --config ./vrt.config.json
vrt workflow verifyAPI Commands
vrt api serve [--port 3456] # Start HTTP API server
vrt api status [--url http://localhost:3456]Compatibility aliases:
vrt serve->vrt api servevrt status->vrt api status
HTTP API
Start the server:
vrt api serve --port 3456The shared Hono app also exposes a Cloudflare Workers entry point at worker/index.ts.
Available endpoints:
GET /api/openapi.json— OpenAPI 3.1 spec for the current HTTP surfaceGET /api/status— server version, backends, and capabilitiesPOST /api/compare— compare baseline/current HTML or URLs across viewportsPOST /api/compare-renderers— compare Chromium vs Crater renderingPOST /api/reason— VLM/LLM reasoning pipeline for diff analysis and fixesPOST /api/smoke-test— random or reasoning-guided a11y smoke test
When running on Workers, /api/status also reports detected R2 / KV / D1 storage bindings.
TypeScript client:
import { VrtClient } from "@mizchi/vrt/client";
const client = new VrtClient("http://localhost:3456");
const status = await client.status();
const result = await client.compareHtml(
"<main><button>Before</button></main>",
"<main><button class='primary'>After</button></main>",
);Install: pnpm add @mizchi/vrt
compareUrls(...) is intended for public HTTP(S) targets. The API server rejects localhost and private-network URLs.
Architecture
HTML (file or URL)
│
├── Pixel diff (pixelmatch v7 → heatmap → diff ratio)
├── Computed style diff (getComputedStyle → property-level changes)
├── A11y tree diff (accessibility snapshot → structural changes)
└── Paint tree diff (Crater BiDi → layout tree comparison)
│
▼
Detection & Classification
│
▼
AI Fix Pipeline (optional)
Stage 1: VLM (cheap) → structured CHANGE report
Stage 2: LLM (accurate) → CSS fix suggestions
│
▼
Dry-run verification → rollback if worseEnvironment Variables
| Variable | Purpose | Default |
|----------|---------|---------|
| VRT_LLM_PROVIDER | LLM provider | gemini |
| VRT_LLM_MODEL | LLM model | provider default |
| VRT_VLM_MODEL | VLM model (OpenRouter) | qwen/qwen3-vl-8b-instruct |
| OPENROUTER_API_KEY | OpenRouter API key | — |
| GEMINI_API_KEY | Google AI API key | — |
| ANTHROPIC_API_KEY | Anthropic API key | — |
Project Structure
src/
vrt.ts # Unified public CLI entry point
vrt-command-router.ts # Root command routing + usage text
vrt-cli.ts # Stateful workflow CLI
vrt-client.ts # TypeScript client for the HTTP API
snapshot.ts # URL snapshot + baseline diff
migration-compare.ts # HTML/URL comparison across viewports
css-challenge-bench.ts # CSS deletion/recovery benchmark
fix-loop.ts # AI-powered CSS fix loop
vrt-reasoning-pipeline.ts # 2-stage VLM + LLM pipeline
heatmap.ts # Pixel diff + heatmap generation
mask.ts # Selector-based visibility masking
vlm-client.ts # OpenRouter / Gemini VLM client
llm-client.ts # Multi-provider LLM client
crater-client.ts # Crater BiDi WebSocket client
api-server.ts # Hono API server
fixtures/
css-challenge/ # 9 HTML fixtures for CSS bench
migration/ # Migration comparison fixtures
docs/
knowledge.md # Accumulated experimental findings
reports/ # Dated experiment reportsAgent Skills (APM)
vrt ships five coding-agent skills under .claude/skills/. They wrap
the most common workflows as standalone, agent-readable playbooks.
Other repos can install them via APM:
# Install a single skill into the current repo's .claude/skills/
apm install mizchi/vrt/.claude/skills/vrt-visual-diff
# Install all five
apm install mizchi/vrt/.claude/skills/vrt-visual-diff \
mizchi/vrt/.claude/skills/vrt-migration-eval \
mizchi/vrt/.claude/skills/vrt-css-fix-loop \
mizchi/vrt/.claude/skills/vrt-markup-synth \
mizchi/vrt/.claude/skills/vrt-regression-watch| Skill | Entry workflow | Use when |
|---|---|---|
| vrt-visual-diff | vrt diff html → vrt diff agent | One-shot "did this CSS edit visibly change something?" |
| vrt-migration-eval | vrt migration compare\|blind\|subagent | Framework / CSS-lib / build-system swap audit |
| vrt-css-fix-loop | fix-loop.ts (VLM-driven) | Closed-loop CSS auto-repair benchmark |
| vrt-markup-synth | vrt build\|scan\|check\|stress * | Screenshot → HTML/CSS, token / theme / i18n audits |
| vrt-regression-watch | vrt diff agent --previous --fail-on-regression | Per-PR or scheduled regression gate |
Each skill assumes the vrt CLI is on $PATH (this repo published as
a Node package, or built from source) and Node 24+. VLM-using skills
(fix-loop, markup-synth, migration subagent) additionally need
one of OPENROUTER_API_KEY / GEMINI_API_KEY / ANTHROPIC_API_KEY
depending on the model selected via VRT_VLM_MODEL.
License
MIT
