@rolepod/uiproof

v0.10.0

Published

13 days ago

Multi-platform UI/mobile automation for AI agents — MCP server + shipped skills.

0High
0Medium
0Low

nuttaruj

mcp model-context-protocol playwright ui-automation claude-code rolepod

rolepod-uiproof

rolepod-uiproof gives Claude Code, Cursor, Codex CLI, Gemini CLI, and Antigravity (CLI + IDE) a real browser/mobile driver — so the AI can actually click through your UI, audit accessibility, measure Core Web Vitals, check console errors, inspect network requests, diff screenshots, audit on-page SEO, and scaffold e2e tests instead of guessing.

One MCP server, one tool surface, eight skills you invoke from chat. Web is production-ready via Playwright; iOS and Android use Appium (same client as alumnium — needs a local Appium daemon + simulator/emulator, or a real device). No internal LLM — your Lead agent drives every action.

v0.8 sharpens visual_diff for animated UIs — settle (default on) scrolls + freezes the page so scroll-reveal / lazy content is captured instead of baselined blank; selector scopes a diff to one element; dimension mismatches degrade gracefully; plus a new extract_computed_style tool that reads an element's computed CSS for pixel-faithful redesign. 30 tools total (22 atomic + 8 composite). v0.7 added the measurement surface — Core Web Vitals, page-weight budgets, and on-page SEO. New in v0.7: /measure-cwv (LCP/INP/CLS via PerformanceObserver), /audit-page-budget (HAR-classified byte budget with third-party tagging), /audit-seo (DOM-based on-page SEO: title, meta, h1, lang, viewport, canonical, robots, OG/Twitter Cards, JSON-LD, hreflang, favicon). v0.5 had completed the UI verification surface (interaction + capture).

What it helps with

Verify a UI change in seconds. /verify-ui opens a real browser, runs your steps, checks your assertions, saves a screenshot + replay bundle (optionally HAR + video + trace + console logs).
Gate merges on "no regressions during this flow". /check-errors runs a flow with strict no_console_errors + no_failed_requests assertions baked in. PR-gate or post-merge smoke check.
Catch a11y regressions before merge. /audit-a11y runs axe-core against WCAG-A / AA / AAA and returns issues grouped by severity, with WCAG references and fix links.
Lock down the visual contract. /visual-diff captures a screenshot and compares against a named baseline under ./.rolepod-uiproof/baselines/. First call seeds; subsequent calls diff. By default it settles the page first (scrolls to trigger reveal/lazy content + freezes animations) so reveal-heavy pages are captured fully-rendered; pass selector to diff a single element.
Turn an interactive verify run into a real test file. /scaffold-e2e transcribes a replay bundle into Playwright Test, Vitest+Playwright, or pytest+selenium — with first-class codegen for every step + expect kind.
Reproduce + minimize a bug deterministically. /verify-ui with mode: "reproduce" runs ddmin step-elimination to find the shortest still-reproducing sequence.

The eight skills

| Skill | Wraps | What it does | |---|---|---| | /verify-ui | verify_ui_flow | Drive a session through steps, evaluate assertions (incl. console errors / failed requests / specific request made / response status), save evidence (screenshot / console / HAR / video / trace / a11y_tree) + replay bundle. mode: assert or reproduce with optional ddmin minimization. | | /check-errors | verify_ui_flow | Thin wrapper with strict no_console_errors + no_failed_requests baked in. Use as PR-gate or post-merge smoke. | | /audit-a11y | audit_a11y | axe-core audit at WCAG-A / AA / AAA. scope: "page" or scope: { ref }. Markdown or JSON report. | | /visual-diff | visual_diff | Pixel diff against a named baseline. Auto-seeds on first call. Configurable threshold + pixelmatch sensitivity. settle (default on) scrolls + freezes the page so scroll-reveal/lazy content is captured; selector scopes the diff to one element; size mismatches degrade gracefully (overlap diff + deltas, not a hard error). | | /scaffold-e2e | scaffold_e2e | Generate a runnable test file from a scenario + optional replay bundle. Three target frameworks. v0.5 codegen handles every step + expect kind. | | /measure-cwv | measure_cwv | Measure Core Web Vitals (LCP / INP / CLS) on a live page via PerformanceObserver. Verdict per web.dev good / needs-improvement / poor bands. Chromium-only. | | /audit-page-budget | audit_page_budget | HAR-classified byte budget per asset category (js/css/image/font) with third-party tagging. Compares against declared budget, returns graduated pass/warn/fail. | | /audit-seo | audit_seo | On-page SEO check via DOM inspection: title, meta description, h1, lang, viewport, canonical, robots, OG + Twitter Cards, JSON-LD validity, hreflang, favicon. |

Every skill is single-backend (D-024) — it calls the rolepod-uiproof server and only the rolepod-uiproof server. If the server is unavailable, the skill fails with a clear diagnostic. Multi-backend routing belongs in the parent rolepod plugin's phase skills, not here.

Standalone vs Combined

rolepod-uiproof works either as a standalone browser MCP for any project, or combined with the rolepod parent plugin (v2.7+) where it becomes the Verify phase provider for UI artifacts.

Standalone (default): use the 8 skills directly as atomic browser tools. Evidence saved under ./.rolepod-uiproof/artifacts/<run>/ with a manifest.json per Extension Protocol v1.

Combined with rolepod parent: when the parent's SessionStart hook drops the marker file <git-root>/.rolepod/parent-active (single line of content = the protocol version, e.g. v1), uiproof writes evidence to <git-root>/.rolepod/evidence/<ts>-rolepod-uiproof-<skill>/ instead, where parent's check-work skill auto-aggregates manifests into the verify report. The marker is read fresh per ArtifactStore construction; no env-var, no daemon. To force combined mode without a parent session: mkdir -p .rolepod && echo v1 > .rolepod/parent-active. No skill changes — same 30 tools, same 8 skills, smarter routing.

| Install | Unlocks | |---|---| | uiproof alone | Browser test, a11y audit, visual diff, e2e scaffold, error gate | | uiproof + rolepod parent | + verify-phase aggregation, evidence handoff to check-work |

The manifest.json is written in BOTH modes, so installing the parent later still lets historic artifacts get picked up. Baselines for /visual-diff always live in ./.rolepod-uiproof/baselines/ regardless of mode — they are user-curated configuration, not per-run evidence. The evidence/artifacts root self-ignores in git: on its first write uiproof drops a .gitignore (*) there, so a git add -A in your repo never sweeps transient screenshots + manifests into a commit. Baselines are left out of that ignore — commit them if you want a shared golden set.

Install

Pick your CLI. All install paths share the same MCP server (@rolepod/uiproof on npm) and the same skill set.

Claude Code (recommended)

# Install
claude plugin marketplace add nuttaruj/rolepod-uiproof
claude plugin install rolepod-uiproof@rolepod-uiproof

# Update
claude plugin marketplace update rolepod-uiproof
claude plugin install rolepod-uiproof@rolepod-uiproof

# Uninstall
claude plugin uninstall rolepod-uiproof@rolepod-uiproof
claude plugin marketplace remove rolepod-uiproof

The plugin auto-registers all eight skills (/verify-ui, /check-errors, /audit-a11y, /visual-diff, /scaffold-e2e, /measure-cwv, /audit-page-budget, /audit-seo) AND spawns the MCP server (npx -y @rolepod/uiproof) on session start.

Cursor IDE

Cursor's plugin marketplace is enterprise-only (Free / Pro plans cannot install marketplace plugins). For everyone else, drop the workspace MCP config:

# Per project — copy from this repo, or run:
mkdir -p .cursor
curl -fsSL https://raw.githubusercontent.com/nuttaruj/rolepod-uiproof/main/.cursor/mcp.json -o .cursor/mcp.json

# Or global (across every project)
mkdir -p ~/.cursor
curl -fsSL https://raw.githubusercontent.com/nuttaruj/rolepod-uiproof/main/.cursor/mcp.json -o ~/.cursor/mcp.json

Then fully restart Cursor — MCP servers load only at startup. Verify under Settings → MCP.

Skills are not auto-registered under Cursor (no unified plugin format for skills + MCP in one). The MCP tools are still available; invoke them by name in chat (Use verify_ui_flow to …).

Teams / Enterprise: add https://github.com/nuttaruj/rolepod-uiproof as a team marketplace under Settings → Plugins for one-click install with skills auto-registered.

Codex CLI

# Install
codex plugin marketplace add nuttaruj/rolepod-uiproof
codex plugin add rolepod-uiproof@rolepod-uiproof

# Update
codex plugin marketplace upgrade rolepod-uiproof
codex plugin add rolepod-uiproof@rolepod-uiproof

Codex reads the plugin from .agents/plugins/marketplace.json + .codex-plugin/plugin.json in this repo. Skills install to ~/.codex/skills/ (Codex's plugin loader handles registration).

Gemini CLI

Install directly from the GitHub repo:

# Install
gemini extensions install https://github.com/nuttaruj/rolepod-uiproof

# Update
gemini extensions update rolepod-uiproof

# Disable / re-enable
gemini extensions disable rolepod-uiproof
gemini extensions enable rolepod-uiproof

# Uninstall
gemini extensions uninstall rolepod-uiproof

Gemini CLI clones the repo into ~/.gemini/extensions/rolepod-uiproof/, reads gemini-extension.json at the root, spawns the MCP server (npx -y @rolepod/uiproof), and auto-discovers all 8 skills from skills/<name>/SKILL.md. After install, restart the CLI session — Gemini loads extensions on startup, and gemini extensions install is not supported in interactive mode.

Verify with /extensions list inside the CLI.

Antigravity (CLI + IDE)

Antigravity reads from ~/.gemini/ but at different sub-paths than Gemini CLI — MCP config and skills must be wired manually.

Step 1 — Skills:

# Copy uiproof skills into Antigravity's shared skills dir
mkdir -p ~/.gemini/skills
git clone --depth 1 https://github.com/nuttaruj/rolepod-uiproof /tmp/rolepod-uiproof
cp -r /tmp/rolepod-uiproof/skills/* ~/.gemini/skills/
rm -rf /tmp/rolepod-uiproof

If you already installed via Gemini CLI (gemini extensions install), symlink instead:

ln -s ~/.gemini/extensions/rolepod-uiproof/skills/measure-cwv ~/.gemini/skills/measure-cwv
# repeat for each of the 8 skills, or:
for d in ~/.gemini/extensions/rolepod-uiproof/skills/*/; do
  ln -s "$d" ~/.gemini/skills/$(basename "$d")
done

Step 2 — MCP server:

Open Antigravity Settings → Customizations → Open MCP Config (or edit ~/.gemini/config/mcp_config.json directly). Add the ui entry to the mcpServers map:

{
  "mcpServers": {
    "ui": {
      "command": "npx",
      "args": ["-y", "@rolepod/uiproof"]
    }
  }
}

Restart Antigravity. Verify the MCP server is connected via Settings → Customizations → MCP Servers panel.

Notes:

Antigravity's mcp_config.json is shared across all Agy tools (CLI + IDE) — one config, both surfaces.
Skills are auto-discovered from ~/.gemini/skills/ — no manifest needed.
The 30 MCP tools surface in chat the same way as in Claude Code / Cursor / Codex.

Direct npm (any MCP-aware tool)

Use this when your tool reads a standard mcpServers config (most non-CLI MCP clients):

{
  "mcpServers": {
    "ui": {
      "command": "npx",
      "args": ["-y", "@rolepod/uiproof"]
    }
  }
}

30 MCP tools (21 browser_* atomics + extract_computed_style + 8 composites including verify_ui_flow, audit_a11y, visual_diff, scaffold_e2e, extract_ui_state, measure_cwv, audit_page_budget, audit_seo) will appear in your client. Skills are not surfaced via this path — call the tools by name.

Quick start

After install, in your Claude Code / Cursor / Codex session:

/verify-ui https://example.com
  steps: []
  expect: text_visible "Example Domain", text_visible "Learn more"

Returns a run_id, passed: true, and a path under ./.rolepod-uiproof/artifacts/verify_<run_id>/:

.rolepod-uiproof/artifacts/verify_20260524T101512_a1b2c3d4/
├── final.png            screenshot at end of run
└── replay.json          replay bundle — re-runnable via `npx rolepod-uiproof replay …`

Convert that to a Playwright Test file:

/scaffold-e2e from .rolepod-uiproof/artifacts/verify_…/replay.json using playwright-test

Verify your setup

npx rolepod-uiproof doctor

✓ Node ≥20                       24.14.0
✓ Playwright Chromium installed  ~/Library/Caches/ms-playwright
✓ webdriverio (mobile client, v0.3)
• Appium server (roadmap v0.3)   Not reachable at http://127.0.0.1:4723/status
✓ Xcode (iOS, roadmap v0.3)      /Applications/Xcode.app
• Android SDK (roadmap v0.3)     Set ANDROID_HOME — needed only for Android
• SeleniumEngine (roadmap v0.4)  Not implemented — deferred to v0.4
✓ Artifact root writable

✓ = ready · • = optional / deferred · ✗ = blocker.

What's inside

30 MCP tools — 22 atomic browser/mobile primitives (browser_open, _close, _snapshot, _click, _type, _key, _scroll, _wait_for, _screenshot, _navigate, plus v0.5 additions _hover, _drag, _fill_form, _upload_file, _handle_dialog, _console, _network, _set_env, _evaluate, _pages, _switch_page, and v0.8 _extract_computed_style) + 8 composites (verify_ui_flow, audit_a11y, visual_diff, scaffold_e2e, extract_ui_state, and v0.7: measure_cwv, audit_page_budget, audit_seo). All prefixed * to namespace away from other MCP servers.
2 engines behind one interface — PlaywrightEngine for web (Chromium / Firefox / WebKit), AppiumEngine for iOS XCUITest + Android UIAutomator2. The Lead sees one unified A11yNode shape regardless of platform.
Stable refs with explicit invalidation (D-010) — every state-changing call invalidates prior refs; the engine returns a structured stale_ref error if you try to reuse one. No silent locator drift.
Replay bundles — every /verify-ui run writes a JSON replay you can re-run later with npx rolepod-uiproof replay <bundle.json>, agent-free.
No internal LLM (D-004) — your Lead agent makes every decision. We don't double-bill you for inference.

Use with parent rolepod

If you also use rolepod (the markdown plugin), its check-work, debug-issue, and review-code skills auto-route to /verify-ui, /audit-a11y, and /visual-diff when the rolepod-uiproof server is present. Nothing breaks if it isn't — parent falls back to Playwright MCP / Chrome DevTools MCP / manual verification.

The two are independent: install rolepod-uiproof standalone and get a complete experience via slash commands, or install both together and let parent's phase router pick the right backend automatically.

Docs

docs/sessions.md — session lifecycle, stale-ref semantics, multi-session
docs/artifacts.md — .rolepod-uiproof/ layout, run_id convention, replay bundle format
docs/recipes/ — verify-a-checkout-flow, audit-a11y-during-review, visual-baseline-workflow
CHANGELOG.md — release history with per-version "Not yet verified" notes mapped to milestones
CONTRIBUTING.md, SECURITY.md, CODE_OF_CONDUCT.md

MIT licensed — see LICENSE and THIRD_PARTY.md. Mobile AT normalizers are alumnium-inspired (UPSTREAM_TRACKING.md). Feedback + runtime reports for Cursor / Codex / Gemini install paths especially welcome via issues.