crabby-mcp
v0.1.0
Published
Automated UI usability lab — audit, walk, review, and fix your running app, every finding evidence-fenced.
Readme
Crabby
An automated UI usability lab. Audit, walk, and fix your running app — in plain English.
Crabby looks at your running web or desktop app the way a careful reviewer would and tells you what's confusing, hard to use, or inaccessible — every issue tied to a named usability or accessibility principle and backed by real evidence (a screenshot, a captured interaction), never a guess. Where it can, it proposes a fix and verifies the fix actually worked. Not a vibe check.
You drive it in plain English through an MCP-aware assistant (Claude Code, Cursor, …), or from the command line.
New here? Start with QUICKSTART.md — zero to a usability report in two commands.
See it in 30 seconds
npm i -g crabby-mcp
crabby try # scans a bundled demo page and opens a report — no app, account, or internet needed
crabby try --public # run live against a real public site insteadIf a report opens in your browser, Crabby is working. That's the whole test.
What you can ask it to do
Wire Crabby into your assistant (see Install), then ask in your own words. Crabby ships four journey skills and picks between them automatically:
- review — the one-shot path: "find problems with my app at
http://localhost:3000and fix the ones I approve." Connects, finds issues, proposes fixes, applies only what you accept, and re-checks. - audit — "audit
http://localhost:3000for usability and accessibility problems." Reviews a single screen. - walk — "walk through the signup flow and tell me where a user would get stuck." Follows a whole task, step by step.
- fix — "fix the issues you found and check they're actually resolved." Applies fixes and verifies them.
You stay in plain English the whole time; the assistant drives the tools.
Install
Crabby ships as an MCP server (plus a CLI). Requires Node 20+. Two one-command paths:
# npm — for any MCP-aware assistant, or the CLI
claude mcp add -s user crabby -- npx -y crabby-mcp
# Claude Code plugin marketplace
/plugin marketplace add 0xHayd3n/crabby
# …then install the "crabby" pluginWorks with Claude Code, Cursor, Continue, and any MCP-aware client via the crabby-mcp command. The review / audit / walk / fix skills then appear automatically.
Command line
Prefer the terminal? Get a readable report on your own app with no API key — Crabby uses a local model:
# 1) Open your running app with a debug port turned on (pick any number, e.g. 9222):
chrome --remote-debugging-port=9222 http://localhost:3000
# Desktop (Electron) app instead? electron . --remote-debugging-port=9222
# 2) Point Crabby at it:
crabby judge --port 9222 # readable report; first run downloads a ~4 GB local model, once, then works offlinecrabby run --port 9222 emits JSON instead and uses no AI at all. Full flag set, caveats, and troubleshooting: QUICKSTART.md.
How it works
Three ideas make Crabby's findings trustworthy:
- The evidence fence. No judged finding is recorded unless it cites a real rubric principle and every piece of evidence resolves to something Crabby actually captured for that run. Enforced in code, not in a prompt — a model that invents a citation gets rejected, not believed. See the fence-as-moat whitepaper.
- A versioned rubric. Every finding is grounded in a curated usability vocabulary synthesized from Nielsen, Shneiderman, Gestalt, WCAG 2.2, and the interaction laws (Fitts, Hick, Tesler). "This is a problem" is a citation, not an opinion.
- A slim primitive bus. The assistant works through ten focused tools —
connect,observe,scan,inspect,rubric,record,findings,fix,verify,overlay— and the journey skills compose them. Deterministic engines (axe-core + IBM Equal Access + Crabby's interaction-aware engine) do the mechanical detection; the model only adjudicates what the engines can't decide, and only through the fence.
Documentation
- Quickstart — plain-language guide, zero to a report
- Positioning — the strategic frame (being refreshed toward the usability-lab framing)
- Whitepaper: the fence-as-moat pattern — technical reference for adopting the evidence fence in your own agentic-eval tool
- Android — mobile-web (Chrome on Android) + the native Android deterministic engine
- Agentic prompts — portable, harness-agnostic judge + scout prompts
License
MIT — composes cleanly with vendored axe-core (MPL-2.0) and IBM Equal Access (EPL-2.0).
