crabby-mcp

v0.1.0

Published

8 days ago

Automated UI usability lab — audit, walk, review, and fix your running app, every finding evidence-fenced.

0High
0Medium
0Low

0xhayd3n

Crabby

An automated UI usability lab. Audit, walk, and fix your running app — in plain English.

Crabby looks at your running web or desktop app the way a careful reviewer would and tells you what's confusing, hard to use, or inaccessible — every issue tied to a named usability or accessibility principle and backed by real evidence (a screenshot, a captured interaction), never a guess. Where it can, it proposes a fix and verifies the fix actually worked. Not a vibe check.

You drive it in plain English through an MCP-aware assistant (Claude Code, Cursor, …), or from the command line.

New here? Start with QUICKSTART.md — zero to a usability report in two commands.

See it in 30 seconds

npm i -g crabby-mcp
crabby try            # scans a bundled demo page and opens a report — no app, account, or internet needed
crabby try --public   # run live against a real public site instead

If a report opens in your browser, Crabby is working. That's the whole test.

What you can ask it to do

Wire Crabby into your assistant (see Install), then ask in your own words. Crabby ships four journey skills and picks between them automatically:

review — the one-shot path: "find problems with my app at http://localhost:3000 and fix the ones I approve." Connects, finds issues, proposes fixes, applies only what you accept, and re-checks.
audit — "audit http://localhost:3000 for usability and accessibility problems." Reviews a single screen.
walk — "walk through the signup flow and tell me where a user would get stuck." Follows a whole task, step by step.
fix — "fix the issues you found and check they're actually resolved." Applies fixes and verifies them.

You stay in plain English the whole time; the assistant drives the tools.

Install

Crabby ships as an MCP server (plus a CLI). Requires Node 20+. Two one-command paths:

# npm — for any MCP-aware assistant, or the CLI
claude mcp add -s user crabby -- npx -y crabby-mcp

# Claude Code plugin marketplace
/plugin marketplace add 0xHayd3n/crabby
# …then install the "crabby" plugin

Works with Claude Code, Cursor, Continue, and any MCP-aware client via the crabby-mcp command. The review / audit / walk / fix skills then appear automatically.

Command line

Prefer the terminal? Get a readable report on your own app with no API key — Crabby uses a local model:

# 1) Open your running app with a debug port turned on (pick any number, e.g. 9222):
chrome --remote-debugging-port=9222 http://localhost:3000
#    Desktop (Electron) app instead?  electron . --remote-debugging-port=9222

# 2) Point Crabby at it:
crabby judge --port 9222   # readable report; first run downloads a ~4 GB local model, once, then works offline

crabby run --port 9222 emits JSON instead and uses no AI at all. Full flag set, caveats, and troubleshooting: QUICKSTART.md.

How it works

Three ideas make Crabby's findings trustworthy:

The evidence fence. No judged finding is recorded unless it cites a real rubric principle and every piece of evidence resolves to something Crabby actually captured for that run. Enforced in code, not in a prompt — a model that invents a citation gets rejected, not believed. See the fence-as-moat whitepaper.
A versioned rubric. Every finding is grounded in a curated usability vocabulary synthesized from Nielsen, Shneiderman, Gestalt, WCAG 2.2, and the interaction laws (Fitts, Hick, Tesler). "This is a problem" is a citation, not an opinion.
A slim primitive bus. The assistant works through ten focused tools — connect, observe, scan, inspect, rubric, record, findings, fix, verify, overlay — and the journey skills compose them. Deterministic engines (axe-core + IBM Equal Access + Crabby's interaction-aware engine) do the mechanical detection; the model only adjudicates what the engines can't decide, and only through the fence.

Documentation

Quickstart — plain-language guide, zero to a report
Positioning — the strategic frame (being refreshed toward the usability-lab framing)
Whitepaper: the fence-as-moat pattern — technical reference for adopting the evidence fence in your own agentic-eval tool
Android — mobile-web (Chrome on Android) + the native Android deterministic engine
Agentic prompts — portable, harness-agnostic judge + scout prompts

License

MIT — composes cleanly with vendored axe-core (MPL-2.0) and IBM Equal Access (EPL-2.0).

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme