@uditgoenka/better-testing

v0.3.25

Published

2 days ago

AGPL-3.0 browser-testing MCP server with lifecycle-safe process management.

0High
0Medium
0Low

uditgoenka

browser-testing mcp testing playwright claude codex

Better Testing

Turn Claude Code, OpenAI Codex, Cursor, Gemini CLI, OpenCode, Kiro, Autohand, and other MCP clients into reliable browser-testing workflows.

"Install → Register MCP → Run tests. Zero required dependencies."

Playwright-powered browser automation. 41 MCP tools. 7 agents supported. Visual regression. Self-healing selectors. Memory profiling. Network mocking. XSS scanning. Session persistence. Turbo mode. Cookie management. npm audit. Annotated screenshots. Dev server mode with HMR detection.

How It Works · Commands · Quick Start · MCP Clients · Troubleshooting

Why This Exists

Browser-testing agents often fail for boring reasons: stale helper processes, bad MCP registration, slow startup, terminal shutdown races, or cleanup that misses child processes.

Better Testing focuses on the operational layer that keeps those sessions stable:

Native stdio MCP server with zero production dependencies
Playwright-powered browser automation with visual overlay
Multi-browser support: Chromium, Firefox, WebKit
Provider setup for Claude Code, Codex, Cursor, Gemini CLI, OpenCode, Kiro, Autohand
Advanced performance: Core Web Vitals + INP + LoAF script attribution + resource breakdown
Compact/interactive snapshot modes (60-80% token reduction for AI agents)
Snapshot diffing with Myers algorithm (track page state changes)
Session recording with Playwright traces
Security audit (headers, mixed content, insecure forms, open redirects)
Sensitive data filtering in console logs (API keys, tokens, passwords auto-masked)
Stale process detection and cleanup
Machine-readable diagnostics for automation
Automatic browser cleanup on session disconnect

Better Testing vs Playwright

Better Testing is built on top of Playwright — it adds an MCP server layer, AI-agent integrations, and 41 specialized tools that raw Playwright doesn't provide out of the box.

| Capability | Better Testing | Playwright (raw) | | --- | --- | --- | | MCP server (stdio JSON-RPC) | Built-in, zero-config | Not available | | AI agent support | Claude Code, Codex, Cursor, Gemini CLI, OpenCode, Kiro, Autohand | Manual integration required | | Tools | 41 MCP tools callable by AI agents | Script-only API | | Setup | npm install -g → init --agent all → done | Write test scripts, configure runner | | Visual overlay | Cursor glow, click animations, action labels via Shadow DOM | None | | Annotated screenshots | Arrows, circles, rects, text callouts, numbered markers | page.screenshot() only | | Snapshot modes | Full, compact (-60% tokens), interactive (-80% tokens) | Full ARIA tree only | | Snapshot diffing | Built-in Myers algorithm diff between snapshots | Manual comparison | | Self-healing selectors | Fallback chain with ref IDs from ARIA tree | Fixed selectors | | Visual regression | Pixel-diff with pixelmatch, threshold control | Requires toHaveScreenshot() config | | Dev server mode (HMR) | Auto-detects Vite, Next.js, Webpack, Turbopack HMR events | Not available | | Watch mode | Re-run BT tools on HMR events, self-terminating | Requires external file watcher | | Persistent sessions | persistent: true — no idle timeout, survives MCP reconnects | Session per test file | | Security audit | Headers, mixed content, insecure forms, open redirects, CSP analysis | Not included | | XSS scanner | Non-destructive payload injection + detection | Not included | | Form fuzzer | Boundary, XSS, SQLi, format fuzzing | Not included | | npm audit | Built-in vulnerability scanning via MCP tool | Not included | | Memory profiling | CDP HeapProfiler leak detection | Manual CDP scripting | | Performance metrics | Core Web Vitals + INP + LoAF + Server Timing + resource breakdown | Basic page.metrics() | | Network mocking | MCP tool for route interception + response override | page.route() API | | HAR export | Filtered HAR with sensitive data masking, 5K entry cap | routeFromHAR() only | | WebSocket monitoring | CDP frame capture with pattern matching | Not built-in | | Cookie management | Import/export with domain validation, masking, sameSite normalization | context.addCookies() only | | Session recording | Playwright traces via MCP tools (start/stop/save) | context.tracing API | | CDP connection | Auto-discover running Chrome, retry with backoff, safe disconnect | connectOverCDP() only | | System Chrome | channel: "chrome" for Keychain, profiles, extensions | channel option available | | Turbo mode | Performance flags + analytics/tracker blocking | Manual arg configuration | | Device emulation | Full Playwright device registry via MCP param | devices descriptor | | Parallel sessions | Multi-context with session IDs, close_all, state isolation | Multi-context via API | | Test codegen | Generate test scripts from recorded interactions | codegen CLI tool | | Lighthouse | Built-in CLI integration via MCP tool | Separate tool | | Stale process cleanup | Auto-detect and clean orphaned browser workers | Manual process management | | Plugin conflict detection | Auto-scans for conflicting MCP plugins at startup | Not applicable | | Sensitive data filtering | Auto-masks API keys, tokens, passwords in console logs | Not included | | Diagnostics | doctor tool: registration, Node version, Playwright status, conflicts | Not included | | CI integration | add github-action generates workflow YAML | Manual workflow setup | | Multi-browser | Chromium, Firefox, WebKit via single MCP param | Chromium, Firefox, WebKit via API | | Dependencies | Zero production deps (Playwright is optional peer) | Playwright is the dependency | | License | AGPL-3.0 | Apache-2.0 |

TL;DR: Playwright is the engine. Better Testing is the cockpit — it wraps Playwright with an MCP server, 41 AI-callable tools, automatic lifecycle management, security scanning, visual regression, dev server integration, and zero-config agent setup.

How It Works

INSTALL             REGISTER              RUN                 RECOVER
┌──────────┐        ┌──────────┐        ┌──────────┐        ┌──────────┐
│   npm    │───────▶│ Claude   │───────▶│  Native  │───────▶│  Doctor  │
│ package  │        │ Codex    │        │   MCP    │        │ Cleanup  │
│ 0 deps   │        │ Cursor   │        │  Server  │        │ Reports  │
│          │        │ Gemini   │        │ 41 tools │        │          │
│          │        │ OpenCode │        │          │        │          │
└──────────┘        └──────────┘        └──────────┘        └──────────┘

The MCP server exposes these tools:

Core Tools

| Tool | What it does | | --- | --- | | version | Returns the installed Better Testing version | | doctor | Reports stale helpers and provider registration state | | cleanup | Cleans stale helpers with Better Testing rules |

Browser Tools

Requires Playwright as a peer dependency (npm install playwright && npx playwright install chromium).

| Tool | What it does | | --- | --- | | open | Launch a browser (Chromium/Firefox/WebKit), connect via CDP to running Chrome, or use system Chrome via channel. Supports cloud dashboards (AWS, Azure, GCP, Vercel, etc.) | | playwright | Execute Playwright JS code against the current page | | screenshot | Capture a PNG screenshot with optional rich annotations (arrows, circles, rects, text callouts, markers) | | snapshot | Get ARIA accessibility tree with ref IDs. Modes: full, compact (-60% tokens), interactive (-80% tokens) | | console_logs | Get collected browser console messages (sensitive data auto-filtered) | | network_requests | Get collected network requests with status codes | | performance | Collect Core Web Vitals (FCP, LCP, CLS, INP), LoAF with script attribution, Server Timing, resource breakdown | | accessibility | Run WCAG accessibility audit using axe-core | | start_recording | Start recording a Playwright trace (screenshots + snapshots) | | save_recording | Stop recording and save the trace as a .zip file | | security_audit | Run security checks: headers, mixed content, insecure forms, open redirects | | import_cookies | Import cookies from JSON array or file path into the browser context | | export_cookies | Export cookies from the browser context, with optional domain filter and masking | | test_suggestions | Crawl the current page and return categorized interactive elements for test planning | | npm_audit | Run npm audit on a project directory and return structured vulnerability results | | diff_snapshot | Take a new snapshot and diff against the previous one (Myers algorithm) | | close | Close the browser session and clean up resources |

Visual Overlay

When running in headed mode, Better Testing injects a visual overlay into the page:

Blue cursor with glow effect that moves smoothly to each target element
Click animation with scale-down press and expanding ripple effect
Pulsing highlights on targeted elements (bright-dull-bright breathing animation)
Action labels showing what the agent is doing (Clicking, Typing, Hovering, etc.)

The overlay is injected via Shadow DOM and does not interfere with the page being tested.

Multi-Browser Support

The open tool accepts a browser parameter to choose the engine:

{ "url": "https://example.com", "browser": "firefox" }

Supported engines: chromium (default), firefox, webkit.

Install additional browsers:

npx playwright install firefox
npx playwright install webkit

Custom Timeouts

The open tool accepts a timeout parameter (seconds) for Playwright code execution:

{ "url": "https://example.com", "timeout": 60 }

Default execution timeout: 30 seconds. Idle session timeout: 120 seconds (configurable).

Session Recording

Record Playwright traces for debugging and replay:

Open a browser session
Call start_recording to begin capturing screenshots and DOM snapshots
Interact with the page using playwright, snapshot, etc.
Call save_recording to stop and save the trace

Traces save to .better-testing/traces/trace-{timestamp}.zip. View with:

npx playwright show-trace .better-testing/traces/trace-*.zip

Security Audit

The security_audit tool checks the current page for:

Security headers: Content-Security-Policy, X-Frame-Options, X-Content-Type-Options, Strict-Transport-Security, Referrer-Policy, Permissions-Policy
Mixed content: HTTP resources loaded on HTTPS pages
Insecure forms: Forms posting to HTTP endpoints
Open redirects: Links with redirect parameters (?redirect=, ?url=, ?next=)

Returns a structured report with pass/fail/warning status per check.

Turbo Mode

Launch a speed-optimized browser session by passing turbo: true to the open tool:

{ "url": "https://example.com", "turbo": true }

Turbo mode applies Chromium performance flags, blocks analytics/tracking scripts (Google Analytics, Mixpanel, Segment, HubSpot, Hotjar, Facebook Pixel, etc.), and skips overlay animations. Blocked requests appear in network_requests with status: "blocked".

Additional resource blocking with blockResources:

{ "url": "https://example.com", "turbo": true, "blockResources": ["image", "media", "font"] }

CDP Connection (Cloud Dashboards)

Connect to an already-running Chrome instance via Chrome DevTools Protocol. Perfect for testing authenticated cloud dashboards (AWS, Azure, GCP, Vercel, Railway, Render, etc.):

{ "cdp": "http://127.0.0.1:9222", "url": "https://console.aws.amazon.com" }

Start Chrome with remote debugging first: chrome --remote-debugging-port=9222

Auto-discover a running Chrome instance:

{ "cdp": true }

Safe close: BT does NOT kill the external Chrome when the session closes — it only disconnects. Your Chrome keeps running with all tabs intact.

Retry: CDP connections retry with exponential backoff (5 attempts) to handle Chrome startup races.

System Chrome (channel)

Launch system-installed Chrome or Edge instead of Playwright's bundled Chromium. Enables macOS Keychain access, user profiles, and extensions:

{ "url": "https://portal.azure.com", "channel": "chrome" }

Supported channels: chrome, chrome-canary, msedge, msedge-dev.

Cookie Tools

Import cookies from JSON or a file:

{ "cookies": [{ "name": "session", "value": "abc123", "domain": ".example.com", "path": "/" }] }

Export cookies with optional domain filter:

{ "domain": ".example.com", "raw": true }

By default, exported cookie values are masked using length-aware masking. Pass raw: true to see full values.

Annotated Screenshots

Take screenshots with rich visual annotations pointing out issues for developers. Supports arrows, circles, rectangles, text callouts, and numbered markers:

{
  "annotations": [
    { "type": "arrow", "target": { "selector": ".submit-btn" }, "label": "8px gap expected, 0px found" },
    { "type": "circle", "target": { "ref": 3 }, "label": "missing required attribute" },
    { "type": "rect", "target": { "selector": ".card" }, "label": "overflow hidden clips content" },
    { "type": "text", "target": { "x": 200, "y": 100 }, "label": "expected: blue, actual: gray" },
    { "type": "marker", "target": { "selector": "#logo" }, "label": "contrast ratio 2.1:1" }
  ]
}

Target elements by CSS selector, ref ID (from snapshot), or x/y coordinates. Annotations use a separate Shadow DOM host for CSS isolation. Invalid or hidden targets are skipped with detailed reasons (partial success). Screenshots saved to .better-testing/annotations/.

Annotations are red by default (distinct from blue ref labels). Override per annotation with color. Max 50 annotations per screenshot. Use annotate: true for simple auto-labels, or annotations: [...] for rich visual callouts.

Test Suggestions

Crawl the current page and get categorized interactive elements for test planning:

{ "scope": "form", "categories": ["inputs", "buttons"] }

Returns structured element inventory (forms, links, buttons, inputs, images, headings, landmarks, custom elements) capped at 200 elements.

npm Audit

Run vulnerability scanning on any project directory:

{ "directory": "/path/to/project", "minLevel": "high" }

Returns structured results with vulnerability counts by severity level.

Browser Lifecycle

Browser auto-closes when the MCP session disconnects (stdin pipe closes)
Idle timeout: browser closes after 120 seconds of inactivity (configurable via open timeout)
Clean signal handling: SIGTERM/SIGINT properly terminate the browser process

Snapshot + Ref Workflow

1. Call `snapshot` to get the ARIA tree with [ref=N] IDs
2. Use `ref(N)` in `playwright` tool code to interact with elements
3. Example: await ref(1).click()  // clicks element with ref=1

Commands

| Command | What it does | | --- | --- | | better-testing init --agent all | Register the skill and MCP server for local agents | | better-testing init | Interactive setup wizard (choose agents, browser mode) | | better-testing mcp | Start the native Better Testing stdio MCP server | | better-testing doctor --mcp | Check provider registration and stale helpers | | better-testing cleanup --dry-run | Preview stale helper cleanup | | better-testing cleanup | Clean stale helper processes | | better-testing agents --detect | Detect supported local agents | | better-testing add github-action | Generate a GitHub Actions workflow for CI browser testing | | better-testing classify-diff | Classify browser-visible diffs from stdin | | better-testing skill-copy <src> <dest> | Copy the Better Testing skill into an agent skill folder | | better-testing pr --title <title> | Create a pull request with the GitHub CLI |

The add github-action command accepts these flags:

better-testing add github-action --force        # Overwrite existing workflow
better-testing add github-action --dry-run      # Preview without writing
better-testing add github-action --node-version 22  # Set Node.js version

Quick Start

1. Install

npm install -g @uditgoenka/better-testing

Install Playwright for browser tools (Chromium by default):

npm install playwright && npx playwright install chromium

For multi-browser testing, install additional engines:

npx playwright install firefox webkit

Verify the global binary:

better-testing --version

Output should be:

0.3.25

2. Register Local Agents

better-testing init --agent all

This registers:

Claude Code MCP
Codex MCP
Cursor MCP
Gemini CLI MCP
OpenCode MCP
Kiro MCP
Autohand MCP
/better-testing skill files where supported

3. Check Setup

better-testing doctor --mcp

For JSON automation:

better-testing doctor --mcp --json

4. Use It

In Claude Code:

/better-testing

In Codex:

$better-testing

In any MCP client, configure the server command:

better-testing mcp

MCP Clients

Better Testing supports MCP clients that implement stdio transport.

Claude Code

claude mcp add --scope user better-testing -- better-testing mcp

Codex

Add this to ~/.codex/config.toml:

[mcp_servers.better-testing]
command = "better-testing"
args = ["mcp"]
startup_timeout_sec = 20

Cursor

Add this to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "better-testing": {
      "command": "better-testing",
      "args": ["mcp"]
    }
  }
}

Gemini CLI

Add this to ~/.gemini/settings.json:

{
  "mcpServers": {
    "better-testing": {
      "command": "better-testing",
      "args": ["mcp"]
    }
  }
}

OpenCode

Add this to your OpenCode MCP config:

{
  "mcpServers": {
    "better-testing": {
      "command": "better-testing",
      "args": ["mcp"]
    }
  }
}

Any stdio MCP Client

{
  "command": "better-testing",
  "args": ["mcp"]
}

Troubleshooting

MCP client times out

Run:

better-testing doctor --mcp --json

Then re-register:

better-testing init --agent all

Restart the client after registration.

MCP reconnect fails

Verify the server by sending an initialize request:

printf '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{}}}\n' | better-testing mcp

The response should include:

{
  "serverInfo": {
    "name": "better-testing"
  }
}

Stale helpers remain

Preview cleanup:

better-testing cleanup --dry-run

Run cleanup:

better-testing cleanup

Browser stays open after session

This should not happen with v0.2.0+. The browser auto-closes when the MCP pipe disconnects or after 120 seconds of inactivity. If it persists:

better-testing cleanup --includeBrowserWorkers

Development

npm test
npm run check
npm run pack:dry

Package rules:

No required dependencies (Playwright, pixelmatch, pngjs are optional peer dependencies)
Native MCP server by default
AGPL-3.0 license
Public package: @uditgoenka/better-testing

Roadmap

~~Add browser action tools inside the native MCP server~~ (shipped in v0.1.34)
~~Add visual overlay with cursor, click animations, and highlights~~ (shipped in v0.1.35)
~~Add multi-browser support: Chromium, Firefox, WebKit~~ (shipped in v0.1.39)
~~Add custom execution and idle timeouts~~ (shipped in v0.1.39)
~~Add session recording with Playwright traces~~ (shipped in v0.1.39)
~~Add security audit tool (headers, mixed content, forms, redirects)~~ (shipped in v0.1.39)
~~Add GitHub Actions workflow generator (bt add github-action)~~ (shipped in v0.1.39)
~~Add advanced performance metrics: INP, LoAF, Server Timing, resource breakdown~~ (shipped in v0.2.0)
~~Add snapshot diffing with Myers algorithm~~ (shipped in v0.2.0)
~~Add compact/interactive snapshot modes (60-80% token reduction)~~ (shipped in v0.2.0)
~~Add sensitive data filtering in console logs~~ (shipped in v0.2.0)
~~Add input validation (viewport, timeouts) and recording safety guards~~ (shipped in v0.2.0)
~~Add visual regression with pixel-diff (pixelmatch)~~ (shipped in v0.3.0)
~~Add network mocking and interception~~ (shipped in v0.3.0)
~~Add self-healing selectors with fallback chain~~ (shipped in v0.3.0)
~~Add auth session persistence (save/load cookies + storage)~~ (shipped in v0.3.0)
~~Add memory leak detection (CDP HeapProfiler, Chromium)~~ (shipped in v0.3.0)
~~Add test codegen from session interactions~~ (shipped in v0.3.0)
~~Add parallel browser sessions~~ (shipped in v0.3.0)
~~Add responsive breakpoint sweep~~ (shipped in v0.3.0)
~~Add XSS payload scanner (non-destructive)~~ (shipped in v0.3.0)
~~Add cloud browser connection (generic wsEndpoint)~~ (shipped in v0.3.0)
~~Add HAR export with sensitive data filtering~~ (shipped in v0.3.0)
~~Add device emulation (full Playwright registry)~~ (shipped in v0.3.0)
~~Add form fuzzer (boundary, XSS, SQLi, format)~~ (shipped in v0.3.0)
~~Add WebSocket monitoring (CDP frame capture)~~ (shipped in v0.3.0)
~~Add Lighthouse integration via CLI~~ (shipped in v0.3.0)
~~Add CDP connection (attach to running Chrome DevTools)~~ (shipped in v0.3.0)
~~Add annotated screenshots (numbered labels on interactive elements)~~ (shipped in v0.3.0)
~~Add browser border glow effect (pulsing blue while active)~~ (shipped in v0.3.0)
~~Change license to AGPL-3.0~~ (shipped in v0.3.0)
~~Add cookie import/export tools~~ (shipped in v0.3.12)
~~Add test suggestions tool (DOM element inventory)~~ (shipped in v0.3.12)
~~Add npm audit tool (vulnerability scanning)~~ (shipped in v0.3.12)
~~Add turbo mode (performance flags + analytics blocking)~~ (shipped in v0.3.12)
~~Add enhanced security audit (cookie attributes + CSP analysis)~~ (shipped in v0.3.12)
~~Add /bt skill shortcut~~ (shipped in v0.3.12)
~~Add rich annotated screenshots (arrows, circles, rects, text, markers)~~ (shipped in v0.3.13)
~~Add stale ref enriched error messages~~ (shipped in v0.3.13)
~~Fix CDP close killing external Chrome (graceful disconnect)~~ (shipped in v0.3.14)
~~Add CDP connection retry with exponential backoff~~ (shipped in v0.3.14)
~~Add system Chrome channel support (chrome, msedge, chrome-canary)~~ (shipped in v0.3.14)
~~Make url optional for CDP/WS connections~~ (shipped in v0.3.14)
~~Add auto-restart Chrome for CDP when running without --remote-debugging-port~~ (shipped in v0.3.15)
~~Fix cookie import compatibility (sameSite normalization, Chrome field stripping)~~ (shipped in v0.3.16)
~~Fix critical: session-store unsanitized cookies, lighthouse category injection~~ (shipped in v0.3.16)
~~Fix high: XSS detection, memory profiler, codegen escaping, viewport null crash, WS race, enum mismatch~~ (shipped in v0.3.16)
~~Fix MCP transport crash (process guards, async dispatch safety, idle timer race)~~ (shipped in v0.3.17)
~~Default to headless browser mode for MCP environments~~ (shipped in v0.3.18)
~~Add persistent browser sessions (no idle timeout)~~ (shipped in v0.3.23)
~~Add dev server mode with HMR event detection (Vite, Next.js, Webpack, Turbopack)~~ (shipped in v0.3.23)
~~Add dev_status tool (HMR event queue + long-poll)~~ (shipped in v0.3.23)
~~Add watch tool (run BT tools on HMR events, self-terminating)~~ (shipped in v0.3.23)
~~Add watch_cancel tool~~ (shipped in v0.3.23)
~~Add "Better Testing vs Playwright" comparison table to README~~ (shipped in v0.3.24)
Add dual-engine accessibility (IBM Equal Access + axe-core)
Add Windows process-tree cleanup parity
Add HTTP MCP server transport mode
Add TypeScript SDK for programmatic use

License

Better Testing is open source software released under the GNU Affero General Public License v3.0 (AGPL-3.0).