mcp-browser-agent
v0.2.1
Published
MCP server giving AI agents structured perception of web pages — semantic UI tree, actions, and linting via Playwright + CDP
Maintainers
Readme
mcp-browser-agent
MCP server giving AI agents structured perception of web pages — semantic UI tree, actions, and linting via Playwright + CDP.
Installation
npm install mcp-browser-agentChromium is downloaded automatically by Playwright on first run. If you need to install it manually:
npx playwright install chromiumSetup
Claude Code
claude mcp add mcp-browser-agent -- npx mcp-browser-agent serveCursor
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"mcp-browser-agent": {
"command": "npx",
"args": ["mcp-browser-agent", "serve"]
}
}
}Windsurf
Add to ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"mcp-browser-agent": {
"command": "npx",
"args": ["mcp-browser-agent", "serve"]
}
}
}VS Code (Copilot)
Add to .vscode/mcp.json in your project:
{
"servers": {
"mcp-browser-agent": {
"command": "npx",
"args": ["mcp-browser-agent", "serve"]
}
}
}Other MCP Clients
The server uses stdio transport. Run npx mcp-browser-agent serve and connect via stdin/stdout.
CLI (standalone)
No MCP client needed — use directly from the terminal:
npx mcp-browser-agent observe https://example.com
npx mcp-browser-agent lint https://example.com
npx mcp-browser-agent screenshot https://example.com
npx mcp-browser-agent act click 'role=button name~="Submit"' -u https://example.comMCP Tools
| Tool | Description |
|------|-------------|
| ui_connect | Connect to a URL and start a browser session |
| ui_observe | Get the full semantic UI tree, ranked actions, and lint issues |
| ui_locate | Find elements by semantic query (e.g., role=button name~="Submit") |
| ui_act | Execute actions: click, type, select, scroll, focus, upload |
| ui_screenshot | Capture a screenshot of the current page |
| ui_lint | Run UI detectors/linters (overflow, contrast, target-size, etc.) |
| ui_navigate | Navigate to a different URL |
| ui_state | Get page state info and transition history |
| ui_watch | Start/stop continuous hot observation for DOM changes |
What Agents See
When an agent calls ui_observe, they get:
- Semantic UI tree — a pruned accessibility tree with layout, styles, and stable IDs
- Ranked actions — interactive elements scored by visibility, size, and role
- Issues — UI lint findings (overflow, contrast, target-size, focus-visible, heading-scale, text-truncation, misalignment, spacing, line-length, layout-shift)
- State — page state fingerprint and transition history
Locate Query Language
Find elements using a semantic query syntax:
role=button name~="Submit" # button with name containing "Submit"
role=link name="Sign in" # exact name match
role=textbox in=form # textbox inside a form
role=heading level=2 # h2 headingUse as a Library
import { BrowserManager, observe, locate, act } from 'mcp-browser-agent';
const manager = new BrowserManager();
const session = await manager.connect('https://example.com');
// Observe the page
const result = await observe(session);
console.log(result.ui_tree);
// Find elements
const matches = locate(result.ui_tree, { role: 'button', namePattern: /Submit/ });
// Take actions
await act(session, { action: 'click', target: 'role=button name~="Submit"' });Detectors
10 built-in UI detectors:
| Detector | What it catches |
|----------|----------------|
| overflow | Content overflowing its container |
| contrast | Text with insufficient color contrast (WCAG AA) |
| target-size | Interactive elements below minimum tap/click size |
| focus-visible | Missing focus indicators on interactive elements |
| heading-scale | Skipped heading levels (e.g., h1 → h3) |
| text-truncation | Text being clipped or truncated |
| misalignment | Elements misaligned with their siblings |
| spacing | Padding/margins not following a consistent scale |
| line-length | Text lines exceeding recommended character count |
| layout-shift | Elements that may cause layout shift |
Requirements
- Node.js >= 20
License
MIT
