browsepilot
v0.1.0
Published
MCP server for AI browser automation. Semantic page snapshots, annotated screenshots, and full browser control — built for Claude, GPT, and any MCP client.
Maintainers
Readme
🧭 BrowsePilot
MCP server for AI browser automation. Semantic page snapshots, annotated screenshots, and full browser control — built for Claude, GPT, and any MCP client.
BrowsePilot gives AI models a deep understanding of web pages through three-tree merge snapshots — combining accessibility trees, DOM structure, and runtime state into a single, AI-readable page model. No other browser MCP server does this.
Quick Start
npx browsepilotThat's it. Add it to Claude Desktop:
{
"mcpServers": {
"browsepilot": {
"command": "npx",
"args": ["browsepilot"]
}
}
}Connect to Your Existing Chrome
Keep your logins, cookies, and extensions. Launch Chrome with remote debugging:
# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
# Linux
google-chrome --remote-debugging-port=9222
# Windows
chrome.exe --remote-debugging-port=9222Then configure:
{
"mcpServers": {
"browsepilot": {
"command": "npx",
"args": ["browsepilot", "--connect-existing", "--cdp-port", "9222"]
}
}
}What Makes This Different
Three-Tree Merge Snapshots
Every other browser MCP server gives you either a raw accessibility tree or a screenshot. BrowsePilot merges three data sources via CDP into a unified page model:
- Accessibility tree — semantic roles, names, values
- DOM snapshot — visibility, bounding boxes, paint order, computed styles
- Runtime evaluation — scroll position, shadow DOM, scrollable containers, page type
The result is a structured, categorized view of the page:
Page: Google Search [search] | https://www.google.com/
Scroll: 0.0 pages above | 3.2 pages below
Stats: 47 links | 12 interactive | 3 inputs | 340 total
[Start of page]
── INPUTS ──
e1 textbox "Search" = ""
* e5 option "Suggested: weather"
── ACTIONS ──
e2 button "Google Search"
e3 button "I'm Feeling Lucky"
── LINKS ──
e10 link "Gmail"
e11 link "Images"
── SCROLL CONTAINERS ──
sc1 div#results — 4200px tall, at top, 3400px remaining
[End of page]*= element appeared since last snapshot (change detection)- Elements organized by category (INPUTS, ACTIONS, LINKS, etc.)
- Scroll position awareness (pages above/below)
- Page type detection (auth, form, search, data-table, article, dashboard)
- Shadow DOM and scrollable container awareness
Annotated Screenshots
Visual context with numbered bounding boxes on interactive elements, color-coded by type:
- 🔵 Blue = inputs
- 🟢 Green = actions/buttons
- 🟠 Orange = links
Resilient Actions
Click, type, and other actions use multi-strategy fallback chains:
- Playwright click → scroll into view → partial name match → CDP click
- If one strategy fails, the next one tries automatically
Tools (22)
| Tool | Description |
|------|-------------|
| browser_snapshot | Semantic page snapshot with element refs |
| browser_screenshot | Annotated screenshot with bounding boxes |
| browser_navigate | Navigate to URL |
| browser_click | Click element by ref |
| browser_type | Type text into element |
| browser_press | Press keyboard key/shortcut |
| browser_scroll | Scroll page or element into view |
| browser_select | Select dropdown option |
| browser_hover | Hover over element |
| browser_fill_form | Fill multiple form fields |
| browser_evaluate | Run JavaScript on page |
| browser_extract | Extract structured data (tables, lists, articles, links) |
| browser_read_text | Read text content |
| browser_force_click | JavaScript click (for covered/shadow DOM elements) |
| browser_wait | Wait for conditions |
| browser_tab_new | Open new tab |
| browser_tab_switch | Switch tabs |
| browser_tab_list | List open tabs |
| browser_tab_close | Close tab |
| browser_go_back | Navigate back |
| browser_go_forward | Navigate forward |
| browser_scroll_container | Scroll embedded containers |
Options
npx browsepilot [options]
--headless Run Chrome in headless mode
--connect-existing Connect to already-running Chrome
--cdp-port <port> CDP port (default: random)
--user-data-dir <dir> Chrome profile directory
-v, --version Show version
-h, --help Show helpRequirements
- Node.js 18+
- Chrome, Brave, Edge, or Chromium installed (auto-detected)
- Optional:
canvasnpm package for annotated screenshots
How It Works
MCP Client (Claude Desktop, Cursor, etc.)
→ stdio transport
→ BrowsePilot MCP Server
→ Chrome via Playwright + raw CDP
→ Three-tree merge snapshot
→ Annotated screenshots
→ All browser actionsBrowsePilot doesn't run its own AI — it's a pure tool layer. Your MCP client's model makes the decisions, BrowsePilot executes them.
License
MIT
