@sanity-labs/browser-mcp
v0.1.0
Published
Browser MCP - Web browsing tools for AI agents
Readme
Browser MCP
Web browsing tools for AI agents. Navigate and interact with web pages using semantic accessibility patterns—no screenshots needed.
Browser MCP is an MCP server that lets AI agents navigate and interact with web pages using the same accessibility semantics that screen readers use. Instead of parsing raw HTML or analyzing screenshots, agents query landmarks, headings, forms, and other semantic elements.
Installation
Claude Desktop
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"browser": {
"command": "npx",
"args": ["@sanity-labs/browser-mcp"]
}
}
}Restart Claude Desktop. The browsing tools will appear automatically.
Claude Code
Add to your project's .claude/config.json or run:
claude mcp add browser "npx @sanity-labs/browser-mcp"Visible Browser (Debug Mode)
To see what the agent is doing, run with a visible browser window:
{
"mcpServers": {
"browser": {
"command": "npx",
"args": ["@sanity-labs/browser-mcp", "--no-headless"]
}
}
}The browser will open visibly so you can watch the agent navigate.
Optional: Vision Support
No API key is required for basic browser automation. All tools work without configuration. The API key only enables the describe tool for AI-powered page descriptions:
{
"mcpServers": {
"browser": {
"command": "npx",
"args": ["@sanity-labs/browser-mcp"],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}Supports OPENAI_API_KEY (gpt-4o) or ANTHROPIC_API_KEY (claude-sonnet-4). If both are set, OpenAI is preferred.
Available Tools
| Tool | Description |
|------|-------------|
| open_session | Opens a browser tab and navigates to a URL |
| close_session | Closes a browser session |
| overview | Page summary: title, URL, landmarks, element counts |
| query | Query elements by CSS selector, extract structure or text |
| section | Extract content under a heading |
| elements | List elements by type (headings, links, buttons, forms, tables, images) |
| action | Interact: navigate, click, fill, select, check, press, scroll, back, forward, highlight |
| screenshot | Capture page or element screenshots (saves to disk) |
| diagnostics | Get console logs and network requests for debugging |
| run_sequence | Execute a batch of browser operations and assertions in a single call |
| describe | Use vision AI to describe what's visible on the page (requires API key) |
Example Workflow
The 3-call pattern covers most browsing tasks:
- Overview — Understand the page structure
- Elements/Query — Find what you need
- Action — Interact with it
// 1. What's on this page?
const overview = await mcp.call('overview', { session: 's1' });
// → 1 form, 15 links, 6 headings
// 2. What does the form look like?
const forms = await mcp.call('elements', { session: 's1', type: 'forms' });
// → fields: [{ name: 'q', label: 'Search', type: 'text' }, ...]
// 3. Fill and submit
await mcp.call('action', { session: 's1', type: 'fill', selector: '[name="q"]', value: 'accessibility' });
await mcp.call('action', { session: 's1', type: 'press', selector: '[name="q"]', value: 'Enter' });Screenshot Tool
Capture full page, viewport, or specific element screenshots. Screenshots save to disk and return the file path (no base64 in context window).
// Full viewport
await mcp.call('screenshot', { session: 'main' });
// → { success: true, path: '/tmp/browser-screenshots/screenshot-123.png', size: 150000 }
// Full scrollable page
await mcp.call('screenshot', { session: 'main', fullPage: true });
// Specific element only
await mcp.call('screenshot', { session: 'main', selector: '[data-testid="tweet"]' });
// Custom save path
await mcp.call('screenshot', { session: 'main', savePath: '/tmp/my-screenshot.png' });Diagnostics Tool
Access browser console logs and network requests for debugging.
// Get console logs
await mcp.call('diagnostics', { session: 'main', type: 'console' });
// → { console: [{ level: 'error', text: '...', url: '...', timestamp: '...' }] }
// Get network requests
await mcp.call('diagnostics', { session: 'main', type: 'network' });
// → { network: [{ url: '...', method: 'GET', status: 200, timing: 150 }] }
// Get both
await mcp.call('diagnostics', { session: 'main', type: 'all' });
// Filter by level, limit results, clear buffer
await mcp.call('diagnostics', {
session: 'main',
type: 'console',
level: 'error',
limit: 10,
clear: true
});Highlight Action
Scroll to an element and flash it with a colored border—useful for showing users what you're looking at.
// Highlight an element (scrolls into view + flashes orange border 3x)
await mcp.call('action', { session: 'main', type: 'highlight', selector: '.article-title' });Run Sequence Tool
Execute a batch of browser operations and assertions in a single call. Useful for testing flows.
await mcp.call('run_sequence', {
session: 'main',
steps: [
{ type: 'action', action: 'fill', selector: '#search', value: 'test' },
{ type: 'action', action: 'click', selector: '#submit' },
{ type: 'assert', condition: { element_exists: '#results' } },
{ type: 'assert', condition: { element_text_contains: { selector: '#results', text: 'test' } } }
]
});
// → { success: true, completed: 4, total: 4, events: [...], final_state: {...} }Describe Tool (Vision AI)
Use vision AI to describe what's visible on the page. Takes a screenshot and sends it to OpenAI or Anthropic for analysis, returning a text description.
Requires OPENAI_API_KEY or ANTHROPIC_API_KEY environment variable.
{
"mcpServers": {
"browser": {
"command": "npx",
"args": ["@sanity-labs/browser-mcp"],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}// Describe the current viewport
await mcp.call('describe', { session: 'main' });
// → { description: "The page shows a login form with email and password fields...", provider: "openai" }
// Describe a specific element
await mcp.call('describe', { session: 'main', selector: '.error-message' });
// → { description: "A red error banner displaying 'Invalid credentials'", provider: "openai" }
// Ask a specific question
await mcp.call('describe', {
session: 'main',
prompt: 'What navigation options are visible?'
});
// → { description: "The navigation bar shows: Home, Products, About, Contact...", provider: "openai" }Also available as a query type in run_sequence:
await mcp.call('run_sequence', {
session: 'main',
steps: [
{ type: 'action', action: 'click', selector: '#submit' },
{ type: 'query', query: 'describe', params: {
selector: '.result-panel',
prompt: 'Was the form submitted successfully?'
}}
]
});CLI Options
npx @sanity-labs/browser-mcp [options]
Options:
--headless=true Run browser in headless mode (default)
--headless=false Run browser with visible window (for debugging)
--help Show helpDevelopment
# Clone and install
git clone https://github.com/sanity-labs/browser-mcp.git
cd browser-mcp
npm install
# Build
npm run build
# Run tests
npm test
# Watch mode
npm run devProject Structure
src/
├── index.ts # MCP server entry point
├── cli.ts # CLI
├── session.ts # Playwright session management + diagnostics buffers
├── browser/
│ ├── accessibility.ts # DOM queries, element extraction
│ ├── actions.ts # Browser actions (including highlight)
│ └── assertions.ts # Assertion conditions for run_sequence
├── vision/
│ ├── index.ts # Vision provider selection
│ ├── openai.ts # OpenAI vision wrapper
│ └── anthropic.ts # Anthropic vision wrapper
└── tools/
├── open-session.ts # open_session tool
├── close-session.ts # close_session tool
├── overview.ts # overview tool
├── query.ts # query tool
├── section.ts # section tool
├── elements.ts # elements tool
├── action.ts # action tool
├── screenshot.ts # screenshot tool
├── diagnostics.ts # diagnostics tool
├── run-sequence.ts # run_sequence tool
└── describe.ts # describe tool (vision AI)
test/
├── fixtures/ # Test HTML pages
├── test-server.ts # Local test server
└── integration.test.ts # Integration testsWhy Accessibility Semantics?
Traditional web scraping parses raw HTML—brittle and verbose. Screenshot-based approaches require vision models and can't interact precisely.
Accessibility semantics give us:
- Structure — Landmarks (nav, main, aside) reveal page organization
- Labels — Buttons, links, and inputs have accessible names
- Hierarchy — Headings create navigable outlines
- Interactivity — Forms, buttons, and controls are explicitly marked
This is how screen reader users browse—and it works for agents too.
License
MIT
