@sanity-labs/browser-mcp

v0.1.0

Published

15 days ago

Browser MCP - Web browsing tools for AI agents

Downloads

115

Browser MCP

Web browsing tools for AI agents. Navigate and interact with web pages using semantic accessibility patterns—no screenshots needed.

Browser MCP is an MCP server that lets AI agents navigate and interact with web pages using the same accessibility semantics that screen readers use. Instead of parsing raw HTML or analyzing screenshots, agents query landmarks, headings, forms, and other semantic elements.

Installation

Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "browser": {
      "command": "npx",
      "args": ["@sanity-labs/browser-mcp"]
    }
  }
}

Restart Claude Desktop. The browsing tools will appear automatically.

Claude Code

Add to your project's .claude/config.json or run:

claude mcp add browser "npx @sanity-labs/browser-mcp"

Visible Browser (Debug Mode)

To see what the agent is doing, run with a visible browser window:

{
  "mcpServers": {
    "browser": {
      "command": "npx",
      "args": ["@sanity-labs/browser-mcp", "--no-headless"]
    }
  }
}

The browser will open visibly so you can watch the agent navigate.

Optional: Vision Support

No API key is required for basic browser automation. All tools work without configuration. The API key only enables the describe tool for AI-powered page descriptions:

{
  "mcpServers": {
    "browser": {
      "command": "npx",
      "args": ["@sanity-labs/browser-mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Supports OPENAI_API_KEY (gpt-4o) or ANTHROPIC_API_KEY (claude-sonnet-4). If both are set, OpenAI is preferred.

Available Tools

| Tool | Description | |------|-------------| | open_session | Opens a browser tab and navigates to a URL | | close_session | Closes a browser session | | overview | Page summary: title, URL, landmarks, element counts | | query | Query elements by CSS selector, extract structure or text | | section | Extract content under a heading | | elements | List elements by type (headings, links, buttons, forms, tables, images) | | action | Interact: navigate, click, fill, select, check, press, scroll, back, forward, highlight | | screenshot | Capture page or element screenshots (saves to disk) | | diagnostics | Get console logs and network requests for debugging | | run_sequence | Execute a batch of browser operations and assertions in a single call | | describe | Use vision AI to describe what's visible on the page (requires API key) |

Example Workflow

The 3-call pattern covers most browsing tasks:

Overview — Understand the page structure
Elements/Query — Find what you need
Action — Interact with it

// 1. What's on this page?
const overview = await mcp.call('overview', { session: 's1' });
// → 1 form, 15 links, 6 headings

// 2. What does the form look like?
const forms = await mcp.call('elements', { session: 's1', type: 'forms' });
// → fields: [{ name: 'q', label: 'Search', type: 'text' }, ...]

// 3. Fill and submit
await mcp.call('action', { session: 's1', type: 'fill', selector: '[name="q"]', value: 'accessibility' });
await mcp.call('action', { session: 's1', type: 'press', selector: '[name="q"]', value: 'Enter' });

Screenshot Tool

Capture full page, viewport, or specific element screenshots. Screenshots save to disk and return the file path (no base64 in context window).

// Full viewport
await mcp.call('screenshot', { session: 'main' });
// → { success: true, path: '/tmp/browser-screenshots/screenshot-123.png', size: 150000 }

// Full scrollable page
await mcp.call('screenshot', { session: 'main', fullPage: true });

// Specific element only
await mcp.call('screenshot', { session: 'main', selector: '[data-testid="tweet"]' });

// Custom save path
await mcp.call('screenshot', { session: 'main', savePath: '/tmp/my-screenshot.png' });

Diagnostics Tool

Access browser console logs and network requests for debugging.

// Get console logs
await mcp.call('diagnostics', { session: 'main', type: 'console' });
// → { console: [{ level: 'error', text: '...', url: '...', timestamp: '...' }] }

// Get network requests
await mcp.call('diagnostics', { session: 'main', type: 'network' });
// → { network: [{ url: '...', method: 'GET', status: 200, timing: 150 }] }

// Get both
await mcp.call('diagnostics', { session: 'main', type: 'all' });

// Filter by level, limit results, clear buffer
await mcp.call('diagnostics', {
  session: 'main',
  type: 'console',
  level: 'error',
  limit: 10,
  clear: true
});

Highlight Action

Scroll to an element and flash it with a colored border—useful for showing users what you're looking at.

// Highlight an element (scrolls into view + flashes orange border 3x)
await mcp.call('action', { session: 'main', type: 'highlight', selector: '.article-title' });

Run Sequence Tool

Execute a batch of browser operations and assertions in a single call. Useful for testing flows.

await mcp.call('run_sequence', {
  session: 'main',
  steps: [
    { type: 'action', action: 'fill', selector: '#search', value: 'test' },
    { type: 'action', action: 'click', selector: '#submit' },
    { type: 'assert', condition: { element_exists: '#results' } },
    { type: 'assert', condition: { element_text_contains: { selector: '#results', text: 'test' } } }
  ]
});
// → { success: true, completed: 4, total: 4, events: [...], final_state: {...} }

Describe Tool (Vision AI)

Use vision AI to describe what's visible on the page. Takes a screenshot and sends it to OpenAI or Anthropic for analysis, returning a text description.

Requires OPENAI_API_KEY or ANTHROPIC_API_KEY environment variable.

{
  "mcpServers": {
    "browser": {
      "command": "npx",
      "args": ["@sanity-labs/browser-mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

// Describe the current viewport
await mcp.call('describe', { session: 'main' });
// → { description: "The page shows a login form with email and password fields...", provider: "openai" }

// Describe a specific element
await mcp.call('describe', { session: 'main', selector: '.error-message' });
// → { description: "A red error banner displaying 'Invalid credentials'", provider: "openai" }

// Ask a specific question
await mcp.call('describe', {
  session: 'main',
  prompt: 'What navigation options are visible?'
});
// → { description: "The navigation bar shows: Home, Products, About, Contact...", provider: "openai" }

Also available as a query type in run_sequence:

await mcp.call('run_sequence', {
  session: 'main',
  steps: [
    { type: 'action', action: 'click', selector: '#submit' },
    { type: 'query', query: 'describe', params: {
      selector: '.result-panel',
      prompt: 'Was the form submitted successfully?'
    }}
  ]
});

CLI Options

npx @sanity-labs/browser-mcp [options]

Options:
  --headless=true   Run browser in headless mode (default)
  --headless=false  Run browser with visible window (for debugging)
  --help            Show help

Development

# Clone and install
git clone https://github.com/sanity-labs/browser-mcp.git
cd browser-mcp
npm install

# Build
npm run build

# Run tests
npm test

# Watch mode
npm run dev

Project Structure

src/
├── index.ts              # MCP server entry point
├── cli.ts                # CLI
├── session.ts            # Playwright session management + diagnostics buffers
├── browser/
│   ├── accessibility.ts  # DOM queries, element extraction
│   ├── actions.ts        # Browser actions (including highlight)
│   └── assertions.ts     # Assertion conditions for run_sequence
├── vision/
│   ├── index.ts          # Vision provider selection
│   ├── openai.ts         # OpenAI vision wrapper
│   └── anthropic.ts      # Anthropic vision wrapper
└── tools/
    ├── open-session.ts   # open_session tool
    ├── close-session.ts  # close_session tool
    ├── overview.ts       # overview tool
    ├── query.ts          # query tool
    ├── section.ts        # section tool
    ├── elements.ts       # elements tool
    ├── action.ts         # action tool
    ├── screenshot.ts     # screenshot tool
    ├── diagnostics.ts    # diagnostics tool
    ├── run-sequence.ts   # run_sequence tool
    └── describe.ts       # describe tool (vision AI)

test/
├── fixtures/             # Test HTML pages
├── test-server.ts        # Local test server
└── integration.test.ts   # Integration tests

Why Accessibility Semantics?

Traditional web scraping parses raw HTML—brittle and verbose. Screenshot-based approaches require vision models and can't interact precisely.

Accessibility semantics give us:

Structure — Landmarks (nav, main, aside) reveal page organization
Labels — Buttons, links, and inputs have accessible names
Hierarchy — Headings create navigable outlines
Interactivity — Forms, buttons, and controls are explicitly marked

This is how screen reader users browse—and it works for agents too.

License

MIT