npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@sanity-labs/browser-mcp

v0.1.0

Published

Browser MCP - Web browsing tools for AI agents

Readme

Browser MCP

Web browsing tools for AI agents. Navigate and interact with web pages using semantic accessibility patterns—no screenshots needed.

Browser MCP is an MCP server that lets AI agents navigate and interact with web pages using the same accessibility semantics that screen readers use. Instead of parsing raw HTML or analyzing screenshots, agents query landmarks, headings, forms, and other semantic elements.

Installation

Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "browser": {
      "command": "npx",
      "args": ["@sanity-labs/browser-mcp"]
    }
  }
}

Restart Claude Desktop. The browsing tools will appear automatically.

Claude Code

Add to your project's .claude/config.json or run:

claude mcp add browser "npx @sanity-labs/browser-mcp"

Visible Browser (Debug Mode)

To see what the agent is doing, run with a visible browser window:

{
  "mcpServers": {
    "browser": {
      "command": "npx",
      "args": ["@sanity-labs/browser-mcp", "--no-headless"]
    }
  }
}

The browser will open visibly so you can watch the agent navigate.

Optional: Vision Support

No API key is required for basic browser automation. All tools work without configuration. The API key only enables the describe tool for AI-powered page descriptions:

{
  "mcpServers": {
    "browser": {
      "command": "npx",
      "args": ["@sanity-labs/browser-mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Supports OPENAI_API_KEY (gpt-4o) or ANTHROPIC_API_KEY (claude-sonnet-4). If both are set, OpenAI is preferred.

Available Tools

| Tool | Description | |------|-------------| | open_session | Opens a browser tab and navigates to a URL | | close_session | Closes a browser session | | overview | Page summary: title, URL, landmarks, element counts | | query | Query elements by CSS selector, extract structure or text | | section | Extract content under a heading | | elements | List elements by type (headings, links, buttons, forms, tables, images) | | action | Interact: navigate, click, fill, select, check, press, scroll, back, forward, highlight | | screenshot | Capture page or element screenshots (saves to disk) | | diagnostics | Get console logs and network requests for debugging | | run_sequence | Execute a batch of browser operations and assertions in a single call | | describe | Use vision AI to describe what's visible on the page (requires API key) |

Example Workflow

The 3-call pattern covers most browsing tasks:

  1. Overview — Understand the page structure
  2. Elements/Query — Find what you need
  3. Action — Interact with it
// 1. What's on this page?
const overview = await mcp.call('overview', { session: 's1' });
// → 1 form, 15 links, 6 headings

// 2. What does the form look like?
const forms = await mcp.call('elements', { session: 's1', type: 'forms' });
// → fields: [{ name: 'q', label: 'Search', type: 'text' }, ...]

// 3. Fill and submit
await mcp.call('action', { session: 's1', type: 'fill', selector: '[name="q"]', value: 'accessibility' });
await mcp.call('action', { session: 's1', type: 'press', selector: '[name="q"]', value: 'Enter' });

Screenshot Tool

Capture full page, viewport, or specific element screenshots. Screenshots save to disk and return the file path (no base64 in context window).

// Full viewport
await mcp.call('screenshot', { session: 'main' });
// → { success: true, path: '/tmp/browser-screenshots/screenshot-123.png', size: 150000 }

// Full scrollable page
await mcp.call('screenshot', { session: 'main', fullPage: true });

// Specific element only
await mcp.call('screenshot', { session: 'main', selector: '[data-testid="tweet"]' });

// Custom save path
await mcp.call('screenshot', { session: 'main', savePath: '/tmp/my-screenshot.png' });

Diagnostics Tool

Access browser console logs and network requests for debugging.

// Get console logs
await mcp.call('diagnostics', { session: 'main', type: 'console' });
// → { console: [{ level: 'error', text: '...', url: '...', timestamp: '...' }] }

// Get network requests
await mcp.call('diagnostics', { session: 'main', type: 'network' });
// → { network: [{ url: '...', method: 'GET', status: 200, timing: 150 }] }

// Get both
await mcp.call('diagnostics', { session: 'main', type: 'all' });

// Filter by level, limit results, clear buffer
await mcp.call('diagnostics', {
  session: 'main',
  type: 'console',
  level: 'error',
  limit: 10,
  clear: true
});

Highlight Action

Scroll to an element and flash it with a colored border—useful for showing users what you're looking at.

// Highlight an element (scrolls into view + flashes orange border 3x)
await mcp.call('action', { session: 'main', type: 'highlight', selector: '.article-title' });

Run Sequence Tool

Execute a batch of browser operations and assertions in a single call. Useful for testing flows.

await mcp.call('run_sequence', {
  session: 'main',
  steps: [
    { type: 'action', action: 'fill', selector: '#search', value: 'test' },
    { type: 'action', action: 'click', selector: '#submit' },
    { type: 'assert', condition: { element_exists: '#results' } },
    { type: 'assert', condition: { element_text_contains: { selector: '#results', text: 'test' } } }
  ]
});
// → { success: true, completed: 4, total: 4, events: [...], final_state: {...} }

Describe Tool (Vision AI)

Use vision AI to describe what's visible on the page. Takes a screenshot and sends it to OpenAI or Anthropic for analysis, returning a text description.

Requires OPENAI_API_KEY or ANTHROPIC_API_KEY environment variable.

{
  "mcpServers": {
    "browser": {
      "command": "npx",
      "args": ["@sanity-labs/browser-mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}
// Describe the current viewport
await mcp.call('describe', { session: 'main' });
// → { description: "The page shows a login form with email and password fields...", provider: "openai" }

// Describe a specific element
await mcp.call('describe', { session: 'main', selector: '.error-message' });
// → { description: "A red error banner displaying 'Invalid credentials'", provider: "openai" }

// Ask a specific question
await mcp.call('describe', {
  session: 'main',
  prompt: 'What navigation options are visible?'
});
// → { description: "The navigation bar shows: Home, Products, About, Contact...", provider: "openai" }

Also available as a query type in run_sequence:

await mcp.call('run_sequence', {
  session: 'main',
  steps: [
    { type: 'action', action: 'click', selector: '#submit' },
    { type: 'query', query: 'describe', params: {
      selector: '.result-panel',
      prompt: 'Was the form submitted successfully?'
    }}
  ]
});

CLI Options

npx @sanity-labs/browser-mcp [options]

Options:
  --headless=true   Run browser in headless mode (default)
  --headless=false  Run browser with visible window (for debugging)
  --help            Show help

Development

# Clone and install
git clone https://github.com/sanity-labs/browser-mcp.git
cd browser-mcp
npm install

# Build
npm run build

# Run tests
npm test

# Watch mode
npm run dev

Project Structure

src/
├── index.ts              # MCP server entry point
├── cli.ts                # CLI
├── session.ts            # Playwright session management + diagnostics buffers
├── browser/
│   ├── accessibility.ts  # DOM queries, element extraction
│   ├── actions.ts        # Browser actions (including highlight)
│   └── assertions.ts     # Assertion conditions for run_sequence
├── vision/
│   ├── index.ts          # Vision provider selection
│   ├── openai.ts         # OpenAI vision wrapper
│   └── anthropic.ts      # Anthropic vision wrapper
└── tools/
    ├── open-session.ts   # open_session tool
    ├── close-session.ts  # close_session tool
    ├── overview.ts       # overview tool
    ├── query.ts          # query tool
    ├── section.ts        # section tool
    ├── elements.ts       # elements tool
    ├── action.ts         # action tool
    ├── screenshot.ts     # screenshot tool
    ├── diagnostics.ts    # diagnostics tool
    ├── run-sequence.ts   # run_sequence tool
    └── describe.ts       # describe tool (vision AI)

test/
├── fixtures/             # Test HTML pages
├── test-server.ts        # Local test server
└── integration.test.ts   # Integration tests

Why Accessibility Semantics?

Traditional web scraping parses raw HTML—brittle and verbose. Screenshot-based approaches require vision models and can't interact precisely.

Accessibility semantics give us:

  • Structure — Landmarks (nav, main, aside) reveal page organization
  • Labels — Buttons, links, and inputs have accessible names
  • Hierarchy — Headings create navigable outlines
  • Interactivity — Forms, buttons, and controls are explicitly marked

This is how screen reader users browse—and it works for agents too.

License

MIT