@mindstone-engineering/mcp-server-browser-automation

v0.1.6

Published

2 months ago

Browser automation MCP server — visible-by-default browser control via accessibility snapshots, navigation, form filling, screenshots, and tab management. Set AGENT_BROWSER_SHOW_WINDOW=false to run quietly.

0High
0Medium
0Low

mindstone-engineering

Browser Automation MCP Server

Headless browser control via accessibility snapshots — navigate pages, fill forms, click elements, take screenshots, and manage tabs using the agent-browser CLI.

Installation

npx -y @mindstone-engineering/mcp-server-browser-automation

Or install globally:

npm install -g @mindstone-engineering/mcp-server-browser-automation
mcp-server-browser-automation

Requirements

This server requires the agent-browser CLI binary to control the browser.

Binary Resolution

PATH lookup (preferred): If agent-browser is on your PATH, it is used directly.
npx fallback: If the binary is not found, the server automatically falls back to npx -y [email protected].

Installing agent-browser

npm install -g agent-browser

Or let the npx fallback handle it automatically (slower on first use due to download).

Configuration

No API keys or credentials are required. The server communicates with the browser via the agent-browser CLI.

| Variable | Required | Description | |---|---|---| | AGENT_BROWSER_SESSION_NAME | No | Session name for browser persistence (default: mcp) | | BROWSER_AUTOMATION_ALLOW_EVAL | No | Set to 1 to register the browser_evaluate tool. Off by default. See Security considerations. |

MCP Host Configuration

{
  "mcpServers": {
    "browser-automation": {
      "command": "npx",
      "args": ["-y", "@mindstone-engineering/mcp-server-browser-automation"]
    }
  }
}

Available Tools (17 by default; +1 when `BROWSER_AUTOMATION_ALLOW_EVAL=1`)

Navigation

browser_navigate — Navigate to a URL
browser_back — Navigate back in browser history
browser_forward — Navigate forward in browser history
browser_wait — Wait for an element to appear or a specified time

Observation

browser_snapshot — Get the page accessibility tree with interactive element references
browser_screenshot — Take a screenshot of the current page
browser_get_page_info — Get the current page URL and title

Interaction

browser_click — Click an element using @ref or CSS selector
browser_fill — Clear a field and fill it with text
browser_type — Type text character by character (real keystrokes)
browser_press_key — Press a keyboard key
browser_scroll — Scroll the page in a direction
browser_select — Select an option from a dropdown
browser_hover — Hover over an element
browser_evaluate — Execute JavaScript in the page context (gated; see Security considerations)

Session Management

browser_tabs — List open tabs or switch to a tab
browser_close — Close the browser session
browser_authenticate — Open a visible browser for manual login

Workflow

The typical workflow uses accessibility snapshots for reliable element targeting:

browser_navigate → open a page
browser_snapshot → see interactive elements with @ref IDs
browser_click / browser_fill → interact using @ref references
browser_screenshot → visual verification

Security considerations

Browser automation has a large attack surface: the agent-browser CLI controls a real headless browser that loads URLs you pass it, runs page-side JavaScript, and persists cookies and session state across runs. Read this section before deploying.

`browser_evaluate` is gated behind `BROWSER_AUTOMATION_ALLOW_EVAL`

browser_evaluate lets the model execute arbitrary JavaScript inside the page context — the security equivalent of giving the model a shell on whatever site it has just navigated to. To prevent prompt-injected content from doing this silently, the tool is only registered when the host explicitly opts in:

BROWSER_AUTOMATION_ALLOW_EVAL=1 mcp-server-browser-automation

Without this env var, browser_evaluate is not in the tools list at all — the LLM cannot even see it. When enabled, the tool is annotated destructiveHint: true so MCP hosts can (and should) require explicit user confirmation before each invocation.

URL scheme deny-list

browser_navigate and browser_authenticate accept only http: and https: URLs (plus the special about:blank). Other URL schemes are refused before the underlying agent-browser CLI is invoked:

file: — would let pages read local filesystem paths
chrome: and chrome-extension: — internal browser pages and installed extensions
javascript: — equivalent to eval() against the current document
data: — inlined attacker-controlled HTML/JS payloads
view-source: — defeats the same-origin policy on rendered content
about: — privileged internal pages (about:config, about:cache, about:debugging, …); only about:blank is permitted

Cookie and session persistence

The connector tells agent-browser to use a named, persistent session via AGENT_BROWSER_SESSION_NAME (default value: mcp). All cookies, localStorage data, and any logins performed via browser_authenticate are stored on disk under that session name and reused across runs. Anyone who can read the session storage — the local user, other tools running as the same user, or backups — can also use those logged-in sessions.

To override the session name (for example, to keep separate profiles per project) set AGENT_BROWSER_SESSION_NAME explicitly in the host's MCP server config. To wipe state, close the browser via browser_close and remove the session directory managed by agent-browser.

License

FSL-1.1-MIT