foxbrowser
v1.0.0
Published
Live browser plugin for AI coding agents. Connect to your running Firefox session via WebDriver BiDi.
Maintainers
Readme
foxbrowser
Your browser. Your sessions. Your agent.
An MCP server + CLI that connects AI coding agents to Firefox via WebDriver BiDi. Use as an MCP server for LLM-driven automation, or as a standalone CLI for direct browser control from the terminal.
Why foxbrowser?
Standard protocol — Uses WebDriver BiDi, the W3C standard for browser automation. No proprietary protocols, no vendor lock-in.
Credentials never reach the LLM — Cookie values are managed at the browser level via BiDi storage commands. They never enter the MCP message stream, never reach the model context, never leave your machine.
No extra browser to install — Uses your existing Firefox installation. No separate binary downloads.
20x cheaper than screenshot-default tools — Server-side snapshot redirection returns ~500 tokens instead of ~10K per interaction. 50 interactions/day: 25K tokens vs 500K.
Always up to date — Auto-upgrade checks npm registry on every server start. Next session launches with the latest version. Zero manual intervention.
Quick Start
npx foxbrowser installAuto-detects your AI platform and configures the MCP server. No global install needed.
// .mcp.json
{
"mcpServers": {
"foxbrowser": {
"command": "npx",
"args": ["-y", "foxbrowser"]
}
}
}// .cursor/mcp.json
{
"mcpServers": {
"foxbrowser": {
"command": "npx",
"args": ["-y", "foxbrowser"]
}
}
}// .vscode/mcp.json
{
"servers": {
"foxbrowser": {
"command": "npx",
"args": ["-y", "foxbrowser"]
}
}
}// ~/.gemini/settings.json
{
"mcpServers": {
"foxbrowser": {
"command": "npx",
"args": ["-y", "foxbrowser"]
}
}
}// ~/.codeium/windsurf/mcp_config.json
{
"mcpServers": {
"foxbrowser": {
"command": "npx",
"args": ["-y", "foxbrowser"]
}
}
}// Cline MCP settings (Settings > MCP Servers)
{
"mcpServers": {
"foxbrowser": {
"command": "npx",
"args": ["-y", "foxbrowser"]
}
}
}// ~/.config/zed/settings.json
{
"context_servers": {
"foxbrowser": {
"command": "npx",
"args": ["-y", "foxbrowser"]
}
}
}# ~/.continue/config.yaml
mcpServers:
foxbrowser:
command: npx
args: ["-y", "foxbrowser"]// opencode.json
{
"mcpServers": {
"foxbrowser": {
"command": "npx",
"args": ["-y", "foxbrowser"]
}
}
}CLI Mode
foxbrowser also works as a standalone CLI -- no LLM required. Same commands, same Firefox connection.
foxbrowser open example.com
foxbrowser snapshot -i
foxbrowser click @e5
foxbrowser fill @e2 "hello world"
foxbrowser press Enter
foxbrowser eval "document.title"Commands (30)
| Category | Commands |
| -------------- | --------------------------------------------------------------------------------------- |
| Navigation | open (goto, navigate), back, scroll, wait, tab (tabs), close, resize |
| Observation | snapshot, screenshot, html, eval, find, source, console, network |
| Actions | click, fill, type, press (key), hover, drag, select, upload, dialog |
| Network | route, abort, unroute, save, load, diff |
Short Flags
foxbrowser snapshot -i # interactive elements only
foxbrowser snapshot -c # compact output
foxbrowser snapshot -d 3 # depth limit
foxbrowser snapshot -s "main" # scope to selector
foxbrowser screenshot -o ss.png # save to filePositional Arguments
foxbrowser click @e5 # ref (not --ref=@e5)
foxbrowser click "#submit" # CSS selector
foxbrowser fill @e2 "text" # ref + value
foxbrowser drag @e1 @e2 # source + target
foxbrowser select @e3 "option1" # ref + value(s)
foxbrowser scroll down # direction
foxbrowser resize 1280 720 # width heightWorkflow Example
foxbrowser open github.com/login
foxbrowser snapshot -i
# @e12 textbox "Username"
# @e15 textbox "Password"
# @e18 button "Sign in"
foxbrowser fill @e12 "[email protected]"
foxbrowser fill @e15 "password"
foxbrowser click @e18
foxbrowser wait --url="github.com/dashboard"
foxbrowser snapshot -iFeatures
| Feature | Description |
| ----------------------- | ---------------------------------------------------------------------------------------------------------- |
| WebDriver BiDi | W3C standard protocol. Cross-browser compatible, future-proof. |
| Daemon Architecture | MCP server survives browser crashes. Auto-reconnects on next browser_connect. |
| Skill Injection | On every connect, injects workflow hints, cost hierarchy, and identity resolution rules into agent context. |
| EventBuffer Capture | Server-side BiDi event listeners. Network requests and console messages survive page navigations. |
| Source Inspection | Maps DOM elements to source code: React (Fiber tree + jsxDEV), Vue (__file), Svelte (__svelte_meta). |
| Network Intercept | Route, abort, and mock HTTP requests with glob pattern matching via BiDi network module. |
| Element Refs | Accessibility tree nodes get @eN refs. Click, fill, hover, drag -- all by ref. |
| Pixel Diff | Compare two screenshots pixel-by-pixel. Returns diff percentage and visual overlay. |
| Session Persistence | Save/load cookies, localStorage, sessionStorage across agent sessions. |
| Auto-Upgrade | Checks npm registry on server start. Background upgrade applies on next restart. |
| Cost Optimization | browser_screenshot auto-returns text snapshot (~500 tokens) unless visual: true (~10K tokens). |
Tools (33)
Connection and Lifecycle
| Tool | What it does | ~Tokens |
| ------------------ | --------------------------------------------------------------------------------------------- | ------: |
| browser_connect | Connect to Firefox via WebDriver BiDi. Auto-launches if needed. Injects agent skill hints. | - |
| browser_tabs | List open tabs, filter by title/URL glob. | ~10 |
| browser_list | List available browser instances on default ports. | ~10 |
| browser_close | Close tab(s) or detach. force: true to actually close. | - |
| browser_resize | Set viewport dimensions or preset (mobile, tablet, desktop, reset). | ~10 |
Navigation
| Tool | What it does | ~Tokens |
| ----------------------- | ----------------------------------------------------------------------- | ------: |
| browser_navigate | Navigate to URL. waitUntil: load, domcontentloaded, networkidle.| ~500 |
| browser_navigate_back | Go back or forward in history. | ~500 |
| browser_scroll | Scroll page/element by direction and pixels, or scroll element into view.| ~10 |
| browser_wait_for | Wait for text, selector, URL glob, JS condition, or timeout. | ~10 |
Observation
| Tool | What it does | ~Tokens |
| ------------------------------ | ----------------------------------------------------------------------------------- | ------: |
| browser_snapshot | Accessibility tree with @eN refs. compact, interactive, cursor, depth modes.| ~500 |
| browser_screenshot | Returns text snapshot by default. visual: true for base64 image. | ~500/~10K |
| browser_annotated_screenshot | Screenshot with numbered labels on interactive elements. | ~12K |
| browser_html | Raw HTML of page or element by selector. | ~500 |
| browser_find | Find elements by ARIA role, name, or text. Returns @eN ref. | ~100 |
| browser_inspect_source | Source file, line, component name. React/Vue/Svelte. | ~100 |
| browser_evaluate | Run JavaScript in page context. Async supported. | ~10 |
Interaction
| Tool | What it does | ~Tokens |
| ----------------------- | ------------------------------------------------------------------------------------------ | ------: |
| browser_click | Click by @eN ref, CSS selector, or x/y coordinates. newTab support. | ~10 |
| browser_fill_form | Clear + type into a field. Handles textbox, checkbox, radio, combobox, slider. | ~10 |
| browser_type | Type text (appends, doesn't clear). slowly mode for key-event listeners. | ~10 |
| browser_press_key | Press key or combination (Control+c, Meta+a, Enter, Escape). | ~10 |
| browser_hover | Hover over element by ref. | ~10 |
| browser_drag | Drag from one ref to another with synthesized pointer events. | ~10 |
| browser_select_option | Select dropdown options by value or label text. | ~10 |
| browser_file_upload | Upload files to a file input by ref. | ~10 |
| browser_handle_dialog | Accept/dismiss alert, confirm, prompt. With optional prompt text. | ~10 |
Network and Debugging
| Tool | What it does | ~Tokens |
| -------------------------- | --------------------------------------------------------------------------------------- | ------: |
| browser_network_requests | List captured requests. Filter by URL glob, exclude static resources, include headers. | ~100 |
| browser_console_messages | Retrieve console log/warn/error/info messages. Filter by level. | ~100 |
| browser_route | Intercept requests matching URL glob. Respond with custom body/status/headers. | ~10 |
| browser_abort | Block requests matching URL glob. | ~10 |
| browser_unroute | Remove intercept rules. all: true to clear everything. | ~10 |
State and Persistence
| Tool | What it does | ~Tokens |
| -------------------- | -------------------------------------------------------------------------------------- | ------: |
| browser_save_state | Save cookies, localStorage, sessionStorage to named file. | ~10 |
| browser_load_state | Restore saved state. Optionally navigate to URL after loading. | ~10 |
| browser_diff | Pixel-by-pixel comparison. Returns diff %, pixel counts, visual overlay. | ~11K |
~Tokens = approximate tokens returned to the LLM per call.
Architecture
Protocol
foxbrowser uses WebDriver BiDi -- the W3C standard bidirectional protocol for browser automation. Unlike CDP (Chrome DevTools Protocol), BiDi is designed as an open standard with cross-browser support.
┌──────────────────┐ WebDriver BiDi ┌──────────────────┐
│ foxbrowser │ ◄──────────────────────► │ Firefox │
│ MCP Server │ WebSocket │ (BiDi endpoint) │
│ │ │ │
│ - Tool handlers │ │ - DOM access │
│ - Event buffer │ │ - Input actions │
│ - Skill inject │ │ - Network │
└────────┬─────────┘ └──────────────────┘
│
│ MCP (stdio)
▼
┌────────────────┐
│ AI Agent │
│ (Claude, etc) │
└────────────────┘Cost Optimization
browser_evaluate ~10 tokens JS expression
browser_snapshot ~500 tokens Accessibility tree
browser_screenshot ~10K tokens Visual (opt-in)
20x cost reduction vs screenshot-default toolsbrowser_screenshot without visual: true auto-returns a text snapshot. The LLM gets the same information at 1/20th the cost.
| Scenario | Screenshot-default tool | foxbrowser | | ------------------------------ | ----------------------: | --------------: | | 50 interactions/day | 500K tokens/day | 25K tokens/day | | 20 devs x 22 working days | 220M tokens/month | 11M tokens/month|
EventBuffer
Network requests and console messages are captured via server-side BiDi event listeners -- not browser-side JavaScript injection. This means:
- Captures survive page navigations (no re-injection needed)
- Bounded ring buffer (500 events) prevents memory leaks
- URL secrets are automatically redacted (JWT, Bearer tokens, auth headers)
- Static resources (images, fonts, stylesheets) can be filtered out
Auto-Upgrade
Session 1: server starts -> checks npm registry -> background upgrade
Session 2: starts with latest version- 1-hour rate limit between checks
- npx: clears npm cache (next invocation fetches latest)
- global:
npm install -g foxbrowser@latestin background - dev mode: skipped
- Upgrade notice shown on
browser_connectif newer version available - All errors silently caught -- never crashes the server
Skill Injection
On every browser_connect, foxbrowser injects a structured skill document into the agent context:
- Cost hierarchy -- guides the agent to prefer
evaluate>snapshot>screenshot - Workflow patterns -- snapshot-ref interaction model, when to re-snapshot
- Identity resolution -- use browser session cookies, never guess usernames
- Per-tool hints -- appended to each tool response (ref staling warnings, cross-origin limitations)
Diagnostics
foxbrowser doctorChecks Firefox installation, Node.js version, BiDi connectivity, and platform configuration.
Security
What foxbrowser does
- Launches a Firefox instance with WebDriver BiDi enabled
- Returns only page content to the agent (DOM text, evaluate results, snapshots)
- Redacts secrets in network output (Authorization, Cookie, Set-Cookie, Bearer tokens, JWTs)
- Resets state gracefully when Firefox closes (MCP server stays alive)
What foxbrowser does NOT do
- Send cookie values to the LLM provider
- Store credentials in any config file
- Use a cloud relay or proxy
- Require you to enter passwords into the agent
- Modify your Firefox profile or existing sessions
Supported Platforms
| Platform | Status | | -------------- | ------ | | Claude Code | Y | | Cursor | Y | | Gemini CLI | Y | | VS Code Copilot| Y | | Windsurf | Y | | Cline | Y | | Zed | Y | | Continue | Y | | OpenCode | Y |
FAQ
No. Cookie values are managed at the browser level via BiDi storage commands. The LLM only sees page content -- text, DOM elements, JavaScript evaluation results.
WebDriver BiDi is the W3C standard for browser automation. It provides a standardized, cross-browser compatible protocol. Firefox has the most mature BiDi implementation among browsers.
The MCP server stays alive. On the next browser_connect, it launches a fresh Firefox instance.
Yes. browser_connect { headless: true }. Note: some services may detect headless browsers.
Yes -- the LLM sees the same content you would see in the browser. This is inherent to any browser automation tool. The key difference is that authentication credentials (cookies, tokens, session IDs) are never in the LLM context.
License
AGPL-3.0 -- free to use, modify, and distribute. If you modify and deploy as a network service, you must open-source your changes.
