@ulpi/browse

v2.4.0

Published

a month ago

Fast headless browser and native app automation CLI for AI coding agents. Android, iOS, and macOS.

0High
0Medium
0Low

ciprianspiridon

browser automation playwright headless cli ai-agent claude android ios macos mobile-testing accessibility

@ulpi/browse

Fast headless browser and native app automation CLI for AI coding agents. Persistent Chromium daemon via Playwright, ~100ms per command. Automate Android, iOS, and macOS apps through the same interface.

Installation

Global Installation (recommended)

npm install -g @ulpi/browse

Requires Node.js 18+. Chromium is installed automatically via Playwright on first npm install. If Bun is installed, browse automatically uses it for ~2x faster command execution.

Project Installation (local dependency)

npm install @ulpi/browse

Then use via package.json scripts or by invoking browse directly.

From Source

git clone https://github.com/ulpi-io/browse
cd browse
npm install
npx tsx src/cli.ts goto https://example.com   # Dev mode
npm run build                                  # Build bundle

Quick Start

browse goto https://example.com
browse snapshot -i                     # Get interactive elements with refs
browse click @e2                       # Click by ref from snapshot
browse fill @e3 "[email protected]"     # Fill input by ref
browse text                            # Get visible page text
browse screenshot page.png
browse stop

The Ref Workflow

Every snapshot assigns refs (@e1, @e2, ...) to elements. Use refs as selectors in any command — no CSS selector construction needed:

$ browse snapshot -i
@e1 [button] "Submit"
@e2 [link] "Home"
@e3 [textbox] "Email"

$ browse click @e1                     # Click the Submit button
Clicked @e1

$ browse fill @e3 "[email protected]"   # Fill the Email field
Filled @e3

Traditional Selectors (also supported)

browse click "#submit"
browse fill ".email-input" "[email protected]"
browse click "text=Submit"

Commands

Navigation

browse goto <url>              # Navigate to URL
browse back                    # Go back
browse forward                 # Go forward
browse reload                  # Reload page
browse url                     # Get current URL

Content Extraction

browse text                    # Visible text (clean, no DOM mutation)
browse html [sel]              # Full HTML or element innerHTML
browse links                   # All links as "text -> href"
browse forms                   # Form structure as JSON
browse accessibility           # Raw ARIA snapshot tree
browse schema                  # JSON-LD, Microdata, RDFa structured data
browse meta                    # Page meta tags (title, description, OG, canonical, hreflang)
browse headings                # H1-H6 heading hierarchy with counts

Interaction

browse click <sel>             # Click element
browse rightclick <sel>        # Right-click element (context menu)
browse dblclick <sel>          # Double-click element
browse fill <sel> <val>        # Clear and fill input
browse select <sel> <val>      # Select dropdown option
browse hover <sel>             # Hover element
browse focus <sel>             # Focus element
browse tap <sel>               # Tap element (requires touch context via emulate)
browse check <sel>             # Check checkbox
browse uncheck <sel>           # Uncheck checkbox
browse type <text>             # Type text via keyboard (current focus)
browse press <key>             # Press key (Enter, Tab, etc.)
browse keydown <key>           # Key down event
browse keyup <key>             # Key up event
browse keyboard inserttext <text> # Insert text without key events
browse scroll [sel|up|down]    # Scroll element into view or direction
browse scrollinto <sel>        # Scroll element into view (explicit)
browse swipe <dir> [px]        # Swipe up/down/left/right (touch events)
browse drag <src> <tgt>        # Drag and drop
browse highlight <sel>         # Highlight element with visual overlay
browse download <sel> [path]   # Download file triggered by click
browse upload <sel> <files...> # Upload files to input

Mouse Control

browse mouse move <x> <y>     # Move mouse to coordinates
browse mouse down [button]     # Press mouse button (left/right/middle)
browse mouse up [button]       # Release mouse button
browse mouse wheel <dy> [dx]   # Scroll wheel

Settings

browse set geo <lat> <lng>     # Set geolocation
browse set media <scheme>      # Set color scheme (dark/light/no-preference)

Wait

browse wait <selector>         # Wait for element
browse wait <selector> --state hidden  # Wait for element to disappear
browse wait <ms>               # Wait for milliseconds
browse wait --url <pattern>    # Wait for URL
browse wait --text "Welcome"   # Wait for text to appear in page
browse wait --fn "js expr"     # Wait for JavaScript condition
browse wait --load <state>     # Wait for load state (load/domcontentloaded/networkidle)
browse wait --network-idle     # Wait for network idle
browse wait --download [path]  # Wait for download to complete

Snapshot

browse snapshot                # Full accessibility tree
browse snapshot -i             # Interactive elements only (terse flat list)
browse snapshot -i -f          # Interactive elements, full indented tree
browse snapshot -i -C          # Include cursor-interactive elements (onclick, cursor:pointer)
browse snapshot -V             # Viewport only — elements visible on screen
browse snapshot -c             # Compact — remove empty structural elements
browse snapshot -d 3           # Limit depth to 3 levels
browse snapshot -s "#main"     # Scope to CSS selector
browse snapshot -i -c -d 5    # Combine options

| Flag | Description | |------|-------------| | -i | Interactive elements only (buttons, links, inputs) — terse flat list | | -f | Full — indented tree with props and children (use with -i) | | -V | Viewport — only elements visible in current viewport | | -c | Compact — remove empty structural elements | | -C | Cursor-interactive — detect divs with cursor:pointer, onclick, tabindex | | -d N | Limit tree depth | | -s <sel> | Scope to CSS selector |

The -C flag catches modern SPA patterns that ARIA trees miss — <div onclick>, cursor: pointer, tabindex, and data-action elements.

Find Elements

browse find role <role> [name]                # By ARIA role
browse find text <text>                       # By text content
browse find label <label>                     # By label
browse find placeholder <placeholder>         # By placeholder
browse find testid <id>                       # By data-testid
browse find alt <text>                        # By alt text
browse find title <text>                      # By title attribute
browse find first <sel>                       # First matching element
browse find last <sel>                        # Last matching element
browse find nth <n> <sel>                     # Nth matching element (0-indexed)

Inspection

browse js <expr>               # Evaluate JavaScript expression
browse eval <file>             # Evaluate JavaScript file
browse css <sel> <prop>        # Get computed CSS property
browse attrs <sel>             # Get element attributes as JSON
browse element-state <sel>     # Element state (visible, enabled, checked, etc.)
browse value <sel>             # Get input/select value
browse count <sel>             # Count elements matching selector
browse box <sel>               # Get bounding box as JSON {x, y, width, height}
browse clipboard [write <text>] # Read or write clipboard
browse console [--clear]       # Console log buffer
browse errors [--clear]        # Page errors only (filtered from console)
browse network [--clear]       # Network request buffer
browse cookies                 # Browser cookies as JSON
browse storage [set <k> <v>]   # localStorage/sessionStorage
browse perf                    # Navigation timing (dns, ttfb, load)
browse devices [filter]        # List available device names
browse images [sel] [--inline] # List page images with src, alt, dimensions

Visual

browse screenshot [path]              # Take screenshot (viewport)
browse screenshot --full [path]       # Full-page screenshot
browse screenshot <sel|@ref> [path]   # Screenshot specific element
browse screenshot --clip x,y,w,h [path] # Screenshot clipped region
browse screenshot --annotate [path]   # Annotated screenshot with numbered labels
browse pdf [path]                     # Save page as PDF
browse responsive [prefix]            # Mobile/tablet/desktop screenshots

Compare

browse diff <url1> <url2>                  # Text diff between two pages
browse snapshot-diff                        # Diff current vs last snapshot
browse screenshot-diff <baseline> [current] # Pixel-level visual diff

Tabs

browse tabs                    # List all tabs
browse tab <id>                # Switch to tab
browse newtab [url]            # Open new tab
browse closetab [id]           # Close tab

Frames

browse frame <sel>             # Switch to iframe
browse frame main              # Back to main frame

Device Emulation

browse emulate "iPhone 14"     # Emulate device
browse emulate reset           # Reset to desktop (1920x1080)
browse devices                 # List all available devices
browse devices iphone          # Filter device list
browse viewport 1280x720       # Set viewport size

100+ devices: iPhone 12–17, Pixel 5–7, iPad, Galaxy, and all Playwright built-ins.

Cookies

browse cookie <name>=<value>                        # Set cookie (simple)
browse cookie set <n> <v> [--domain --secure ...]   # Set cookie with options
browse cookie clear                                 # Clear all cookies
browse cookie export <file>                         # Export cookies to JSON
browse cookie import <file>                         # Import cookies from JSON
browse cookies                                      # Read all cookies

Network

browse route <pattern> block                # Block matching requests
browse route <pattern> fulfill <status> [body] # Mock response
browse route clear                          # Remove all routes
browse offline [on|off]                     # Toggle offline mode
browse header <name>:<value>                # Set extra HTTP header
browse useragent <string>                   # Set user agent

Dialogs

browse dialog                  # Last dialog info
browse dialog-accept [text]    # Accept next dialog (optional prompt text)
browse dialog-dismiss          # Dismiss next dialog

Recording

browse har start               # Start HAR recording
browse har stop [path]         # Stop and save HAR file

browse video start [dir]       # Start video recording (WebM)
browse video stop              # Stop recording
browse video status            # Check recording status

browse record start            # Record browsing commands as you go
browse record stop             # Stop recording
browse record status           # Check recording status
browse record export browse [path]      # Export as chain-compatible JSON (replay with browse chain)
browse record export flow [path]       # Export as YAML flow (replay with browse flow)
browse record export replay [path]     # Export as Chrome DevTools Recorder (browser only)
browse record export playwright [path] # Export as Playwright Test (browser only)
browse record export replay --selectors css,aria [path]  # Filter selector types in export

React DevTools

browse react-devtools enable           # Enable (downloads hook on first use)
browse react-devtools tree             # Component tree
browse react-devtools props <sel>      # Props/state of component
browse react-devtools suspense         # Suspense boundary status
browse react-devtools errors           # Error boundaries
browse react-devtools profiler         # Render timing
browse react-devtools hydration        # Hydration timing
browse react-devtools renders          # What re-rendered
browse react-devtools owners <sel>     # Parent component chain
browse react-devtools context <sel>    # Context values
browse react-devtools disable          # Disable

Performance Audit

browse perf-audit [url]                  # Full performance audit with actionable report
browse perf-audit [url] --no-coverage    # Skip JS/CSS coverage (faster)
browse perf-audit [url] --no-detect      # Skip stack detection
browse perf-audit [url] --json           # Structured JSON output
browse perf-audit save [name]            # Save audit report for later comparison
browse perf-audit compare <base> [curr]  # Compare saved baseline vs current or saved audit
browse perf-audit list                   # List saved audit reports
browse perf-audit delete <name>          # Delete a saved audit
browse detect                            # Tech stack fingerprint (frameworks, SaaS, CDN, infra)
browse coverage start                    # Start JS/CSS code coverage collection
browse coverage stop                     # Stop and report per-file used/unused bytes
browse initscript set <code>             # Inject JS before every page load
browse initscript show                   # Show current init script
browse initscript clear                  # Remove init script

perf-audit runs a complete performance analysis in one command:

Core Web Vitals — LCP, CLS, TBT, FCP, TTFB, INP with Google's good/needs-improvement/poor thresholds
LCP Analysis — identifies the LCP element, its network entry, render-blocking chain, and critical path
Layout Shift Attribution — each shift traced to font swap, missing image dimensions, or dynamic content
Long Task Attribution — maps blocking JS to source scripts and domains with per-domain TBT
Resource Breakdown — JS/CSS/images/fonts/API categorized with sizes and largest files
Render-Blocking Detection — sync scripts and blocking stylesheets in <head>
Image Audit — format (JPEG vs WebP), missing dimensions, missing lazy-load, missing fetchpriority, oversized images, srcset usage
Font Audit — per-font font-display value, preload status, FOIT/FOUT risk
DOM Complexity — node count, max depth, largest subtree (flags >1,500 and >3,000 thresholds)
Stack Detection — 108 frameworks (React, Vue, Angular, Next.js, Nuxt, Laravel, WordPress, Magento, etc.), 55 SaaS platforms (Shopify, Wix, Squarespace, etc.), CDN, protocol, compression, caching
Third-Party Impact — per-domain inventory with size, request count, and category (analytics/ads/social/chat/monitoring)
Coverage — per-file JS/CSS used vs unused bytes
Correlation Engine — connects LCP to blocking CSS, Long Tasks to scripts, CLS to font swaps, fonts to FCP blocking
Recommendations — prioritized, data-driven action items (platform-specific when SaaS detected)

$ browse perf-audit https://example.com --no-coverage

Core Web Vitals:
  TTFB         580ms    good
  FCP          696ms    good
  LCP          696ms    good
  CLS          0.015    good
  TBT          599ms    needs improvement

LCP Analysis:
  Element:        <img src='hero.webp'>
  Critical path:  TTFB(580ms) -> CSS(styles.css) -> JS(vendor.js) -> Image(hero.webp) -> LCP(696ms)

DOM Complexity:
  Total nodes:    4,476
  WARNING: exceeds 3,000 threshold (poor)

Top Recommendations:
  1. Add fetchpriority="high" to LCP image
  2. Add font-display:swap to fallback fonts (FOIT risk)
  3. Lazy-load YouTube embeds (click-to-play facade)

Audit completed in 13.2s (reload: 10.0s, settle: 3.0s, collect: 41ms, detection: 75ms)

detect gives a quick stack fingerprint without the full audit:

$ browse detect

Stack:
  meta-framework     Next.js (production), router: app, rsc: true
  css-framework      Tailwind CSS
  build-tool         Turbopack

Infrastructure:
  CDN:          Amazon CloudFront
  Protocol:     h2 (64%)
  Cache rate:   74% (134/180)
  DNS origins:  24 unique (15 missing preconnect)
  DOM:          4,476 nodes, depth 23

Third-Party (4.4MB total):
  www.youtube.com                3.0MB   45 reqs   video
  www.googletagmanager.com       331KB    3 reqs   analytics
  connect.facebook.net           214KB    2 reqs   ads

Handoff (Human Takeover)

browse handoff [reason]        # Swap to Chrome for CAPTCHA/MFA/OAuth (falls back to Chromium)
browse handoff --chromium      # Force Playwright Chromium instead of Chrome
browse resume                  # Swap back to headless, returns fresh snapshot

Handoff defaults to your system Chrome (bypasses Turnstile and bot detection). Falls back to Playwright Chromium if Chrome is not installed. Agent asks permission first via AskUserQuestion, then hands off. Server auto-suggests handoff after 3 consecutive failures.

Cloud Providers

browse provider save browserbase <api-key>     # Save API key (encrypted)
browse provider save browserless <token>       # Save token (encrypted)
browse --provider browserbase goto https://...  # Use cloud browser
browse provider list                           # List saved providers
browse provider delete <name>                  # Remove saved key

API keys are encrypted at rest in .browse/providers/ — never visible to agents.

State & Auth

browse state save [name]       # Save cookies + localStorage
browse state load [name]       # Restore saved state
browse state list              # List saved states
browse state show [name]       # Show state details

browse auth save <name> <url> <user> <pass>  # Save encrypted credential
browse auth save <name> <url> <user> --password-stdin  # Password from stdin
browse auth login <name>       # Auto-login with saved credential
browse auth list               # List saved credentials
browse auth delete <name>      # Delete credential

browse cookie-import --list                            # List browsers with cookies
browse cookie-import chrome [--domain .example.com]    # Import cookies from Chrome
browse cookie-import chrome --profile "Profile 1"      # Specific browser profile

Multi-Step (Chaining)

Execute a sequence of commands in one call:

echo '[["goto","https://example.com"],["snapshot","-i"],["text"]]' | browse chain

Server Control

browse status                  # Server health report
browse instances               # List all running browse servers
browse version                 # Print CLI version
browse doctor                  # System check (Node, Playwright, Chromium)
browse upgrade                 # Self-update via npm
browse stop                    # Stop server
browse restart                 # Restart server
browse inspect                 # Open DevTools (requires BROWSE_DEBUG_PORT)

Setup

browse install-skill [path]    # Install Claude Code skill

Sessions

Run multiple AI agents in parallel, each with isolated browser state, sharing one Chromium process:

# Agent A
browse --session agent-a goto https://site-a.com
browse --session agent-a snapshot -i
browse --session agent-a click @e3

# Agent B (simultaneously)
browse --session agent-b goto https://site-b.com
browse --session agent-b snapshot -i
browse --session agent-b fill @e2 "query"

# Or set once via env var
export BROWSE_SESSION=agent-a
browse text

Each session has its own:

Browser context (cookies, storage, cache)
Tabs and navigation history
Refs from snapshots
Console and network buffers

browse sessions                # List active sessions
browse session-close agent-a   # Close a session
browse status                  # Shows total session count

Sessions auto-close after the idle timeout (default 30 min). Without --session, everything runs in a "default" session.

For full process isolation (separate Chromium instances), use BROWSE_PORT to run independent servers.

Profiles vs Sessions

| | --session | --profile | |---|---|---| | Chromium | Shared (one process) | Own (one per profile) | | Memory | ~5MB per session | ~200MB per profile | | State | Ephemeral (auto-persisted cookies) | Full persistence (cookies, cache, IndexedDB) | | Multiplexing | Yes (parallel agents) | No (one agent per profile) | | Use case | Parallel browsing, lightweight | Real login state, heavy |

Native App Automation

Automate Android, iOS, and macOS apps through the same CLI and ref workflow:

Enable Platforms

browse enable android    # Installs adb, JDK, Android SDK, emulator, system image, driver APK
browse enable ios        # Builds iOS runner (requires Xcode)
browse enable macos      # Builds macOS AX bridge (requires Xcode CLI tools)
browse enable all        # Enable all platforms

Each enable command installs all dependencies and builds the native driver for that platform. Run once — everything is cached for future use.

Simulator/Emulator Lifecycle

browse sim start --platform ios --app com.apple.Preferences --visible
browse sim start --platform android --app com.android.settings --visible
browse sim stop --platform ios
browse sim stop --platform android
browse sim status --platform ios
browse sim status --platform android

sim start boots the simulator/emulator, launches the target app, and starts the automation driver
--visible opens the simulator/emulator window (default: headless)
Switching --app on a running simulator reconfigures the target without rebooting
Auto-install: Android sim start automatically installs adb, Java, Android SDK, system image, and emulator via Homebrew if missing

Android

browse sim start --platform android --app com.android.settings --visible
browse --platform android --app com.android.settings snapshot -i
browse --platform android --app com.android.settings tap @e3
browse --platform android --app com.android.settings swipe up
browse --platform android --app com.android.settings press back
browse --platform android --app com.android.settings text
browse --platform android --app com.android.settings screenshot app.png
browse sim stop --platform android

Auto-installs the full toolchain on first use (adb, JDK 21, Android SDK, emulator, system image, AVD). No manual setup required.

browse doctor --platform android    # Check setup

iOS

browse sim start --platform ios --app com.apple.Preferences --visible
browse --platform ios --app com.apple.Preferences snapshot -i
browse --platform ios --app com.apple.Preferences tap @e2
browse --platform ios --app com.apple.Preferences swipe up
browse --platform ios --app com.apple.Preferences press home
browse --platform ios --app com.apple.mobilesafari snapshot -i  # Switch app
browse sim stop --platform ios

Requires: Xcode. Simulator boots automatically.

macOS

browse --app "System Settings" snapshot -i
browse --app "System Settings" tap @e5
browse --app "System Settings" swipe up
browse --app TextEdit type "Hello"
browse --app TextEdit press "cmd+n"   # Modifier combos supported

Requires: macOS, Accessibility permission granted to the terminal.

Platform Flags

| Flag | Description | |------|-------------| | --platform android\|ios\|macos | Target platform (default: browser) | | --app <name> | App package name (Android), bundle ID (iOS), or process name (macOS) | | --device <serial> | Device serial (Android), simulator UDID (iOS) | | --visible | Show simulator/emulator window (default: headless) |

Unified Command Surface

All platforms support the same commands: snapshot, text, tap, fill, type, press, swipe, screenshot. The @ref workflow is identical — snapshot -i assigns refs, then tap @e1, fill @e2 "text", etc. Commands requiring browser capabilities (navigation, tabs, JavaScript) are blocked with clear errors on app targets.

Workflow Commands

Flows

browse flow run.yaml                  # Execute YAML automation script
browse flow save login-flow           # Save current recording as named flow
browse flow run login-flow            # Execute saved flow
browse flow list                      # List saved flows
browse retry                          # Retry command with backoff (browser only)
browse watch                          # Watch DOM element for changes (browser only)

Flows work on all platforms (browser, Android, iOS, macOS). Each flow step goes through the executeCommand() pipeline — capability-gated per target. Recording captures individual flow steps, not the flow wrapper.

Browser-only workflow commands: retry, watch, har start/stop, video start/stop Browser-only export formats: record export replay, record export playwright

Assertions

browse expect "text 'Welcome'"        # Assert text exists on page
browse expect "count .item > 3"       # Assert element count
browse expect "url contains /dashboard" # Assert URL
browse expect "title 'My App'"        # Assert page title

SDK Mode

import { createBrowser } from '@ulpi/browse/sdk';

const browser = await createBrowser();
await browser.goto('https://example.com');
const text = await browser.text();
await browser.close();

Custom Extensibility

Extend browse with project-local JSON/YAML configuration in .browse/:

Custom Audit Rules

.browse/rules/my-rules.json:

[
  { "kind": "metric-threshold", "metric": "lcp", "max": 2000, "severity": "critical" },
  { "kind": "selector-count", "selector": "img:not([alt])", "max": 0, "severity": "warning" }
]

Custom Detection Signatures

.browse/detections/my-framework.json:

[
  { "name": "MyFramework", "detect": "!!window.__MY_FRAMEWORK__", "versionExpr": "window.__MY_FRAMEWORK__.version", "category": "custom" }
]

Project Config

browse.json:

{
  "detectionPaths": [".browse/detections"],
  "rulePaths": [".browse/rules"],
  "flowPaths": [".browse/flows"],
  "startupFlows": ["setup.yaml"]
}

Security

All security features are opt-in — existing workflows are unaffected until you explicitly enable a feature.

Domain Allowlist

Restrict navigation and sub-resource requests to trusted domains:

browse --allowed-domains "example.com,*.example.com" goto https://example.com
# Or via env var
BROWSE_ALLOWED_DOMAINS="example.com,*.api.io" browse goto https://example.com

Blocks HTTP requests, WebSocket, EventSource, and sendBeacon to non-allowed domains. Wildcards like *.example.com match the bare domain and all subdomains.

Action Policy

Gate commands with a browse-policy.json file:

{ "default": "allow", "deny": ["js", "eval"], "confirm": ["goto"] }

Precedence: deny > confirm > allow > default. Hot-reloads on file change — no server restart needed.

Credential Vault

Encrypted credential storage (AES-256-GCM). The LLM never sees passwords:

echo "mypassword" | browse auth save github https://github.com/login myuser --password-stdin
browse auth login github          # Auto-navigates, detects form, fills + submits
browse auth list                  # List saved credentials (no passwords shown)

Key is auto-generated at .browse/.encryption-key or set via BROWSE_ENCRYPTION_KEY.

Content Boundaries

Wrap page output in CSPRNG nonce-delimited markers so LLMs can distinguish tool output from untrusted page content:

browse --content-boundaries text

JSON Output

Machine-readable output for agent frameworks:

browse --json snapshot -i
# Returns: {"success": true, "data": "...", "command": "snapshot"}

Configuration

Create a browse.json file at your project root to set persistent defaults:

{
  "session": "my-agent",
  "json": true,
  "contentBoundaries": true,
  "allowedDomains": ["example.com", "*.api.io"],
  "idleTimeout": 3600000,
  "viewport": "1280x720",
  "device": "iPhone 14",
  "runtime": "playwright",
  "detectionPaths": [".browse/detections"],
  "rulePaths": [".browse/rules"],
  "flowPaths": [".browse/flows"],
  "startupFlows": ["setup.yaml"]
}

CLI flags and environment variables override config file values.

Usage with AI Agents

Claude Code (recommended)

Install as a Claude Code skill via skills.sh:

npx skills add https://github.com/ulpi-io/skills --skill browse

Or install directly:

browse install-skill

Both copy the skill definition to .claude/skills/browse/SKILL.md and add all browse commands to permissions — no more approval prompts.

CLAUDE.md / AGENTS.md

Add to your project instructions:

## Browser Automation

Use `browse` for web automation. Run `browse --help` for all commands.

Core workflow:
1. `browse goto <url>` — Navigate to page
2. `browse snapshot -i` — Get interactive elements with refs (@e1, @e2)
3. `browse click @e1` / `fill @e2 "text"` — Interact using refs
4. Re-snapshot after page changes

Just ask the agent

Use browse to test the login flow. Run browse --help to see available commands.

MCP Server Mode

Run browse as an MCP server for editors that support the Model Context Protocol. All CLI commands are available as MCP tools — browser automation, app automation, perf-audit, detect, coverage, flows, and more.

browse --mcp

Use --json alongside --mcp for structured responses ({success, data, command}).

Note: Requires npm install @modelcontextprotocol/sdk alongside browse.

Cursor

.cursor/mcp.json:

{
  "mcpServers": {
    "browse": {
      "command": "browse",
      "args": ["--mcp"]
    }
  }
}

Claude Desktop

claude_desktop_config.json:

{
  "mcpServers": {
    "browse": {
      "command": "browse",
      "args": ["--mcp"]
    }
  }
}

Windsurf

{
  "mcpServers": {
    "browse": {
      "command": "browse",
      "args": ["--mcp"]
    }
  }
}

Options

| Flag | Description | |------|-------------| | --session <id> | Named session (isolates tabs, refs, cookies) | | --profile <name> | Persistent browser profile (own Chromium, full state) | | --context [state\|delta\|full] | Action context: state = page changes, delta = ARIA diff with refs, full = complete snapshot with refs | | --json | Wrap output as {success, data, command} | | --content-boundaries | Wrap page content in nonce-delimited markers | | --allowed-domains <d,d> | Block navigation/resources outside allowlist | | --max-output <n> | Truncate output to N characters | | --headed | Show browser window (not headless) | | --chrome | Shortcut for --runtime chrome --headed | | --cdp <port> | Connect to Chrome on a specific debugging port | | --connect | Auto-discover and connect to a running Chrome instance | | --provider <name> | Cloud browser provider (browserless, browserbase) | | --runtime <name> | Browser runtime: playwright (default), rebrowser (stealth), lightpanda, camoufox (anti-detection Firefox), chrome | | --camoufox-profile <name> | Named camoufox profile from .browse/camoufox-profiles/<name>.json (server-spawn-only) |

Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | BROWSE_PORT | auto (9400–10400) | Fixed server port | | BROWSE_PORT_START | 9400 | Start of port scan range | | BROWSE_SESSION | (none) | Default session ID for all commands | | BROWSE_INSTANCE | auto (PPID) | Instance ID for multi-agent isolation | | BROWSE_IDLE_TIMEOUT | 1800000 (30m) | Idle auto-shutdown in ms | | BROWSE_TIMEOUT | (none) | Override all command timeouts (ms) | | BROWSE_LOCAL_DIR | .browse/ or /tmp | State/log/screenshot directory | | BROWSE_JSON | (none) | Set to 1 for JSON output mode | | BROWSE_CONTEXT | (none) | Set to 1/state/delta/full for action context levels | | BROWSE_CONTENT_BOUNDARIES | (none) | Set to 1 for nonce-delimited output | | BROWSE_ALLOWED_DOMAINS | (none) | Comma-separated domain allowlist | | BROWSE_MAX_OUTPUT | (none) | Truncate output to N characters | | BROWSE_HEADED | (none) | Set to 1 for headed browser mode | | BROWSE_CDP_URL | (none) | Connect to remote Chrome via CDP | | BROWSE_PROXY | (none) | Proxy server URL | | BROWSE_PROXY_BYPASS | (none) | Proxy bypass list | | BROWSE_SERVER_SCRIPT | auto-detected | Override path to server.ts | | BROWSE_DEBUG_PORT | (none) | Port for DevTools debugging | | BROWSE_POLICY | browse-policy.json | Path to action policy file | | BROWSE_CONFIRM_ACTIONS | (none) | Commands requiring confirmation | | BROWSE_ENCRYPTION_KEY | auto-generated | 64-char hex AES key for credential vault | | BROWSE_AUTH_PASSWORD | (none) | Password for auth save (alt to --password-stdin) | | BROWSE_RUNTIME | playwright | Browser runtime (playwright, rebrowser, lightpanda, camoufox, chrome) | | BROWSE_CAMOUFOX_PROFILE | (none) | Named camoufox profile from .browse/camoufox-profiles/ | | BROWSE_CHROME | (none) | Set to 1 to use system Chrome | | BROWSE_CHROME_PATH | auto-detected | Override Chrome executable path |

Architecture

browse [--session <id>] [--platform <p>] [--app <name>] <command>
          |
    CLI (thin HTTP client)
          |
    Persistent server (localhost, auto-started)
          |
    SessionManager + CommandRegistry + executeCommand()
    ├── Browser sessions:
    │   ├── "default"  → BrowserManager → Chromium (Playwright)
    │   ├── "agent-a"  → BrowserManager → Chromium (shared)
    │   └── "agent-b"  → BrowserManager → Chromium (shared)
    ├── App sessions:
    │   ├── "app:com.example"     → AndroidAppManager → adb → device driver
    │   ├── "app:com.example.ios" → IOSAppManager     → simctl → Simulator
    │   └── "app:Safari"          → AppManager         → browse-ax → macOS AX
    └── All targets implement AutomationTarget interface

First command: ~2s (server + Chromium startup, once)
Every command after: ~100–200ms (HTTP to localhost)
Server auto-starts on first command, auto-shuts down after 30 min idle
Crash recovery: CLI detects dead server and restarts transparently
State file: .browse/browse-server.json (pid, port, token)

Benchmarks

vs Agent Browser & Browser-Use (Token Cost)

Tested on 3 sites across multi-step browsing flows — navigate, snapshot, scroll, search, extract text:

| Tool | Total Tokens | Total Time | Context Used (200K) | |------|-------------:|-----------:|--------------------:| | browse | 14,134 | 28.5s | 7.1% | | agent-browser | 39,414 | 36.2s | 19.7% | | browser-use | 34,281 | 72.7s | 17.1% |

browse uses 2.4x fewer tokens than browser-use, 2.8x fewer than agent-browser, and completes 2.5x faster than browser-use.

vs @playwright/mcp (Architecture)

@playwright/mcp dumps the full accessibility snapshot on every action. browse returns ~15 tokens per action — the agent requests a snapshot only when needed:

| | @playwright/mcp | browse | |---|---:|---:| | Tokens on navigate | ~14,578 (auto-dumped) | ~11 | | Tokens on click | ~14,578 (auto-dumped) | ~15 | | 10-action session | ~145,780 | ~11,388 | | Context consumed (200K) | 73% | 6% |

Rerun: npm run benchmark

Changelog

See CHANGELOG.md for full release history.

Acknowledgments

Inspired by and originally derived from the /browse skill in gstack by Garry Tan.

License

Apache-2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@ulpi/browse

Installation

Global Installation (recommended)

Project Installation (local dependency)

From Source

Quick Start

The Ref Workflow

Traditional Selectors (also supported)

Commands

Navigation

Content Extraction

Interaction

Mouse Control

Settings

Wait

Snapshot

Find Elements

Inspection

Visual

Compare

Tabs

Frames

Device Emulation

Cookies

Network

Dialogs

Recording

React DevTools

Performance Audit

Handoff (Human Takeover)

Cloud Providers

State & Auth

Multi-Step (Chaining)

Server Control

Setup

Sessions

Profiles vs Sessions

Native App Automation

Enable Platforms

Simulator/Emulator Lifecycle

Android

iOS

macOS

Platform Flags

Unified Command Surface

Workflow Commands

Flows

Assertions

SDK Mode

Custom Extensibility

Custom Audit Rules

Custom Detection Signatures

Project Config

Security

Domain Allowlist

Action Policy

Credential Vault

Content Boundaries

JSON Output

Configuration

Usage with AI Agents

Claude Code (recommended)

CLAUDE.md / AGENTS.md

Just ask the agent

MCP Server Mode

Cursor

Claude Desktop

Windsurf

Options

Environment Variables

Architecture

Benchmarks

vs Agent Browser & Browser-Use (Token Cost)

vs @playwright/mcp (Architecture)