z-agent-browser
v0.5.14
Published
Enhanced browser automation CLI for AI agents - stealth mode, auto-persistence, profile mode
Downloads
1,459
Maintainers
Readme
z-agent-browser
Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.
This is the enhanced fork of vercel-labs/agent-browser with stealth mode, auto-persistence, profile mode, and more. Published to npm as z-agent-browser.
Installation
npm (recommended)
npm install -g z-agent-browser
z-agent-browser install # Download ChromiumFrom Source
git clone https://github.com/zm2231/agent-browser
cd agent-browser
pnpm install
pnpm build
pnpm build:native # Requires Rust (https://rustup.rs)
pnpm link --global # Makes z-agent-browser available globally
z-agent-browser installUpstream (Basic Features Only)
For the original vercel-labs version without enhanced features:
npm install -g agent-browserLinux Dependencies
On Linux, install system dependencies:
z-agent-browser install --with-deps
# or manually: npx playwright install-deps chromiumQuick Start
z-agent-browser open example.com
z-agent-browser snapshot # Get accessibility tree with refs
z-agent-browser click @e2 # Click by ref from snapshot
z-agent-browser fill @e3 "[email protected]" # Fill by ref
z-agent-browser get text @e1 # Get text by ref
z-agent-browser screenshot page.png
z-agent-browser closeTraditional Selectors (also supported)
z-agent-browser click "#submit"
z-agent-browser fill "#email" "[email protected]"
z-agent-browser find role button click --name "Submit"Commands
Core Commands
z-agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
z-agent-browser click <sel> # Click element
z-agent-browser dblclick <sel> # Double-click element
z-agent-browser focus <sel> # Focus element
z-agent-browser type <sel> <text> # Type into element
z-agent-browser fill <sel> <text> # Clear and fill
z-agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key)
z-agent-browser keydown <key> # Hold key down
z-agent-browser keyup <key> # Release key
z-agent-browser hover <sel> # Hover element
z-agent-browser select <sel> <val> # Select dropdown option
z-agent-browser check <sel> # Check checkbox
z-agent-browser uncheck <sel> # Uncheck checkbox
z-agent-browser scroll <dir> [px] # Scroll (up/down/left/right)
z-agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto)
z-agent-browser drag <src> <tgt> # Drag and drop
z-agent-browser upload <sel> <files> # Upload files
z-agent-browser screenshot [path] # Take screenshot (--full for full page)
z-agent-browser pdf <path> # Save as PDF
z-agent-browser snapshot # Accessibility tree with refs (best for AI)
z-agent-browser eval <js> # Run JavaScript
z-agent-browser connect <port> # Connect to browser via CDP
z-agent-browser close # Close browser (aliases: quit, exit)Get Info
z-agent-browser get text <sel> # Get text content
z-agent-browser get html <sel> # Get innerHTML
z-agent-browser get value <sel> # Get input value
z-agent-browser get attr <sel> <attr> # Get attribute
z-agent-browser get title # Get page title
z-agent-browser get url # Get current URL
z-agent-browser get count <sel> # Count matching elements
z-agent-browser get box <sel> # Get bounding boxCheck State
z-agent-browser is visible <sel> # Check if visible
z-agent-browser is enabled <sel> # Check if enabled
z-agent-browser is checked <sel> # Check if checkedFind Elements (Semantic Locators)
z-agent-browser find role <role> <action> [value] # By ARIA role
z-agent-browser find text <text> <action> # By text content
z-agent-browser find label <label> <action> [value] # By label
z-agent-browser find placeholder <ph> <action> [value] # By placeholder
z-agent-browser find alt <text> <action> # By alt text
z-agent-browser find title <text> <action> # By title attr
z-agent-browser find testid <id> <action> [value] # By data-testid
z-agent-browser find first <sel> <action> [value] # First match
z-agent-browser find last <sel> <action> [value] # Last match
z-agent-browser find nth <n> <sel> <action> [value] # Nth matchActions: click, fill, check, hover, text
Examples:
z-agent-browser find role button click --name "Submit"
z-agent-browser find text "Sign In" click
z-agent-browser find label "Email" fill "[email protected]"
z-agent-browser find first ".item" click
z-agent-browser find nth 2 "a" textWait
z-agent-browser wait <selector> # Wait for element to be visible
z-agent-browser wait <ms> # Wait for time (milliseconds)
z-agent-browser wait --text "Welcome" # Wait for text to appear
z-agent-browser wait --url "**/dash" # Wait for URL pattern
z-agent-browser wait --load networkidle # Wait for load state
z-agent-browser wait --fn "window.ready === true" # Wait for JS conditionLoad states: load, domcontentloaded, networkidle
Mouse Control
z-agent-browser mouse move <x> <y> # Move mouse
z-agent-browser mouse down [button] # Press button (left/right/middle)
z-agent-browser mouse up [button] # Release button
z-agent-browser mouse wheel <dy> [dx] # Scroll wheelBrowser Settings
z-agent-browser set viewport <w> <h> # Set viewport size
z-agent-browser set device <name> # Emulate device ("iPhone 14")
z-agent-browser set geo <lat> <lng> # Set geolocation
z-agent-browser set offline [on|off] # Toggle offline mode
z-agent-browser set headers <json> # Extra HTTP headers
z-agent-browser set credentials <u> <p> # HTTP basic auth
z-agent-browser set media [dark|light] # Emulate color schemeCookies & Storage
z-agent-browser cookies # Get all cookies
z-agent-browser cookies set <name> <val> # Set cookie
z-agent-browser cookies clear # Clear cookies
z-agent-browser storage local # Get all localStorage
z-agent-browser storage local <key> # Get specific key
z-agent-browser storage local set <k> <v> # Set value
z-agent-browser storage local clear # Clear all
z-agent-browser storage session # Same for sessionStorageNetwork
z-agent-browser network route <url> # Intercept requests
z-agent-browser network route <url> --abort # Block requests
z-agent-browser network route <url> --body <json> # Mock response
z-agent-browser network unroute [url] # Remove routes
z-agent-browser network requests # View tracked requests
z-agent-browser network requests --filter api # Filter requestsTabs & Windows
z-agent-browser tab # List tabs
z-agent-browser tab new [url] # New tab (optionally with URL)
z-agent-browser tab <n> # Switch to tab n
z-agent-browser tab close [n] # Close tab
z-agent-browser window new # New windowFrames
z-agent-browser frame <sel> # Switch to iframe
z-agent-browser frame main # Back to main frameDialogs
z-agent-browser dialog accept [text] # Accept (with optional prompt text)
z-agent-browser dialog dismiss # DismissDebug
z-agent-browser trace start [path] # Start recording trace
z-agent-browser trace stop [path] # Stop and save trace
z-agent-browser console # View console messages
z-agent-browser console --clear # Clear console
z-agent-browser errors # View page errors
z-agent-browser errors --clear # Clear errors
z-agent-browser highlight <sel> # Highlight element
z-agent-browser state save <path> # Save auth state
z-agent-browser state load <path> # Load auth stateNavigation
z-agent-browser back # Go back
z-agent-browser forward # Go forward
z-agent-browser reload # Reload pageSetup
z-agent-browser install # Download Chromium browser
z-agent-browser install --with-deps # Also install system deps (Linux)Sessions
Run multiple isolated browser instances:
# Different sessions
z-agent-browser --session agent1 open site-a.com
z-agent-browser --session agent2 open site-b.com
# Or via environment variable
AGENT_BROWSER_SESSION=agent1 z-agent-browser click "#btn"
# List active sessions
z-agent-browser session list
# Output:
# Active sessions:
# -> default
# agent1
# Show current session
z-agent-browser sessionEach session has its own:
- Browser instance
- Cookies and storage
- Navigation history
- Authentication state
Snapshot Options
The snapshot command supports filtering to reduce output size:
z-agent-browser snapshot # Full accessibility tree
z-agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
z-agent-browser snapshot -c # Compact (remove empty structural elements)
z-agent-browser snapshot -d 3 # Limit depth to 3 levels
z-agent-browser snapshot -s "#main" # Scope to CSS selector
z-agent-browser snapshot -i -c -d 5 # Combine options| Option | Description |
|--------|-------------|
| -i, --interactive | Only show interactive elements (buttons, links, inputs) |
| -c, --compact | Remove empty structural elements |
| -d, --depth <n> | Limit tree depth |
| -s, --selector <sel> | Scope to CSS selector |
Token Efficiency: eval vs snapshot
For AI agents, token efficiency is critical. Use the right tool for the job:
Use snapshot -i for navigation (finding what to click)
z-agent-browser snapshot -i # Returns interactive elements with refs
# Output: ~200-500 tokens for buttons, links, inputsUse eval for data extraction (getting information)
# Instead of parsing a 5000-token snapshot, run JS to get exactly what you need:
z-agent-browser eval "document.querySelectorAll('.item').length"
z-agent-browser eval "[...document.querySelectorAll('a')].map(a => ({text: a.textContent, href: a.href}))"
z-agent-browser eval "document.querySelector('h1').textContent"When to use which
| Task | Best Tool | Token Cost |
|------|-----------|------------|
| Find button to click | snapshot -i | ~200-500 |
| Count items on page | eval | ~10 |
| Extract all links | eval | ~50-200 |
| Fill a form | snapshot -i + refs | ~200-500 |
| Check if logged in | eval | ~10 |
| Get table data | eval | ~100-500 |
| Navigate complex UI | snapshot -i | ~200-500 |
Example: Extract data efficiently
Bad (snapshot approach - ~5000 tokens):
z-agent-browser snapshot # Returns full page, AI parses itGood (eval approach - ~100 tokens):
z-agent-browser eval "
const rows = [...document.querySelectorAll('tr')];
rows.slice(1, 11).map(r => ({
title: r.cells[0]?.textContent?.trim(),
link: r.querySelector('a')?.href
}));
"
# Returns: [{title: "...", link: "..."}, ...]Rule of thumb:
- Need to CLICK/FILL something? →
snapshot -i+ refs - Need to READ/COUNT/EXTRACT data? →
eval
Options
| Option | Description |
|--------|-------------|
| --session <name> | Use isolated session (or AGENT_BROWSER_SESSION env) |
| --headers <json> | Set HTTP headers scoped to the URL's origin |
| --executable-path <path> | Custom browser executable (or AGENT_BROWSER_EXECUTABLE_PATH env) |
| --json | JSON output (for agents) |
| --full, -f | Full page screenshot |
| --name, -n | Locator name filter |
| --exact | Exact text match |
| --headed | Show browser window (not headless) |
| --cdp <port> | Connect via Chrome DevTools Protocol |
| --debug | Debug output |
Selectors
Refs (Recommended for AI)
Refs provide deterministic element selection from snapshots:
# 1. Get snapshot with refs
z-agent-browser snapshot
# Output:
# - heading "Example Domain" [ref=e1] [level=1]
# - button "Submit" [ref=e2]
# - textbox "Email" [ref=e3]
# - link "Learn more" [ref=e4]
# 2. Use refs to interact
z-agent-browser click @e2 # Click the button
z-agent-browser fill @e3 "[email protected]" # Fill the textbox
z-agent-browser get text @e1 # Get heading text
z-agent-browser hover @e4 # Hover the linkWhy use refs?
- Deterministic: Ref points to exact element from snapshot
- Fast: No DOM re-query needed
- AI-friendly: Snapshot + ref workflow is optimal for LLMs
CSS Selectors
z-agent-browser click "#id"
z-agent-browser click ".class"
z-agent-browser click "div > button"Text & XPath
z-agent-browser click "text=Submit"
z-agent-browser click "xpath=//button"Semantic Locators
z-agent-browser find role button click --name "Submit"
z-agent-browser find label "Email" fill "[email protected]"Agent Mode
Use --json for machine-readable output:
z-agent-browser snapshot --json
# Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
z-agent-browser get text @e1 --json
z-agent-browser is visible @e2 --jsonOptimal AI Workflow
# 1. Navigate and get snapshot
z-agent-browser open example.com
z-agent-browser snapshot -i --json # AI parses tree and refs
# 2. AI identifies target refs from snapshot
# 3. Execute actions using refs
z-agent-browser click @e2
z-agent-browser fill @e3 "input text"
# 4. Get new snapshot if page changed
z-agent-browser snapshot -i --jsonHeaded Mode
Show the browser window for debugging:
z-agent-browser open example.com --headedThis opens a visible browser window instead of running headless.
Authenticated Sessions
Use --headers to set HTTP headers for a specific origin, enabling authentication without login flows:
# Headers are scoped to api.example.com only
z-agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
# Requests to api.example.com include the auth header
z-agent-browser snapshot -i --json
z-agent-browser click @e2
# Navigate to another domain - headers are NOT sent (safe!)
z-agent-browser open other-site.comThis is useful for:
- Skipping login flows - Authenticate via headers instead of UI
- Switching users - Start new sessions with different auth tokens
- API testing - Access protected endpoints directly
- Security - Headers are scoped to the origin, not leaked to other domains
To set headers for multiple origins, use --headers with each open command:
z-agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
z-agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'For global headers (all domains), use set headers:
z-agent-browser set headers '{"X-Custom-Header": "value"}'Custom Browser Executable
Use a custom browser executable instead of the bundled Chromium. This is useful for:
- Serverless deployment: Use lightweight Chromium builds like
@sparticuz/chromium(~50MB vs ~684MB) - System browsers: Use an existing Chrome/Chromium installation
- Custom builds: Use modified browser builds
CLI Usage
# Via flag
z-agent-browser --executable-path /path/to/chromium open example.com
# Via environment variable
AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium z-agent-browser open example.comServerless Example (Vercel/AWS Lambda)
import chromium from '@sparticuz/chromium';
import { BrowserManager } from 'agent-browser';
export async function handler() {
const browser = new BrowserManager();
await browser.launch({
executablePath: await chromium.executablePath(),
headless: true,
});
// ... use browser
}CDP Mode
Connect to an existing browser via Chrome DevTools Protocol:
# Start Chrome with remote debugging
google-chrome --remote-debugging-port=9222
# Connect once, then run commands without --cdp
z-agent-browser connect 9222
z-agent-browser snapshot
z-agent-browser tab
z-agent-browser close
# Or pass --cdp on each command
z-agent-browser --cdp 9222 snapshotThis enables control of:
- Electron apps
- Chrome/Chromium instances with remote debugging
- WebView2 applications
- Any browser exposing a CDP endpoint
Important: In CDP mode, headless/headed is determined by how Chrome was launched, not by z-agent-browser. The --headed flag has no effect when connecting via CDP.
# Headed CDP (visible browser)
google-chrome --remote-debugging-port=9222 &
# Headless CDP (no window) - use --headless=new flag when launching Chrome
google-chrome --headless=new --remote-debugging-port=9222 &For headless automation with your logins, use Profile Mode instead (see above).
Playwright MCP Mode (Experimental)
Control your existing browser session via the Playwright MCP bridge extension. This allows AI agents to automate your actual browser instead of a separate headless instance.
Setup
Install the Chrome extension
- Install "Playwright MCP Bridge" from the Chrome Web Store
- Or load unpacked from the playwright-mcp repo's
extension/directory
Set your extension token and run
# Set the token from the Chrome extension export PLAYWRIGHT_MCP_EXTENSION_TOKEN=your-token-here export AGENT_BROWSER_BACKEND=playwright-mcp # Commands work the same as native mode z-agent-browser open "https://example.com" z-agent-browser snapshot -i z-agent-browser click @e1 z-agent-browser back z-agent-browser close
The daemon spawns npx @playwright/mcp@latest --extension as a subprocess and communicates via stdio. No separate server needed.
Environment Variables
| Variable | Description |
|----------|-------------|
| AGENT_BROWSER_BACKEND | Set to playwright-mcp to use MCP mode (default: native) |
| PLAYWRIGHT_MCP_EXTENSION_TOKEN | Token from the Chrome extension (required for extension mode) |
| PLAYWRIGHT_MCP_COMMAND | Custom command to spawn MCP server (default: npx) |
| PLAYWRIGHT_MCP_ARGS | Space-separated args (default: @playwright/mcp@latest --extension) |
Limitations
- Feature parity: Not all commands are supported. Streaming, profile mode, and stealth mode are not available in MCP mode.
- Extension required: The Chrome extension must be installed and connected for the MCP server to control your browser.
Use Cases
- AI-assisted browsing: Let AI agents help you navigate complex web apps in your actual browser
- Testing with extensions: Test sites that require specific browser extensions
- Debugging: Watch AI actions in real-time in your browser
Streaming (Browser Preview)
Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
Enable Streaming
Set the AGENT_BROWSER_STREAM_PORT environment variable:
AGENT_BROWSER_STREAM_PORT=9223 z-agent-browser open example.comThis starts a WebSocket server on the specified port that streams the browser viewport and accepts input events.
WebSocket Protocol
Connect to ws://localhost:9223 to receive frames and send input:
Receive frames:
{
"type": "frame",
"data": "<base64-encoded-jpeg>",
"metadata": {
"deviceWidth": 1280,
"deviceHeight": 720,
"pageScaleFactor": 1,
"offsetTop": 0,
"scrollOffsetX": 0,
"scrollOffsetY": 0
}
}Send mouse events:
{
"type": "input_mouse",
"eventType": "mousePressed",
"x": 100,
"y": 200,
"button": "left",
"clickCount": 1
}Send keyboard events:
{
"type": "input_keyboard",
"eventType": "keyDown",
"key": "Enter",
"code": "Enter"
}Send touch events:
{
"type": "input_touch",
"eventType": "touchStart",
"touchPoints": [{ "x": 100, "y": 200 }]
}Programmatic API
For advanced use, control streaming directly via the protocol:
import { BrowserManager } from 'agent-browser';
const browser = new BrowserManager();
await browser.launch({ headless: true });
await browser.navigate('https://example.com');
// Start screencast
await browser.startScreencast((frame) => {
// frame.data is base64-encoded image
// frame.metadata contains viewport info
console.log('Frame received:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
}, {
format: 'jpeg',
quality: 80,
maxWidth: 1280,
maxHeight: 720,
});
// Inject mouse events
await browser.injectMouseEvent({
type: 'mousePressed',
x: 100,
y: 200,
button: 'left',
});
// Inject keyboard events
await browser.injectKeyboardEvent({
type: 'keyDown',
key: 'Enter',
code: 'Enter',
});
// Stop when done
await browser.stopScreencast();Architecture
z-agent-browser uses a client-daemon architecture:
- Rust CLI (fast native binary) - Parses commands, communicates with daemon
- Node.js Daemon - Manages Playwright browser instance
- Fallback - If native binary unavailable, uses Node.js directly
The daemon starts automatically on first command and persists between commands for fast subsequent operations.
Browser Engine: Uses Chromium by default. The daemon also supports Firefox and WebKit via the Playwright protocol.
Platforms
| Platform | Binary | Fallback | |----------|--------|----------| | macOS ARM64 | Native Rust | Node.js | | macOS x64 | Native Rust | Node.js | | Linux ARM64 | Native Rust | Node.js | | Linux x64 | Native Rust | Node.js | | Windows x64 | Native Rust | Node.js |
Usage with AI Agents
Just ask the agent
The simplest approach - just tell your agent to use it:
Use z-agent-browser to test the login flow. Run z-agent-browser --help to see available commands.The --help output is comprehensive and most agents can figure it out from there.
AGENTS.md / CLAUDE.md
For more consistent results, add to your project or global instructions file:
## Browser Automation
Use `z-agent-browser` for web automation. Run `z-agent-browser --help` for all commands.
Core workflow:
1. `z-agent-browser open <url>` - Navigate to page
2. `z-agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
3. `z-agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
4. Re-snapshot after page changesClaude Code Skill
For Claude Code, install the browser-skill:
/plugin marketplace add zm2231/browser-skill
/plugin install browser-skill@browser-skill-marketplaceOr manually:
mkdir -p ~/.claude/skills/browser-automation
curl -o ~/.claude/skills/browser-automation/skill.md \
https://raw.githubusercontent.com/zm2231/browser-skill/main/skills/browser-automation/skill.mdEnhanced Fork Features
This fork (zm2231/agent-browser) adds features for bot detection bypass, persistent auth, custom profiles, and more.
Installation (Enhanced Fork)
git clone https://github.com/zm2231/agent-browser.git
cd agent-browser
pnpm install
pnpm build
pnpm build:native # requires Rust: https://rustup.rs
npm link
z-agent-browser installStealth Mode
Bypass bot detection using playwright-extra with stealth plugin:
z-agent-browser --stealth open https://bot.sannysoft.com
z-agent-browser snapshot -i
# Most bot detection tests pass
# Via environment variable
AGENT_BROWSER_STEALTH=1 z-agent-browser open https://example.comStealth mode applies evasions for: WebDriver detection, Chrome automation flags, permissions, plugins, languages, WebGL, and more.
Auto-Persistence
Save and restore auth state automatically between sessions:
# First session: log in with --persist
z-agent-browser --persist open "https://github.com/login" --headed
# User logs in manually
z-agent-browser close # State saved to ~/.z-agent-browser/sessions/default.json
# Later sessions: auth restored automatically
z-agent-browser --persist open "https://github.com" # Already logged inUse explicit state file:
z-agent-browser --state ~/github-auth.json open "https://github.com"Profile Mode (Headless with Your Logins)
Use a persistent Chrome profile directory to run headless with all your existing logins, extensions, and passwords:
# 1. Copy your Chrome profile (one-time setup)
cp -R "$HOME/Library/Application Support/Google/Chrome" ~/.z-agent-browser/chrome-profile
# 2. Run headless with your logins (DEFAULT - no browser window)
z-agent-browser --profile ~/.z-agent-browser/chrome-profile open "https://github.com"
# You're logged in! No visible browser.
# 3. Or run headed if you need to see the browser
z-agent-browser --profile ~/.z-agent-browser/chrome-profile --headed open "https://github.com"Key points:
- Headless by default - no
--headedflag needed for background automation - Uses a COPY of your profile - your real Chrome data is safe
- Keeps extensions, bookmarks, passwords, cookies, localStorage
- Profile location:
~/.z-agent-browser/chrome-profile(recommended) - Cannot combine with CDP mode
Headless Limitation: Google, Gmail, and other strict sites detect headless Chromium and invalidate sessions. For these sites, use --headed or CDP Mode with real Chrome.app.
Profile Mode vs CDP Mode:
| Feature | Profile Mode | CDP Mode |
|---------|--------------|----------|
| Command | --profile <path> | --cdp <port> or connect <port> |
| Headless support | Yes (default) | Depends on how Chrome was launched |
| Profile data | Uses COPY (safe) | Uses running Chrome's profile |
| Browser process | Playwright launches Chromium | Connects to existing Chrome |
| Best for | Background automation with logins | Interactive debugging, user's actual browser |
Custom User-Agent
z-agent-browser --user-agent "MyBot/1.0 (compatible)" open https://httpbin.org/user-agentBrowser Launch Arguments
Pass custom Chromium flags:
z-agent-browser --args "--disable-gpu,--no-sandbox" open https://example.comCommon args:
--disable-gpu: disable GPU acceleration--no-sandbox: required in some Docker containers--disable-dev-shm-usage: overcome limited /dev/shm in Docker--window-size=1920,1080: set initial window size
HTTPS Certificate Errors
Skip SSL validation for local dev servers with self-signed certs:
z-agent-browser --ignore-https-errors open "https://localhost:8443"Note: When changing launch options (like --ignore-https-errors), kill any existing daemon first:
pkill -f "node.*daemon"; sleep 1
z-agent-browser --ignore-https-errors open "https://localhost:8443"Video Recording
Record browser sessions to WebM:
z-agent-browser open "https://example.com" --headed
z-agent-browser record start ./demo.webm
z-agent-browser fill @e1 "demo input"
z-agent-browser click @e2
z-agent-browser record stop
# Video saved to ./demo.webm
# Restart recording with new file
z-agent-browser record restart ./take2.webmRecording creates a fresh context but preserves cookies and storage.
Tab New with URL
Open new tab directly at a URL:
z-agent-browser tab new https://example.comScreenshot to Base64
Omit path to get base64-encoded PNG:
z-agent-browser screenshot --json
# Returns: {"success":true,"data":{"base64":"iVBORw0KGgo..."}}Runtime State Load
Load auth state into a running browser (not just at launch):
z-agent-browser open "https://github.com"
z-agent-browser state load ~/.browser/github-auth.json # Loads into current sessionConnect Command
Establish persistent CDP connection; subsequent commands omit --cdp:
# Start Chrome with remote debugging
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug &
# Connect once
z-agent-browser connect 9222
# All subsequent commands use CDP automatically
z-agent-browser open "https://example.com"
z-agent-browser snapshot -i
z-agent-browser closeSpecial URL Schemes
Support for about:, data:, and file: URLs:
z-agent-browser open "about:blank"
z-agent-browser open "data:text/html,<h1>Hello</h1>"
z-agent-browser open "file:///path/to/local.html"Environment Variables
| Variable | Description |
|----------|-------------|
| AGENT_BROWSER_SESSION | Session name for isolation |
| AGENT_BROWSER_HEADED | Set to "1" for visible browser |
| AGENT_BROWSER_STEALTH | Set to "1" for stealth mode |
| AGENT_BROWSER_PERSIST | Set to "1" for auto-persistence |
| AGENT_BROWSER_STATE | Path to state file |
| AGENT_BROWSER_PROFILE | Path to Chrome profile directory |
| AGENT_BROWSER_USER_AGENT | Custom User-Agent string |
| AGENT_BROWSER_ARGS | Comma-separated browser launch args |
| AGENT_BROWSER_IGNORE_HTTPS_ERRORS | Set to "1" to skip SSL validation |
| AGENT_BROWSER_EXECUTABLE_PATH | Custom browser binary path |
| AGENT_BROWSER_EXTENSIONS | Path to browser extensions |
| AGENT_BROWSER_STREAM_PORT | WebSocket port for streaming |
| AGENT_BROWSER_BACKEND | Backend type: native (default) or playwright-mcp |
| PLAYWRIGHT_MCP_COMMAND | Command to spawn MCP server (default: npx) |
| PLAYWRIGHT_MCP_ARGS | Space-separated args for MCP server (default: @playwright/mcp@latest) |
| NO_COLOR | Disable colored output |
Known Issues
--ignore-https-errors with existing daemon: If daemon already has a browser context, new launch options may not apply. Kill daemon before changing options:
pkill -f "node.*daemon"Acknowledgments
- Playwright MCP by Microsoft - Powers the experimental MCP backend mode
- Playwright by Microsoft - Core browser automation engine
- vercel-labs/agent-browser - Original project this fork is based on
License
Apache-2.0
