chrome-agent
v0.4.2
Published
Browser automation for AI agents. Single binary, zero dependencies, CDP direct.
Maintainers
Readme
chrome-agent
Browser automation for AI agents. Single Rust binary, zero runtime dependencies, talks CDP directly to Chrome.
Why
Existing tools (Playwright, Puppeteer, Selenium) carry heavy runtimes and weren't designed for agents. Agents need:
- Minimum tokens — a11y tree snapshots instead of raw HTML (~50 tokens vs ~2000)
- Minimum round-trips —
--inspectreturns updated page state with every action - Zero setup — single binary, headless by default, no npm/Node required
- Persistent sessions — login once, stay logged in across invocations
- Stable UIDs — element identifiers based on
backendNodeId, survive between inspects - 3 targeting modes — uid from accessibility tree, CSS selectors, or coordinates
Install
For AI agents (recommended)
# Install the skill — your agent learns chrome-agent automatically
npx skills add sderosiaux/chrome-agentThis installs a SKILL.md that teaches your agent (Claude Code, Cursor, Copilot, etc.) how to use chrome-agent, including the workflow, commands, and best practices.
CLI binary
# npm (downloads prebuilt binary)
npm install -g chrome-agent
# or with npx (no install needed)
npx chrome-agent --help
# or with Cargo (builds from source)
cargo install chrome-agentQuick Start
# Navigate and inspect the page in one call
chrome-agent goto https://example.com --inspect
# → https://example.com — Example Domain
# → uid=n1 RootWebArea "Example Domain"
# → uid=n9 heading "Example Domain" level=1
# → uid=n10 paragraph "This domain is for..."
# → uid=n12 link "Learn more"
# Click by uid, get updated page state
chrome-agent click n12 --inspect
# Fill a form field
chrome-agent fill --uid n20 "[email protected]"
# Or target by CSS selector (when uids aren't practical)
chrome-agent click --selector "button.submit"
chrome-agent fill --selector "input[name=email]" "[email protected]"
# Extract article content (Mozilla Readability — reader mode)
chrome-agent read
# Extract full visible text (use --selector to scope, --truncate to cap)
chrome-agent text --selector "main" --truncate 500
# Evaluate JavaScript
chrome-agent eval "document.title"
# Screenshot (returns file path, not binary data)
chrome-agent screenshotHow It Works
chrome-agent v0.2.0 (Rust, ~5.3K lines, 2.9 MB binary)
│
│ WebSocket (Chrome DevTools Protocol)
▼
Chrome / Chromium (headless by default)No Node.js. No Playwright. No daemon required. Headless by default — --headed for debugging.
UIDs are stable across inspects (based on Chrome's backendNodeId). The agent inspects, picks a uid, acts — even minutes later. When a11y tree isn't practical, CSS selectors and coordinates work as fallbacks. Click auto-falls back to JS .click() when the element has no box model.
Commands
| Command | Description |
|---------|------------|
| goto <url> [--inspect] [--max-depth N] | Navigate to URL |
| inspect [--verbose] [--max-depth N] [--uid nN] [--filter "role,role"] | Accessibility tree with stable uids |
| click <uid> [--inspect] [--max-depth N] | Click by uid (JS fallback if no box model) |
| click --selector "css" [--inspect] | Click by CSS selector |
| click --xy 100,200 | Click by coordinates |
| fill --uid <uid> <value> [--inspect] | Fill input by uid |
| fill --selector "css" <value> | Fill by CSS selector |
| fill-form <uid=val>... | Batch fill multiple fields |
| read [--html] [--truncate N] | Extract main content (Mozilla Readability) |
| text [uid] [--selector "css"] [--truncate N] | Extract visible text (page or element) |
| eval <expression> [--selector "css"] | Run JS in page context (el = matched element) |
| network [--filter "pattern"] [--body] [--live N] | Capture network requests / API responses |
| console [--level error] [--clear] | Show captured console.log/warn/error + JS exceptions |
| pipe | Persistent connection: JSON stdin → JSON stdout |
| wait <text\|url\|selector> <pattern> | Wait for condition |
| type <text> [--selector "css"] | Type into focused/selected element |
| press <key> | Press Enter, Tab, Escape, etc. |
| scroll <down\|up\|uid> | Scroll page or element into view |
| hover <uid> | Hover over element |
| back | Navigate back in history |
| screenshot [--filename name] | Capture screenshot → file path |
| tabs | List open browser tabs |
| close [--purge] | Close browser (--purge deletes profile/cookies) |
| status | Show session info |
| stop | Stop background daemon |
Global Flags
--browser <name> Named browser profile (default: "default")
--page <name> Named page/tab (default: "default")
--connect [url] Connect to running Chrome (auto or explicit)
--headed Show browser window (default is headless)
--stealth Bypass bot detection (Cloudflare, Turnstile)
--timeout <seconds> Command timeout (default: 30)
--max-depth <N> Limit inspect tree depth (works with --inspect on any command)
--ignore-https-errors Accept self-signed certificates
--json Structured JSON output for all commandsThe Inspect → Act → Inspect Loop
# 1. Navigate and inspect
chrome-agent goto https://app.com/login --inspect
# → uid=n47 heading "Login" level=1
# uid=n52 textbox "Email" focusable
# uid=n58 textbox "Password" focusable
# uid=n63 button "Sign In" focusable
# 2. Act
chrome-agent fill --uid n52 "[email protected]"
chrome-agent fill --uid n58 "password123"
# 3. Click with --inspect to get result + new state in one call
chrome-agent click n63 --inspect
# → Clicked uid=n63
# → uid=n101 heading "Dashboard" level=1
# → uid=n105 navigation "Main menu"UIDs (n47, n52, etc.) are stable — they won't change between inspects as long as the DOM node exists.
Network Capture
Extract API data directly instead of DOM scraping:
# Show resources loaded by the page (stealth-safe, uses Performance API)
chrome-agent network --filter "api"
# Capture live traffic with response bodies (5 seconds)
chrome-agent network --live 5 --body --filter "graphql"
# JSON output for structured extraction
chrome-agent --json network --body --filter "api" --limit 10Console Capture
See what the page logs — useful for debugging and error detection:
chrome-agent console # all messages
chrome-agent console --level error # errors + exceptions only
chrome-agent console --clear # read and clear bufferStealth-safe: uses injected interceptor, not Runtime.enable.
Pipe Mode
Persistent connection for high-performance agent workflows:
# Start pipe (one connection, reads JSON from stdin)
echo '{"cmd":"goto","url":"https://example.com","inspect":true}
{"cmd":"click","uid":"n12","inspect":true}
{"cmd":"read"}' | chrome-agent pipeEach command returns one JSON line: {"ok":true,...} or {"ok":false,"error":"..."}. 10x faster than spawning chrome-agent per command.
Content Extraction
# Article content (Readability — like Firefox Reader Mode)
chrome-agent read
# → # Article Title
# → Clean article text without nav, footer, sidebar...
# Full page text (scoped by selector)
chrome-agent text --selector "[role=main]" --truncate 1000
# Structured data via JS
chrome-agent eval "JSON.stringify([...document.querySelectorAll('h2')].map(e => e.textContent))"Stealth Mode
Many sites (Cloudflare, Turnstile) block headless Chrome. --stealth patches 7 automation fingerprints via CDP:
chrome-agent --stealth goto https://protected-site.com --inspectWhat it patches:
navigator.webdriver→undefinedchrome.runtime→ mocked (headless doesn't have it)- Permissions API → consistent with real browser
- WebGL renderer → masks ANGLE/headless fingerprint
- User-Agent → removes "HeadlessChrome"
- Input
screenX/pageXleak → random offset added Runtime.enable→ skipped (the #1 CDP detection vector)
All patches are CDP-level (Page.addScriptToEvaluateOnNewDocument). No fake Chrome flags.
Heavy bot protection (DataDome, Kasada)
Some sites (Leboncoin, etc.) use advanced fingerprinting that detects bundled Chromium regardless of CDP patches. For these, connect to your real installed Chrome instead:
# Launch your real Chrome with debugging enabled
google-chrome --remote-debugging-port=9222 &
# Connect chrome-agent to it
chrome-agent --connect http://127.0.0.1:9222 goto https://www.leboncoin.fr --inspectReal Chrome has genuine canvas/audio/codec fingerprints that Chromium lacks.
| Protection Level | Solution |
|---|---|
| None | chrome-agent goto ... |
| Cloudflare/Turnstile | chrome-agent --stealth goto ... |
| DataDome/Kasada | chrome-agent --connect to real Chrome |
JSON Mode
chrome-agent --json goto https://example.com --inspect
# → {"ok":true,"url":"...","title":"...","snapshot":"uid=n1 heading..."}
chrome-agent --json eval "1+1"
# → {"ok":true,"result":2}
chrome-agent --json read
# → {"ok":true,"title":"...","text":"...","excerpt":"...","byline":"..."}
# Errors also structured (exit 0 for agent parsing):
chrome-agent --json click n99
# → {"ok":false,"error":"Element uid=n99 not found.","hint":"Run 'chrome-agent inspect'"}Multi-Tab
chrome-agent --page main goto https://app.com
chrome-agent --page docs goto https://docs.app.com
chrome-agent --page main eval "document.title" # → "App"
chrome-agent --page docs eval "document.title" # → "Docs"Parallel Agents
Multiple agents sharing the same browser corrupt each other's sessions. Isolate with --browser:
# Agent 1
chrome-agent --browser agent1 goto https://example.com
# Agent 2 (separate Chrome instance)
chrome-agent --browser agent2 goto https://other.comUsing with AI Agents
Skill (recommended)
npx skills add sderosiaux/chrome-agentThis installs a SKILL.md that teaches your agent the full chrome-agent workflow, commands, and tips. Works with Claude Code, Cursor, Copilot, and any agent that reads skill files.
Manual
Tell your agent to run chrome-agent --help — the help output includes a complete LLM usage guide.
Claude Code permissions
{
"permissions": {
"allow": ["Bash(chrome-agent *)"]
}
}Connect to Your Browser
chrome-agent --connect inspect # auto-discover Chrome with debugging
google-chrome --remote-debugging-port=9222 # or launch manuallyComparison
| | chrome-agent | dev-browser | chrome-devtools-mcp | Playwright MCP |
|---|---|---|---|---|
| Language | Rust | Rust + Node.js | TypeScript | TypeScript |
| Runtime deps | none | Node.js + npm + Playwright + QuickJS | Node.js + Puppeteer | Node.js + Playwright |
| Binary size | ~3 MB | ~3 MB (CLI) + ~200 MB (daemon + deps) | npm package | npm package |
| CLI startup (reuse session) | ~10ms | ~500ms (daemon check) | N/A (MCP server) | N/A (MCP server) |
| Element targeting | uid + CSS selector + coordinates | CSS selectors + snapshotForAI | uid (sequential) | CSS selectors |
| UID stability | backendNodeId (stable across inspects) | N/A | sequential (reassigned each snapshot) | N/A |
| Action + observe | --inspect flag (1 call) | 1 script (batched) | 1 MCP call per action | 1 MCP call per action |
| Script batching | No (atomic commands + eval) | Full JS scripts in QuickJS sandbox | No | No |
| Stealth mode | 7 CDP patches + Runtime.enable skip | No | No | No |
| Reader mode | read (Mozilla Readability) | No | No | No |
| Sandbox | Chrome sandbox | QuickJS WASM sandbox | Chrome sandbox | No |
| Network capture | Retroactive + live | No | No | Metadata only (no bodies) |
| Console capture | Stealth-safe interceptor | No | Console messages | No |
| Pipe mode | JSON stdin/stdout | No | No | No |
| Code | ~5.3K lines | ~76K lines (69K Playwright fork) | ~12K lines | Playwright |
License
MIT
