chrome-agent

v0.5.0

Published

9 days ago

Browser automation for AI agents. Single binary, zero dependencies, CDP direct.

0High
0Medium
0Low

chtefi

browser automation ai agent cdp chrome headless cli

chrome-agent

Browser automation for AI agents. Single Rust binary, zero runtime dependencies, talks CDP directly to Chrome.

Why

Existing tools (Playwright, Puppeteer, Selenium) carry heavy runtimes and weren't designed for agents. Agents need:

Minimum tokens — a11y tree snapshots instead of raw HTML (~50 tokens vs ~2000)
Minimum round-trips — --inspect returns updated page state with every action
Zero setup — single binary, headless by default, no npm/Node required
Persistent sessions — login once, stay logged in across invocations
Stable UIDs — element identifiers based on backendNodeId, survive between inspects
3 targeting modes — uid from accessibility tree, CSS selectors, or coordinates

Install

For AI agents (recommended)

# Install the skill — your agent learns chrome-agent automatically
npx skills add sderosiaux/chrome-agent

This installs a SKILL.md that teaches your agent (Claude Code, Cursor, Copilot, etc.) how to use chrome-agent, including the workflow, commands, and best practices.

CLI binary

# npm (downloads prebuilt binary)
npm install -g chrome-agent

# or with npx (no install needed)
npx chrome-agent --help

# or with Cargo (builds from source)
cargo install chrome-agent

Quick Start

# Navigate and inspect the page in one call
chrome-agent goto https://example.com --inspect
# → https://example.com — Example Domain
# → uid=n1 RootWebArea "Example Domain"
# →   uid=n9 heading "Example Domain" level=1
# →   uid=n10 paragraph "This domain is for..."
# →   uid=n12 link "Learn more"

# Click by uid, get updated page state
chrome-agent click n12 --inspect

# Fill a form field
chrome-agent fill --uid n20 "[email protected]"

# Or target by CSS selector (when uids aren't practical)
chrome-agent click --selector "button.submit"
chrome-agent fill --selector "input[name=email]" "[email protected]"

# Extract article content (Mozilla Readability — reader mode)
chrome-agent read

# Extract full visible text (use --selector to scope, --truncate to cap)
chrome-agent text --selector "main" --truncate 500

# Evaluate JavaScript
chrome-agent eval "document.title"

# Screenshot (returns file path, not binary data)
chrome-agent screenshot

How It Works

chrome-agent v0.2.0 (Rust, ~5.3K lines, 2.9 MB binary)
    │
    │ WebSocket (Chrome DevTools Protocol)
    ▼
Chrome / Chromium (headless by default)

No Node.js. No Playwright. No daemon required. Headless by default — --headed for debugging.

UIDs are stable across inspects (based on Chrome's backendNodeId). The agent inspects, picks a uid, acts — even minutes later. When a11y tree isn't practical, CSS selectors and coordinates work as fallbacks. Click auto-falls back to JS .click() when the element has no box model.

Commands

| Command | Description | |---------|------------| | goto <url> [--inspect] [--max-depth N] [--header "K: V"] | Navigate to URL. --header sends extra HTTP headers (repeatable) | | inspect [--verbose] [--max-depth N] [--uid nN] [--filter "role,role"] [--max-chars N] [--offset K] | Accessibility tree with stable uids. --max-chars/--offset cap and page the output | | click <uid> [--inspect] [--max-depth N] | Click by uid (JS fallback if no box model) | | click --selector "css" [--inspect] | Click by CSS selector | | click --xy 100,200 | Click by coordinates | | fill --uid <uid> <value> [--inspect] | Fill input by uid | | fill --selector "css" <value> | Fill by CSS selector | | fill-form <uid=val>... | Batch fill multiple fields | | read [--html] [--truncate N] | Extract main content (Mozilla Readability) | | text [uid] [--selector "css"] [--truncate N] | Extract visible text (page or element) | | eval <expression> [--selector "css"] | Run JS in page context (el = matched element) | | network [--filter "pattern"] [--body] [--live N] | Capture network requests / API responses | | console [--level error] [--clear] | Show captured console.log/warn/error + JS exceptions | | pipe | Persistent connection: JSON stdin → JSON stdout | | wait <text\|url\|selector> <pattern> | Wait for condition | | wait network-idle [--idle-ms N] [--timeout N] | Wait until the network is quiet (SPA/XHR settle) | | type <text> [--selector "css"] | Type into focused/selected element | | press <key> | Press Enter, Tab, Escape, etc. | | scroll <down\|up\|uid> | Scroll page or element into view | | hover <uid> | Hover over element | | back | Navigate back in history | | screenshot [--filename name] [--format jpeg\|png] [--quality N] [--max-width N] [--uid nN\|--selector "css"] | Screenshot → file path. JPEG/max-width shrink it; --uid/--selector clip to one element | | pdf [--out name] [--landscape] [--background] | Print the current page to a PDF file | | download <url> [--out path] [--timeout N] | Download a URL fetched in-page (cookies/auth preserved) → {path,bytes,mime} | | tabs | List open browser tabs | | close [--purge] | Close browser (--purge deletes profile/cookies) | | status | Show session info | | stop | Stop background daemon |

Global Flags

--browser <name>         Named browser profile (default: "default")
--page <name>            Named page/tab (default: "default")
--connect [url]          Connect to running Chrome (auto or explicit)
--headed                 Show browser window (default is headless)
--stealth                Bypass bot detection (Cloudflare, Turnstile)
--timeout <seconds>      Command timeout (default: 30)
--max-depth <N>          Limit inspect tree depth (works with --inspect on any command)
--copy-cookies           Use cookies from your real Chrome profile
--dialog <mode>          JS dialog policy: accept (default), dismiss, or manual
--dialog-text <text>     Text submitted for prompt() dialogs under --dialog accept
--ignore-https-errors    Accept self-signed certificates
--json                   Structured JSON output for all commands

JS dialogs (alert/confirm/prompt/beforeunload) are auto-answered by default (--dialog accept) so the page never hangs on a blocking dialog.

The Inspect → Act → Inspect Loop

# 1. Navigate and inspect
chrome-agent goto https://app.com/login --inspect
# → uid=n47 heading "Login" level=1
#   uid=n52 textbox "Email" focusable
#   uid=n58 textbox "Password" focusable
#   uid=n63 button "Sign In" focusable

# 2. Act
chrome-agent fill --uid n52 "[email protected]"
chrome-agent fill --uid n58 "password123"

# 3. Click with --inspect to get result + new state in one call
chrome-agent click n63 --inspect
# → Clicked uid=n63
# → uid=n101 heading "Dashboard" level=1
# → uid=n105 navigation "Main menu"

UIDs (n47, n52, etc.) are stable — they won't change between inspects as long as the DOM node exists.

Network Capture

Extract API data directly instead of DOM scraping:

# Show resources loaded by the page (stealth-safe, uses Performance API)
chrome-agent network --filter "api"

# Capture live traffic with response bodies (5 seconds)
chrome-agent network --live 5 --body --filter "graphql"

# JSON output for structured extraction
chrome-agent --json network --body --filter "api" --limit 10

Console Capture

See what the page logs — useful for debugging and error detection:

chrome-agent console                    # all messages
chrome-agent console --level error      # errors + exceptions only
chrome-agent console --clear            # read and clear buffer

Stealth-safe: uses injected interceptor, not Runtime.enable.

Pipe Mode

Persistent connection for high-performance agent workflows:

# Start pipe (one connection, reads JSON from stdin)
echo '{"cmd":"goto","url":"https://example.com","inspect":true}
{"cmd":"click","uid":"n12","inspect":true}
{"cmd":"read"}' | chrome-agent pipe

Each command returns one JSON line: {"ok":true,...} or {"ok":false,"error":"..."}. 10x faster than spawning chrome-agent per command.

Content Extraction

# Article content (Readability — like Firefox Reader Mode)
chrome-agent read
# → # Article Title
# → Clean article text without nav, footer, sidebar...

# Full page text (scoped by selector)
chrome-agent text --selector "[role=main]" --truncate 1000

# Structured data via JS
chrome-agent eval "JSON.stringify([...document.querySelectorAll('h2')].map(e => e.textContent))"

Stealth Mode

Many sites (Cloudflare, Turnstile) block headless Chrome. --stealth patches 7 automation fingerprints via CDP:

chrome-agent --stealth goto https://protected-site.com --inspect

What it patches:

navigator.webdriver → undefined
chrome.runtime → mocked (headless doesn't have it)
Permissions API → consistent with real browser
WebGL renderer → masks ANGLE/headless fingerprint
User-Agent → removes "HeadlessChrome"
Input screenX/pageX leak → random offset added
Runtime.enable → skipped (the #1 CDP detection vector)

All patches are CDP-level (Page.addScriptToEvaluateOnNewDocument). No fake Chrome flags.

Heavy bot protection (DataDome, Kasada)

Some sites (Leboncoin, etc.) use advanced fingerprinting that detects bundled Chromium regardless of CDP patches. For these, connect to your real installed Chrome instead:

# Launch your real Chrome with debugging enabled
google-chrome --remote-debugging-port=9222 &

# Connect chrome-agent to it
chrome-agent --connect http://127.0.0.1:9222 goto https://www.leboncoin.fr --inspect

Real Chrome has genuine canvas/audio/codec fingerprints that Chromium lacks.

| Protection Level | Solution | |---|---| | None | chrome-agent goto ... | | Cloudflare/Turnstile | chrome-agent --stealth goto ... | | DataDome/Kasada | chrome-agent --connect to real Chrome |

JSON Mode

chrome-agent --json goto https://example.com --inspect
# → {"ok":true,"url":"...","title":"...","snapshot":"uid=n1 heading..."}

chrome-agent --json eval "1+1"
# → {"ok":true,"result":2}

chrome-agent --json read
# → {"ok":true,"title":"...","text":"...","excerpt":"...","byline":"..."}

# Errors also structured (exit 0 for agent parsing):
chrome-agent --json click n99
# → {"ok":false,"error":"Element uid=n99 not found.","hint":"Run 'chrome-agent inspect'"}

Multi-Tab

chrome-agent --page main goto https://app.com
chrome-agent --page docs goto https://docs.app.com
chrome-agent --page main eval "document.title"   # → "App"
chrome-agent --page docs eval "document.title"   # → "Docs"

Parallel Agents

Multiple agents sharing the same browser corrupt each other's sessions. Isolate with --browser:

# Agent 1
chrome-agent --browser agent1 goto https://example.com

# Agent 2 (separate Chrome instance)
chrome-agent --browser agent2 goto https://other.com

Using with AI Agents

Skill (recommended)

npx skills add sderosiaux/chrome-agent

This installs a SKILL.md that teaches your agent the full chrome-agent workflow, commands, and tips. Works with Claude Code, Cursor, Copilot, and any agent that reads skill files.

Manual

Tell your agent to run chrome-agent --help — the help output includes a complete LLM usage guide.

Claude Code permissions

{
  "permissions": {
    "allow": ["Bash(chrome-agent *)"]
  }
}

Connect to Your Browser

chrome-agent --connect inspect    # auto-discover Chrome with debugging
google-chrome --remote-debugging-port=9222  # or launch manually

Comparison

| | chrome-agent | dev-browser | chrome-devtools-mcp | Playwright MCP | |---|---|---|---|---| | Language | Rust | Rust + Node.js | TypeScript | TypeScript | | Runtime deps | none | Node.js + npm + Playwright + QuickJS | Node.js + Puppeteer | Node.js + Playwright | | Binary size | ~3 MB | ~3 MB (CLI) + ~200 MB (daemon + deps) | npm package | npm package | | CLI startup (reuse session) | ~10ms | ~500ms (daemon check) | N/A (MCP server) | N/A (MCP server) | | Element targeting | uid + CSS selector + coordinates | CSS selectors + snapshotForAI | uid (sequential) | CSS selectors | | UID stability | backendNodeId (stable across inspects) | N/A | sequential (reassigned each snapshot) | N/A | | Action + observe | --inspect flag (1 call) | 1 script (batched) | 1 MCP call per action | 1 MCP call per action | | Script batching | No (atomic commands + eval) | Full JS scripts in QuickJS sandbox | No | No | | Stealth mode | 7 CDP patches + Runtime.enable skip | No | No | No | | Reader mode | read (Mozilla Readability) | No | No | No | | Sandbox | Chrome sandbox | QuickJS WASM sandbox | Chrome sandbox | No | | Network capture | Retroactive + live | No | No | Metadata only (no bodies) | | Console capture | Stealth-safe interceptor | No | Console messages | No | | Pipe mode | JSON stdin/stdout | No | No | No | | Code | ~5.3K lines | ~76K lines (69K Playwright fork) | ~12K lines | Playwright |

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

chrome-agent

Why

Install

For AI agents (recommended)

CLI binary

Quick Start

How It Works

Commands

Global Flags

The Inspect → Act → Inspect Loop

Network Capture

Console Capture

Pipe Mode

Content Extraction

Stealth Mode

Heavy bot protection (DataDome, Kasada)

JSON Mode

Multi-Tab

Parallel Agents

Using with AI Agents

Skill (recommended)

Manual

Claude Code permissions

Connect to Your Browser

Comparison

License