browser4-cli
v0.1.16
Published
Browser automation CLI for AI agents
Downloads
2,329
Maintainers
Readme
Browser4
Make websites accessible for AI agents. Automate tasks online with ease.
Installation
Global Installation (recommended)
Installs the native Rust binary:
npm install -g browser4-cliAfter installation, use browser4-cli. The shorter browser4 command remains
available as a compatibility alias.
Project Installation (local dependency)
For projects that want to pin the version in package.json:
npm install browser4-cliThen use via package.json scripts or by invoking browser4-cli directly.
Homebrew (macOS)
brew install browser4-cliCargo (Rust)
cargo install browser4-cliStandalone Installer Scripts (no npm / Rust / Homebrew needed)
Bootstrap the native binary directly with a single command:
Windows (PowerShell):
Invoke-WebRequest -Uri "https://browser4.oss-cn-beijing.aliyuncs.com/scripts/install-browser4-cli.ps1" -OutFile "$env:TEMP\install-browser4-cli.ps1"
powershell -ExecutionPolicy Bypass -File "$env:TEMP\install-browser4-cli.ps1"Linux / macOS (bash):
curl -fsSL https://browser4.oss-cn-beijing.aliyuncs.com/scripts/install-browser4-cli.sh | bashOr run the scripts locally from a cloned repo:
cli/scripts/install-browser4-cli.ps1(Windows)cli/scripts/install-browser4-cli.sh(Linux / macOS / Git Bash)
See Standalone CLI Installer Scripts for the full option reference, supported platforms, and examples.
From Source
Build prerequisites: Rust (stable, edition 2021), Node.js 24+, pnpm 10+, git.
git clone https://github.com/platonai/Browser4.git
cd Browser4/cli/browser4-cli
pnpm install
pnpm build:native # Compiles the Rust binary (requires https://rustup.rs)
pnpm link --global # Makes browser4-cli available globallyCross-compilation from Linux (for release builds) additionally requires:
cargo-zigbuild, Zig 0.13.0, gcc-aarch64-linux-gnu, and mingw-w64.
See cli/docker/Dockerfile.build for a Dockerized build environment.
Requirements
- Chrome — Latest Chrome installed on your system. The CLI can auto-install Chrome on most platforms when missing.
- Java 17+ — Required to run the Browser4 backend (
Browser4.jar). Eclipse Temurin recommended. JDK 21+ enables best jlink compression when the CLI auto-builds a runtime bundle from source. - Rust — Only needed when building the CLI from source (see From Source above). The
stabletoolchain (edition 2021) is sufficient.
Additional requirements for auto-building a runtime bundle from source
When the CLI detects a Browser4 repository checkout, it attempts to build a self-contained runtime bundle (bundled JRE + dependency JARs) from source instead of downloading a pre-built release. This requires:
| Tool | Version | Linux | macOS | Windows |
|------|---------|-------|-------|---------|
| Maven | 3.9+ | via mvnw wrapper | via mvnw wrapper | via mvnw.cmd wrapper |
| JDK tools (jdeps, jlink) | bundled with JDK 16+ | included in JDK | included in JDK | included in JDK |
| PowerShell 7 (pwsh) | 7.0+ | required | required | built-in (powershell.exe) |
| tar | any | required | required | built-in |
Set BROWSER4_CLI_FORCE_REMOTE_BUNDLE=1 to skip the local build and always
download a pre-built bundle — useful in CI / corporate environments where
Maven or jlink are unavailable.
Usage
browser4-cli <command> [args] [options]
browser4-cli -s=<session> <command> [args] [options]Global options
| Flag | Description |
|--------------------|------------------------------------------------|
| --help [command] | Print help (optionally for a specific command) |
| --version | Print version |
| -s=<name> | Named session label |
| --server=<url> | Override Browser4 server URL |
| --json | Emit machine-parseable JSON to stdout |
| -q, --quiet | Suppress normal output, only show errors |
--json switches every command's stdout from human-readable text to a
single-line JSON envelope ({"status":"ok","command":"<name>","output":{...}}).
Omit --json for the default human-readable output.
-q / --quiet suppresses all normal stdout output. Errors and
progress messages still go to stderr. Combine with --json for
silent-on-success scripting: browser4-cli --json -q open.
Sessions are persisted independently per name. Omitting -s uses the
default session (~/.browser4/cli-state.json). With -s=<name>, a
separate state file is stored under ~/.browser4/sessions/<name>.json.
open without -s reuses the default session if one exists; with
-s=<name> it switches to or creates the named session.
Commands
The tables below mirror the commands surfaced by the global browser4-cli help overview.
Core
| Command | Description |
|---|---|
| open [url] | Open or switch to a browser session. Supports --headed (force visible window) and --headless (force headless). |
| close | Close the active session |
| goto <url> | Navigate to a URL, auto-opening or refreshing the session if needed |
| click <ref> [button] | Click an element |
| dblclick <ref> [button] | Double-click an element |
| type <text> [ref] | Type text into the focused element or an optional target element |
| fill <ref> <text> | Fill text into an editable element |
| hover <ref> | Hover over an element |
| select <ref> <val> | Select an option in a dropdown |
| upload <ref> <file> | Upload a file |
| check <ref> | Check a checkbox or radio button |
| uncheck <ref> | Uncheck a checkbox or radio button |
| drag <startRef> <endRef> | Drag and drop between two elements |
| snapshot | Capture accessibility snapshot |
| eval <expression> [ref] | Evaluate JavaScript on the page or a target element |
| dialog-accept [prompt] | Accept a dialog |
| dialog-dismiss | Dismiss a dialog |
| resize <w> <h> | Resize the browser window |
| delete-data | Delete session data |
Navigation
| Command | Description |
|---|---|
| go-back | Go back to the previous page |
| go-forward | Go forward to the next page |
| reload | Reload the current page |
Keyboard
| Command | Description |
|---|---|
| press <key> [ref] | Press a key on the focused element or an optional target element |
| keydown <key> | Press and hold a key |
| keyup <key> | Release a key |
Mouse
| Command | Description |
|---|---|
| mousemove <x> <y> | Move mouse to coordinates |
| mousedown [button] | Press mouse button |
| mouseup [button] | Release mouse button |
| mousewheel <dx> <dy> | Scroll the mouse wheel |
Screenshots
| Command | Description |
|---|---|
| screenshot [ref] | Take a screenshot (optionally of a specific element) |
Tabs
| Command | Description |
|---|---|
| tab-list | List all tabs |
| tab-new [url] | Create a new tab |
| tab-close [index] | Close a browser tab by zero-based index |
| tab-select <index> | Select a browser tab by zero-based index |
Use tab-list first to find the zero-based tab index you want to select or close.
Browser storage
| Command | Description |
|---|---|
| state-save <path> | Save cookies and localStorage to a JSON file |
| state-load <path> | Restore cookies and localStorage from a saved state file |
| cookie-list | List all cookies (optionally filtered by --domain / --path) |
| cookie-get <name> | Get a cookie by name |
| cookie-set <name> <value> | Set a cookie (optional --path, --domain) |
| cookie-delete <name> | Delete a cookie by name |
| cookie-clear | Clear all cookies for the current page |
| localstorage-list | List all localStorage entries |
| localstorage-get <key> | Get a localStorage value by key |
| localstorage-set <key> <value> | Set a localStorage key-value pair |
| localstorage-delete <key> | Delete a localStorage key |
| localstorage-clear | Clear all localStorage entries |
| sessionstorage-list | List all sessionStorage entries |
| sessionstorage-get <key> | Get a sessionStorage value by key |
| sessionstorage-set <key> <value> | Set a sessionStorage key-value pair |
| sessionstorage-delete <key> | Delete a sessionStorage key |
| sessionstorage-clear | Clear all sessionStorage entries |
Browser sessions
| Command | Description |
|---|---|
| list | List browser sessions |
| close-all | Close all browser sessions without stopping Browser4.jar / the Browser4 backend |
| kill-all | Forcefully stop Browser4.jar / the Browser4 backend and kill Browser4 browser processes |
Use close-all for session cleanup when you want to keep the current Browser4 service running. Use kill-all only when you explicitly want to stop the backend and clean up tracked Browser4 processes.
Server management
| Command | Description |
|---|---|
| install | Download the Browser4 runtime bundle. Supports --tag=<version> to pin a release and --force to reinstall even when already present. |
| upgrade | Upgrade the Browser4 runtime bundle to the latest version or a specified --tag |
| stop | Kill the Browser4 backend after closing all sessions |
| status | Check whether the Browser4 backend is reachable and healthy |
install and upgrade both manage the Browser4 runtime bundle — a self-contained
distribution that includes all dependency jars, a minimal jlink-built JRE, and
platform launcher scripts. Neither requires cargo or a Rust toolchain; the runtime
is a Java application downloaded from GitHub Releases.
Use --tag=<version> to pin a specific release (e.g. --tag=v4.9.3). Use --force
to reinstall even when the same version is already present.
When a local Browser4 checkout is detected with the browser4-bundle module present,
install and upgrade auto-build the runtime bundle from source (via Maven) instead
of downloading.
Advanced commands
These commands are intentionally omitted from the global browser4-cli help overview.
Query browser4-cli help <command> for the exact syntax when you need them.
| Command | Description |
|---|---|
| batch [command...] | Execute multiple commands in one invocation. Only DOM operations are supported (Core, Navigation, Keyboard, Mouse, Export, Tabs categories). Commands like open, close, list, agent run, etc. are not allowed in batch mode. |
| console [min-level] | List console messages |
| extract <instruction> | Extract structured data from the current page |
| summarize [instruction] | Summarize page content using AI |
| agent run <task> | Run an autonomous agent task |
| agent status <id> | Check the status of a running agent task |
| agent result <id> | Get the result of a completed agent task |
| swarm create | Create a swarm scrape session with parallel browser contexts |
| swarm submit [url] | Submit URL(s) or raw X-SQL payloads as scrape jobs |
| swarm query <url> | Run an X-SQL query against a loaded webpage |
| swarm status <id> | Check the status of a scrape or query job |
| swarm result <id> | Get the result of a completed job |
Agent task workflow (agent <subcommand>)
The agent-* commands wrap the backend command agent's asynchronous task API.
They are useful when you want Browser4 to plan and execute a natural-language
task in the background instead of issuing one low-level browser action at a
time.
Like other advanced commands, they are intentionally omitted from the global
browser4-cli help overview. Query browser4-cli help agent run (or
agent status / agent result) when you need the exact syntax.
Use the spaced agent <subcommand> form:
browser4-cli agent run "Open browser4.io and summarize the hero section"
browser4-cli agent status agent-task-1
browser4-cli agent result agent-task-1Command lifecycle
| Step | Command | What it does |
|---|---|---|
| 1 | agent run <task> | Submits an asynchronous natural-language task through command_run and prints the returned task ID |
| 2 | agent status <id> | Fetches the latest task status payload through command_status |
| 3 | agent result <id> | Fetches the completed task result payload through command_result |
Notes
agent runis asynchronous: it returns immediately after the backend accepts the task and prints a follow-upagent statuscommand with the generated task ID.agent statusprints the backend status payload as-is. In practice this is a JSON object that commonly includes fields such asid,status,statusCode,processState,message,agentState,agentHistory, andcommandResult.agent resultprints the backend result payload as-is. Depending on the task, it may be plain text or structured JSON.- These commands are task-ID based and do not require an active CLI browser
session slot. The global
-s=<name>option is therefore usually not relevant foragent-*follow-up calls. agentsubcommands are not supported insidebatchmode.agent runperforms a short post-submit status probe so obvious missing-LLM configuration failures can be surfaced immediately instead of leaving you with a task ID that will never succeed.
Use cases
1. Submit an autonomous agent task
browser4-cli agent run "Open browser4.io and summarize the hero section"Typical output:
Task submitted: agent-task-1
Use 'browser4-cli agent status agent-task-1' to check progress.2. Poll task progress
browser4-cli agent status agent-task-1Example status payload:
{"id":"agent-task-1","status":"RUNNING"}On a real Browser4 backend the payload can be richer and may include lifecycle
details such as processState, agent history snapshots, or an embedded partial
commandResult.
3. Read the final result
browser4-cli agent result agent-task-1If the backend returns a structured CommandResult, expect fields such as
summary, pageSummary, fields, links, or xsqlResultSet.
Swarm scrape workflow (swarm <subcommand>)
The swarm subcommands support a swarm scrape workflow where one CLI session
coordinates multiple browser contexts in the Browser4 backend.
Command overview
| Command | Purpose | Backend endpoint |
|---|---|---|
| swarm create | Create a swarm scrape session | POST /api/swarm |
| swarm submit <url> | Scrape URLs or submit raw X-SQL | POST /api/swarm/submit |
| swarm query <url> | Run X-SQL queries against loaded pages | POST /api/swarm/query |
| swarm status <id> | Poll job status | GET /api/swarm/{id}/status |
| swarm result <id> | Fetch completed job result | GET /api/swarm/{id}/result |
URL scraping with swarm submit
# create a session
browser4-cli swarm create \
--profile-mode=TEMPORARY \
--max-open-tabs=12 \
--max-browser-contexts=3 \
--display-mode=HEADLESS
# submit URLs as scrape jobs
browser4-cli swarm submit https://example.com/direct \
--seed-file=./swarm-seeds.txt \
--deadline=2026-03-30T00:00:00Z \
--expires=1d \
--refresh --store-content
# poll and fetch the result
browser4-cli swarm status scrape-task-4
browser4-cli swarm result scrape-task-4X-SQL queries with swarm query
Run structured X-SQL queries against loaded webpages to extract data.
# Inline query:
browser4-cli swarm query "https://www.amazon.com/dp/B08PP5MSVB" --sql "
SELECT
dom_base_uri(dom) AS url,
dom_first_text(dom, '#productTitle') AS title,
dom_first_slim_html(dom, 'img:expr(width > 400)') AS img
FROM load_and_select(@url, 'body');
"
# From a file:
browser4-cli swarm query "https://www.amazon.com/dp/B08PP5MSVB" --sql @query.sql
# With seed file and load options:
browser4-cli swarm query --sql @query.sql --seed-file=./urls.txt --refreshNotes
swarm createaccepts backend capability hints:--profile-mode,--max-open-tabs,--max-browser-contexts,--display-mode.swarm submitandswarm queryboth accept a positional URL,--seed-file, or both. Seed files use one URL per line;#comments and blank lines are ignored.- Both commands support load-option flags:
--deadline,--expires,--refresh,--parse,--store-content. swarm query --sqlis required;swarm submit --sqlalso works as a convenience. Use@urlin the X-SQL template; it is replaced with the target URL server-side.- Prefix the
--sqlvalue with@to read from a file (e.g.--sql @query.sql). - All commands return a task ID; use
swarm status/swarm resultto track progress.
Element References
The snapshot command returns an accessibility tree where every interactive
node is labeled with a short identifier such as e15. Pass this identifier
directly to commands like click, type, or press. You can also use plain
CSS selectors (e.g. .my-button, #search-input).
State Persistence
browser4-cli persists CLI state between invocations under ~/.browser4 by
default. Override the root directory with the BROWSER4_CLI_STATE_DIR
environment variable.
- Default session state:
~/.browser4/cli-state.json - Named session state (
-s=<name>):~/.browser4/sessions/<name>.json
Each state file stores the current Browser4 server URL plus session-scoped fields such as:
sessionId— active Browser4 session IDbaseUrl— Browser4 backend URL used by the CLIactiveSelector— last selector tracked for keyboard restore flowslastMousePosition— last pointer coordinates tracked for mouse restore flows
Session state transitions
| Command | Behavior |
|---|---|
| open | Creates a new session, or reuses an existing active one. Stale sessions are automatically refreshed. |
| open -s=<name> | Same as open but scoped to a named session slot. |
| goto <url> | Reuses the current session if active; otherwise opens a fresh one before navigating. |
| close | Closes the current session (no-op if none active). |
| close-all / kill-all / stop | Clears all persisted session state. |
The list command shows each session's status: Active (backend confirms),
Stale (backend has stopped it), or Unknown (backend unreachable).
Runtime Temp Files
browser4-cli keeps ephemeral runtime artifacts under the system temp directory:
- Windows:
%TEMP%\browser4\browser4-cli - Linux/macOS:
${TMPDIR:-/tmp}/browser4/browser4-cli
This temp subtree contains items such as:
- startup logs for auto-started Browser4 servers
- staged Maven wrapper launchers
- Rust test scratch directories used by
browser4-clitests
Persistent CLI state remains under ~/.browser4 by default. The Browser4 runtime
bundle (JRE, JARs, launchers) is stored separately in a platform-conventional
data directory so that clearing CLI session state does not require re-downloading
the ~200 MB runtime:
- Linux:
~/.local/share/browser4/runtime/<version>/ - macOS:
~/Library/Application Support/browser4/runtime/<version>/ - Windows:
%APPDATA%/browser4/runtime/<version>/
The current.tag file in the runtime/ directory records the active version.
Override the runtime data root with the BROWSER4_RUNTIME_DIR environment variable.
Downloaded archives are cached under the platform cache directory
(~/.cache/browser4/downloads/ on Linux, ~/Library/Caches/browser4/downloads/
on macOS, %LOCALAPPDATA%/browser4/downloads/ on Windows).
Snapshots
After each command that modifies browser state, the CLI automatically:
- Retrieves the current page URL and title
- Captures an accessibility snapshot
- Saves the snapshot to
.browser4-cli/snapshot/page-<timestamp>.yml - Prints the snapshot path in Markdown link format
Examples
# Open a new browser window (defaults to headed)
browser4-cli open
# Open in headed or headless mode
browser4-cli open --headed https://browser4.io
browser4-cli open --headless https://browser4.io
# Navigate to a page — auto-opens a session if none is active
browser4-cli goto https://playwright.dev
# Inspect the page — note the eN labels on interactive nodes
browser4-cli snapshot
# Interact using refs from the snapshot
browser4-cli click e15
browser4-cli type "Hello World" e15
browser4-cli press Enter e15
browser4-cli eval "document.title"
browser4-cli eval "element => element.textContent" e15
browser4-cli keydown Shift
browser4-cli mousemove 150 300
browser4-cli mousewheel 0 100
browser4-cli keyup Shift
# Take a screenshot and save it to disk
browser4-cli screenshot
# Inspect tab indices before switching tabs
browser4-cli tab-list
browser4-cli tab-select 1
browser4-cli tab-close 1
# Use a custom server URL
browser4-cli open --server http://localhost:9090
# Advanced: execute multiple commands in one process (batch mode)
# Batch mode only supports DOM operations. You must run `open` separately first.
browser4-cli open
browser4-cli batch "goto https://playwright.dev" "snapshot"
# Advanced: stop on the first batch failure
browser4-cli batch --bail "goto https://playwright.dev" "click e1" "screenshot"
# Advanced: batch mode for form filling (recommended use case)
browser4-cli batch "fill e1 'John Doe'" "fill e2 '[email protected]'" "click e3"
# Advanced: pipe batch commands as JSON via stdin
echo '[
["goto", "https://example.com/form-filling"],
["click", "#reset-btn"],
["fill", "#first-name", "Bob"],
["fill", "#last-name", "Smith"],
["fill", "#email", "[email protected]"],
["select", "#country", "uk"],
["check", "#agree-terms"],
["click", "#submit-btn"]
]' | browser4-cli batch --json
# Close the session when done
browser4-cli close
# Close all sessions but keep the current Browser4 backend running
browser4-cli close-all
# Explicitly stop the Browser4 backend and clean up tracked Browser4 processes
browser4-cli kill-allArchitecture
The Rust CLI is structured as follows:
| Module | Purpose |
|---|---|
| main.rs | Entry point, command dispatch, session management |
| args.rs | CLI argument parsing (global flags, positional args, options) |
| commands.rs | Command definitions mapping to MCP tool names and parameters |
| http.rs | HTTP client for calling /mcp/call-tool |
| state.rs | Persistent state management for the default state file and named session files under ~/.browser4/ |
| daemon.rs | Local server auto-start (prefer Maven from repo root, fall back to jar) and health checking |
| managed_processes.rs | Registry for browser4 server processes |
| snapshot.rs | Snapshot and screenshot file helpers |
| help.rs | Help text generation |
Testing
## Run all tests (unit + end-to-end):
cargo test
## Run only the end-to-end tests and print their output:
cargo test --test e2e -- --nocapture
## Run a specific end-to-end test scenario:
cargo test --test e2e -- --nocapture --scenario=test_e2e_batch_form_submissionPublishing the CLI package
For maintainers, the CLI package now uses an npm version guard before publish.
The GitHub release workflow publishes the npm package via npm trusted publishing
(GitHub Actions OIDC) instead of NODE_AUTH_TOKEN. This avoids CI failures caused
by npm one-time-password challenges (EOTP).
- Local release entrypoint:
npm run release - Direct guarded publish entrypoint:
npm run publish:if-needed - GitHub release workflow: re-checks npm immediately before the publish step
If the local version in cli/package.json already matches the version currently
published on npm, the publish step is skipped automatically.
Examples:
# Check whether npm publish is needed
node scripts/check-npm-publish-needed.js --json
# Publish only when the local version differs from npm
npm run publish:if-needed
# Standard maintainer release command (also guarded)
npm run releaseFor local testing, you can override the detected remote version:
BROWSER4_CLI_NPM_REMOTE_VERSION=0.1.7 node scripts/check-npm-publish-needed.js --json
BROWSER4_CLI_NPM_REMOTE_VERSION=0.1.7 node scripts/publish-if-needed.js --dry-runLicense
Apache-2.0
