browser-driver-cli
v0.6.0
Published
TypeScript-native browser automation tool for AI agents
Maintainers
Readme
browser-driver-cli
TypeScript-native browser automation CLI — optimized for AI agents
40+ commands organized into subcommand groups for browser control via CLI. Every command returns structured JSON. Designed for AI agent workflows with zero parsing ambiguity.
Install
npm install -g browser-driver-cliPlaywright chromium is installed automatically via postinstall.
Quick Start
# Launch browser (auto-sets as current session)
browser-driver open https://app.com --name my-session
# See what's on the page
browser-driver state
# Interact
browser-driver click "#login-btn"
browser-driver fill "#email" "[email protected]"
# Take screenshot (base64 to stdout)
browser-driver screenshot
# Done
browser-driver closeCommand Reference
Session
| Command | Description |
|---------|-------------|
| open <url> [--name <name>] [--from <browser>] | Launch browser, create session |
| close | Close session and browser |
| sessions | List all active sessions |
| use <name> | Set current working session |
| current | Show current session |
Navigation — browser-driver navigation <cmd>
| Command | Description |
|---------|-------------|
| navigation goto <url> | Navigate to URL |
| navigation back | Navigate back |
| navigation forward | Navigate forward |
| navigation reload | Reload page |
Aliases:
goto,back,forward,reloadalso work as top-level commands.
Inspection — browser-driver inspection <cmd>
| Command | Description |
|---------|-------------|
| inspection state | Page URL, title, and interactive elements |
| inspection dom <selector> | Get DOM tree |
| inspection inspect <selector> | Element details (tag, bbox, styles) |
| inspection source | Full page HTML source |
| inspection save-page <output> | Save page as self-contained HTML |
Aliases:
state,dom,inspect,source,save-pagealso work as top-level commands.
Interaction — browser-driver interaction <cmd>
| Command | Description |
|---------|-------------|
| interaction click <selector> | Click an element |
| interaction fill <selector> <text> | Fill input field |
| interaction select <selector> <value> | Select dropdown option |
| interaction hover <selector> | Hover over element |
| interaction scroll-to <selector> | Scroll element into viewport |
| interaction scroll-by --y <px> | Scroll by pixel offset |
| interaction screenshot [--out <file>] | Screenshot (base64 stdout or file) |
| interaction upload <selector> <path> | Upload file to input |
Aliases:
click,fill,select,hover,scroll-to,scroll-by,screenshot,uploadalso work as top-level commands.
Wait — browser-driver wait <cmd>
| Command | Description |
|---------|-------------|
| wait selector <selector> [--timeout <ms>] | Wait for element |
| wait text <text> | Wait for text to appear |
| wait url <pattern> | Wait for URL match |
Get — browser-driver get <cmd>
| Command | Description |
|---------|-------------|
| get title / get url | Page title or URL |
| get text <selector> | Element text content |
| get value <selector> | Input/select value |
| get html <selector> | Element HTML |
| get attributes <selector> | All attributes |
Cookies — browser-driver cookies <cmd>
| Command | Description |
|---------|-------------|
| cookies get / cookies clear | Get or clear all cookies |
| cookies set <name> <value> | Set cookie |
| cookies export <file> / cookies import <file> | Export/import JSON |
Tab — browser-driver tab <cmd>
| Command | Description |
|---------|-------------|
| tab new <url> [--name <name>] | Open new tab |
| tab list | List all tabs |
| tab switch <name> | Switch to tab |
| tab close <name> | Close a tab |
Snapshot — browser-driver snapshot <cmd>
| Command | Description |
|---------|-------------|
| snapshot take <name> [--scope <sel>] | Save DOM + styles snapshot |
| snapshot diff <a> <b> [--mode <mode>] | Compare snapshots (dom/elements/style/attrs) |
Script — browser-driver script <cmd>
| Command | Description |
|---------|-------------|
| script new --name <n> [--scope user\|project] | Generate script scaffold (script + test + fixture) |
| script list | List all scripts grouped by source |
| script doctor [name] | Diagnose script health |
| script where <name> | Show script file path |
| script edit <name> | Open in $EDITOR (core scripts → use fork) |
| script fork <name> [--scope user\|project] | Copy core script for customization |
| script reload | Rescan user script directories |
Utility
| Command | Description |
|---------|-------------|
| run <script> [--args <json>] | Execute a script by name or path |
| describe <name> | Show script documentation |
| list-scripts | List available scripts (alias for script list) |
| config | Show resolved configuration |
| network | Query network request/response log |
| errors | Query JavaScript errors |
| console | Query console messages |
User Script Development (SDD/TDD)
Create custom scripts with built-in scaffolding:
# Generate scaffold with Agent Guide checklist
browser-driver script new --name gmail-spam --scope user
# Generated files:
# ~/.browser-driver/scripts/gmail-spam.ts ← script + cli metadata
# ~/.browser-driver/scripts/gmail-spam.test.ts ← bun:test with Tier 1/2
# ~/.browser-driver/scripts/gmail-spam.fixture.html ← placeholder for save-page
# Save real page as fixture
browser-driver save-page /path/to/gmail-spam.fixture.html
# Run tests
bun test gmail-spam.test.ts
# Diagnose script health
browser-driver script doctor gmail-spamTesting utilities available from the package:
import { serveFixture, runScript } from 'browser-driver/testing';Agent Workflow Example
# Setup
browser-driver open https://app.com/login --name task1
# Observe → Act → Wait → Verify
browser-driver state
browser-driver fill "#email" "[email protected]"
browser-driver fill "#password" "secret123"
browser-driver click "#login-btn"
browser-driver wait url "**/dashboard" --timeout 10000
browser-driver get title
# → { "title": "Dashboard - App" }Key Features
- JSON everywhere — stdout for success, stderr for errors, no parsing ambiguity
- Session persistence — daemon keeps browser alive between commands (~50ms per command)
statecommand — returns all interactive elements indexed, the agent's primary viewsnapshot+diff— token-efficient change detection (125x fewer tokens than raw HTML)- Subcommand groups — related commands organized by category, old flat names still work as aliases
- Script scaffolding —
script newgenerates SDD/TDD-ready templates with Agent Guide checklists - Self-discovery —
list-scripts+describefor runtime command discovery
Architecture
CLI (yargs) → HTTP → Daemon (Node.js) → Browser (Playwright)
↓
Session MapLicense
MIT
