open-browse
v0.1.0
Published
TypeScript-native browser automation tool for AI agents
Maintainers
Readme
open-browse
TypeScript-native browser automation CLI — optimized for AI agents
32 commands for browser control via CLI. Every command returns structured JSON. Designed for AI agent workflows with zero parsing ambiguity.
Install
npm install -g open-browsePlaywright chromium is installed automatically via postinstall.
Quick Start
# Launch browser (auto-sets as current session)
browser-driver open --url https://app.com --name my-session
# See what's on the page
browser-driver state
# Interact
browser-driver click "#login-btn"
browser-driver fill "#email" "[email protected]"
# Take screenshot (base64 to stdout)
browser-driver screenshot
# Done
browser-driver closeCommand Reference
Session (5)
| Command | Description |
|---------|-------------|
| open --url <url> --name <name> [--use] | Launch browser, create session, auto-set as current |
| close | Close session and browser |
| sessions | List all active sessions |
| use <name> | Set current working session |
| current | Show current session |
Navigation (4)
| Command | Description |
|---------|-------------|
| goto <url> | Navigate to URL |
| back / forward / reload | History navigation |
Inspection (4)
| Command | Description |
|---------|-------------|
| state | Page URL, title, and interactive elements |
| dom <selector> | Get outer HTML |
| inspect <selector> | Element details (tag, bbox, styles) |
| source | Full page HTML source |
Interaction (8)
| Command | Description |
|---------|-------------|
| click <selector> | Click an element |
| fill <selector> <text> | Fill input field |
| select <selector> <value> | Select dropdown option |
| hover <selector> | Hover over element |
| scroll-to <selector> | Scroll element into viewport |
| scroll-by --y <px> | Scroll by pixel offset |
| screenshot [--out <file>] | Screenshot (base64 stdout or file) |
| upload <selector> <path> | Upload file to input |
Wait (3)
| Command | Description |
|---------|-------------|
| wait <selector> [--timeout <ms>] | Wait for element |
| wait-text <text> | Wait for text to appear |
| wait-url <pattern> | Wait for URL match |
Get (6)
| Command | Description |
|---------|-------------|
| get-title / get-url | Page title or URL |
| get-text <selector> | Element text content |
| get-value <selector> | Input/select value |
| get-html <selector> | Element HTML |
| get-attributes <selector> | All attributes |
Cookies (5)
| Command | Description |
|---------|-------------|
| cookies-get / cookies-clear | Get or clear all cookies |
| cookies-set <name> <value> | Set cookie |
| cookies-export / cookies-import <file> | Export/import JSON |
Snapshot & Diff (5)
| Command | Description |
|---------|-------------|
| snapshot <name> [--scope <sel>] | Save DOM + styles snapshot |
| diff dom/elements/style/attrs <a> <b> | Compare snapshots |
Script System (3)
| Command | Description |
|---------|-------------|
| run <script> [--args <json>] | Execute a script |
| list-scripts | List available scripts |
| describe <script> [--json] | Show script docs |
Agent Workflow Example
# Setup
browser-driver open --url https://app.com/login --name task1
# Observe → Act → Wait → Verify
browser-driver state
browser-driver fill "#email" "[email protected]"
browser-driver fill "#password" "secret123"
browser-driver click "#login-btn"
browser-driver wait-url "**/dashboard" --timeout 10000
browser-driver get-title
# → { "title": "Dashboard - App" }Key Features
- JSON everywhere — stdout for success, stderr for errors, no parsing ambiguity
- Session persistence — daemon keeps browser alive between commands (~50ms per command)
statecommand — returns all interactive elements indexed, the agent's primary viewsnapshot+diff— token-efficient change detection (125x fewer tokens than raw HTML)- Self-discovery —
list-scripts+describe --jsonfor runtime command discovery
Architecture
CLI (yargs) → HTTP → Daemon (Node.js) → Browser (Playwright)
↓
Session MapLicense
MIT
