agent-tbrowser
v0.5.0
Published
Headless browser automation CLI for AI agents
Maintainers
Readme
agent-tbrowser
Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.
Installation
npm (recommended)
npm install -g agent-tbrowser
agent-tbrowser install # Download ChromiumFrom Source
git clone https://github.com/kokouaserge/custom-agent-browser
cd agent-tbrowser
pnpm install
pnpm build
pnpm build:native # Requires Rust (https://rustup.rs)
pnpm link --global # Makes agent-tbrowser available globally
agent-tbrowser installLinux Dependencies
On Linux, install system dependencies:
agent-tbrowser install --with-deps
# or manually: npx playwright install-deps chromiumQuick Start
agent-tbrowser open example.com
agent-tbrowser snapshot # Get accessibility tree with refs
agent-tbrowser click @e2 # Click by ref from snapshot
agent-tbrowser fill @e3 "[email protected]" # Fill by ref
agent-tbrowser get text @e1 # Get text by ref
agent-tbrowser screenshot page.png
agent-tbrowser closeTraditional Selectors (also supported)
agent-tbrowser click "#submit"
agent-tbrowser fill "#email" "[email protected]"
agent-tbrowser find role button click --name "Submit"Commands
Core Commands
agent-tbrowser open <url> # Navigate to URL (aliases: goto, navigate)
agent-tbrowser click <sel> # Click element
agent-tbrowser dblclick <sel> # Double-click element
agent-tbrowser focus <sel> # Focus element
agent-tbrowser type <sel> <text> # Type into element
agent-tbrowser fill <sel> <text> # Clear and fill
agent-tbrowser press <key> # Press key (Enter, Tab, Control+a) (alias: key)
agent-tbrowser keydown <key> # Hold key down
agent-tbrowser keyup <key> # Release key
agent-tbrowser hover <sel> # Hover element
agent-tbrowser select <sel> <val> # Select dropdown option
agent-tbrowser check <sel> # Check checkbox
agent-tbrowser uncheck <sel> # Uncheck checkbox
agent-tbrowser scroll <dir> [px] # Scroll (up/down/left/right)
agent-tbrowser scrollintoview <sel> # Scroll element into view (alias: scrollinto)
agent-tbrowser drag <src> <tgt> # Drag and drop
agent-tbrowser upload <sel> <files> # Upload files
agent-tbrowser screenshot [path] # Take screenshot (--full for full page)
agent-tbrowser pdf <path> # Save as PDF
agent-tbrowser snapshot # Accessibility tree with refs (best for AI)
agent-tbrowser eval <js> # Run JavaScript
agent-tbrowser close # Close browser (aliases: quit, exit)Get Info
agent-tbrowser get text <sel> # Get text content
agent-tbrowser get html <sel> # Get innerHTML
agent-tbrowser get value <sel> # Get input value
agent-tbrowser get attr <sel> <attr> # Get attribute
agent-tbrowser get title # Get page title
agent-tbrowser get url # Get current URL
agent-tbrowser get count <sel> # Count matching elements
agent-tbrowser get box <sel> # Get bounding boxCheck State
agent-tbrowser is visible <sel> # Check if visible
agent-tbrowser is enabled <sel> # Check if enabled
agent-tbrowser is checked <sel> # Check if checkedFind Elements (Semantic Locators)
agent-tbrowser find role <role> <action> [value] # By ARIA role
agent-tbrowser find text <text> <action> # By text content
agent-tbrowser find label <label> <action> [value] # By label
agent-tbrowser find placeholder <ph> <action> [value] # By placeholder
agent-tbrowser find alt <text> <action> # By alt text
agent-tbrowser find title <text> <action> # By title attr
agent-tbrowser find testid <id> <action> [value] # By data-testid
agent-tbrowser find first <sel> <action> [value] # First match
agent-tbrowser find last <sel> <action> [value] # Last match
agent-tbrowser find nth <n> <sel> <action> [value] # Nth matchActions: click, fill, check, hover, text
Examples:
agent-tbrowser find role button click --name "Submit"
agent-tbrowser find text "Sign In" click
agent-tbrowser find label "Email" fill "[email protected]"
agent-tbrowser find first ".item" click
agent-tbrowser find nth 2 "a" textWait
agent-tbrowser wait <selector> # Wait for element to be visible
agent-tbrowser wait <ms> # Wait for time (milliseconds)
agent-tbrowser wait --text "Welcome" # Wait for text to appear
agent-tbrowser wait --url "**/dash" # Wait for URL pattern
agent-tbrowser wait --load networkidle # Wait for load state
agent-tbrowser wait --fn "window.ready === true" # Wait for JS conditionLoad states: load, domcontentloaded, networkidle
Mouse Control
agent-tbrowser mouse move <x> <y> # Move mouse
agent-tbrowser mouse down [button] # Press button (left/right/middle)
agent-tbrowser mouse up [button] # Release button
agent-tbrowser mouse wheel <dy> [dx] # Scroll wheelBrowser Settings
agent-tbrowser set viewport <w> <h> # Set viewport size
agent-tbrowser set device <name> # Emulate device ("iPhone 14")
agent-tbrowser set geo <lat> <lng> # Set geolocation
agent-tbrowser set offline [on|off] # Toggle offline mode
agent-tbrowser set headers <json> # Extra HTTP headers
agent-tbrowser set credentials <u> <p> # HTTP basic auth
agent-tbrowser set media [dark|light] # Emulate color schemeCookies & Storage
agent-tbrowser cookies # Get all cookies
agent-tbrowser cookies set <name> <val> # Set cookie
agent-tbrowser cookies clear # Clear cookies
agent-tbrowser storage local # Get all localStorage
agent-tbrowser storage local <key> # Get specific key
agent-tbrowser storage local set <k> <v> # Set value
agent-tbrowser storage local clear # Clear all
agent-tbrowser storage session # Same for sessionStorageNetwork
agent-tbrowser network route <url> # Intercept requests
agent-tbrowser network route <url> --abort # Block requests
agent-tbrowser network route <url> --body <json> # Mock response
agent-tbrowser network unroute [url] # Remove routes
agent-tbrowser network requests # View tracked requests
agent-tbrowser network requests --filter api # Filter requestsTabs & Windows
agent-tbrowser tab # List tabs
agent-tbrowser tab new [url] # New tab (optionally with URL)
agent-tbrowser tab <n> # Switch to tab n
agent-tbrowser tab close [n] # Close tab
agent-tbrowser window new # New windowFrames
agent-tbrowser frame <sel> # Switch to iframe
agent-tbrowser frame main # Back to main frameDialogs
agent-tbrowser dialog accept [text] # Accept (with optional prompt text)
agent-tbrowser dialog dismiss # DismissDebug
agent-tbrowser trace start [path] # Start recording trace
agent-tbrowser trace stop [path] # Stop and save trace
agent-tbrowser console # View console messages
agent-tbrowser console --clear # Clear console
agent-tbrowser errors # View page errors
agent-tbrowser errors --clear # Clear errors
agent-tbrowser highlight <sel> # Highlight element
agent-tbrowser state save <path> # Save auth state
agent-tbrowser state load <path> # Load auth stateNavigation
agent-tbrowser back # Go back
agent-tbrowser forward # Go forward
agent-tbrowser reload # Reload pageSetup
agent-tbrowser install # Download Chromium browser
agent-tbrowser install --with-deps # Also install system deps (Linux)Sessions
Run multiple isolated browser instances:
# Different sessions
agent-tbrowser --session agent1 open site-a.com
agent-tbrowser --session agent2 open site-b.com
# Or via environment variable
AGENT_BROWSER_SESSION=agent1 agent-tbrowser click "#btn"
# List active sessions
agent-tbrowser session list
# Output:
# Active sessions:
# -> default
# agent1
# Show current session
agent-tbrowser sessionEach session has its own:
- Browser instance
- Cookies and storage
- Navigation history
- Authentication state
Snapshot Options
The snapshot command supports filtering to reduce output size:
agent-tbrowser snapshot # Full accessibility tree
agent-tbrowser snapshot -i # Interactive elements only (buttons, inputs, links)
agent-tbrowser snapshot -c # Compact (remove empty structural elements)
agent-tbrowser snapshot -d 3 # Limit depth to 3 levels
agent-tbrowser snapshot -s "#main" # Scope to CSS selector
agent-tbrowser snapshot -i -c -d 5 # Combine options| Option | Description |
|--------|-------------|
| -i, --interactive | Only show interactive elements (buttons, links, inputs) |
| -c, --compact | Remove empty structural elements |
| -d, --depth <n> | Limit tree depth |
| -s, --selector <sel> | Scope to CSS selector |
Options
| Option | Description |
|--------|-------------|
| --session <name> | Use isolated session (or AGENT_BROWSER_SESSION env) |
| --headers <json> | Set HTTP headers scoped to the URL's origin |
| --executable-path <path> | Custom browser executable (or AGENT_BROWSER_EXECUTABLE_PATH env) |
| --json | JSON output (for agents) |
| --full, -f | Full page screenshot |
| --name, -n | Locator name filter |
| --exact | Exact text match |
| --headed | Show browser window (not headless) |
| --cdp <port> | Connect via Chrome DevTools Protocol |
| --debug | Debug output |
Selectors
Refs (Recommended for AI)
Refs provide deterministic element selection from snapshots:
# 1. Get snapshot with refs
agent-tbrowser snapshot
# Output:
# - heading "Example Domain" [ref=e1] [level=1]
# - button "Submit" [ref=e2]
# - textbox "Email" [ref=e3]
# - link "Learn more" [ref=e4]
# 2. Use refs to interact
agent-tbrowser click @e2 # Click the button
agent-tbrowser fill @e3 "[email protected]" # Fill the textbox
agent-tbrowser get text @e1 # Get heading text
agent-tbrowser hover @e4 # Hover the linkWhy use refs?
- Deterministic: Ref points to exact element from snapshot
- Fast: No DOM re-query needed
- AI-friendly: Snapshot + ref workflow is optimal for LLMs
CSS Selectors
agent-tbrowser click "#id"
agent-tbrowser click ".class"
agent-tbrowser click "div > button"Text & XPath
agent-tbrowser click "text=Submit"
agent-tbrowser click "xpath=//button"Semantic Locators
agent-tbrowser find role button click --name "Submit"
agent-tbrowser find label "Email" fill "[email protected]"Agent Mode
Use --json for machine-readable output:
agent-tbrowser snapshot --json
# Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
agent-tbrowser get text @e1 --json
agent-tbrowser is visible @e2 --jsonOptimal AI Workflow
# 1. Navigate and get snapshot
agent-tbrowser open example.com
agent-tbrowser snapshot -i --json # AI parses tree and refs
# 2. AI identifies target refs from snapshot
# 3. Execute actions using refs
agent-tbrowser click @e2
agent-tbrowser fill @e3 "input text"
# 4. Get new snapshot if page changed
agent-tbrowser snapshot -i --jsonHeaded Mode
Show the browser window for debugging:
agent-tbrowser open example.com --headedThis opens a visible browser window instead of running headless.
Authenticated Sessions
Use --headers to set HTTP headers for a specific origin, enabling authentication without login flows:
# Headers are scoped to api.example.com only
agent-tbrowser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
# Requests to api.example.com include the auth header
agent-tbrowser snapshot -i --json
agent-tbrowser click @e2
# Navigate to another domain - headers are NOT sent (safe!)
agent-tbrowser open other-site.comThis is useful for:
- Skipping login flows - Authenticate via headers instead of UI
- Switching users - Start new sessions with different auth tokens
- API testing - Access protected endpoints directly
- Security - Headers are scoped to the origin, not leaked to other domains
To set headers for multiple origins, use --headers with each open command:
agent-tbrowser open api.example.com --headers '{"Authorization": "Bearer token1"}'
agent-tbrowser open api.acme.com --headers '{"Authorization": "Bearer token2"}'For global headers (all domains), use set headers:
agent-tbrowser set headers '{"X-Custom-Header": "value"}'Custom Browser Executable
Use a custom browser executable instead of the bundled Chromium. This is useful for:
- Serverless deployment: Use lightweight Chromium builds like
@sparticuz/chromium(~50MB vs ~684MB) - System browsers: Use an existing Chrome/Chromium installation
- Custom builds: Use modified browser builds
CLI Usage
# Via flag
agent-tbrowser --executable-path /path/to/chromium open example.com
# Via environment variable
AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-tbrowser open example.comServerless Example (Vercel/AWS Lambda)
import chromium from '@sparticuz/chromium';
import { BrowserManager } from 'agent-tbrowser';
export async function handler() {
const browser = new BrowserManager();
await browser.launch({
executablePath: await chromium.executablePath(),
headless: true,
});
// ... use browser
}SDK (Programmatic API)
The SDK provides natural language browser automation. Describe elements in plain language, and the SDK finds them using LLM + cache.
Quick Start
import { createRunner } from 'agent-tbrowser/sdk';
// Create LLM client (e.g., Anthropic)
const llm = {
complete: async (prompt: string) => {
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [{ role: 'user', content: prompt }],
});
return {
text: response.content[0].type === 'text' ? response.content[0].text : '',
usage: {
input: response.usage.input_tokens,
output: response.usage.output_tokens,
},
};
},
};
// Local browser
const runner = await createRunner({
llm,
local: { headless: false },
});
// Natural language automation
await runner.open('https://example.com');
await runner.click('the login button');
await runner.fill('email input', '[email protected]');
await runner.fill('password field', 'secret123');
await runner.click('submit button');
await runner.close();Cloud Browsers
// TBrowser
const runner = await createRunner({
mode: 'cloud',
cloud: {
provider: 'tbrowser',
apiKey: process.env.TBROWSER_API_KEY,
apiUrl: process.env.TBROWSER_API_URL,
},
llm,
});
// Browserbase
const runner = await createRunner({
mode: 'cloud',
cloud: {
provider: 'browserbase',
apiKey: process.env.BROWSERBASE_API_KEY,
projectId: process.env.BROWSERBASE_PROJECT_ID,
},
llm,
});
// Hyperbrowser
const runner = await createRunner({
mode: 'cloud',
cloud: {
provider: 'hyperbrowser',
apiKey: process.env.HYPERBROWSER_API_KEY,
},
llm,
});
// Get session info
console.log('Session:', runner.getSessionId());
console.log('Provider:', runner.getCloudProvider());SDK Actions
The SDK provides 70+ actions organized by category:
| Category | Actions |
|----------|---------|
| Navigation | open, back, forward, reload |
| Interactions | click, dblclick, fill, type, select, check, uncheck, hover, focus, drag, dropFiles |
| Keyboard | press, keyDown, keyUp |
| Mouse | mouseMove, mouseDown, mouseUp, mouseWheel, scroll, scrollToElement |
| Getters | getText, getHtml, getValue, getAttribute, getTitle, getUrl, getCount, getBoundingBox |
| State | isVisible, isEnabled, isChecked, isHidden, isEditable |
| Wait | waitForElement, waitForTimeout, waitForUrl, waitForPageLoad |
| Tabs/Frames | newTab, switchTab, closeTab, listTabs, switchToFrame, switchToMainFrame |
| Storage | getCookies, setCookie, clearCookies, getLocalStorage, setLocalStorageItem, clearLocalStorage |
| Network | startRequestTracking, getRequests, mockRoute, blockRoute, setExtraHeaders, setOffline |
| Debug | startTracing, stopTracing, highlight, getSnapshot, getDebugInfo, pdf |
| Config | setViewport, setGeolocation, setColorScheme, setTimezone, setLocale |
See SDK Documentation for complete API reference.
CDP Mode
Connect to an existing browser via Chrome DevTools Protocol:
# Connect to Electron app
agent-tbrowser --cdp 9222 snapshot
# Connect to Chrome with remote debugging
# (Start Chrome with: google-chrome --remote-debugging-port=9222)
agent-tbrowser --cdp 9222 open about:blankThis enables control of:
- Electron apps
- Chrome/Chromium instances with remote debugging
- WebView2 applications
- Any browser exposing a CDP endpoint
Streaming (Browser Preview)
Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
Enable Streaming
Set the AGENT_BROWSER_STREAM_PORT environment variable:
AGENT_BROWSER_STREAM_PORT=9223 agent-tbrowser open example.comThis starts a WebSocket server on the specified port that streams the browser viewport and accepts input events.
WebSocket Protocol
Connect to ws://localhost:9223 to receive frames and send input:
Receive frames:
{
"type": "frame",
"data": "<base64-encoded-jpeg>",
"metadata": {
"deviceWidth": 1280,
"deviceHeight": 720,
"pageScaleFactor": 1,
"offsetTop": 0,
"scrollOffsetX": 0,
"scrollOffsetY": 0
}
}Send mouse events:
{
"type": "input_mouse",
"eventType": "mousePressed",
"x": 100,
"y": 200,
"button": "left",
"clickCount": 1
}Send keyboard events:
{
"type": "input_keyboard",
"eventType": "keyDown",
"key": "Enter",
"code": "Enter"
}Send touch events:
{
"type": "input_touch",
"eventType": "touchStart",
"touchPoints": [{ "x": 100, "y": 200 }]
}Programmatic API
For advanced use, control streaming directly via the protocol:
import { BrowserManager } from 'agent-tbrowser';
const browser = new BrowserManager();
await browser.launch({ headless: true });
await browser.navigate('https://example.com');
// Start screencast
await browser.startScreencast((frame) => {
// frame.data is base64-encoded image
// frame.metadata contains viewport info
console.log('Frame received:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
}, {
format: 'jpeg',
quality: 80,
maxWidth: 1280,
maxHeight: 720,
});
// Inject mouse events
await browser.injectMouseEvent({
type: 'mousePressed',
x: 100,
y: 200,
button: 'left',
});
// Inject keyboard events
await browser.injectKeyboardEvent({
type: 'keyDown',
key: 'Enter',
code: 'Enter',
});
// Stop when done
await browser.stopScreencast();Architecture
agent-tbrowser uses a client-daemon architecture:
- Rust CLI (fast native binary) - Parses commands, communicates with daemon
- Node.js Daemon - Manages Playwright browser instance
- Fallback - If native binary unavailable, uses Node.js directly
The daemon starts automatically on first command and persists between commands for fast subsequent operations.
Browser Engine: Uses Chromium by default. The daemon also supports Firefox and WebKit via the Playwright protocol.
Platforms
| Platform | Binary | Fallback | |----------|--------|----------| | macOS ARM64 | Native Rust | Node.js | | macOS x64 | Native Rust | Node.js | | Linux ARM64 | Native Rust | Node.js | | Linux x64 | Native Rust | Node.js | | Windows x64 | Native Rust | Node.js |
Usage with AI Agents
Just ask the agent
The simplest approach - just tell your agent to use it:
Use agent-tbrowser to test the login flow. Run agent-tbrowser --help to see available commands.The --help output is comprehensive and most agents can figure it out from there.
AGENTS.md / CLAUDE.md
For more consistent results, add to your project or global instructions file:
## Browser Automation
Use `agent-tbrowser` for web automation. Run `agent-tbrowser --help` for all commands.
Core workflow:
1. `agent-tbrowser open <url>` - Navigate to page
2. `agent-tbrowser snapshot -i` - Get interactive elements with refs (@e1, @e2)
3. `agent-tbrowser click @e1` / `fill @e2 "text"` - Interact using refs
4. Re-snapshot after page changesClaude Code Skill
For Claude Code, a skill provides richer context:
cp -r node_modules/agent-tbrowser/skills/agent-tbrowser .claude/skills/Or download:
mkdir -p .claude/skills/agent-tbrowser
curl -o .claude/skills/agent-tbrowser/SKILL.md \
https://raw.githubusercontent.com/kokouaserge/custom-agent-browser/main/skills/agent-tbrowser/SKILL.mdLicense
Apache-2.0
