npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

agent-tbrowser

v0.5.0

Published

Headless browser automation CLI for AI agents

Readme

agent-tbrowser

Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.

Installation

npm (recommended)

npm install -g agent-tbrowser
agent-tbrowser install  # Download Chromium

From Source

git clone https://github.com/kokouaserge/custom-agent-browser
cd agent-tbrowser
pnpm install
pnpm build
pnpm build:native   # Requires Rust (https://rustup.rs)
pnpm link --global  # Makes agent-tbrowser available globally
agent-tbrowser install

Linux Dependencies

On Linux, install system dependencies:

agent-tbrowser install --with-deps
# or manually: npx playwright install-deps chromium

Quick Start

agent-tbrowser open example.com
agent-tbrowser snapshot                    # Get accessibility tree with refs
agent-tbrowser click @e2                   # Click by ref from snapshot
agent-tbrowser fill @e3 "[email protected]" # Fill by ref
agent-tbrowser get text @e1                # Get text by ref
agent-tbrowser screenshot page.png
agent-tbrowser close

Traditional Selectors (also supported)

agent-tbrowser click "#submit"
agent-tbrowser fill "#email" "[email protected]"
agent-tbrowser find role button click --name "Submit"

Commands

Core Commands

agent-tbrowser open <url>              # Navigate to URL (aliases: goto, navigate)
agent-tbrowser click <sel>             # Click element
agent-tbrowser dblclick <sel>          # Double-click element
agent-tbrowser focus <sel>             # Focus element
agent-tbrowser type <sel> <text>       # Type into element
agent-tbrowser fill <sel> <text>       # Clear and fill
agent-tbrowser press <key>             # Press key (Enter, Tab, Control+a) (alias: key)
agent-tbrowser keydown <key>           # Hold key down
agent-tbrowser keyup <key>             # Release key
agent-tbrowser hover <sel>             # Hover element
agent-tbrowser select <sel> <val>      # Select dropdown option
agent-tbrowser check <sel>             # Check checkbox
agent-tbrowser uncheck <sel>           # Uncheck checkbox
agent-tbrowser scroll <dir> [px]       # Scroll (up/down/left/right)
agent-tbrowser scrollintoview <sel>    # Scroll element into view (alias: scrollinto)
agent-tbrowser drag <src> <tgt>        # Drag and drop
agent-tbrowser upload <sel> <files>    # Upload files
agent-tbrowser screenshot [path]       # Take screenshot (--full for full page)
agent-tbrowser pdf <path>              # Save as PDF
agent-tbrowser snapshot                # Accessibility tree with refs (best for AI)
agent-tbrowser eval <js>               # Run JavaScript
agent-tbrowser close                   # Close browser (aliases: quit, exit)

Get Info

agent-tbrowser get text <sel>          # Get text content
agent-tbrowser get html <sel>          # Get innerHTML
agent-tbrowser get value <sel>         # Get input value
agent-tbrowser get attr <sel> <attr>   # Get attribute
agent-tbrowser get title               # Get page title
agent-tbrowser get url                 # Get current URL
agent-tbrowser get count <sel>         # Count matching elements
agent-tbrowser get box <sel>           # Get bounding box

Check State

agent-tbrowser is visible <sel>        # Check if visible
agent-tbrowser is enabled <sel>        # Check if enabled
agent-tbrowser is checked <sel>        # Check if checked

Find Elements (Semantic Locators)

agent-tbrowser find role <role> <action> [value]       # By ARIA role
agent-tbrowser find text <text> <action>               # By text content
agent-tbrowser find label <label> <action> [value]     # By label
agent-tbrowser find placeholder <ph> <action> [value]  # By placeholder
agent-tbrowser find alt <text> <action>                # By alt text
agent-tbrowser find title <text> <action>              # By title attr
agent-tbrowser find testid <id> <action> [value]       # By data-testid
agent-tbrowser find first <sel> <action> [value]       # First match
agent-tbrowser find last <sel> <action> [value]        # Last match
agent-tbrowser find nth <n> <sel> <action> [value]     # Nth match

Actions: click, fill, check, hover, text

Examples:

agent-tbrowser find role button click --name "Submit"
agent-tbrowser find text "Sign In" click
agent-tbrowser find label "Email" fill "[email protected]"
agent-tbrowser find first ".item" click
agent-tbrowser find nth 2 "a" text

Wait

agent-tbrowser wait <selector>         # Wait for element to be visible
agent-tbrowser wait <ms>               # Wait for time (milliseconds)
agent-tbrowser wait --text "Welcome"   # Wait for text to appear
agent-tbrowser wait --url "**/dash"    # Wait for URL pattern
agent-tbrowser wait --load networkidle # Wait for load state
agent-tbrowser wait --fn "window.ready === true"  # Wait for JS condition

Load states: load, domcontentloaded, networkidle

Mouse Control

agent-tbrowser mouse move <x> <y>      # Move mouse
agent-tbrowser mouse down [button]     # Press button (left/right/middle)
agent-tbrowser mouse up [button]       # Release button
agent-tbrowser mouse wheel <dy> [dx]   # Scroll wheel

Browser Settings

agent-tbrowser set viewport <w> <h>    # Set viewport size
agent-tbrowser set device <name>       # Emulate device ("iPhone 14")
agent-tbrowser set geo <lat> <lng>     # Set geolocation
agent-tbrowser set offline [on|off]    # Toggle offline mode
agent-tbrowser set headers <json>      # Extra HTTP headers
agent-tbrowser set credentials <u> <p> # HTTP basic auth
agent-tbrowser set media [dark|light]  # Emulate color scheme

Cookies & Storage

agent-tbrowser cookies                 # Get all cookies
agent-tbrowser cookies set <name> <val> # Set cookie
agent-tbrowser cookies clear           # Clear cookies

agent-tbrowser storage local           # Get all localStorage
agent-tbrowser storage local <key>     # Get specific key
agent-tbrowser storage local set <k> <v>  # Set value
agent-tbrowser storage local clear     # Clear all

agent-tbrowser storage session         # Same for sessionStorage

Network

agent-tbrowser network route <url>              # Intercept requests
agent-tbrowser network route <url> --abort      # Block requests
agent-tbrowser network route <url> --body <json>  # Mock response
agent-tbrowser network unroute [url]            # Remove routes
agent-tbrowser network requests                 # View tracked requests
agent-tbrowser network requests --filter api    # Filter requests

Tabs & Windows

agent-tbrowser tab                     # List tabs
agent-tbrowser tab new [url]           # New tab (optionally with URL)
agent-tbrowser tab <n>                 # Switch to tab n
agent-tbrowser tab close [n]           # Close tab
agent-tbrowser window new              # New window

Frames

agent-tbrowser frame <sel>             # Switch to iframe
agent-tbrowser frame main              # Back to main frame

Dialogs

agent-tbrowser dialog accept [text]    # Accept (with optional prompt text)
agent-tbrowser dialog dismiss          # Dismiss

Debug

agent-tbrowser trace start [path]      # Start recording trace
agent-tbrowser trace stop [path]       # Stop and save trace
agent-tbrowser console                 # View console messages
agent-tbrowser console --clear         # Clear console
agent-tbrowser errors                  # View page errors
agent-tbrowser errors --clear          # Clear errors
agent-tbrowser highlight <sel>         # Highlight element
agent-tbrowser state save <path>       # Save auth state
agent-tbrowser state load <path>       # Load auth state

Navigation

agent-tbrowser back                    # Go back
agent-tbrowser forward                 # Go forward
agent-tbrowser reload                  # Reload page

Setup

agent-tbrowser install                 # Download Chromium browser
agent-tbrowser install --with-deps     # Also install system deps (Linux)

Sessions

Run multiple isolated browser instances:

# Different sessions
agent-tbrowser --session agent1 open site-a.com
agent-tbrowser --session agent2 open site-b.com

# Or via environment variable
AGENT_BROWSER_SESSION=agent1 agent-tbrowser click "#btn"

# List active sessions
agent-tbrowser session list
# Output:
# Active sessions:
# -> default
#    agent1

# Show current session
agent-tbrowser session

Each session has its own:

  • Browser instance
  • Cookies and storage
  • Navigation history
  • Authentication state

Snapshot Options

The snapshot command supports filtering to reduce output size:

agent-tbrowser snapshot                    # Full accessibility tree
agent-tbrowser snapshot -i                 # Interactive elements only (buttons, inputs, links)
agent-tbrowser snapshot -c                 # Compact (remove empty structural elements)
agent-tbrowser snapshot -d 3               # Limit depth to 3 levels
agent-tbrowser snapshot -s "#main"         # Scope to CSS selector
agent-tbrowser snapshot -i -c -d 5         # Combine options

| Option | Description | |--------|-------------| | -i, --interactive | Only show interactive elements (buttons, links, inputs) | | -c, --compact | Remove empty structural elements | | -d, --depth <n> | Limit tree depth | | -s, --selector <sel> | Scope to CSS selector |

Options

| Option | Description | |--------|-------------| | --session <name> | Use isolated session (or AGENT_BROWSER_SESSION env) | | --headers <json> | Set HTTP headers scoped to the URL's origin | | --executable-path <path> | Custom browser executable (or AGENT_BROWSER_EXECUTABLE_PATH env) | | --json | JSON output (for agents) | | --full, -f | Full page screenshot | | --name, -n | Locator name filter | | --exact | Exact text match | | --headed | Show browser window (not headless) | | --cdp <port> | Connect via Chrome DevTools Protocol | | --debug | Debug output |

Selectors

Refs (Recommended for AI)

Refs provide deterministic element selection from snapshots:

# 1. Get snapshot with refs
agent-tbrowser snapshot
# Output:
# - heading "Example Domain" [ref=e1] [level=1]
# - button "Submit" [ref=e2]
# - textbox "Email" [ref=e3]
# - link "Learn more" [ref=e4]

# 2. Use refs to interact
agent-tbrowser click @e2                   # Click the button
agent-tbrowser fill @e3 "[email protected]" # Fill the textbox
agent-tbrowser get text @e1                # Get heading text
agent-tbrowser hover @e4                   # Hover the link

Why use refs?

  • Deterministic: Ref points to exact element from snapshot
  • Fast: No DOM re-query needed
  • AI-friendly: Snapshot + ref workflow is optimal for LLMs

CSS Selectors

agent-tbrowser click "#id"
agent-tbrowser click ".class"
agent-tbrowser click "div > button"

Text & XPath

agent-tbrowser click "text=Submit"
agent-tbrowser click "xpath=//button"

Semantic Locators

agent-tbrowser find role button click --name "Submit"
agent-tbrowser find label "Email" fill "[email protected]"

Agent Mode

Use --json for machine-readable output:

agent-tbrowser snapshot --json
# Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}

agent-tbrowser get text @e1 --json
agent-tbrowser is visible @e2 --json

Optimal AI Workflow

# 1. Navigate and get snapshot
agent-tbrowser open example.com
agent-tbrowser snapshot -i --json   # AI parses tree and refs

# 2. AI identifies target refs from snapshot
# 3. Execute actions using refs
agent-tbrowser click @e2
agent-tbrowser fill @e3 "input text"

# 4. Get new snapshot if page changed
agent-tbrowser snapshot -i --json

Headed Mode

Show the browser window for debugging:

agent-tbrowser open example.com --headed

This opens a visible browser window instead of running headless.

Authenticated Sessions

Use --headers to set HTTP headers for a specific origin, enabling authentication without login flows:

# Headers are scoped to api.example.com only
agent-tbrowser open api.example.com --headers '{"Authorization": "Bearer <token>"}'

# Requests to api.example.com include the auth header
agent-tbrowser snapshot -i --json
agent-tbrowser click @e2

# Navigate to another domain - headers are NOT sent (safe!)
agent-tbrowser open other-site.com

This is useful for:

  • Skipping login flows - Authenticate via headers instead of UI
  • Switching users - Start new sessions with different auth tokens
  • API testing - Access protected endpoints directly
  • Security - Headers are scoped to the origin, not leaked to other domains

To set headers for multiple origins, use --headers with each open command:

agent-tbrowser open api.example.com --headers '{"Authorization": "Bearer token1"}'
agent-tbrowser open api.acme.com --headers '{"Authorization": "Bearer token2"}'

For global headers (all domains), use set headers:

agent-tbrowser set headers '{"X-Custom-Header": "value"}'

Custom Browser Executable

Use a custom browser executable instead of the bundled Chromium. This is useful for:

  • Serverless deployment: Use lightweight Chromium builds like @sparticuz/chromium (~50MB vs ~684MB)
  • System browsers: Use an existing Chrome/Chromium installation
  • Custom builds: Use modified browser builds

CLI Usage

# Via flag
agent-tbrowser --executable-path /path/to/chromium open example.com

# Via environment variable
AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-tbrowser open example.com

Serverless Example (Vercel/AWS Lambda)

import chromium from '@sparticuz/chromium';
import { BrowserManager } from 'agent-tbrowser';

export async function handler() {
  const browser = new BrowserManager();
  await browser.launch({
    executablePath: await chromium.executablePath(),
    headless: true,
  });
  // ... use browser
}

SDK (Programmatic API)

The SDK provides natural language browser automation. Describe elements in plain language, and the SDK finds them using LLM + cache.

Quick Start

import { createRunner } from 'agent-tbrowser/sdk';

// Create LLM client (e.g., Anthropic)
const llm = {
  complete: async (prompt: string) => {
    const response = await anthropic.messages.create({
      model: 'claude-sonnet-4-20250514',
      max_tokens: 1024,
      messages: [{ role: 'user', content: prompt }],
    });
    return {
      text: response.content[0].type === 'text' ? response.content[0].text : '',
      usage: {
        input: response.usage.input_tokens,
        output: response.usage.output_tokens,
      },
    };
  },
};

// Local browser
const runner = await createRunner({
  llm,
  local: { headless: false },
});

// Natural language automation
await runner.open('https://example.com');
await runner.click('the login button');
await runner.fill('email input', '[email protected]');
await runner.fill('password field', 'secret123');
await runner.click('submit button');

await runner.close();

Cloud Browsers

// TBrowser
const runner = await createRunner({
  mode: 'cloud',
  cloud: {
    provider: 'tbrowser',
    apiKey: process.env.TBROWSER_API_KEY,
    apiUrl: process.env.TBROWSER_API_URL,
  },
  llm,
});

// Browserbase
const runner = await createRunner({
  mode: 'cloud',
  cloud: {
    provider: 'browserbase',
    apiKey: process.env.BROWSERBASE_API_KEY,
    projectId: process.env.BROWSERBASE_PROJECT_ID,
  },
  llm,
});

// Hyperbrowser
const runner = await createRunner({
  mode: 'cloud',
  cloud: {
    provider: 'hyperbrowser',
    apiKey: process.env.HYPERBROWSER_API_KEY,
  },
  llm,
});

// Get session info
console.log('Session:', runner.getSessionId());
console.log('Provider:', runner.getCloudProvider());

SDK Actions

The SDK provides 70+ actions organized by category:

| Category | Actions | |----------|---------| | Navigation | open, back, forward, reload | | Interactions | click, dblclick, fill, type, select, check, uncheck, hover, focus, drag, dropFiles | | Keyboard | press, keyDown, keyUp | | Mouse | mouseMove, mouseDown, mouseUp, mouseWheel, scroll, scrollToElement | | Getters | getText, getHtml, getValue, getAttribute, getTitle, getUrl, getCount, getBoundingBox | | State | isVisible, isEnabled, isChecked, isHidden, isEditable | | Wait | waitForElement, waitForTimeout, waitForUrl, waitForPageLoad | | Tabs/Frames | newTab, switchTab, closeTab, listTabs, switchToFrame, switchToMainFrame | | Storage | getCookies, setCookie, clearCookies, getLocalStorage, setLocalStorageItem, clearLocalStorage | | Network | startRequestTracking, getRequests, mockRoute, blockRoute, setExtraHeaders, setOffline | | Debug | startTracing, stopTracing, highlight, getSnapshot, getDebugInfo, pdf | | Config | setViewport, setGeolocation, setColorScheme, setTimezone, setLocale |

See SDK Documentation for complete API reference.

CDP Mode

Connect to an existing browser via Chrome DevTools Protocol:

# Connect to Electron app
agent-tbrowser --cdp 9222 snapshot

# Connect to Chrome with remote debugging
# (Start Chrome with: google-chrome --remote-debugging-port=9222)
agent-tbrowser --cdp 9222 open about:blank

This enables control of:

  • Electron apps
  • Chrome/Chromium instances with remote debugging
  • WebView2 applications
  • Any browser exposing a CDP endpoint

Streaming (Browser Preview)

Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.

Enable Streaming

Set the AGENT_BROWSER_STREAM_PORT environment variable:

AGENT_BROWSER_STREAM_PORT=9223 agent-tbrowser open example.com

This starts a WebSocket server on the specified port that streams the browser viewport and accepts input events.

WebSocket Protocol

Connect to ws://localhost:9223 to receive frames and send input:

Receive frames:

{
  "type": "frame",
  "data": "<base64-encoded-jpeg>",
  "metadata": {
    "deviceWidth": 1280,
    "deviceHeight": 720,
    "pageScaleFactor": 1,
    "offsetTop": 0,
    "scrollOffsetX": 0,
    "scrollOffsetY": 0
  }
}

Send mouse events:

{
  "type": "input_mouse",
  "eventType": "mousePressed",
  "x": 100,
  "y": 200,
  "button": "left",
  "clickCount": 1
}

Send keyboard events:

{
  "type": "input_keyboard",
  "eventType": "keyDown",
  "key": "Enter",
  "code": "Enter"
}

Send touch events:

{
  "type": "input_touch",
  "eventType": "touchStart",
  "touchPoints": [{ "x": 100, "y": 200 }]
}

Programmatic API

For advanced use, control streaming directly via the protocol:

import { BrowserManager } from 'agent-tbrowser';

const browser = new BrowserManager();
await browser.launch({ headless: true });
await browser.navigate('https://example.com');

// Start screencast
await browser.startScreencast((frame) => {
  // frame.data is base64-encoded image
  // frame.metadata contains viewport info
  console.log('Frame received:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
}, {
  format: 'jpeg',
  quality: 80,
  maxWidth: 1280,
  maxHeight: 720,
});

// Inject mouse events
await browser.injectMouseEvent({
  type: 'mousePressed',
  x: 100,
  y: 200,
  button: 'left',
});

// Inject keyboard events
await browser.injectKeyboardEvent({
  type: 'keyDown',
  key: 'Enter',
  code: 'Enter',
});

// Stop when done
await browser.stopScreencast();

Architecture

agent-tbrowser uses a client-daemon architecture:

  1. Rust CLI (fast native binary) - Parses commands, communicates with daemon
  2. Node.js Daemon - Manages Playwright browser instance
  3. Fallback - If native binary unavailable, uses Node.js directly

The daemon starts automatically on first command and persists between commands for fast subsequent operations.

Browser Engine: Uses Chromium by default. The daemon also supports Firefox and WebKit via the Playwright protocol.

Platforms

| Platform | Binary | Fallback | |----------|--------|----------| | macOS ARM64 | Native Rust | Node.js | | macOS x64 | Native Rust | Node.js | | Linux ARM64 | Native Rust | Node.js | | Linux x64 | Native Rust | Node.js | | Windows x64 | Native Rust | Node.js |

Usage with AI Agents

Just ask the agent

The simplest approach - just tell your agent to use it:

Use agent-tbrowser to test the login flow. Run agent-tbrowser --help to see available commands.

The --help output is comprehensive and most agents can figure it out from there.

AGENTS.md / CLAUDE.md

For more consistent results, add to your project or global instructions file:

## Browser Automation

Use `agent-tbrowser` for web automation. Run `agent-tbrowser --help` for all commands.

Core workflow:
1. `agent-tbrowser open <url>` - Navigate to page
2. `agent-tbrowser snapshot -i` - Get interactive elements with refs (@e1, @e2)
3. `agent-tbrowser click @e1` / `fill @e2 "text"` - Interact using refs
4. Re-snapshot after page changes

Claude Code Skill

For Claude Code, a skill provides richer context:

cp -r node_modules/agent-tbrowser/skills/agent-tbrowser .claude/skills/

Or download:

mkdir -p .claude/skills/agent-tbrowser
curl -o .claude/skills/agent-tbrowser/SKILL.md \
  https://raw.githubusercontent.com/kokouaserge/custom-agent-browser/main/skills/agent-tbrowser/SKILL.md

License

Apache-2.0