browserman-cli

v0.3.0

Published

9 days ago

BrowserMan CLI for device authorization and MCP browser control

0High
0Medium
0Low

kinhunt

browserman browser mcp cli automation ai-agent

BrowserMan CLI

Connect AI agents to BrowserMan with a device-authorization CLI and MCP bridge.

BrowserMan CLI lets a user run setup from the terminal, approve access on the web, and then start an MCP server that targets their approved BrowserMan browser scope.

No dashboard code-copy flow required. Run setup in the terminal, approve on the web, then hand the resulting access to your agent.

What this npm package includes

This npm package contains the BrowserMan CLI and MCP bridge.

It is intended for:

browserman setup
browserman setup start
browserman setup status
browserman setup finish
browserman doctor
browserman revoke
browserman mcp

It does not bundle the BrowserMan web server or Chrome extension. Those live in the main BrowserMan project and service.

Architecture

AI Agent  <-->  MCP Server  <-->  BrowserMan Server  <-->  Chrome Extension
(Claude)       (stdio/SSE)        (Express + WS)           (Manifest V3)

Chrome Extension — Runs in your browser, executes commands via Chrome DevTools Protocol
Server — Relay that bridges API calls to the extension via WebSocket
MCP Server — Exposes browser tools to AI agents using the Model Context Protocol
CLI — Starts the MCP server with your credentials

Install

npm install -g browserman-cli

One-off usage without global install:

npx browserman-cli setup

After install, the main commands are:

browserman setup
browserman setup start --json
browserman setup status --json
browserman setup finish --resume <deviceCode> --json
browserman doctor
browserman browser list --json
- For any command that accepts --browser, you can pass a browser id, slug, or exact name.
browserman page open --url https://example.com
browserman page read --json
browserman revoke
browserman mcp

Direct CLI automation

These commands are the primary way for AI agents, shell scripts, and skills to use BrowserMan without MCP.

browserman browser list --json
- For any command that accepts --browser, you can pass a browser id, slug, or exact name.
browserman browser current --json
browserman browser ping --json
browserman page open --url https://example.com --json
browserman page read --json
browserman page click --ref 12 --json
browserman page type --text "hello" --json
browserman page press --key Enter --json
browserman page eval --js "document.title" --json
browserman page url --json
browserman page form --ref 12 --value "kinhunt" --json
browserman page scroll --direction down --pixels 500 --json
browserman page screenshot --out ./page.png --json
browserman script list --json
browserman script run --site x.com --action search --text "browserman" --json

Important behavior for agent users:

browserman page ... commands now operate on the selected browser's current active page by default.
You do not need to manually provide a tabId or sessionId for normal CLI usage.
page open will reuse the active tab when possible.
If the selected browser has no usable tab yet, BrowserMan CLI will create a session-backed tab automatically and continue.
For the most stable agent integrations, prefer --json on browser/page/script commands.

Recommended mental model:

browserman browser current --json
browserman page open --url ... --json
browserman page read --json
browserman page click/type/press/... --json

Use browserman mcp when you specifically need an MCP server for an MCP-compatible client.

Quick Start

1. Start the BrowserMan service

git clone <repo-url> && cd browserman
npm install
npm start
# Server running on http://localhost:3100

2. Sign in on the web and connect your browser

Open the BrowserMan website or local dashboard in your browser.
Sign in or create an account.
Install the Chrome extension from the extension/ folder.
Open the BrowserMan side panel and connect this browser to your account.

The normal product path is account-first:

sign in
connect browser
connect AI agent

You do not need to manually mint a long-lived bm_key_... or paste a bm_ext_... secret for the default CLI flow.

3. Approve CLI access from the terminal

Run setup from the terminal:

browserman setup

BrowserMan CLI prints a stable URL and code like:

BrowserMan authorization required.
Open: https://browserman.run/activate
Code: ABCD-EFGH

Then:

Open https://browserman.run/activate
Sign in if needed
Enter the code from the terminal
Review and narrow the final scope on the web
Approve the request
Return to CLI — BrowserMan finishes setup automatically

For agent-friendly automation:

browserman setup --json

For a task-oriented flow that does not depend on one long-lived interactive command:

browserman setup start --json
browserman setup status --json
browserman setup finish --resume <deviceCode> --json

Recommended standard flow for AI agents plus a human approver:

Agent runs browserman setup start --json --no-open.
Agent reads userCode, activateUrl, and finishCommand from the JSON result.
Agent shows the human the activation URL and code.
Human opens the web page, signs in, reviews scope, and approves.
Agent runs the exact finishCommand from the JSON result.
Agent expects a final setup_result with status: "completed" and code: "setup_completed".

The real successful shape for setup start --json is:

status: "waiting_for_user"
code: "device_authorization_required"
userCode
activateUrl
deviceCode
finishCommand
resumeCommand
startCommand

If no local setup task exists yet, setup status --json and setup finish --json return a structured setup_result with:

status: "not_started"
code: "setup_state_missing"
startCommand: "browserman setup start"

Important JSON contract for agents:

browserman setup --json may emit multiple JSON lines, not just one.
Progress lines use kind: "setup_event".
The final command outcome uses kind: "setup_result".
Agents should branch on kind, not on line position alone.
setup_result payloads now expose stable top-level fields like status, code, resumable, deviceCode, userCode, activateUrl, pollAfterMs, expiresAt, statusCommand, finishCommand, and resumeCommand.

Typical agent pattern:

Read each JSON line.
If kind === "setup_event", update progress state.
If kind === "setup_result", treat it as the final return value for that CLI call.
If status === "interrupted" and resumable === true, tell the user to approve on the web and then run finishCommand.

Example event line:

{"kind":"setup_event","type":"waiting_for_user","status":"waiting_for_user","code":"approval_pending","resumable":true,"deviceCode":"bm_dev_xxx"}

Example final result line:

{"kind":"setup_result","ok":true,"command":"setup","status":"interrupted","code":"setup_waiting_for_user","resumable":true,"deviceCode":"bm_dev_xxx","finishCommand":"browserman setup finish --resume bm_dev_xxx"}

4. Validate the saved delegated setup

browserman doctor
browserman browser list --json
- For any command that accepts --browser, you can pass a browser id, slug, or exact name.
browserman page open --url https://example.com --json
browserman page read --json

5. Connect AI agents

Claude Desktop / Claude Code

After browserman setup, BrowserMan stores delegated config locally. The simplest MCP config is:

{
  "mcpServers": {
    "browserman": {
      "command": "browserman",
      "args": ["mcp"]
    }
  }
}

If you need to override the saved config explicitly:

{
  "mcpServers": {
    "browserman": {
      "command": "browserman",
      "args": [
        "mcp",
        "--server", "http://localhost:3100",
        "--token", "bm_dlg_xxx",
        "--browser", "ext_xxx"
      ]
    }
  }
}

Cursor / Other MCP Clients

Use the same pattern: prefer saved delegated config from browserman setup, and only fall back to explicit --server / --token / --browser when you need a fully scripted override.

SSE Transport (Remote)

browserman mcp \
  --server http://localhost:3100 \
  --token bm_dlg_xxx \
  --browser ext_xxx \
  --transport sse \
  --port 3001
# MCP server available at http://localhost:3001/sse

MCP Tools

Browser Discovery

| Tool | Description | |------|-------------| | browserman_list_browsers | List visible connected browsers and the saved default when available | | browserman_current_browser | Show which browser MCP will target by default |

Core Browser Operations

| Tool | Description | |------|-------------| | browser_status | Check if extension is connected | | browser_navigate | Go to a URL | | browser_read_page | Get accessibility tree (element refs) | | browser_screenshot | Capture page screenshot | | browser_click | Click element by ref | | browser_type | Type text at cursor | | browser_form_input | Set form field value by ref | | browser_press_key | Press keyboard key | | browser_scroll | Scroll page or element into view | | browser_evaluate | Run JavaScript in page | | browser_get_url | Get current URL | | browser_new_tab | Open new tab | | browser_upload_file | Upload file to input | | browser_task_complete | Signal task done (hides overlay) |

Platform Automation Scripts

| Tool | Description | |------|-------------| | browser_list_scripts | List available platforms and their actions | | browser_run_script | Execute a pre-built multi-step automation script |

BrowserMan ships with pre-built scripts that combine multiple low-level operations into single, reliable actions. For sites without scripts, the AI falls back to core browser tools.

Detailed social media script reference: docs/social-media-platform-scripts.md

Supported Platforms

For the full action inventory implemented in code, see docs/social-media-platform-scripts.md.

X (Twitter) — `x.com`

| Action | Description | |--------|-------------| | post | Create a tweet (with optional media) | | like | Like a tweet by URL | | reply | Reply to a tweet | | retweet | Retweet a post | | quote_retweet | Quote retweet with comment | | bookmark | Bookmark a tweet | | search | Search tweets | | get_timeline | Read the home timeline | | follow / unfollow | Follow or unfollow a user | | get_notifications | Read your notifications |

LinkedIn — `linkedin.com`

| Action | Description | |--------|-------------| | post | Create a post (with optional images) | | article | Publish a long-form article (Markdown supported) | | like | Like/react to a post | | comment | Comment on a post | | get_feed | Read the LinkedIn feed | | follow / unfollow | Follow or unfollow a user/company | | get_notifications | Read your notifications | | send_connection | Send a connection request (with optional note) | | search | Search people, posts, or companies |

Reddit — `reddit.com`

| Action | Description | |--------|-------------| | post | Submit a post to a subreddit | | comment | Comment on a post | | vote | Upvote or downvote | | get_feed | Read a subreddit or home feed | | subscribe / unsubscribe | Join or leave a subreddit | | get_notifications | Read inbox/notifications | | search | Search Reddit |

Medium — `medium.com`

| Action | Description | |--------|-------------| | article | Publish an article (Markdown supported) | | get_feed | Read the Medium feed | | follow | Follow an author | | clap | Clap for an article (1–50 claps) |

Any website works. For sites without pre-built scripts, the AI uses core browser tools (navigate, read, click, type, etc.) to automate any task.

How It Works

Two Operation Modes

Pre-built Scripts — Single-command automation for supported platforms. The AI calls browser_run_script with a platform and action (e.g., x.com + post), and BrowserMan handles the entire multi-step flow.
Core Browser Tools — Direct low-level control for any website. The AI reads the page structure, identifies elements, and performs actions step by step. This is the fallback for sites without pre-built scripts.

Smart Tab Management (v0.2.0)

BrowserMan manages browser tabs intelligently:

Tab Reuse — Navigating to a platform reuses an existing tab instead of opening a new one
Auto Cleanup — Tabs created by scripts are automatically closed when done; tabs that existed before are left alone
Unified Tab Group — All BrowserMan-managed tabs are organized in a single Chrome Tab Group for a clean workspace
Activity-Driven Lifecycle — Sessions stay alive based on activity, not arbitrary timeouts

REST API

All API endpoints require Authorization: Bearer <token> header. Both session tokens (bm_sess_) and API keys (bm_key_) are accepted.

Auth

| Method | Endpoint | Description | |--------|----------|-------------| | POST | /auth/register | Create account (email, password) | | POST | /auth/login | Login, get session token | | POST | /auth/logout | Invalidate session | | GET | /auth/me | Get current user |

Commands

# Send a command to the browser
curl -X POST http://localhost:3100/api/command \
  -H "Authorization: Bearer bm_key_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "extension": "bm_ext_xxx",
    "action": "navigate",
    "params": {"url": "https://example.com"}
  }'

Extensions

| Method | Endpoint | Description | |--------|----------|-------------| | GET | /api/extensions | List your extensions | | POST | /api/extensions | Create extension | | PATCH | /api/extensions/:id | Update name | | DELETE | /api/extensions/:id | Delete extension |

API Keys

| Method | Endpoint | Description | |--------|----------|-------------| | GET | /api/keys | List keys (prefix only) | | POST | /api/keys | Create key (full key shown once) | | DELETE | /api/keys/:id | Revoke key |

Deployment

Fly.io

# Install flyctl, then:
fly launch          # first time
fly deploy          # subsequent deploys
fly secrets set PORT=8080  # if needed

The included fly.toml and Dockerfile are pre-configured:

Multi-stage build (compiles native SQLite bindings)
Persistent volume at /app/data for the database
Auto-stop when idle, auto-start on request
Force HTTPS

Docker

docker build -t browserman .
docker run -p 3100:8080 -v browserman-data:/app/data browserman

Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | PORT | 3100 | Server port | | HOSTNAME | 0.0.0.0 | Bind address |

Development

npm run dev          # Start with --watch (auto-reload)
npm run seed         # Create test user/extension/key
node scripts/test-auth.sh  # Test auth flow

Runtime data note:

data/ is a local runtime data directory used for the SQLite database and other machine-local state. It is not part of the source tree and is ignored by git.

Security Notes

Passwords stored as salted SHA-256 hashes
API keys and session tokens stored as SHA-256 hashes (never plaintext)
Extension keys (bm_ext_) are stored in plaintext for WebSocket auth lookup
Sessions expire after 30 days
All API endpoints validate ownership (users can only access their own resources)

License

MIT