agentbrowse
v0.2.0
Published
Agent-browser CLI: drive any website from the terminal.
Readme
agentbrowse
Drive any website from the terminal — built for AI coding agents.
Agents (Claude Code, Codex, …) are great at running CLIs and clumsy at clicking through web UIs. agentbrowse gives them a clean, parseable surface: open a page, read it as token-bounded markdown, follow links, fill and submit forms, and operate behind a login — all from terminal commands, with a persistent browser session that survives across invocations.
There is no separate web interface to wire up. The agent runs
agentbrowseand gets structured output back.
Install
npm install -g agentbrowse
# first run downloads the browser:
npx playwright install chromiumQuickstart
agentbrowse open https://example.com # navigate the session
agentbrowse read # current page as clean markdown
agentbrowse links # numbered, followable links
agentbrowse click "Learn more" # click by visible text...
agentbrowse click 2 # ...or by a number from links/find
agentbrowse read --json # structured output for machines
agentbrowse stop # end the session (frees the browser)read/links accept an optional URL to open first, so agentbrowse read https://x.com is "open then read" in one step.
Make your agent use it by default
Install agentbrowse as a skill so your coding agent reaches for it automatically on web tasks — no prompting required:
npx agentbrowse skill # auto-detects Claude Code, Codex, Cursor, Gemini, Windsurf in this project
npx agentbrowse skill --global # or install once at the user levelIt writes each agent's native format — a SKILL.md for Claude Code, a .cursor/rules rule for Cursor, an AGENTS.md block for Codex and others — and is safe to re-run (idempotent). Preview with --print; target one with npx agentbrowse skill claude|codex|cursor|gemini|windsurf.
Claude Code plugin — install the skill from this repo's built-in marketplace:
/plugin marketplace add mandarwagh9/agentbrowse-skill
/plugin install agentbrowse@agentbrowseHow it works
A background browser daemon (auto-spawned per session, local socket only) holds a live Playwright page, so state persists between separate commands — open, then later click, then read, all hit the same page. The daemon self-stops after inactivity.
Sessions are isolated by --session <id> (default default), each with its own cookies and saved auth.
Commands
| Command | What it does |
|---|---|
| open <url> | Navigate the session to a URL |
| read [url] | Current page (or open <url> first) as token-bounded markdown (--max-chars, --page) |
| links [url] | Numbered, followable links (--filter) |
| snapshot [url] | Accessibility-tree view: every actionable element with a stable [ref], role, name, state (--filter, --max, --json). The robust way to act |
| find <text> | Locate elements by visible text (falls back to accessible name); numbers reusable by click |
| click <target> | Click by a snapshot ref (robust), visible text, a links/find number, or a CSS selector |
| type <field> <text> | Type into a field (CSS selector or bare name) |
| fill -f name=value … | Fill form fields |
| submit [form] | Submit the current form |
| login <url> | Open a real browser to authenticate once; persists the session for headless reuse |
| session save\|load\|clear | Manage saved auth/session state |
| stop | Stop the session's browser daemon |
Add --json to any command for structured output. Errors go to stderr as { "error": { code, message } } with a non-zero exit code (2 usage, 3 navigation, 4 target-not-found, 5 daemon).
Authentication
agentbrowse login https://site/login opens a real browser window for you to log in (handling SSO, MFA, captchas an agent can't). On success it saves the session's cookies locally (~/.webcli/sessions/<id>/, gitignored); the agent then operates headlessly with that session. Credentials are typed into the browser by a human — never passed as CLI arguments.
Site manifests (optional)
Point agentbrowse at a site.agent.json to expose named, high-level commands for a specific site:
agentbrowse --site ./notion.agent.json search "roadmap"A manifest declares pages, selectors, and commands as ordered steps with {pages.*} / {selectors.*} / {arg} / ${ENV} interpolation. Schema version: webcli-manifest-v0. See AGENTS.md for the agent-usage guide and the manifest format.
License
Free to use, including commercially — but not copyable. You may install, run,
and use agentbrowse; you may not copy, fork, redistribute, or modify it. See
LICENSE. (Versions 0.0.1–0.1.1 were released under MIT and keep that license.)
