browser-automation-skill
v0.75.0
Published
MCP server + bash CLI that drives a real browser from Claude Code, OpenAI Codex, and other MCP clients by routing across chrome-devtools-mcp, playwright-cli, playwright-lib, and obscura adapters. 45 verbs covering site/session/credential management, snaps
Maintainers
Readme
browser-automation-skill
A Claude Code skill, OpenAI Codex plugin, and MCP server for driving real browsers from an LLM. 45 verbs + Webwright delegation + a per-action audit surface routed across four primitive tools (chrome-devtools-mcp / playwright-cli / playwright-lib / obscura), with a 5-tier cache defense chain (cached selector -> fingerprint rescue -> local-VLM rescue -> cloud LLM -> user fixup) that lets agents skip LLM ref-resolution on repeat actions and per-schema state migration tooling. Credentials and sessions stay strictly local under $HOME/.browser-skill/.
Status: v0.75.0 current. Phases 1-16 plus P0 multi-step orchestration hardening shipped: stateful sessions, persistent CDP, real YAML flows, Webwright delegation, delegate offload telemetry, session TTL auto-relogin, and doctor/readiness checks.
scripts/lib/node/mcp-server.mjspublishes 6 verbs (open/snapshot/click/fill/extract/list-sites) over JSON-RPC NDJSON for external agents; full CLI remains available from the repo checkout. Full bats: 1202/1202 green. Architecture map:docs/ARCHITECTURE.md. New contributors:CONTRIBUTING.md.One-command enable Path 3 cache rescue:
bash scripts/browser-vlm.sh install-env(idempotent — appends env exports to~/.zshrc; lazy auto-start handles the rest).
What it does
- Sites + sessions + credentials. Register sites; capture/restore Playwright
storageState; store credentials in keychain (macOS) / libsecret (Linux) / plaintext-with-typed-confirmation; rotate TOTP shared secrets. - Navigation + interaction.
open·snapshot(eN-indexed accessibility tree) ·click/fill/hover/press/select/drag/uploadby--ref eNor--selector CSS·wait·route(network mock) · multi-tab (tab-list/tab-switch/tab-close). - Capture pipelines.
inspectaggregates console + network (sanitized HAR) + screenshot.auditruns Lighthouse. All captures persist under~/.browser-skill/captures/<NNN>/withmeta.json+ per-aspect files; auto-prune at retention thresholds (default: 500 captures / 14 days; baselines exempt). - Declarative flow runner.
flow run task.flow.yamlexecutes a YAML flow with${var}+${refs.NAME}templating. Top-levelsite:+session:metadata is injected as per-step--site/--asunless a step overrides it, so storageState validation and daemon routing are inherited.flow recordwrapsplaywright codegen(password-canary write-side:/password/ibecomes${secrets.password}placeholder; literal dropped).replay <id>re-executes a capture's steps + emits structured per-step diff.history list/show/diff/clear+baseline save/list/removefor managing the capture corpus. - Per-archetype memory cache (Phase 11).
browser-do --intent "click delete" --pattern '/devices/:id'looks up cached selector for the(site, archetype, intent)triple; on hit, dispatches the existing verb at zero LLM tokens; on miss, emitscache_missevent.browser-do recordfor explicit write-back.browser-do proposeauto-clusters URLs into patterns. Self-heal: 4 consecutive failures disable the cached selector; agent re-resolves + re-records to heal. - Per-action telemetry + balance-triangle audit (Phase 12). Every adapter call (
open/click/fill/snapshot/extract) emits one OTel-shaped JSONL event to~/.browser-skill/memory/stats.jsonl(mode 0600).browser-delegateadds secondary-LLM offload fields (delegate_steps,offloaded_input_tokens,offloaded_output_tokens, cached-input tokens, model/backend), andbrowser-stats reportseparates those from Claude-context token signals.browser-stats report --paretorolls events into a route × verb table: success rate, post-condition hit rate, token-proxy byte counts, p50 duration, $$ cost (whenCLAUDE_USAGE_*env injected), 14-value failure-mode histogram (Phase-14 addedunknown_failurecatch-all), andoblivious_successdetection (adapter said ok but post-condition assertion failed — the dominant invisible-error class for browser agents).browser-stats tunesurfaces worst-performing(verb, route)candidates for/autoresearchhandoff.browser-stats prune(Phase 14) closes the feedback loop: finds (site, selector) tuples with ≥3oblivious_successevents;--applydisables the matching cache interactions so cloud LLM re-derives.browser-stats mark <span> success|fail[:reason]records user overrides. Schema follows OpenInference + OTel GenAI v1.40 naming for forward-compat with Langfuse/Phoenix/Jaeger via OTLP exporter. Seereferences/browser-stats-cheatsheet.md. - Local-VLM cache rescue (Phase 14, Path 3). 5th tier in the cache defense chain — between Phase-13 fingerprint rescue and cloud-LLM fallback. When
BROWSER_SKILL_VISION_FALLBACK=1+BROWSER_SKILL_VISUAL_RESCUE_CMD=<path>set, browser-do invokes an external hook that probes whether the cached element is still semantically present. Bundled canonical probescripts/lib/visual-rescue-default.sh(text-mode v1) reads the accessibility-tree snapshot + asks a local OpenAI-compatible LLM (defaulthttp://127.0.0.1:8080— matchesbash scripts/browser-vlm.sh start) yes/no. Smart-skip whenfail_count ≥ 3(cache likely fundamentally broken; skip the probe). One env var pair viabash scripts/browser-vlm.sh install-envenables everything; lazy-start + 10-min idle-stop watchdog manage the llama-server lifecycle. Seereferences/recipes/visual-rescue-hook.md. - MCP server (Phase 14).
bash scripts/browser-mcp.sh servepublishes 6 verbs (open / snapshot / click / fill / extract / list-sites) over JSON-RPC NDJSON for external agents (Claude Code, OpenAI Codex, midscene, agent-browser, Stagehand, Continue, Cline). TOOLS auto-discovered from each adapter'stool_capabilities()+scripts/lib/node/mcp-tools.jsonallowlist — adding a verb to MCP is a 1-JSON-entry change. Env-var passthrough is whitelisted (AP-7: client'sOPENAI_API_KEYand other foreign secrets are filtered; onlyBROWSER_SKILL_*/MIDSCENE_MODEL_*/CLAUDE_*/PLAYWRIGHT_*/ etc inherit).browser_fillhas nosecretfield andadditionalProperties: false— secrets stay on the bash entry point via--secret-stdin. Seereferences/browser-mcp-cheatsheet.md. - Webwright delegation (Phase 15, opt-in).
browser-delegatecan offload a novel no-auth multi-step task to Webwright driven by a secondary LLM such as GLM. Delegation isoffby default, never router-selected automatically, writes task text through a mode-0600 task file instead of argv, and emits compact summary/offload telemetry back to the main agent. Seereferences/browser-delegate-cheatsheet.mdandreferences/webwright-setup.md.
Security at a glance
- Credentials are on disk only at
$HOME/.browser-skill/(mode 0700 dir, 0600 files). - Credentials never appear on argv, in
ps, in git, or in the agent transcript (AP-7 stdin-only pattern enforced viatests/argv_leak.bats). - Authenticated delegation is disabled. The future bridge is storageState-only and must never pass passwords, TOTP secrets, or credential backend payloads to Webwright. See
references/browser-delegate-auth-bridge.md. - Cache writes refuse
PASSWORD-CANARYsentinel (privacy guard inbrowser-do record). .gitignoreblocks every credential / session / capture / memory pattern from the repo..githooks/pre-commitrejects any staged file or diff that looks like a credential.- See
SECURITY.mdfor the full threat model +references/recipes/{privacy-canary,cache-write-security,path-security}.mdfor codified discipline.
Requirements
Skill itself (always required):
- bash ≥ 5.0 (
brew install bashon macOS — system bash 3.2 is too old; bash 5.0 needed for$EPOCHREALTIMEfast path used by the Phase-12 telemetry emitter) jqsqlite3(Phase 12 — lazy-built SQLite mirror atmemory/stats.db; standard on macOS and most Linux distros)
For real browser flows (install at least one):
- chrome-devtools-mcp (recommended; most-complete adapter):
npx -y chrome-devtools-mcp@latest - playwright-cli:
npm i -g playwright @playwright/test @playwright/cli && playwright install chromium - playwright-lib: requires
node+npm i -g playwright(driver lazy-imports) - obscura (single-binary; scrape + stealth-only): download from https://github.com/h4ckf0r0day/obscura/releases
For tests: bats-core (brew install bats-core)
Optional for delegation: Webwright plus an Anthropic-compatible model key such as GLM. Follow references/webwright-setup.md. browser doctor reports Webwright readiness as advisory only.
browser doctor reports which adapters are present + install hints for missing ones.
Install
You can use this project three ways: as an MCP server (works with MCP-aware clients such as Claude Code, OpenAI Codex, Continue, Cline, midscene, Stagehand, and agent-browser), as a full Codex plugin (bundled skill + MCP server), or as a full Claude Code skill (all 44 bash verbs + the cache + audit surface).
Option A — MCP server only (via npm; any MCP client)
Zero-install via npx:
# One-off smoke test
printf '%s\n' \
'{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
'{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}' \
| npx -y browser-automation-skill@latest serveWire into Claude Code (user-scope — every project on the machine):
claude mcp add browser-skill --scope user -- npx -y browser-automation-skill@latest serve
claude mcp list # → browser-skill: ... ✓ ConnectedWire into OpenAI Codex (shared by Codex CLI and the Codex app/IDE):
codex mcp add browser-skill -- npx -y browser-automation-skill@latest serve
codex mcp listEquivalent ~/.codex/config.toml entry:
[mcp_servers.browser-skill]
command = "npx"
args = ["-y", "browser-automation-skill@latest", "serve"]
startup_timeout_sec = 20
tool_timeout_sec = 606 tools become available: browser_open, browser_snapshot, browser_click, browser_fill, browser_extract, browser_list-sites. Pin a version (@0.75.0) for reproducibility, or omit @latest to track the registry tip. Phase 12 (v0.72.0+) auto-flips MCP tool output to TOON format for tabular verbs (40-65% token savings vs JSON, +0.4pp LLM parse accuracy; spec: docs/superpowers/specs/2026-05-22-toon-output-amendment.md).
Other MCP clients (Continue / Cline / midscene / Stagehand): add a stdio entry pointing at npx -y browser-automation-skill@latest serve. Protocol: MCP 2024-11-05, NDJSON over stdio.
Optional global install (skip the npx warmup):
npm i -g browser-automation-skill
browser-automation-skill serve # same as npx formNote: the MCP surface is intentionally a curated subset (6 verbs). For the full 44-verb CLI + cache + flow runner + telemetry, install as a skill (Option B or Option C).
Option B — Full Codex plugin (skill + bundled MCP server)
From GitHub:
codex plugin marketplace add xicv/browser-automation-skill
codex plugin add browser-automation-skill@browser-automation-skillFrom a local checkout:
git clone https://github.com/xicv/browser-automation-skill ~/Projects/browser-automation-skill
cd ~/Projects/browser-automation-skill
codex plugin marketplace add .
codex plugin add browser-automation-skill@browser-automation-skillThis installs the Codex plugin metadata from plugins/browser-automation-skill/.codex-plugin/plugin.json, the bundled skill from plugins/browser-automation-skill/skills/browser-automation-skill/SKILL.md, and the bundled MCP server from plugins/browser-automation-skill/.mcp.json. Codex stores plugin and MCP enablement in ~/.codex/config.toml, so the CLI and app/IDE see the same setup.
To refresh an existing local checkout after a release:
git pull --ff-only
codex plugin marketplace add .
codex plugin add browser-automation-skill@browser-automation-skill
codex mcp get browser-skillOption C — Full Claude Code skill (one machine, all your projects)
git clone https://github.com/xicv/browser-automation-skill ~/Projects/browser-automation-skill
cd ~/Projects/browser-automation-skill
./install.sh --with-hooks # --with-hooks enables the credential-leak pre-commit blockerSymlinks ~/.claude/skills/browser-automation-skill → repo. Creates ~/.browser-skill/ mode 0700. Runs doctor at the end.
Verify (in Claude Code)
/browser doctorExpected: exit 0; final line is a JSON summary with "status":"ok". Doctor also enumerates installed adapters.
Verify (in Codex)
/mcp
/skills
codex plugin listExpected: /mcp shows the browser-skill MCP server and codex plugin list shows browser-automation-skill@browser-automation-skill as installed and enabled. In /skills, Codex may list the bundled skill as browser-automation-skill or with its plugin-qualified name, browser-automation-skill:browser-automation-skill; it will not necessarily appear as a standalone directory under ~/.codex/skills.
Quickstart
# Register your first site
bash scripts/browser-add-site.sh --name myapp --url 'https://app.example.com'
bash scripts/browser-use.sh --set myapp
# Open + snapshot (uses chrome-devtools-mcp by default)
bash scripts/browser-open.sh --url 'https://app.example.com'
bash scripts/browser-snapshot.sh
# → emits aria_yaml + eN refs you can pass to click/fill/hover/etc.
# Click a ref
bash scripts/browser-click.sh --ref e3
# Or click by CSS (cacheable; required for browser-do cache dispatch)
bash scripts/browser-click.sh --selector 'button.delete'
# Phase 11 cache: record a learned selector once, dispatch zero-LLM-token thereafter
bash scripts/browser-do.sh record \
--site myapp --intent "click delete" \
--selector "button.delete" \
--url 'https://app.example.com/devices/123'
bash scripts/browser-do.sh \
--site myapp --verb click \
--intent "click delete" \
--pattern '/devices/:id'
# → cache hit; dispatches click; bumps success_count
# Phase 12 telemetry: every adapter call above emits one stats event automatically.
# Review the audit:
bash scripts/browser-stats.sh rebuild
bash scripts/browser-stats.sh report --days 7 --pareto
# Assert a post-condition so the audit can flag oblivious_success:
BROWSER_STATS_EXPECT_TYPE=url \
BROWSER_STATS_EXPECT_MATCH=include \
BROWSER_STATS_EXPECT_VALUE='/devices/123' \
bash scripts/browser-open.sh --url 'https://app.example.com/devices/123'Output contract
Every verb prints zero or more streaming JSON lines, then ends with a single-line JSON summary. Parse with jq; route on .status (ok, partial, error, empty, aborted).
$ bash scripts/browser-doctor.sh | tail -1 | jq .
{"verb":"doctor","tool":"none","why":"health-check","status":"ok","problems":0,"adapters_ok":4,"duration_ms":42}Layout
install.sh # preflight + state dir + symlink + (opt) hooks
uninstall.sh # remove symlink (state preserved)
.agents/plugins/ # repo marketplace entry for Codex plugin install
.claude-plugin/ # legacy-compatible marketplace entry Codex can import
plugins/ # Codex plugin wrapper (manifest, skill, MCP entry)
SKILL.md # Claude Code skill manifest (verb table; updated at every phase ship)
SECURITY.md # threat model + disclosure
.gitignore # blocks credential / session / capture / memory patterns
.githooks/pre-commit # credential-leak blocker
scripts/ # 42 verbs + browser-stats + 7 lib/ + 4 lib/tool/ adapters + lib/node/ driver helpers + lib/fingerprint-rescue.js + lib/migrators/{memory,recent_urls,stats}
tests/ # 1002 bats (25 new across Phases 12 + 13); runs in <60s
references/ # routing-heuristics + recipes (incl. fingerprint-rescue.md) + browser-stats-cheatsheet + stats-schema.json + stats-prices.json
docs/superpowers/ # design specs + per-phase plan-docs + HANDOFF.mdUninstall
./uninstall.shRemoves the ~/.claude/skills/browser-automation-skill symlink. State at ~/.browser-skill/ is preserved by default.
Roadmap
See docs/superpowers/specs/2026-04-27-browser-automation-skill-design.md for the design and docs/superpowers/plans/ for executable plans. Current "what's next" lives in docs/superpowers/HANDOFF.md (refreshed after every shipped PR).
v1.2 work ✅ COMPLETE. Remaining hardening (all opt-in, none blocking): Phase 11 v2 backlog A2-A6 (slug heuristic / --auto-record / pattern-equivalence canonicalization / self_heal_history[] audit trail / active observation recent_urls.jsonl); daemon e2e for playwright-lib selector path; press cache-scope decision codification; Phase 12 backlog (TOON output mode for tabular verbs, plugin-wrapper distribution shape, wire remaining 25 verbs to stats_run_adapter_emit); Phase 13 backlog (strong-fingerprint mode that captures dimensions at browser-do record time instead of parsing them out of the cached selector string; LLM-judge upgrade for the semantic post-condition matcher).
