browserclaw-agent
v0.7.0
Published
Browser automation agent loop powered by browserclaw
Readme
browserclaw-agent
The AI agent for browserclaw.
Layered, not bundled
Three separate layers, not one monolith:
- ⚡ LLM — the electricity. Your choice: Claude, GPT, Gemini, local. No lock-in.
- 😎 The agent — the driver. Reasoning, obstacle recovery, learned skills. That's this project.
- 🏎️ BrowserClaw — the vehicle. Snapshots, element refs, browser control. Standalone npm library.
browser-use welds these into one package. We keep them as separate, swappable layers — drop the library into your own agent, pair this agent with any LLM, or run the whole stack as-is.
vs browser-use
Different lineage, different design. OpenClaw took the Playwright MCP approach — Microsoft's snapshot-and-ref pattern — implemented it locally, and refined it into browserclaw, a standalone npm library. This agent rides on that library. browser-use rolled its own Python stack as one bundled package.
| | browserclaw | browser-use | | :-------------------------- | :----------------: | :----------------: | | Standalone browser library | :white_check_mark: | :x: | | Works with any agent | :white_check_mark: | :heavy_minus_sign: | | Auto-learns skills per site | :white_check_mark: | :x: | | Solves anti-bot in OSS | :white_check_mark: | :heavy_minus_sign: | | TypeScript / Node | :white_check_mark: | :x: |
:white_check_mark: = Yes :heavy_minus_sign: = Partial :x: = No
What the agent does
The agent reads an accessibility snapshot of the page, decides what to do next, and executes the action. Up to 100 steps per run. It maintains a memory scratchpad across steps and evaluates whether each action succeeded before deciding the next move.
snapshot → LLM → action → repeatBuilt-in skills
When the agent hits common obstacles, built-in skills take over automatically — no prompting needed:
- Anti-bot bypass — Detects and solves "hold to verify" overlays and press-and-hold challenges via CDP
- Cloudflare Turnstile — Solves "Verify you are human" checkboxes by locating and clicking the Turnstile iframe via CDP
- Popup dismissal — Closes cookie banners, consent dialogs, and modals using multi-strategy detection
- Loop detection — Detects when the agent is stuck repeating the same action and nudges it toward a different approach
- Tab manager — Detects and switches to new tabs opened during automation
Skill catalog
Every successful run generates a skill file — steps and tips for that domain, stored in MinIO. On the next run against the same domain, the agent loads the skill as a playbook instead of exploring from scratch. If the new run completes in fewer steps, the skill is replaced. One domain, one skill, always improving.
The first user to automate a domain pays the exploration cost. Every subsequent run benefits from the learned playbook — and refines it further.
Quick start
Local (dev mode)
Chrome opens on your desktop. No containers, no VNC.
Requires: Node.js 22+, Chrome installed
cd src/Services/Browser
cp .env.example .env.local
# Set LLM_PROVIDER and at least one API key (see LLM providers below)
npm install
npm run devStart a run:
curl -X POST http://localhost:5040/api/v1/sessions \
-H "Content-Type: application/json" \
-d '{"prompt": "Find apartments in NYC under $3000"}'Stream progress:
curl http://localhost:5040/api/v1/sessions/{id}/streamDocker (full stack)
Runs the frontend, browser service (headless Chrome + VNC), MinIO (skill storage), and Traefik. Same setup as browserclaw.org.
git clone https://github.com/idan-rubin/browserclaw-agent.git
cd browserclaw-agent
cp src/Services/Browser/.env.example src/Services/Browser/.env.local
# Set LLM_PROVIDER and at least one API key
# Set BROWSER_INTERNAL_TOKEN to a random secret (used for frontend → browser service auth):
echo "BROWSER_INTERNAL_TOKEN=$(openssl rand -hex 32)" >> .env
docker compose upOpen localhost.
LLM providers
Add at least one API key to .env.local and set LLM_PROVIDER:
| Provider | Env var | LLM_PROVIDER | Free tier |
| ----------------------------- | -------------------- | -------------- | ----------------- |
| Groq | GROQ_API_KEY | groq | Yes |
| Google Gemini | GEMINI_API_KEY | gemini | Yes |
| OpenAI | OPENAI_API_KEY | openai | No |
| OpenAI (ChatGPT subscription) | OPENAI_OAUTH_TOKEN | openai-oauth | No (subscription) |
| Anthropic | ANTHROPIC_API_KEY | anthropic | No |
Set LLM_MODEL to override the default model for your provider.
Other features
- BYOK — Users can pass their own LLM API key per session for multi-tenant deployments
- User interaction — The agent can pause mid-run to ask for information (MFA codes, credentials)
- SSE streaming — Real-time step-by-step progress events
- Content moderation — Rejects harmful prompts before execution
- SSRF protection — Private network access blocked by default
Bring your own agent
Don't want this agent? Use the browserclaw library directly with any LLM.
npm install browserclawRequires Chrome, Brave, Edge, or Chromium installed on your machine.
import { BrowserClaw } from "browserclaw";
const browser = await BrowserClaw.launch({ headless: false });
const page = await browser.open("https://example.com");
const { snapshot, refs } = await page.snapshot();
// snapshot: text tree of the page
// refs: { "e1": { role: "link", name: "More info" }, ... }
await page.click("e1");
await page.type("e3", "hello");
await browser.stop();snapshot() returns a text representation of the page with numbered refs. Pass it to any LLM, get back a ref, call the action. Here's a minimal agent loop:
import { BrowserClaw } from "browserclaw";
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const browser = await BrowserClaw.launch({ headless: false });
const page = await browser.open("https://news.ycombinator.com");
const history = [];
for (let step = 0; step < 20; step++) {
const { snapshot } = await page.snapshot();
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
system:
"You control a browser. Given a page snapshot, return JSON: { action, ref?, text?, url?, reasoning }. Actions: click, type, navigate, done.",
messages: [
...history,
{
role: "user",
content: `Task: Find the top 3 AI posts.\n\nPage:\n${snapshot}`,
},
],
});
const action = JSON.parse(response.content[0].text);
history.push(
{ role: "user", content: `Page:\n${snapshot}` },
{ role: "assistant", content: JSON.stringify(action) },
);
if (action.action === "done") break;
switch (action.action) {
case "click":
await page.click(action.ref);
break;
case "type":
await page.type(action.ref, action.text);
break;
case "navigate":
await page.goto(action.url);
break;
}
}
await browser.stop();Swap Anthropic for OpenAI, Groq, Gemini, or a local model. See the full browserclaw API docs for fill(), select(), drag(), screenshot(), pdf(), waitFor(), and more.
Why browserclaw?
- Built for TypeScript — native to the JS ecosystem. First-class Node.js support, not a Python port.
- Accessibility tree, not DOM — snapshots use the browser's accessibility tree — the same structure screen readers use. Semantic roles, names, and states instead of raw tags and attributes. Cleaner, smaller, and more meaningful to an LLM.
- Layered, not bundled — the engine, the agent, and the LLM are separate, swappable pieces. See the comparison above.
- Gets smarter with use — the skill catalog learns from every successful run. Other browser agents start from scratch each time. browserclaw-agent builds a playbook per domain and improves it on every run.
- Handles the real world — Cloudflare Turnstile, press-and-hold anti-bot overlays, cookie banners, tab management — handled automatically via CDP. These are the things that make browser agents fail in production.
Read more
- The Intelligence Gap — why AI browser agents keep failing, and what we're doing about it
Built with
- BrowserClaw — the browser automation library
- OpenClaw — the community behind it
