browxai

v0.9.0

Published

11 days ago

MCP-native, model-agnostic, agentic-first, multi-engine browser-control server.

Downloads

465

0High
0Medium
0Low

kalebtec-studio

mcp model-context-protocol browser-automation playwright cdp ai-agent agentic browser-control

An MCP-native, model-agnostic, agentic-first browser-control server for AI agents.

browxai gives an AI agent a curated, token-efficient browser surface over the Model Context Protocol — a tool surface designed for agents rather than for human developers, and headless/CI-capable. It is engine-agnostic: the same tool surface drives Chromium, Firefox, and WebKit, real Chrome-on-Android, and real Safari, each over the automation protocol that fits it (CDP, WebDriver BiDi, safaridriver) behind one capability-port seam, so a tool never knows which engine it is running on.

It is deliberately not a wrapper over @playwright/mcp: browxai owns its own multi-engine transport so it can own the full session lifecycle — managed profiles, attach-to-an-existing browser (BYOB), authenticated sessions, headed and headless — and shape an agent-first surface around it.

Model-agnostic — any MCP client (Claude, Codex, …), not locked to one model.
Engine-agnostic — Chromium / Firefox / WebKit / Android Chrome / Safari behind one capability-port architecture; pick with --engine / BROWX_ENGINE. The default is Chromium, so nothing changes for existing setups.
Token-efficient — snapshot() is a compact accessibility tree + DOM-walk, not a DOM dump; results are scoped/paginated/budgeted.
Safe by default — capability-gated tools, an origin allow/blocklist, confirmation hooks, a hard anti-wedge deadline on every call. Dangerous surface (arbitrary JS, full response bodies, OS clipboard, network mocking) is off by default.

Install

npm install -g browxai
npx playwright-core install chromium    # one-time, ~150 MB

Wire it into an MCP client (stdio transport) — e.g. in an .mcp.json:

{
  "mcpServers": {
    "browxai": { "command": "browxai" },
  },
}

SDK (programmatic surface)

For consumers that author a single TypeScript script and run it autonomously, browxai also ships a typed SDK. Same tool registry, same capability gates, same egress hygiene — different transport.

import { createBrowxai } from "browxai";

const browxai = await createBrowxai(); // in-process, single-script
await browxai.navigate({ url: "https://example.com" });
const { data } = await browxai.extract({
  schema: {
    /* … */
  },
});
await browxai.close();

Three transports:

In-process (default) — single Node process; the SDK drives the server in-process. close() shuts the embedded server.
Stdio child (transport: "stdio-child") — spawns the browxai bin as a subprocess and speaks MCP-over-stdio. close() ends the child.
Socket-attached (endpoint: "unix:///tmp/foo.sock") — connects to a long-running browxai serve --socket /tmp/foo.sock process. Multiple clients can attach to ONE server (e.g. a parent agent plus a child script sharing one Chromium). close() ends only the local connection.

Capability gates apply identically to the MCP path: posture-broadening tools (eval_js, network_body, register_secret, upload_file, …) are off by default and only appear on the client when their capability is named in createBrowxai({ capabilities }). Calling a non-exposed tool — even via client.callTool("eval_js", …) — fails with a BROWXAI_SDK_NOT_EXPOSED error before any wire dispatch.

Harness setup

Ready-to-use setup for the common agent harnesses — MCP-server registration plus a portable "driving browxai well" Agent Skill — lives in harness/: Claude Code, Codex, Pi.

The surface

snapshot — compact accessibility tree + DOM-walk pass; every node gets a stable [ref=eN].
find — natural-language query → ranked candidate locators with selectorHint, stability, visible-rect bbox, and an actionable verdict.
action tools (click / fill / navigate / select / wait_for / …) — each returns a structured ActionResult: what navigated, what structure changed, console/network slice, a post-action element probe.
read tools — text_search, inspect, console_read, network_read, ws_read, screenshot.
sessions — isolated per-session contexts (own cookie jar / refs); persistent, incognito, or attached (BYOB) modes; MCP-driven config.
capabilities — read,navigation,action,human on by default; eval, network-body, clipboard, file-io, and byob-attach are explicit opt-ins.

Full per-tool reference, the security model, and the stability policy are in the documentation site.

Stability

browxai is v0.7.0 and follows semver. The public tool surface (tool names, documented input/output shapes, the ActionResult shape, the default capability set) is frozen; anything behind an off-by-default capability is explicitly experimental and not covered by the stability guarantee. See the Stability and semver policy.

Develop

corepack enable && pnpm install
pnpm install-browser     # Chromium for playwright-core
pnpm typecheck && pnpm test
pnpm build               # builds dist/ — the `browxai` bin is dist/cli.js
pnpm test:keystone       # headless end-to-end keystone (real Chromium)
pnpm docs:dev            # the documentation site, locally

Project docs

CONTRIBUTING.md — contributor workflow + DCO.
AGENTS.md — operating rules for AI-harness contributors.
SECURITY.md — vulnerability reporting + disclosure policy.
MAINTAINERS.md — maintainer roster + responsibilities.
RELEASING.md — release ritual + OIDC publish flow.
CODE_OF_CONDUCT.md — Contributor Covenant adoption.

License

Code is MIT — see LICENSE.

The browxai name and logo are trademarks of Kalebtec and are not covered by the MIT License. The brand assets under brand/ are all-rights-reserved (see brand/LICENSE). See TRADEMARKS.md for the full brand policy and what nominative use is allowed.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme