@agent-browser-io/browser
v0.3.0
Published
Token efficient agent browser
Readme
@agent-browser-io/browser
Token efficient agent browser.
This package lets AI agents control a real browser ( navigate, click, type, interact via ASCII wireframes ) in a token-efficient way. Use it from MCP clients (e.g. Cursor, Claude Desktop) or from code with the Vercel AI SDK.
Ways to use:
- MCP — Add the included MCP server to Cursor or another MCP client so the AI can drive a browser (see How to add MCP).
- Vercel AI SDK — Use
createBrowserTools(browser)withgenerateText({ tools, ... })in your app (see Vercel AI SDK). - CLI — Run the interactive CLI for manual testing (
npx @agent-browser-io/browseroragent-browser-cliafter install).
Install
npm install @agent-browser-io/browserHow to add MCP
MCP (Model Context Protocol) lets AI assistants in Cursor or Claude Desktop use browser tools over stdio. Your AI will be able to launch a browser, open URLs, get wireframes, click, type, scroll, screenshot, and more.
Run the MCP server (for testing):
npx @agent-browser-io/browser mcpAdd to Cursor
- Open Cursor settings → MCP (or edit your MCP config file, e.g.
~/.cursor/mcp.jsonor project.cursor/mcp.json). - Add a server entry:
{
"mcpServers": {
"agent-browser": {
"command": "npx",
"args": ["-y", "@agent-browser-io/browser", "mcp"]
}
}
}- Restart Cursor or reload MCP so it picks up the new server. The agent-browser tools will appear for the AI to use.
Other MCP clients (e.g. Claude Desktop)
Use the same stdio command in your client's config:
- Command:
npx(or full path tonode) - Args:
["-y", "@agent-browser-io/browser", "mcp"](or["path/to/bin/index.cjs", "mcp"])
The server speaks JSON-RPC over stdin/stdout; no extra env vars are required.
Vercel AI SDK
You can use the same browser automation as tools with the Vercel AI SDK and generateText. The package exposes createBrowserTools(browser), which returns an object of tools you can pass to generateText({ tools, ... }). The ai package is included as a dependency.
Tools: launch, navigate, getWireframe, click, type, fill, dblclick, hover, press, select, check, uncheck, scroll, screenshot, close. Same toolset as the MCP server, so behavior is consistent.
Important: Have the model call the launch tool first before other actions (navigate, getWireframe, click, etc.).
Example:
import { createBrowserTools, AgentBrowser, DefaultBrowserBackend } from '@agent-browser-io/browser';
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
const browser = new AgentBrowser(new DefaultBrowserBackend());
const tools = createBrowserTools(browser);
const { text } = await generateText({
model: openai('gpt-4o'),
tools,
prompt: 'Go to hackernews visit on top 3 news, and summarize their content.',
});
// Model will call launch, then navigate, then getWireframe, etc.Development
Requires Node 18+. Browser automation uses Playwright (included as a dev dependency).
npm install
npm run buildBuilds to dist/cjs (CommonJS) and dist/esm (ESM).
