@mylesiyabor/betterbrowse
v0.7.0
Published
Zero-dependency browser automation via Chrome DevTools Protocol with ARIA accessibility snapshots — 10-100x cheaper than vision-based approaches
Maintainers
Readme
betterbrowse
Zero-dependency browser automation via Chrome DevTools Protocol with ARIA accessibility snapshots — 10-100x cheaper than vision-based approaches.
Why?
Most browser automation agents use screenshots + vision models. That's expensive and slow. betterbrowse uses ARIA accessibility snapshots instead — a text representation of the page that any LLM can understand. This means:
- 10-100x cheaper — text tokens vs image tokens
- Works with any text model — no vision model required
- Faster — no image encoding/decoding overhead
- More reliable — structured data vs pixel interpretation
- Video recording — record browser sessions as MP4 (via CDP screencast + ffmpeg)
Install
Project (library):
npm install @mylesiyabor/betterbrowseGlobal (CLI — easy for agents):
npm install -g @mylesiyabor/betterbrowseThen use the CLI from any terminal or agent (see below).
Requires Node.js >= 20.10.0 and Chrome/Chromium installed locally.
CLI (easy for agents)
The simplest way for agents to use betterbrowse is the CLI. Install globally, then:
Snapshot only (no API key) — get the ARIA snapshot of a page on stdout:
betterbrowse https://example.comAgent mode (uses OpenAI; set OPENAI_API_KEY) — complete a task and print the result to stdout:
betterbrowse https://news.ycombinator.com "What is the top story title?"
betterbrowse https://example.com "Click the first link" --no-headless| Option | Description |
|--------|-------------|
| betterbrowse <url> | Print ARIA snapshot of the page |
| betterbrowse <url> "<task>" | Run browser agent; result to stdout |
| betterbrowse search "<query>" | Search the web (multi-provider, free) |
| --model <name> | OpenAI model (default: gpt-4o-mini) |
| --no-headless | Show browser window |
| --record | Record the session as video (MP4 if ffmpeg installed) |
| --record-dir <dir> | Directory for recording output (default: cwd or temp) |
| --json | Output search results as JSON |
| --deep | Visit top results and extract page content |
| --max <n> | Max search results (default: 5) |
| -v, --version | Print version |
| -h, --help | Show help |
Agents can capture stdout for the snapshot or the task result. No extra dependencies — agent mode calls the OpenAI API with fetch.
Video recording: Use --record (and optionally --record-dir ./out). The browser session is captured via CDP screencast; if ffmpeg is installed, frames are stitched into recording.mp4. The output path is printed to stderr so stdout stays clean for the result.
Quick Start (library)
Browser Class (Tool Harness)
import { Browser } from '@mylesiyabor/betterbrowse';
const browser = new Browser({ headless: true });
await browser.launch();
await browser.navigate('https://example.com');
// Get ARIA snapshot — structured text representation of the page
const snapshot = await browser.getSnapshot();
console.log(snapshot);
// - heading "Example Domain" [ref=e1]
// - text "This domain is for use in illustrative examples..."
// - link "More information..." [ref=e2]
// Interact using refs from the snapshot
await browser.clickRef('e2');
// Take a screenshot
const png = await browser.screenshot(); // base64
await browser.close();Agent (LLM-Driven Loop)
import { browseWeb } from '@mylesiyabor/betterbrowse';
const result = await browseWeb('https://news.ycombinator.com', 'Find the top story title', {
chat: async (messages, { tools, maxTokens }) => {
// Wire up your LLM here — OpenAI, Anthropic, Google, etc.
const response = await yourLLM.chat(messages, { tools, maxTokens });
return {
content: response.text,
toolCalls: response.toolCalls, // [{ name, arguments, id }]
usage: { input: response.inputTokens, output: response.outputTokens },
};
},
});
console.log(result.result); // "The top story is: ..."
console.log(result.usage); // { inputTokens, outputTokens, modelCalls }
console.log(result.steps); // [{ step, action, ref, text, result }, ...]API
Browser
new Browser({ headless?: boolean, useProfile?: boolean, port?: number })Extends EventEmitter. Events: launch, navigate, action, snapshot, close, error.
| Method | Description |
|---|---|
| launch() | Start Chrome and connect via CDP |
| navigate(url) | Navigate to a URL |
| getSnapshot() | Get optimized ARIA snapshot |
| getRawSnapshot() | Get raw snapshot + refMap |
| clickRef(ref) | Click element by ref (e.g. "e5") |
| fillRef(ref, text) | Type into input by ref |
| hover(ref) | Mouse hover by ref |
| selectOption(ref, value) | Select dropdown option by ref |
| waitForSelector(selector, timeout?) | Wait for CSS selector |
| screenshot() | Capture PNG (base64) |
| extractText() | Get all visible text |
| evaluate(expr) | Run JS in page |
| close() | Close browser |
browseWeb(url, task, opts)
LLM-driven browser agent. Returns { result, usage, steps, recording }.
Required option: chat — async function matching:
(messages, { tools, maxTokens }) => Promise<{ content, toolCalls?, usage? }>Optional: record: true — record the session; recordDir: string — output directory. When recording, the returned object includes recording: { video, frameDir, frameCount, frames } (MP4 path in video if ffmpeg is installed).
Snapshot Utilities
import { optimizeAll, computeDiff, analyzeWaste } from '@mylesiyabor/betterbrowse';
// Optimize a raw ARIA snapshot
const optimized = optimizeAll(rawSnapshot, { maxItems: 10 });
// Compute diff between two snapshots
const diff = computeDiff(prevSnapshot, currSnapshot, prevUrl, currUrl);
// Analyze snapshot waste
const report = analyzeWaste(rawSnapshot);How ARIA Snapshots Work
Instead of screenshots, we fetch the browser's accessibility tree via CDP and convert it to a compact text format:
- heading "Search Results" [ref=e1]
- textbox "Search query" [ref=e2]
- button "Search" [ref=e3]
- list
- listitem
- link "First Result" [ref=e4]
- listitem
- link "Second Result" [ref=e5]Interactive elements get [ref=eXX] tags. The agent uses these refs to click, fill, hover, and select — no pixel coordinates needed.
The snapshot optimizer pipeline strips chrome (headers/footers), deduplicates links, compresses long names, and truncates lists — reducing token count by 60-90%.
Video recording
You can record browser sessions as video (CLI or library).
- CLI:
betterbrowse <url> "<task>" --recordorbetterbrowse <url> --record. Use--record-dir <dir>to choose where the file is saved. The session is captured via Chrome DevTools screencast; if ffmpeg is on your PATH, frames are stitched intorecording.mp4in that directory. The path is printed to stderr. - Library: Pass
record: trueand optionallyrecordDir: './recordings'tobrowseWeb(). The return value includesrecording: { video, frameDir, frameCount, frames }(orrecording: nullif not recording). Frames are always saved;videois set only when ffmpeg is available.
Use in agents (global install)
Install betterbrowse globally so any agent (Cursor, MCP, scripts) can run it as a CLI:
npm install -g @mylesiyabor/betterbrowse- CLI (recommended): Run
betterbrowse <url>orbetterbrowse <url> "<task>". Result/snapshot goes to stdout — easy for agents to capture. For task mode setOPENAI_API_KEY. - As a library: In your agent code,
import { Browser, browseWeb } from '@mylesiyabor/betterbrowse'and call the API (e.g. with your ownchatfunction). - Project-local: Run
npm install @mylesiyabor/betterbrowsein your project and use the CLI fromnpx betterbrowseor import the module.
License
MIT
