browser-agentkit
v0.0.5
Published
A powerful SDK for Chrome extensions to interact with browsers and web pages
Maintainers
Readme
Browser AgentKit SDK
A powerful SDK for Chrome extensions to interact with browsers and web pages. Simplifies browser automation, web scraping, and page interaction tasks.
Features
- 🎯 CDP-Powered — Direct Chrome DevTools Protocol integration for fine-grained browser control
- 🔒 Stable & Reliable — CDP provides deterministic, low-level operations without DOM injection
- 🚀 Easy to Use — Simple, intuitive API design
- 🌐 Comprehensive — Browser control, page interactions, content extraction
- 🎨 Flexible — Support for keyboard, mouse, and complex interactions
- 📦 Lightweight — Minimal dependencies, no external browser binaries required
Why Browser AgentKit?
Built for Chrome Extensions
Unlike traditional browser automation tools (Puppeteer, Playwright, Selenium), Browser AgentKit is specifically designed for Chrome extension development:
| Feature | Browser AgentKit | Puppeteer/Playwright | |---------|------------------|---------------------| | Runtime | Inside Chrome extension | External Node.js process | | Browser Instance | Uses user's existing browser | Launches separate browser | | User Session | Access to logged-in sessions & cookies | Isolated browser context | | Installation | npm package only | Requires browser binary download | | Use Case | Extension-based automation | Testing & scraping servers |
CDP: The Foundation of Reliability
Browser AgentKit leverages the Chrome DevTools Protocol (CDP) directly, the same protocol that powers Chrome DevTools, Puppeteer, and Playwright internally:
- Fine-grained Control — Direct access to browser internals: DOM, network, input, runtime
- No Script Injection — Input simulation happens at the browser level, not via JavaScript injection
- Deterministic Operations — CDP commands are executed synchronously by the browser engine
- Anti-Detection Friendly — Native browser events are indistinguishable from real user actions
- Full Debugging Capabilities — Access to the same powerful APIs used by Chrome DevTools
When to Use Browser AgentKit
✅ Use Browser AgentKit when you need:
- Build Chrome extensions with automation capabilities
- Access user's authenticated sessions (no re-login required)
- Operate within user's existing browser environment
- Lightweight SDK without bundled browser binaries
❌ Use Puppeteer/Playwright instead when you need:
- Server-side web scraping or testing
- Headless browser automation in CI/CD pipelines
- Cross-browser testing (Firefox, Safari, etc.)
- Isolated browser contexts for parallel execution
Architecture
Browser AgentKit is organized around a single unified CDP channel:
┌─────────────────┐
│ Browser │ tab lifecycle (chrome.tabs API)
└─────────────────┘
│
┌─────────────────┐
│ Page │ page-level orchestrator
└────────┬────────┘
│ owns
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌─────────────────┐
│ Keyboard │ │ Mouse │ │ ContentExtractor│
└────┬─────┘ └────┬─────┘ └────────┬────────┘
│ │ │
└────────────┼────────────────┘
▼
┌───────────────┐
│ CDPSession │ the ONLY channel to chrome.debugger
└───────┬───────┘
▼
┌───────────────┐
│ chrome.* APIs │
└───────────────┘CDPSession is the single, central abstraction for talking to the browser. Every module — Keyboard, Mouse, Element, ContentExtractor, Navigation, Screenshot — sends commands, subscribes to events, and runs scripts through it. There is exactly one path from this library to chrome.debugger.*, which makes the surface predictable and testable.
Installation
npm install browser-agentkitQuick Start
import { browser, Page } from 'browser-agentkit';
// Open a new tab
const tab = await browser.openTab('https://example.com');
// Create a page instance (Page is the main entry point for page-level operations)
const page = new Page(tab.id);
await page.initialize();
// Interact with the page
await page.fill('#search', 'browser automation');
await page.click('#submit');
// Extract content
const extractor = await page.getContentExtractor();
const html = await extractor.getHTML();
const metadata = await extractor.getMetadata();
console.log('Page title:', metadata.title);
// Clean up
await page.close();
await browser.closeTab(tab.id);Core Modules
Browser
Manage browser tabs and windows. Operates purely at the chrome.tabs / chrome.windows / chrome.history level — no debugger attachment.
import { browser } from 'browser-agentkit';
// Open and manage tabs
const tab = await browser.openTab('https://example.com');
const currentTab = await browser.getCurrentTab();
const allTabs = await browser.queryTabs({ currentWindow: true });
// Tab lifecycle (fire-and-forget — for navigation with network-idle waiting, use Page.navigate)
await browser.navigate(tab.id, 'https://google.com');
await browser.goBack(tab.id);
await browser.reload(tab.id);
// Hidden tab for background work
const hidden = await browser.createHiddenTab('https://example.com');
// Lightweight screenshot (uses chrome.tabs.captureVisibleTab — no debugger needed)
const screenshot = await browser.captureVisibleTab(tab.id);Page
The main entry point for page-level operations. On first use it lazily attaches the debugger and wires up Keyboard, Mouse, Actions, and access to ContentExtractor.
import { Page, ScrollDirection } from 'browser-agentkit';
const page = new Page(tabId);
await page.initialize(); // attaches debugger, enables CDP domains
// Click elements (CSS selector or AX-node selector `node="123"`)
await page.click('#button');
await page.click('node="42"');
// Fill forms
await page.fill('#email', '[email protected]');
await page.fill('#password', 'secret');
await page.fill('#bio', longText, /* usePaste */ true); // paste via system clipboard
// Navigate with network-idle + FCP wait
await page.navigate('https://example.com');
await page.waitForNavigation({ timeout: 30_000 });
// Wait for elements
await page.waitForSelector('#result');
// Scroll
await page.scroll(ScrollDirection.DOWN);
// Element handles
const el = page.element('#title');
const text = await el.getText();
const box = await el.getBoundingBox();
// Low-level input
const keyboard = await page.getKeyboard();
await keyboard.press('Enter');
const mouse = await page.getMouse();
await mouse.click(100, 200);
// In-page script execution
const title = await page.evaluate(() => document.title);
// Debugger-based screenshot (when you need full pixel control)
const screenshot = await page.captureVisible({ format: 'png' });
// Clean up — detaches the debugger
await page.close();Element
Represents a single element. Combines selector resolution and per-element operations. Created via page.element(selector).
const el = page.element('#submit');
await el.waitForExist({ timeout: 10_000 });
const exists = await el.exists();
const box = await el.getBoundingBox();
const text = await el.getText();
const href = await el.getAttributeValue('href');
await el.scrollIntoView();Both CSS selectors and AX-node selectors (node="123") work. The AX-node form is what ContentExtractor.getHTML() embeds on every interactive element, so you can round-trip from extracted HTML to clickable element with no extra parsing.
Content Extraction
Convert a page into a simplified, LLM-friendly HTML representation built from the accessibility tree plus cursor: pointer detection.
const extractor = await page.getContentExtractor();
// Full simplified HTML — every interactive element carries a `node="..."` attribute
const html = await extractor.getHTML();
// Viewport-only mode skips elements outside the current viewport
const visibleHtml = await extractor.getHTML({ viewportOnly: true });
// Plain text (tags stripped)
const text = await extractor.getText();
// Open Graph / description metadata extracted from the snapshot
const meta = await extractor.getMetadata();
// One-off element extraction via selector
const result = await extractor.getElementContent('#article');
// PDF detection
const isPdf = await extractor.isPDF();Input Simulation
Low-level keyboard and mouse. Normally obtained from page.getKeyboard() / page.getMouse(), but can be constructed manually against any CDPSession.
import { createKeyboard, createMouse, createCDPSession } from 'browser-agentkit';
const session = await createCDPSession(tabId);
const keyboard = await createKeyboard(session);
await keyboard.type('Hello World');
await keyboard.press('Enter');
await keyboard.press('ControlOrMeta+KeyA'); // Ctrl+A on Win/Linux, Cmd+A on Mac
const mouse = createMouse(session, keyboard);
await mouse.move(100, 200);
await mouse.click(100, 200);
await mouse.dblclick(150, 250);
await mouse.wheel(0, 400); // scrollCDP Session
The single channel through which all CDP traffic flows. You rarely use it directly — Page creates and owns one — but it is the right escape hatch when you need a CDP command this library doesn't wrap.
import { createCDPSession } from 'browser-agentkit';
const session = await createCDPSession(tabId);
// Send any CDP command
const { root } = await session.send('DOM.getDocument', { depth: -1 });
// Subscribe to CDP events (auto-filtered by tabId)
const off = session.onEvent('Network.requestWillBeSent', (method, params) => {
console.log('request:', params);
});
off(); // unsubscribe
// Execute a function in the page's main world (Runtime.evaluate)
const title = await session.evaluate(() => document.title);
// Execute a function in an isolated world (chrome.scripting.executeScript)
const result = await session.executeScript((id: string) => {
return document.getElementById(id)?.innerHTML;
}, ['main']);
// Attach to a cross-origin iframe and get a separate session for it
const iframeSession = await session.attachIframe(frameId);
await iframeSession.detach();
await session.detach();Event Management
Wraps chrome.tabs.* and chrome.webRequest.* events with a shared cleanup story.
import { events } from 'browser-agentkit';
events.onTabCreated((tab) => {
console.log('New tab created:', tab.url);
});
events.onTabUpdated((tabId, changeInfo, tab) => {
if (changeInfo.status === 'complete') {
console.log('Tab loaded:', tab.url);
}
});
events.onBeforeRequest((details) => {
console.log('Request:', details.url);
});
events.onCompleted((details) => {
console.log('Response:', details.url);
});
// Remove all listeners registered through this manager
events.cleanup();Advanced Examples
Form Automation
import { browser, Page } from 'browser-agentkit';
async function fillLoginForm() {
const tab = await browser.openTab('https://example.com/login');
const page = new Page(tab.id);
await page.initialize();
await page.fill('#username', 'myuser');
await page.fill('#password', 'mypass');
await page.click('#login-button');
await page.waitForNavigation();
console.log('Login successful!');
await page.close();
}Web Scraping with AX Tree
The node="..." attributes that ContentExtractor embeds on interactive elements are valid selectors for page.click and page.fill. This lets an LLM read the extracted HTML and act on it without ever computing a CSS selector.
import { browser, Page } from 'browser-agentkit';
async function scrapeAndClick() {
const tab = await browser.openTab('https://example.com/data');
const page = new Page(tab.id);
await page.initialize();
const extractor = await page.getContentExtractor();
const html = await extractor.getHTML({ viewportOnly: true });
// ... pass `html` to an LLM, get back a node id ...
await page.click('node="42"');
await page.close();
await browser.closeTab(tab.id);
}Drag and Drop
Mouse drag is HTML5-aware: moving the mouse while a button is down triggers dragstart detection and the rest of the drag lifecycle is handled by CDP Input.setInterceptDrags.
import { Page } from 'browser-agentkit';
async function dragAndDrop(tabId: number) {
const page = new Page(tabId);
await page.initialize();
const mouse = await page.getMouse();
await mouse.move(100, 100);
await mouse.down();
await mouse.move(300, 300, { steps: 10 }); // smooth, triggers dragstart detection
await mouse.up(); // drops at (300, 300)
await page.close();
}Custom CDP Commands
When you need a CDP capability this library doesn't wrap (e.g. Network.setExtraHTTPHeaders), reach for CDPSession directly.
import { Page } from 'browser-agentkit';
async function withCustomHeaders(tabId: number) {
const page = new Page(tabId);
await page.initialize();
// Page owns a CDPSession internally; create your own if you need raw access
const { createCDPSession } = await import('browser-agentkit');
const session = await createCDPSession(tabId);
await session.send('Network.setExtraHTTPHeaders', {
headers: { 'X-Custom': 'value' },
});
}API Reference
Classes
Browser
Tab and window lifecycle. No debugger attachment.
| Method | Description |
|--------|-------------|
| openTab(url, options?) | Opens a new tab |
| getTab(tabId) | Gets tab information |
| getCurrentTab(windowId?) | Gets the active tab |
| queryTabs(queryInfo) | Queries tabs |
| searchHistory(query) | Searches browser history (deduplicated by URL) |
| closeTab(tabId) | Closes a tab |
| navigate(tabId, url) | Navigates the tab (fire-and-forget; use Page.navigate if you need network-idle waiting) |
| goBack(tabId) | Goes back in history |
| goForward(tabId) | Goes forward in history |
| reload(tabId, bypassCache?) | Reloads a tab |
| captureVisibleTab(tabId, options?) | Captures visible area via chrome.tabs.captureVisibleTab |
| createHiddenTab(url) | Creates a minimized hidden tab for background work |
Page
Page-level orchestrator. Owns the CDPSession for a single tab.
| Method | Description |
|--------|-------------|
| initialize() | Attaches the debugger, enables CDP domains, builds sub-components. Called automatically by every other method, but you can call it explicitly to surface attach errors early. |
| close() | Detaches the debugger and clears sub-components |
| click(selector) | Clicks an element. selector is a CSS selector or node="123" AX-node selector |
| fill(selector, text, usePaste?) | Fills an input. With usePaste=true, writes through the system clipboard (useful for long/formatted text) |
| scroll(direction, amount?) | Scrolls the page (defaults to ~one viewport height) |
| navigate(url) | Navigates and waits for network idle + first contentful paint |
| waitForNavigation(options?) | Waits for an in-flight navigation to settle |
| waitForSelector(selector, options?) | Waits for an element to appear |
| element(selector) | Returns an Element handle (requires the page to be initialized) |
| getKeyboard() | Returns the page's Keyboard |
| getMouse() | Returns the page's Mouse |
| getContentExtractor() | Returns a ContentExtractor bound to this page's session |
| evaluate(fn, ...args) | Executes a function in the page's main world |
| captureVisible(options?) | Captures the visible area via CDP Page.captureScreenshot |
Actions
The action implementation that Page delegates to. Exported for advanced cases where you want to drive actions against a CDPSession you manage yourself.
| Method | Description |
|--------|-------------|
| click(selector) | Clicks an element |
| fill(selector, value, usePaste?) | Fills an input |
| search(selector, value) | fill + press Enter |
| scroll(direction, amount?) | Scrolls the page |
| waitForSelector(selector, options?) | Waits for an element to appear |
Constructor: new Actions(session, keyboard, mouse).
Element
A single element handle. Owns selector resolution and per-element operations.
| Method | Description |
|--------|-------------|
| resolve() | Resolves the selector to { nodeId, backendNodeId, object } |
| exists() | Whether the element resolves successfully |
| waitForExist(options?) | Polls until the element exists or times out |
| getBoundingBox() | Returns the element's CSS-pixel bounding box |
| getText() | Returns the element's text content |
| getAttributeValue(name) | Returns an HTML attribute value, or null |
| scrollIntoView() | Scrolls the element into the viewport if needed |
Constructor: new Element(session, selector).
ContentExtractor
Converts a page into a simplified HTML representation built from the accessibility tree.
| Method | Description |
|--------|-------------|
| getPageSnapshot(options?) | Full snapshot: HTML, metadata, PDF flag. options.viewportOnly keeps only viewport-visible nodes |
| getHTML(options?) | Simplified HTML. Every interactive element carries node="..." |
| getText() | Plain text (tags stripped, whitespace collapsed) |
| getMetadata() | Page metadata (title, description, Open Graph) |
| getElementContent(selector) | innerHTML + textContent for a single element |
| isPDF() | Whether the current page is a PDF |
Constructor: new ContentExtractor(session).
Keyboard
Keyboard simulation. Tracks pressed modifiers across calls.
| Method | Description |
|--------|-------------|
| down(key) | Presses a key down |
| up(key) | Releases a key |
| press(key, options?) | Presses and releases a key. Supports chords like 'ControlOrMeta+KeyA' |
| type(text, options?) | Types a string character by character |
| insertText(text) | Inserts text directly (no per-char key events) |
Mouse
Mouse simulation. Tracks cursor position and button state across calls. Drag-and-drop is detected automatically when the left button is held during a move.
| Method | Description |
|--------|-------------|
| move(x, y, options?) | Moves the cursor. options.steps smooths movement |
| down(options?) | Presses a button (default: 'left') |
| up(options?) | Releases a button |
| click(x, y, options?) | Move + down + up at (x, y) |
| dblclick(x, y, options?) | Two clicks in quick succession |
| wheel(deltaX, deltaY) | Scroll wheel at the current cursor position |
CDPSession
The unified CDP channel. Owns the debugger attachment and provides typed access to every form of browser communication this SDK uses.
| Method | Description |
|--------|-------------|
| attach() | Attaches the debugger (idempotent) |
| enableDomains() | Enables DOM, Page, Network, DOMSnapshot, Accessibility |
| send(method, params?) | Sends a CDP command |
| onEvent(method, handler) | Subscribes to a CDP event (auto-filtered by tab). Returns an unsubscribe function |
| offEvent(handler) | Removes a previously-registered event handler |
| evaluate(fn, args?) | Runs a function in the page's main world via Runtime.evaluate |
| executeScript(fn, args?, options?) | Runs a function via chrome.scripting.executeScript. options.world selects 'MAIN' or 'ISOLATED' |
| executeScriptAllFrames(fn, args?, world?) | Same as above, but executes in every frame and returns each frame's result |
| attachIframe(frameId) | Attaches to a cross-origin iframe and returns a new CDPSession bound to that target |
| detach() | Removes all event listeners and detaches the debugger |
EventManager
chrome.tabs.* and chrome.webRequest.* event wrappers with shared cleanup.
| Method | Description |
|--------|-------------|
| on(eventName, callback) | Registers a custom event listener |
| off(eventName, callback) | Removes a custom event listener |
| emit(eventName, ...args) | Emits a custom event |
| onTabCreated(callback) | Listens for tab created events |
| onTabRemoved(callback) | Listens for tab removed events |
| onTabUpdated(callback) | Listens for tab updated events |
| onTabActivated(callback) | Listens for tab activated events |
| onBeforeRequest(callback, filter?) | Listens for network requests |
| onCompleted(callback, filter?) | Listens for completed requests |
| cleanup() | Removes every listener registered through this manager |
Enums
ScrollDirection
enum ScrollDirection {
UP = 'UP',
DOWN = 'DOWN',
}Types
// Tab information (re-exported chrome.tabs.Tab)
type TabInfo = chrome.tabs.Tab
// Page content snapshot
interface PageContent {
html: string
markdown?: string
meta?: PageMetadata
isPdf?: boolean
resources?: unknown[]
}
// Page metadata
interface PageMetadata {
title?: string
description?: string
url?: string
image?: string
[key: string]: string | undefined
}
// Screenshot options
interface ScreenshotOptions {
format?: 'png' | 'jpeg'
quality?: number // 0-100, jpeg only
}
// Element bounding box
interface BoundingBox {
x: number
y: number
width: number
height: number
}
// Viewport information
interface ViewportInfo {
width: number
height: number
x: number
y: number
}
// Selector resolution result (returned by Element.resolve())
interface ResolvedNode {
nodeId: number
backendNodeId: number
object: RemoteObject // CDP Runtime.RemoteObject
}Factory Functions
| Function | Description |
|----------|-------------|
| createCDPSession(tabId) | Creates a CDPSession, attaches the debugger, and enables CDP domains |
| createKeyboard(session) | Creates a Keyboard with platform detection |
| createMouse(session, keyboard) | Creates a Mouse |
Default Instances
import { browser, events } from 'browser-agentkit';
// Pre-initialized Browser instance
browser.openTab('https://example.com');
// Pre-initialized EventManager instance
events.onTabCreated((tab) => console.log(tab));Try the Demo Extension
A full-featured demo Chrome extension lives in examples/crx-demo/. It exercises every SDK module via an interactive side panel, targeting a deterministic playground page bundled with the extension.
# Build the SDK and demo extension
pnpm run example:build
# Or develop with HMR
pnpm run example:devThen load examples/crx-demo/extension/ as an unpacked extension in Chrome (chrome://extensions → Developer mode → Load unpacked). Click the extension icon to open the side panel, then click "Open Playground" to get a stable target page.
The demo covers: Browser tab lifecycle, Page navigation/click/fill/scroll/evaluate, Element resolution/bounding-box/attributes, Content extraction with AX-node round-trip, Keyboard & Mouse simulation including drag-and-drop, EventManager network listeners, raw CDPSession escape hatch, and utility helpers (retry, waitForCondition, withTimeout, getPlatform).
Requirements
- Chrome / Chromium browser
- Chrome extension with appropriate permissions:
tabsdebuggerscriptingactiveTabhistory
License
MIT
