browser-agentkit
v0.0.2
Published
A powerful SDK for Chrome extensions to interact with browsers and web pages
Maintainers
Readme
Browser AgentKit SDK
A powerful SDK for Chrome extensions to interact with browsers and web pages. Simplifies browser automation, web scraping, and page interaction tasks.
Features
- 🎯 CDP-Powered - Direct Chrome DevTools Protocol integration for fine-grained browser control
- 🔒 Stable & Reliable - CDP provides deterministic, low-level operations without DOM injection
- 🚀 Easy to Use - Simple, intuitive API design
- 🌐 Comprehensive - Browser control, page interactions, content extraction
- 🎨 Flexible - Support for keyboard, mouse, and complex interactions
- 📦 Lightweight - Minimal dependencies, no external browser binaries required
Why Browser AgentKit?
Built for Chrome Extensions
Unlike traditional browser automation tools (Puppeteer, Playwright, Selenium), Browser AgentKit is specifically designed for Chrome extension development:
| Feature | Browser AgentKit | Puppeteer/Playwright | |---------|------------------|---------------------| | Runtime | Inside Chrome extension | External Node.js process | | Browser Instance | Uses user's existing browser | Launches separate browser | | User Session | Access to logged-in sessions & cookies | Isolated browser context | | Installation | npm package only | Requires browser binary download | | Use Case | Extension-based automation | Testing & scraping servers |
CDP: The Foundation of Reliability
Browser AgentKit leverages the Chrome DevTools Protocol (CDP) directly, the same protocol that powers Chrome DevTools, Puppeteer, and Playwright internally:
- Fine-grained Control - Direct access to browser internals: DOM, network, input, runtime
- No Script Injection - Input simulation happens at the browser level, not via JavaScript injection
- Deterministic Operations - CDP commands are executed synchronously by the browser engine
- Anti-Detection Friendly - Native browser events are indistinguishable from real user actions
- Full Debugging Capabilities - Access to the same powerful APIs used by Chrome DevTools
When to Use Browser AgentKit
✅ Use Browser AgentKit when you need:
- Build Chrome extensions with automation capabilities
- Access user's authenticated sessions (no re-login required)
- Operate within user's existing browser environment
- Lightweight SDK without bundled browser binaries
❌ Use Puppeteer/Playwright instead when you need:
- Server-side web scraping or testing
- Headless browser automation in CI/CD pipelines
- Cross-browser testing (Firefox, Safari, etc.)
- Isolated browser contexts for parallel execution
Installation
npm install browser-agentkitQuick Start
import { browser, Page } from 'browser-agentkit';
// Open a new tab
const tab = await browser.openTab('https://example.com');
// Create a page instance
const page = new Page(tab.id);
await page.initialize();
// Interact with the page
await page.fill('#search', 'browser automation');
await page.click('#submit');
// Extract content
const extractor = page.getContentExtractor();
const html = await extractor.getHTML();
const metadata = await extractor.getMetadata();
console.log('Page title:', metadata.title);
// Clean up
await page.close();
await browser.closeTab(tab.id);Core Modules
Browser Module
Manage browser tabs and windows:
import { browser } from 'browser-agentkit';
// Open and manage tabs
const tab = await browser.openTab('https://example.com');
const currentTab = await browser.getCurrentTab();
const allTabs = await browser.queryTabs({ currentWindow: true });
// Navigation
await browser.navigate(tab.id, 'https://google.com');
await browser.goBack(tab.id);
await browser.reload(tab.id);
// Screenshots
const screenshot = await browser.captureVisibleTab(tab.id);Page Module
Interact with web pages:
import { Page, ScrollDirection } from 'browser-agentkit';
const page = new Page(tabId);
await page.initialize();
// Click elements
await page.click('#button');
// Fill forms
await page.fill('#email', '[email protected]');
await page.fill('#password', 'secret');
// Scroll
await page.scroll(ScrollDirection.DOWN);
// Wait for elements
await page.waitForSelector('#result');
// Advanced interactions
const keyboard = await page.getKeyboard();
await keyboard.press('Enter');
const mouse = await page.getMouse();
await mouse.click(100, 200);Content Extraction
Extract data from web pages:
import { ContentExtractor } from 'browser-agentkit';
const extractor = new ContentExtractor(tabId);
// Get page content
const html = await extractor.getHTML();
const text = await extractor.getText();
const metadata = await extractor.getMetadata();
// Get element content
const result = await extractor.getElementContent('#article');
console.log(result.text);
// Check if PDF
const isPdf = await extractor.isPDF();Input Simulation
Simulate keyboard and mouse input:
import { createKeyboard, createMouse } from 'browser-agentkit';
const keyboard = await createKeyboard({ tabId });
// Type text
await keyboard.type('Hello World');
// Press keys
await keyboard.press('Enter');
await keyboard.press('ControlOrMeta+KeyA'); // Ctrl/Cmd + A
// Mouse operations
const mouse = createMouse({ tabId }, keyboard);
await mouse.move(100, 200);
await mouse.click(100, 200);
await mouse.dblclick(150, 250);Event Management
Listen to browser events:
import { events } from 'browser-agentkit';
// Tab events
events.onTabCreated((tab) => {
console.log('New tab created:', tab.url);
});
events.onTabUpdated((tabId, changeInfo, tab) => {
if (changeInfo.status === 'complete') {
console.log('Tab loaded:', tab.url);
}
});
// Network events
events.onBeforeRequest((details) => {
console.log('Request:', details.url);
});
events.onCompleted((details) => {
console.log('Response:', details.url);
});Advanced Examples
Form Automation
import { browser, Page } from 'browser-agentkit';
async function fillLoginForm() {
const tab = await browser.openTab('https://example.com/login');
const page = new Page(tab.id);
await page.initialize();
await page.fill('#username', 'myuser');
await page.fill('#password', 'mypass');
await page.click('#login-button');
await page.waitForNavigation();
console.log('Login successful!');
await page.close();
}Web Scraping
import { browser, Page } from 'browser-agentkit';
async function scrapeData() {
const tab = await browser.openTab('https://example.com/data');
const page = new Page(tab.id);
await page.initialize();
const extractor = page.getContentExtractor();
const metadata = await extractor.getMetadata();
const content = await extractor.getElementContent('.data-container');
const data = {
title: metadata.title,
description: metadata.description,
content: content.text
};
await page.close();
await browser.closeTab(tab.id);
return data;
}Drag and Drop
import { Page } from 'browser-agentkit';
async function dragAndDrop(tabId: number) {
const page = new Page(tabId);
await page.initialize();
const mouse = await page.getMouse();
// Drag from (100, 100) to (300, 300)
await mouse.move(100, 100);
await mouse.down();
await mouse.move(300, 300, { steps: 10 }); // Smooth movement
await mouse.up();
await page.close();
}API Reference
Classes
Browser
Main class for browser-level operations.
| Method | Description |
|--------|-------------|
| openTab(url, options?) | Opens a new tab |
| getTab(tabId) | Gets tab information |
| getCurrentTab(windowId?) | Gets the active tab |
| queryTabs(queryInfo) | Queries tabs |
| searchHistory(query) | Searches browser history |
| closeTab(tabId) | Closes a tab |
| navigate(tabId, url) | Navigates to URL |
| goBack(tabId) | Goes back in history |
| goForward(tabId) | Goes forward in history |
| reload(tabId, bypassCache?) | Reloads a tab |
| captureVisibleTab(tabId, options?) | Captures visible area screenshot |
| captureFullPage(tabId) | Captures full page screenshot |
| createHiddenTab(url) | Creates a hidden tab for background processing |
| captureVisibleTab(tabId, options?) | Captures visible area |
Page
Main class for page-level operations.
| Method | Description |
|--------|-------------|
| initialize() | Initializes the page (must call before other methods) |
| close() | Cleans up resources |
| click(selector) | Clicks an element, The param selector can be any CSS selector or a string in the format node=id, e.g. "node=123". |
| fill(selector, text) | Fills an input field, The param selector can be any CSS selector or a string in the format node=id, e.g. "node=123". |
| scroll(direction, amount?) | Scrolls the page |
| navigate(url) | Navigates to URL |
| waitForNavigation(options?) | Waits for navigation to complete |
| waitForSelector(selector, options?) | Waits for element to appear |
| element(selector) | Creates PageElement instance |
| getKeyboard() | Gets Keyboard instance |
| getMouse() | Gets Mouse instance |
| getContentExtractor() | Gets ContentExtractor instance |
| evaluate(fn, ...args) | Executes script in page context |
| captureVisible(tabId, options?) | Captures using debugger API |
Actions
Standalone class for page actions (used internally by Page).
| Method | Description |
|--------|-------------|
| click(selector) | Clicks an element |
| fill(selector, value) | Fills an input field |
| search(selector, value) | Fills input and presses Enter |
| scroll(direction, amount?) | Scrolls the page |
| waitForSelector(selector, options?) | Waits for element to appear |
Navigation
Handles page navigation.
| Method | Description |
|--------|-------------|
| navigate(url) | Navigates to URL |
| waitForNavigation(options?) | Waits for navigation to complete |
| waitForCondition(fn, options?) | Waits for a condition to be true |
PageElement
Represents a DOM element for interaction.
| Method | Description |
|--------|-------------|
| waitForExist(options?) | Waits for element to exist |
| exists() | Checks if element exists |
| getBoundingBox() | Gets element's bounding box |
| getText() | Gets element's text content |
| getAttributeValue(name) | Gets attribute value |
| scrollIntoView() | Scrolls element into view |
| findElementNodeIds() | Finds element and returns node IDs |
ContentExtractor
Extracts content from web pages.
| Method | Description |
|--------|-------------|
| getPageSnapshot(options?) | Gets page snapshot with HTML and metadata |
| getHTML(options?) | Gets Structured HTML Content (options: { viewportOnly?: boolean }). Every interactive DOM element has a unique "node" attribute, e.g. ...<button onclick="..." node="1234"></button>.... |
| getText() | Gets plain text content |
| getMetadata() | Gets page metadata (title, description, etc.) |
| getElementContent(selector) | Gets element's HTML and text content |
| isPDF() | Checks if current page is a PDF |
Keyboard
Simulates keyboard input.
| Method | Description |
|--------|-------------|
| down(key) | Presses a key down |
| up(key) | Releases a key |
| press(key, options?) | Presses and releases a key |
| type(text, options?) | Types a sequence of characters |
| insertText(text) | Inserts text directly |
Mouse
Simulates mouse input.
| Method | Description |
|--------|-------------|
| move(x, y, options?) | Moves mouse to position |
| down(options?) | Presses mouse button down |
| up(options?) | Releases mouse button |
| click(x, y, options?) | Clicks at position |
| dblclick(x, y, options?) | Double-clicks at position |
| wheel(deltaX, deltaY) | Scrolls using mouse wheel |
EventManager
Manages browser event listeners.
| Method | Description |
|--------|-------------|
| on(eventName, callback) | Registers custom event listener |
| off(eventName, callback) | Removes custom event listener |
| emit(eventName, ...args) | Emits custom event |
| onTabCreated(callback) | Listens for tab created events |
| onTabRemoved(callback) | Listens for tab removed events |
| onTabUpdated(callback) | Listens for tab updated events |
| onTabActivated(callback) | Listens for tab activated events |
| onBeforeRequest(callback, filter?) | Listens for network requests |
| onCompleted(callback, filter?) | Listens for completed requests |
| cleanup() | Removes all listeners |
TabManager
Low-level tab management (used internally by Browser).
| Method | Description |
|--------|-------------|
| create(url, options?) | Creates a new tab |
| get(tabId) | Gets tab by ID |
| getActive(windowId?) | Gets active tab |
| query(queryInfo) | Queries tabs |
| close(tabId) | Closes a tab |
| update(tabId, properties) | Updates tab properties |
| navigate(tabId, url) | Navigates to URL |
| goBack(tabId) | Goes back in history |
| goForward(tabId) | Goes forward in history |
| reload(tabId, bypassCache?) | Reloads tab |
| createHidden(url) | Creates hidden tab |
Enums
ScrollDirection
enum ScrollDirection {
UP = 'UP',
DOWN = 'DOWN',
}Types
// Tab information
type TabInfo = chrome.tabs.Tab
// Tab context for operations
interface TabContext {
tabId: number
mainTabId?: number
url?: string
title?: string
}
// Page content snapshot
interface PageContent {
html: string
markdown?: string
meta?: PageMetadata
isPdf?: boolean
resources?: unknown[]
}
// Page metadata
interface PageMetadata {
title?: string
description?: string
url?: string
image?: string
[key: string]: string | undefined
}
// Screenshot options
interface ScreenshotOptions {
format?: 'png' | 'jpeg'
quality?: number // 0-100, jpeg only
}
// Element bounding box
interface BoundingBox {
x: number
y: number
width: number
height: number
}
// Viewport information
interface ViewportInfo {
width: number
height: number
x: number
y: number
}Factory Functions
| Function | Description |
|----------|-------------|
| createKeyboard(debuggee) | Creates Keyboard instance with platform detection |
| createMouse(tabContext, keyboard) | Creates Mouse instance |
Default Instances
import { browser, events } from 'browser-agentkit';
// Pre-initialized Browser instance
browser.openTab('https://example.com');
// Pre-initialized EventManager instance
events.onTabCreated((tab) => console.log(tab));Requirements
- Chrome/Chromium browser
- Chrome extension with appropriate permissions:
tabsdebuggerscriptingactiveTabhistory
License
MIT
