@hanzili/chrome-browser-agent
v0.1.0
Published
Browser automation toolkit for Chrome extensions using CDP (Chrome DevTools Protocol). Powers AI agents that interact with web pages.
Maintainers
Readme
Chrome Browser Agent
Browser automation toolkit for Chrome extensions using CDP (Chrome DevTools Protocol). Powers AI agents that interact with web pages.
Features
- CDP Integration: Full Chrome DevTools Protocol support for reliable browser automation
- Accessibility Tree: Semantic page representation for AI navigation (not CSS selectors)
- Reference-based Targeting: Elements are tracked by refs (
ref_1,ref_2) that survive page changes - Tool Definitions: Ready-to-use tool schemas for LLM tool calling (Claude, GPT, etc.)
- Screenshot Support: High-quality screenshots with DPR scaling
Installation
npm install @hanzili/chrome-browser-agentSetup
1. Add to your manifest.json
{
"permissions": ["debugger", "scripting", "tabs", "activeTab"],
"host_permissions": ["<all_urls>"],
"content_scripts": [
{
"matches": ["<all_urls>"],
"js": [
"node_modules/@hanzili/chrome-browser-agent/src/content/accessibility-tree.js",
"node_modules/@hanzili/chrome-browser-agent/src/content/content.js"
]
}
]
}2. Use in your service worker
import {
cdpHelper,
executeTool,
TOOL_DEFINITIONS
} from '@hanzili/chrome-browser-agent';
// Pass TOOL_DEFINITIONS to your LLM
const response = await callLLM(messages, { tools: TOOL_DEFINITIONS });
// Execute tools returned by LLM
for (const toolUse of response.tool_calls) {
const result = await executeTool(toolUse.name, toolUse.input, {
tabId: currentTabId,
sendToContent: (tabId, type, payload) => chrome.tabs.sendMessage(tabId, { type, payload })
});
}Core Concepts
Accessibility Tree
Instead of fragile CSS selectors, this toolkit uses an accessibility tree representation:
button "Submit Application" [ref_1]
textbox "Email" [ref_2] placeholder="Enter email"
combobox "Country" [ref_3]
option "United States" value="us"
option "Canada" value="ca" (selected)The LLM sees semantic roles and can reference elements by ref_1, ref_2, etc.
Available Tools
| Tool | Description |
|------|-------------|
| read_page | Get accessibility tree of current page |
| computer | Click, type, scroll, screenshot |
| form_input | Fill form fields by reference |
| navigate | Go to URL, back, forward |
| find | Natural language element search |
| file_upload | Upload files to inputs |
API Reference
cdpHelper
// Attach debugger to tab
await cdpHelper.attachDebugger(tabId);
// Take screenshot
const base64 = await cdpHelper.takeScreenshot(tabId);
// Click at coordinates
await cdpHelper.click(tabId, x, y);
// Type text
await cdpHelper.type(tabId, "Hello world");executeTool
const result = await executeTool('click', { ref: 'ref_1' }, {
tabId,
sendToContent,
cdpHelper
});Content Scripts
The content scripts must be injected into pages. They provide:
accessibility-tree.js: Generates the semantic tree, manages element refscontent.js: Handles messages from service worker (form fill, click, etc.)agent-visual-indicator.js: Shows visual feedback during automation
License
MIT
