@riruru/automation-core
v0.1.0
Published
Browser automation library for Chrome Extensions - LLM-powered browser agent
Maintainers
Readme
automation-core
Browser automation library for Chrome Extensions - extracted from Nanobrowser.
Standalone Package: This is an extracted automation core that can be used as a dependency in other Chrome extensions.
Overview
This library provides AI-driven browser automation capabilities specifically designed for Chrome Extensions. It uses LLM (Large Language Model) agents to interpret natural language commands and execute browser actions.
Requirements
- Chrome Extension Manifest V3
- Required Permissions:
debugger- For CDP (Chrome DevTools Protocol) accesstabs- For tab managementscripting- For DOM injectionactiveTab- For current tab access
- Host Permissions for target sites
⚠️ Important: This library will NOT work in:
- Node.js scripts
- Web pages
- Firefox, Safari, or other browsers
Installation
# From npm
npm install @riruru/automation-core puppeteer-core zod
# Or with pnpm
pnpm add @riruru/automation-core puppeteer-core zodQuick Start
import { AutomationAgent, BrowserContext } from '@riruru/automation-core';
// Create a browser context attached to the active tab
const context = await BrowserContext.fromActiveTab();
// Create an automation agent
const agent = new AutomationAgent({
context,
llm: {
provider: 'anthropic',
apiKey: 'sk-ant-...',
model: 'claude-sonnet-4-20250514'
}
});
// Subscribe to events (optional)
agent.on('step', (event) => {
console.log(`Step ${event.step}: ${event.details}`);
});
// Execute a task
const result = await agent.execute("Click the Jobs button in the navigation");
console.log(result);
// {
// success: true,
// steps: [...],
// finalUrl: "https://example.com/jobs",
// finalAnswer: "Clicked the Jobs button"
// }API Reference
AutomationAgent
The main entry point for browser automation.
const agent = new AutomationAgent({
context: BrowserContext, // Optional - creates from active tab if not provided
llm: {
provider: 'anthropic' | 'openai' | 'gemini' | 'ollama',
apiKey: string,
model: string,
baseUrl?: string, // For custom endpoints
temperature?: number, // Default: 0.1
},
options?: {
maxSteps?: number, // Default: 50
maxActionsPerStep?: number, // Default: 5
maxFailures?: number, // Default: 3
useVision?: boolean, // Default: false
}
});
// Execute a task
const result = await agent.execute("Your task description");
// Subscribe to events
agent.on('step' | 'action' | 'error' | 'complete' | 'all', handler);
// Stop execution
await agent.stop();
// Cleanup resources
await agent.cleanup();BrowserContext
Manages browser tabs and pages.
// Create from active tab
const context = await BrowserContext.fromActiveTab();
// Create from specific tab
const context = await BrowserContext.fromTab(tabId);
// Get current page
const page = await context.getCurrentPage();
// Navigate
await context.navigateTo('https://example.com');
// Tab management
await context.openTab('https://example.com');
await context.switchTab(tabId);
await context.closeTab(tabId);TaskResult
The result of executing a task.
interface TaskResult {
success: boolean;
error?: string;
steps: StepRecord[];
finalUrl: string;
finalAnswer?: string;
data?: unknown; // For extraction tasks
}Supported LLM Providers
| Provider | Models | |----------|--------| | Anthropic | Claude 3 Opus, Sonnet, Haiku, Claude 3.5 Sonnet | | OpenAI | GPT-4, GPT-4 Turbo, GPT-3.5 Turbo | | Google | Gemini Pro, Gemini Ultra | | Ollama | Any local model |
Available Actions
The agent can perform these browser actions:
- Navigation:
go_to_url,go_back,search_google - Interaction:
click_element,input_text,send_keys - Scrolling:
scroll_to_top,scroll_to_bottom,next_page,previous_page - Tab Management:
open_tab,switch_tab,close_tab - Dropdowns:
get_dropdown_options,select_dropdown_option - Utility:
wait,cache_content,done
Manifest Configuration
Add these permissions to your Chrome Extension manifest:
{
"manifest_version": 3,
"permissions": [
"debugger",
"tabs",
"scripting",
"activeTab"
],
"host_permissions": [
"<all_urls>"
]
}How It Works
- User provides a natural language task (e.g., "Click the Jobs button")
- Navigator Agent analyzes the page - extracts interactive elements
- LLM decides what action to take - based on the task and page state
- Action is executed via Puppeteer/CDP
- Loop continues until task is complete or max steps reached
Architecture
AutomationAgent
└── Executor
├── NavigatorAgent (LLM-driven decision making)
│ ├── Prompts (system instructions)
│ ├── Actions (click, input, scroll, etc.)
│ └── MessageManager (conversation history)
└── BrowserContext
├── Page (Puppeteer wrapper)
└── DOM Services (element extraction)Development
Build
pnpm install
pnpm buildType Check
pnpm type-checkRun Tests
pnpm test # Watch mode
pnpm test:run # Single run
pnpm test:coverage # With coverage reportSee TESTING.md for comprehensive testing documentation.
Project Structure
automation-core/
├── agent/ # AI agent logic
│ ├── actions/ # Browser actions (click, input, etc.)
│ ├── messages/ # LLM conversation management
│ └── prompts/ # System prompts for agents
├── browser/ # Browser control layer
│ └── dom/ # DOM extraction and manipulation
├── llm/ # LLM factory and config
├── utils/ # Utilities (logger, JSON repair, etc.)
├── test/ # Test setup and utilities
├── types.ts # Shared type definitions
├── index.ts # Main entry point
└── automation-agent.ts # High-level agent wrapperLicense
Apache-2.0
