browserdd
v0.0.16
Published
Production-grade AI agent framework for browser automation — inspired by browser-use
Maintainers
Readme
Browserdd
Production-grade AI agent framework for browser automation — inspired by browser-use
✨ Highlights
- Smart DOM Extraction — TreeWalker + paint order filtering + bounding box deduplication (no arbitrary limits!)
- Hierarchical Agents — Planner → BrowserNav architecture with automatic task decomposition
- Loop Detection — Detects stuck states and provides recovery nudges
- Vision Support — Screenshots sent to vision-capable models for better context (None)
- Navigation Awareness — Automatically navigates to correct pages before executing tasks
- LLM Agnostic — OpenAI, Anthropic, Google, local models, or any OpenAI-compatible API
📦 Installation
npm install browserdd
# or
pnpm add browserddQuick Start
import { WebAgent } from 'browserdd';
import { chromium } from 'playwright';
// Launch browser
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://example.com');
// Create agent
const agent = new WebAgent({
llm: {
provider: 'openai',
model: 'gpt-4o',
apiKey: process.env.OPENAI_API_KEY,
},
browser: { page },
});
// Execute task
const result = await agent.execute('Click the login button and fill email with "[email protected]"');
console.log(result.success); // true
console.log(result.summary); // "Completed in 3 steps"Architecture
┌────────────────────────────────────────────────────────────┐
│ WebAgent │
│ (Orchestrator) │
└─────────────────────────┬──────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Planner │◄────────────►│ BrowserNav │ │ Loop │
│ Agent │ continuous │ Agent │ │ Detector │
└──────┬──────┘ feedback └──────┬──────┘ └─────────────┘
│ │
│ ┌──────────────────┤
▼ ▼ ▼
┌─────────────┐ ┌─────────┐ ┌──────────┐
│ TaskPlan │ │ DOM │ │ Action │
│ (subtasks) │ │Distiller│ │ Executor │
└─────────────┘ └─────────┘ └──────────┘How It Works
- PlannerAgent receives user task → extracts values → generates specific subtasks with values
- BrowserNavigationAgent executes each subtask step-by-step
- After each subtask: Planner reviews progress, can skip completed subtasks or stop early
- On failure: Planner provides recovery strategy (retry, alternative, ask user)
- DOMDistiller extracts interactive elements with smart filtering
- LoopDetector monitors for stuck states and provides recovery hints
Smart DOM Extraction
Unlike simple approaches that limit elements count, we use browser-use inspired techniques:
4-Phase Filtering Pipeline
| Phase | Technique | Purpose |
|-------|-----------|---------|
| 1 | TreeWalker | Prune hidden subtrees early |
| 2 | ClickableElementDetector | 11 signals to identify interactive elements |
| 3 | Paint Order Filtering | Multi-point occlusion via elementFromPoint() |
| 4 | Bounding Box Filtering | Remove children inside clickable parents |
Clickable Detection Signals
// 11 signals to detect interactive elements
- Native interactive tags (button, a, input, select, textarea, etc.)
- Role attributes (role="button", role="link", etc.)
- Cursor style (pointer, text)
- Event handlers (onclick, @click, v-on:click, ng-click)
- ContentEditable elements
- Tab-focusable elements (tabindex)
- Data attributes (data-action, data-toggle, etc.)
- Search-related patterns (searchbox, magnify, etc.)
- Label wrapper detection (for frameworks like Ant Design)
- Icon-sized elements with interactive hintsDistillation Modes
| Mode | Token Reduction | Use Case |
|------|----------------|----------|
| TEXT_ONLY | ~95% | Reading content, extracting data |
| INPUT_FIELDS | ~90% | Form filling, data entry |
| ALL_FIELDS | ~80% | Complex navigation, clicking |
Navigation Awareness
The agent automatically handles page navigation:
// User is on /chat page, but task requires /feed page
const result = await agent.execute('Post a new message saying "Hello World"');
// Agent automatically:
// 1. Detects current page doesn't have post input
// 2. Finds navigation link to correct page
// 3. Navigates first, then performs the taskloop Detection & Recovery
Built-in detection for common stuck patterns:
- Action Repetition — Same action repeated 3+ times
- Page Oscillation — Toggling between 2 pages
- Stagnant State — No DOM changes after actions
When detected, the agent receives a "nudge" to try alternative approaches.
Configuration
const agent = new WebAgent({
// LLM Configuration
llm: {
provider: 'openai',
model: 'gpt-4o',
apiKey: process.env.OPENAI_API_KEY,
baseUrl: 'https://api.openai.com/v1', // or custom endpoint
temperature: 0.7,
maxTokens: 4096,
},
// Browser (Playwright page)
browser: {
page: playwrightPage,
},
// Execution limits
maxStepsPerSubtask: 8,
maxSubtasksPerTask: 20,
// Features
screenshotOnAction: true, // Send screenshots to vision models
debug: false,
// Custom prompts (optional)
prompts: {
planner: 'Your custom planner prompt...',
browserNav: 'Your custom browser nav prompt...',
siteContext: 'Site-specific rules and context...', // See Site Context below
},
});Site Context
The library is designed to be site-agnostic. All site-specific rules should be provided via siteContext:
const siteContext = `
## Site Rules
### Page Detection
- /feed → Post creation page
- /chat → Messaging page
- /profile → Profile editing page
### Element Hints
- Post button: Look for "Post" or submit button after textarea
- Send button: Paper-plane icon or "Send" text
- Save button: "Save" or "Update" after form fields
### Success Indicators
- Toast/snackbar with "success", "saved", "posted"
- Form closes after submission
- New content appears in feed/chat
`;
const agent = new WebAgent({
llm: { ... },
prompts: {
siteContext, // Appended to Planner and Navigator prompts
},
});What to Include in Site Context
| Category | Examples | |----------|----------| | Page Structure | URL patterns, page purposes | | Element Hints | How to identify key buttons/inputs | | Success Indicators | Toast patterns, success messages | | Error Recovery | What to do when actions fail | | Localization | Language-specific patterns |
---
## Events
```typescript
agent.on('task:start', ({ taskId, task }) => {
console.log(`Starting: ${task}`);
});
agent.on('task:plan', ({ taskId, plan }) => {
console.log(`Plan: ${plan.subtasks.length} subtasks`);
});
agent.on('subtask:start', ({ subtask }) => {
console.log(`Subtask: ${subtask.description}`);
});
agent.on('subtask:complete', ({ result }) => {
console.log(`Result: ${result.success ? '✓' : '✗'}`);
});
agent.on('task:complete', ({ result }) => {
console.log(`Done! Steps: ${result.totalSteps}`);
});Testing
# Unit tests
npm test
# Integration tests (requires OpenAI-compatible API)
export WEB_AGENT_OPENAI_BASE_URL="https://api.openai.com/v1"
export WEB_AGENT_OPENAI_API_KEY="sk-..."
npm run test:runExamples
Form Filling
await agent.execute('Fill the signup form with email "[email protected]" and password "Secret123"');E-commerce
await agent.execute('Search for "wireless headphones", sort by price low to high, add first result to cart');Social Media
await agent.execute('Go to profile, change my display name to "John Doe", and save');Data Extraction
const context = await agent.getContext('text_only');
// Use with your own LLM for extractionLicense
MIT © Hert4
