npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

browserdd

v0.0.16

Published

Production-grade AI agent framework for browser automation — inspired by browser-use

Readme

Browserdd

Production-grade AI agent framework for browser automation — inspired by browser-use

npm version TypeScript License: MIT

✨ Highlights

  • Smart DOM Extraction — TreeWalker + paint order filtering + bounding box deduplication (no arbitrary limits!)
  • Hierarchical Agents — Planner → BrowserNav architecture with automatic task decomposition
  • Loop Detection — Detects stuck states and provides recovery nudges
  • Vision Support — Screenshots sent to vision-capable models for better context (None)
  • Navigation Awareness — Automatically navigates to correct pages before executing tasks
  • LLM Agnostic — OpenAI, Anthropic, Google, local models, or any OpenAI-compatible API

📦 Installation

npm install browserdd
# or
pnpm add browserdd

Quick Start

import { WebAgent } from 'browserdd';
import { chromium } from 'playwright';

// Launch browser
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://example.com');

// Create agent
const agent = new WebAgent({
  llm: {
    provider: 'openai',
    model: 'gpt-4o',
    apiKey: process.env.OPENAI_API_KEY,
  },
  browser: { page },
});

// Execute task
const result = await agent.execute('Click the login button and fill email with "[email protected]"');

console.log(result.success);  // true
console.log(result.summary);  // "Completed in 3 steps"

Architecture

┌────────────────────────────────────────────────────────────┐
│                        WebAgent                             │
│                    (Orchestrator)                           │
└─────────────────────────┬──────────────────────────────────┘
                          │
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
   ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
   │   Planner   │◄────────────►│ BrowserNav  │ │    Loop     │
   │    Agent    │  continuous  │    Agent    │ │  Detector   │
   └──────┬──────┘  feedback    └──────┬──────┘ └─────────────┘
          │                            │
          │         ┌──────────────────┤
          ▼         ▼                  ▼
   ┌─────────────┐ ┌─────────┐ ┌──────────┐
   │  TaskPlan   │ │   DOM   │ │  Action  │
   │ (subtasks)  │ │Distiller│ │ Executor │
   └─────────────┘ └─────────┘ └──────────┘

How It Works

  1. PlannerAgent receives user task → extracts values → generates specific subtasks with values
  2. BrowserNavigationAgent executes each subtask step-by-step
  3. After each subtask: Planner reviews progress, can skip completed subtasks or stop early
  4. On failure: Planner provides recovery strategy (retry, alternative, ask user)
  5. DOMDistiller extracts interactive elements with smart filtering
  6. LoopDetector monitors for stuck states and provides recovery hints

Smart DOM Extraction

Unlike simple approaches that limit elements count, we use browser-use inspired techniques:

4-Phase Filtering Pipeline

| Phase | Technique | Purpose | |-------|-----------|---------| | 1 | TreeWalker | Prune hidden subtrees early | | 2 | ClickableElementDetector | 11 signals to identify interactive elements | | 3 | Paint Order Filtering | Multi-point occlusion via elementFromPoint() | | 4 | Bounding Box Filtering | Remove children inside clickable parents |

Clickable Detection Signals

// 11 signals to detect interactive elements
- Native interactive tags (button, a, input, select, textarea, etc.)
- Role attributes (role="button", role="link", etc.)
- Cursor style (pointer, text)
- Event handlers (onclick, @click, v-on:click, ng-click)
- ContentEditable elements
- Tab-focusable elements (tabindex)
- Data attributes (data-action, data-toggle, etc.)
- Search-related patterns (searchbox, magnify, etc.)
- Label wrapper detection (for frameworks like Ant Design)
- Icon-sized elements with interactive hints

Distillation Modes

| Mode | Token Reduction | Use Case | |------|----------------|----------| | TEXT_ONLY | ~95% | Reading content, extracting data | | INPUT_FIELDS | ~90% | Form filling, data entry | | ALL_FIELDS | ~80% | Complex navigation, clicking |


Navigation Awareness

The agent automatically handles page navigation:

// User is on /chat page, but task requires /feed page
const result = await agent.execute('Post a new message saying "Hello World"');

// Agent automatically:
// 1. Detects current page doesn't have post input
// 2. Finds navigation link to correct page
// 3. Navigates first, then performs the task

loop Detection & Recovery

Built-in detection for common stuck patterns:

  • Action Repetition — Same action repeated 3+ times
  • Page Oscillation — Toggling between 2 pages
  • Stagnant State — No DOM changes after actions

When detected, the agent receives a "nudge" to try alternative approaches.


Configuration

const agent = new WebAgent({
  // LLM Configuration
  llm: {
    provider: 'openai',
    model: 'gpt-4o',
    apiKey: process.env.OPENAI_API_KEY,
    baseUrl: 'https://api.openai.com/v1',  // or custom endpoint
    temperature: 0.7,
    maxTokens: 4096,
  },

  // Browser (Playwright page)
  browser: {
    page: playwrightPage,
  },

  // Execution limits
  maxStepsPerSubtask: 8,
  maxSubtasksPerTask: 20,

  // Features
  screenshotOnAction: true,  // Send screenshots to vision models
  debug: false,

  // Custom prompts (optional)
  prompts: {
    planner: 'Your custom planner prompt...',
    browserNav: 'Your custom browser nav prompt...',
    siteContext: 'Site-specific rules and context...',  // See Site Context below
  },
});

Site Context

The library is designed to be site-agnostic. All site-specific rules should be provided via siteContext:

const siteContext = `
## Site Rules

### Page Detection
- /feed → Post creation page
- /chat → Messaging page
- /profile → Profile editing page

### Element Hints
- Post button: Look for "Post" or submit button after textarea
- Send button: Paper-plane icon or "Send" text
- Save button: "Save" or "Update" after form fields

### Success Indicators
- Toast/snackbar with "success", "saved", "posted"
- Form closes after submission
- New content appears in feed/chat
`;

const agent = new WebAgent({
  llm: { ... },
  prompts: {
    siteContext,  // Appended to Planner and Navigator prompts
  },
});

What to Include in Site Context

| Category | Examples | |----------|----------| | Page Structure | URL patterns, page purposes | | Element Hints | How to identify key buttons/inputs | | Success Indicators | Toast patterns, success messages | | Error Recovery | What to do when actions fail | | Localization | Language-specific patterns |


---

## Events

```typescript
agent.on('task:start', ({ taskId, task }) => {
  console.log(`Starting: ${task}`);
});

agent.on('task:plan', ({ taskId, plan }) => {
  console.log(`Plan: ${plan.subtasks.length} subtasks`);
});

agent.on('subtask:start', ({ subtask }) => {
  console.log(`Subtask: ${subtask.description}`);
});

agent.on('subtask:complete', ({ result }) => {
  console.log(`Result: ${result.success ? '✓' : '✗'}`);
});

agent.on('task:complete', ({ result }) => {
  console.log(`Done! Steps: ${result.totalSteps}`);
});

Testing

# Unit tests
npm test

# Integration tests (requires OpenAI-compatible API)
export WEB_AGENT_OPENAI_BASE_URL="https://api.openai.com/v1"
export WEB_AGENT_OPENAI_API_KEY="sk-..."
npm run test:run

Examples

Form Filling

await agent.execute('Fill the signup form with email "[email protected]" and password "Secret123"');

E-commerce

await agent.execute('Search for "wireless headphones", sort by price low to high, add first result to cart');

Social Media

await agent.execute('Go to profile, change my display name to "John Doe", and save');

Data Extraction

const context = await agent.getContext('text_only');
// Use with your own LLM for extraction

License

MIT © Hert4


Acknowledgments