npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

browser-agentkit

v0.0.5

Published

A powerful SDK for Chrome extensions to interact with browsers and web pages

Readme

Browser AgentKit SDK

A powerful SDK for Chrome extensions to interact with browsers and web pages. Simplifies browser automation, web scraping, and page interaction tasks.

Features

  • 🎯 CDP-Powered — Direct Chrome DevTools Protocol integration for fine-grained browser control
  • 🔒 Stable & Reliable — CDP provides deterministic, low-level operations without DOM injection
  • 🚀 Easy to Use — Simple, intuitive API design
  • 🌐 Comprehensive — Browser control, page interactions, content extraction
  • 🎨 Flexible — Support for keyboard, mouse, and complex interactions
  • 📦 Lightweight — Minimal dependencies, no external browser binaries required

Why Browser AgentKit?

Built for Chrome Extensions

Unlike traditional browser automation tools (Puppeteer, Playwright, Selenium), Browser AgentKit is specifically designed for Chrome extension development:

| Feature | Browser AgentKit | Puppeteer/Playwright | |---------|------------------|---------------------| | Runtime | Inside Chrome extension | External Node.js process | | Browser Instance | Uses user's existing browser | Launches separate browser | | User Session | Access to logged-in sessions & cookies | Isolated browser context | | Installation | npm package only | Requires browser binary download | | Use Case | Extension-based automation | Testing & scraping servers |

CDP: The Foundation of Reliability

Browser AgentKit leverages the Chrome DevTools Protocol (CDP) directly, the same protocol that powers Chrome DevTools, Puppeteer, and Playwright internally:

  • Fine-grained Control — Direct access to browser internals: DOM, network, input, runtime
  • No Script Injection — Input simulation happens at the browser level, not via JavaScript injection
  • Deterministic Operations — CDP commands are executed synchronously by the browser engine
  • Anti-Detection Friendly — Native browser events are indistinguishable from real user actions
  • Full Debugging Capabilities — Access to the same powerful APIs used by Chrome DevTools

When to Use Browser AgentKit

Use Browser AgentKit when you need:

  • Build Chrome extensions with automation capabilities
  • Access user's authenticated sessions (no re-login required)
  • Operate within user's existing browser environment
  • Lightweight SDK without bundled browser binaries

Use Puppeteer/Playwright instead when you need:

  • Server-side web scraping or testing
  • Headless browser automation in CI/CD pipelines
  • Cross-browser testing (Firefox, Safari, etc.)
  • Isolated browser contexts for parallel execution

Architecture

Browser AgentKit is organized around a single unified CDP channel:

                   ┌─────────────────┐
                   │     Browser     │  tab lifecycle (chrome.tabs API)
                   └─────────────────┘
                            │
                   ┌─────────────────┐
                   │      Page       │  page-level orchestrator
                   └────────┬────────┘
                            │ owns
              ┌─────────────┼─────────────┐
              │             │             │
              ▼             ▼             ▼
        ┌──────────┐ ┌──────────┐ ┌─────────────────┐
        │ Keyboard │ │  Mouse   │ │ ContentExtractor│
        └────┬─────┘ └────┬─────┘ └────────┬────────┘
             │            │                │
             └────────────┼────────────────┘
                          ▼
                  ┌───────────────┐
                  │  CDPSession   │  the ONLY channel to chrome.debugger
                  └───────┬───────┘
                          ▼
                  ┌───────────────┐
                  │ chrome.* APIs │
                  └───────────────┘

CDPSession is the single, central abstraction for talking to the browser. Every module — Keyboard, Mouse, Element, ContentExtractor, Navigation, Screenshot — sends commands, subscribes to events, and runs scripts through it. There is exactly one path from this library to chrome.debugger.*, which makes the surface predictable and testable.

Installation

npm install browser-agentkit

Quick Start

import { browser, Page } from 'browser-agentkit';

// Open a new tab
const tab = await browser.openTab('https://example.com');

// Create a page instance (Page is the main entry point for page-level operations)
const page = new Page(tab.id);
await page.initialize();

// Interact with the page
await page.fill('#search', 'browser automation');
await page.click('#submit');

// Extract content
const extractor = await page.getContentExtractor();
const html = await extractor.getHTML();
const metadata = await extractor.getMetadata();

console.log('Page title:', metadata.title);

// Clean up
await page.close();
await browser.closeTab(tab.id);

Core Modules

Browser

Manage browser tabs and windows. Operates purely at the chrome.tabs / chrome.windows / chrome.history level — no debugger attachment.

import { browser } from 'browser-agentkit';

// Open and manage tabs
const tab = await browser.openTab('https://example.com');
const currentTab = await browser.getCurrentTab();
const allTabs = await browser.queryTabs({ currentWindow: true });

// Tab lifecycle (fire-and-forget — for navigation with network-idle waiting, use Page.navigate)
await browser.navigate(tab.id, 'https://google.com');
await browser.goBack(tab.id);
await browser.reload(tab.id);

// Hidden tab for background work
const hidden = await browser.createHiddenTab('https://example.com');

// Lightweight screenshot (uses chrome.tabs.captureVisibleTab — no debugger needed)
const screenshot = await browser.captureVisibleTab(tab.id);

Page

The main entry point for page-level operations. On first use it lazily attaches the debugger and wires up Keyboard, Mouse, Actions, and access to ContentExtractor.

import { Page, ScrollDirection } from 'browser-agentkit';

const page = new Page(tabId);
await page.initialize();   // attaches debugger, enables CDP domains

// Click elements (CSS selector or AX-node selector `node="123"`)
await page.click('#button');
await page.click('node="42"');

// Fill forms
await page.fill('#email', '[email protected]');
await page.fill('#password', 'secret');
await page.fill('#bio', longText, /* usePaste */ true);  // paste via system clipboard

// Navigate with network-idle + FCP wait
await page.navigate('https://example.com');
await page.waitForNavigation({ timeout: 30_000 });

// Wait for elements
await page.waitForSelector('#result');

// Scroll
await page.scroll(ScrollDirection.DOWN);

// Element handles
const el = page.element('#title');
const text = await el.getText();
const box = await el.getBoundingBox();

// Low-level input
const keyboard = await page.getKeyboard();
await keyboard.press('Enter');

const mouse = await page.getMouse();
await mouse.click(100, 200);

// In-page script execution
const title = await page.evaluate(() => document.title);

// Debugger-based screenshot (when you need full pixel control)
const screenshot = await page.captureVisible({ format: 'png' });

// Clean up — detaches the debugger
await page.close();

Element

Represents a single element. Combines selector resolution and per-element operations. Created via page.element(selector).

const el = page.element('#submit');

await el.waitForExist({ timeout: 10_000 });
const exists = await el.exists();
const box = await el.getBoundingBox();
const text = await el.getText();
const href = await el.getAttributeValue('href');
await el.scrollIntoView();

Both CSS selectors and AX-node selectors (node="123") work. The AX-node form is what ContentExtractor.getHTML() embeds on every interactive element, so you can round-trip from extracted HTML to clickable element with no extra parsing.

Content Extraction

Convert a page into a simplified, LLM-friendly HTML representation built from the accessibility tree plus cursor: pointer detection.

const extractor = await page.getContentExtractor();

// Full simplified HTML — every interactive element carries a `node="..."` attribute
const html = await extractor.getHTML();

// Viewport-only mode skips elements outside the current viewport
const visibleHtml = await extractor.getHTML({ viewportOnly: true });

// Plain text (tags stripped)
const text = await extractor.getText();

// Open Graph / description metadata extracted from the snapshot
const meta = await extractor.getMetadata();

// One-off element extraction via selector
const result = await extractor.getElementContent('#article');

// PDF detection
const isPdf = await extractor.isPDF();

Input Simulation

Low-level keyboard and mouse. Normally obtained from page.getKeyboard() / page.getMouse(), but can be constructed manually against any CDPSession.

import { createKeyboard, createMouse, createCDPSession } from 'browser-agentkit';

const session = await createCDPSession(tabId);
const keyboard = await createKeyboard(session);

await keyboard.type('Hello World');
await keyboard.press('Enter');
await keyboard.press('ControlOrMeta+KeyA');  // Ctrl+A on Win/Linux, Cmd+A on Mac

const mouse = createMouse(session, keyboard);
await mouse.move(100, 200);
await mouse.click(100, 200);
await mouse.dblclick(150, 250);
await mouse.wheel(0, 400);  // scroll

CDP Session

The single channel through which all CDP traffic flows. You rarely use it directly — Page creates and owns one — but it is the right escape hatch when you need a CDP command this library doesn't wrap.

import { createCDPSession } from 'browser-agentkit';

const session = await createCDPSession(tabId);

// Send any CDP command
const { root } = await session.send('DOM.getDocument', { depth: -1 });

// Subscribe to CDP events (auto-filtered by tabId)
const off = session.onEvent('Network.requestWillBeSent', (method, params) => {
  console.log('request:', params);
});
off();  // unsubscribe

// Execute a function in the page's main world (Runtime.evaluate)
const title = await session.evaluate(() => document.title);

// Execute a function in an isolated world (chrome.scripting.executeScript)
const result = await session.executeScript((id: string) => {
  return document.getElementById(id)?.innerHTML;
}, ['main']);

// Attach to a cross-origin iframe and get a separate session for it
const iframeSession = await session.attachIframe(frameId);
await iframeSession.detach();

await session.detach();

Event Management

Wraps chrome.tabs.* and chrome.webRequest.* events with a shared cleanup story.

import { events } from 'browser-agentkit';

events.onTabCreated((tab) => {
  console.log('New tab created:', tab.url);
});

events.onTabUpdated((tabId, changeInfo, tab) => {
  if (changeInfo.status === 'complete') {
    console.log('Tab loaded:', tab.url);
  }
});

events.onBeforeRequest((details) => {
  console.log('Request:', details.url);
});

events.onCompleted((details) => {
  console.log('Response:', details.url);
});

// Remove all listeners registered through this manager
events.cleanup();

Advanced Examples

Form Automation

import { browser, Page } from 'browser-agentkit';

async function fillLoginForm() {
  const tab = await browser.openTab('https://example.com/login');
  const page = new Page(tab.id);
  await page.initialize();

  await page.fill('#username', 'myuser');
  await page.fill('#password', 'mypass');
  await page.click('#login-button');

  await page.waitForNavigation();
  console.log('Login successful!');

  await page.close();
}

Web Scraping with AX Tree

The node="..." attributes that ContentExtractor embeds on interactive elements are valid selectors for page.click and page.fill. This lets an LLM read the extracted HTML and act on it without ever computing a CSS selector.

import { browser, Page } from 'browser-agentkit';

async function scrapeAndClick() {
  const tab = await browser.openTab('https://example.com/data');
  const page = new Page(tab.id);
  await page.initialize();

  const extractor = await page.getContentExtractor();
  const html = await extractor.getHTML({ viewportOnly: true });

  // ... pass `html` to an LLM, get back a node id ...
  await page.click('node="42"');

  await page.close();
  await browser.closeTab(tab.id);
}

Drag and Drop

Mouse drag is HTML5-aware: moving the mouse while a button is down triggers dragstart detection and the rest of the drag lifecycle is handled by CDP Input.setInterceptDrags.

import { Page } from 'browser-agentkit';

async function dragAndDrop(tabId: number) {
  const page = new Page(tabId);
  await page.initialize();

  const mouse = await page.getMouse();

  await mouse.move(100, 100);
  await mouse.down();
  await mouse.move(300, 300, { steps: 10 });  // smooth, triggers dragstart detection
  await mouse.up();                            // drops at (300, 300)

  await page.close();
}

Custom CDP Commands

When you need a CDP capability this library doesn't wrap (e.g. Network.setExtraHTTPHeaders), reach for CDPSession directly.

import { Page } from 'browser-agentkit';

async function withCustomHeaders(tabId: number) {
  const page = new Page(tabId);
  await page.initialize();

  // Page owns a CDPSession internally; create your own if you need raw access
  const { createCDPSession } = await import('browser-agentkit');
  const session = await createCDPSession(tabId);

  await session.send('Network.setExtraHTTPHeaders', {
    headers: { 'X-Custom': 'value' },
  });
}

API Reference

Classes

Browser

Tab and window lifecycle. No debugger attachment.

| Method | Description | |--------|-------------| | openTab(url, options?) | Opens a new tab | | getTab(tabId) | Gets tab information | | getCurrentTab(windowId?) | Gets the active tab | | queryTabs(queryInfo) | Queries tabs | | searchHistory(query) | Searches browser history (deduplicated by URL) | | closeTab(tabId) | Closes a tab | | navigate(tabId, url) | Navigates the tab (fire-and-forget; use Page.navigate if you need network-idle waiting) | | goBack(tabId) | Goes back in history | | goForward(tabId) | Goes forward in history | | reload(tabId, bypassCache?) | Reloads a tab | | captureVisibleTab(tabId, options?) | Captures visible area via chrome.tabs.captureVisibleTab | | createHiddenTab(url) | Creates a minimized hidden tab for background work |

Page

Page-level orchestrator. Owns the CDPSession for a single tab.

| Method | Description | |--------|-------------| | initialize() | Attaches the debugger, enables CDP domains, builds sub-components. Called automatically by every other method, but you can call it explicitly to surface attach errors early. | | close() | Detaches the debugger and clears sub-components | | click(selector) | Clicks an element. selector is a CSS selector or node="123" AX-node selector | | fill(selector, text, usePaste?) | Fills an input. With usePaste=true, writes through the system clipboard (useful for long/formatted text) | | scroll(direction, amount?) | Scrolls the page (defaults to ~one viewport height) | | navigate(url) | Navigates and waits for network idle + first contentful paint | | waitForNavigation(options?) | Waits for an in-flight navigation to settle | | waitForSelector(selector, options?) | Waits for an element to appear | | element(selector) | Returns an Element handle (requires the page to be initialized) | | getKeyboard() | Returns the page's Keyboard | | getMouse() | Returns the page's Mouse | | getContentExtractor() | Returns a ContentExtractor bound to this page's session | | evaluate(fn, ...args) | Executes a function in the page's main world | | captureVisible(options?) | Captures the visible area via CDP Page.captureScreenshot |

Actions

The action implementation that Page delegates to. Exported for advanced cases where you want to drive actions against a CDPSession you manage yourself.

| Method | Description | |--------|-------------| | click(selector) | Clicks an element | | fill(selector, value, usePaste?) | Fills an input | | search(selector, value) | fill + press Enter | | scroll(direction, amount?) | Scrolls the page | | waitForSelector(selector, options?) | Waits for an element to appear |

Constructor: new Actions(session, keyboard, mouse).

Element

A single element handle. Owns selector resolution and per-element operations.

| Method | Description | |--------|-------------| | resolve() | Resolves the selector to { nodeId, backendNodeId, object } | | exists() | Whether the element resolves successfully | | waitForExist(options?) | Polls until the element exists or times out | | getBoundingBox() | Returns the element's CSS-pixel bounding box | | getText() | Returns the element's text content | | getAttributeValue(name) | Returns an HTML attribute value, or null | | scrollIntoView() | Scrolls the element into the viewport if needed |

Constructor: new Element(session, selector).

ContentExtractor

Converts a page into a simplified HTML representation built from the accessibility tree.

| Method | Description | |--------|-------------| | getPageSnapshot(options?) | Full snapshot: HTML, metadata, PDF flag. options.viewportOnly keeps only viewport-visible nodes | | getHTML(options?) | Simplified HTML. Every interactive element carries node="..." | | getText() | Plain text (tags stripped, whitespace collapsed) | | getMetadata() | Page metadata (title, description, Open Graph) | | getElementContent(selector) | innerHTML + textContent for a single element | | isPDF() | Whether the current page is a PDF |

Constructor: new ContentExtractor(session).

Keyboard

Keyboard simulation. Tracks pressed modifiers across calls.

| Method | Description | |--------|-------------| | down(key) | Presses a key down | | up(key) | Releases a key | | press(key, options?) | Presses and releases a key. Supports chords like 'ControlOrMeta+KeyA' | | type(text, options?) | Types a string character by character | | insertText(text) | Inserts text directly (no per-char key events) |

Mouse

Mouse simulation. Tracks cursor position and button state across calls. Drag-and-drop is detected automatically when the left button is held during a move.

| Method | Description | |--------|-------------| | move(x, y, options?) | Moves the cursor. options.steps smooths movement | | down(options?) | Presses a button (default: 'left') | | up(options?) | Releases a button | | click(x, y, options?) | Move + down + up at (x, y) | | dblclick(x, y, options?) | Two clicks in quick succession | | wheel(deltaX, deltaY) | Scroll wheel at the current cursor position |

CDPSession

The unified CDP channel. Owns the debugger attachment and provides typed access to every form of browser communication this SDK uses.

| Method | Description | |--------|-------------| | attach() | Attaches the debugger (idempotent) | | enableDomains() | Enables DOM, Page, Network, DOMSnapshot, Accessibility | | send(method, params?) | Sends a CDP command | | onEvent(method, handler) | Subscribes to a CDP event (auto-filtered by tab). Returns an unsubscribe function | | offEvent(handler) | Removes a previously-registered event handler | | evaluate(fn, args?) | Runs a function in the page's main world via Runtime.evaluate | | executeScript(fn, args?, options?) | Runs a function via chrome.scripting.executeScript. options.world selects 'MAIN' or 'ISOLATED' | | executeScriptAllFrames(fn, args?, world?) | Same as above, but executes in every frame and returns each frame's result | | attachIframe(frameId) | Attaches to a cross-origin iframe and returns a new CDPSession bound to that target | | detach() | Removes all event listeners and detaches the debugger |

EventManager

chrome.tabs.* and chrome.webRequest.* event wrappers with shared cleanup.

| Method | Description | |--------|-------------| | on(eventName, callback) | Registers a custom event listener | | off(eventName, callback) | Removes a custom event listener | | emit(eventName, ...args) | Emits a custom event | | onTabCreated(callback) | Listens for tab created events | | onTabRemoved(callback) | Listens for tab removed events | | onTabUpdated(callback) | Listens for tab updated events | | onTabActivated(callback) | Listens for tab activated events | | onBeforeRequest(callback, filter?) | Listens for network requests | | onCompleted(callback, filter?) | Listens for completed requests | | cleanup() | Removes every listener registered through this manager |

Enums

ScrollDirection

enum ScrollDirection {
  UP = 'UP',
  DOWN = 'DOWN',
}

Types

// Tab information (re-exported chrome.tabs.Tab)
type TabInfo = chrome.tabs.Tab

// Page content snapshot
interface PageContent {
  html: string
  markdown?: string
  meta?: PageMetadata
  isPdf?: boolean
  resources?: unknown[]
}

// Page metadata
interface PageMetadata {
  title?: string
  description?: string
  url?: string
  image?: string
  [key: string]: string | undefined
}

// Screenshot options
interface ScreenshotOptions {
  format?: 'png' | 'jpeg'
  quality?: number   // 0-100, jpeg only
}

// Element bounding box
interface BoundingBox {
  x: number
  y: number
  width: number
  height: number
}

// Viewport information
interface ViewportInfo {
  width: number
  height: number
  x: number
  y: number
}

// Selector resolution result (returned by Element.resolve())
interface ResolvedNode {
  nodeId: number
  backendNodeId: number
  object: RemoteObject   // CDP Runtime.RemoteObject
}

Factory Functions

| Function | Description | |----------|-------------| | createCDPSession(tabId) | Creates a CDPSession, attaches the debugger, and enables CDP domains | | createKeyboard(session) | Creates a Keyboard with platform detection | | createMouse(session, keyboard) | Creates a Mouse |

Default Instances

import { browser, events } from 'browser-agentkit';

// Pre-initialized Browser instance
browser.openTab('https://example.com');

// Pre-initialized EventManager instance
events.onTabCreated((tab) => console.log(tab));

Try the Demo Extension

A full-featured demo Chrome extension lives in examples/crx-demo/. It exercises every SDK module via an interactive side panel, targeting a deterministic playground page bundled with the extension.

# Build the SDK and demo extension
pnpm run example:build

# Or develop with HMR
pnpm run example:dev

Then load examples/crx-demo/extension/ as an unpacked extension in Chrome (chrome://extensions → Developer mode → Load unpacked). Click the extension icon to open the side panel, then click "Open Playground" to get a stable target page.

The demo covers: Browser tab lifecycle, Page navigation/click/fill/scroll/evaluate, Element resolution/bounding-box/attributes, Content extraction with AX-node round-trip, Keyboard & Mouse simulation including drag-and-drop, EventManager network listeners, raw CDPSession escape hatch, and utility helpers (retry, waitForCondition, withTimeout, getPlatform).

Requirements

  • Chrome / Chromium browser
  • Chrome extension with appropriate permissions:
    • tabs
    • debugger
    • scripting
    • activeTab
    • history

License

MIT