npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

dom-distill

v0.4.1

Published

Token-efficient DOM-to-tree distiller for LLMs and browser automation.

Readme

dom-distill

Token-efficient DOM-to-tree distiller for LLMs and browser automation.

The Problem

AI agents that browse the web (AutoGPT, Skyvern, browser-use, etc.) hit the same bottleneck: bloated DOMs.

A typical React/Next.js page has 2,000+ nodes. Serialize that and send it to an LLM, and you're burning 100k–180k tokens per step — most of it hidden <div>s, <svg>s, and <style> tags the model can't act on anyway.

dom-distill is a zero-dependency TypeScript engine that runs inside the browser (page.evaluate) and converts the DOM into a minimal, structured JSON array of only the nodes an LLM needs to take action.

Results

Live benchmarks running dom-distill against real sites via Playwright:

| Site | Raw Nodes | Raw Tokens | → Filtered Nodes | Filtered Tokens | Reduction | |---|---|---|---|---|---| | GitHub | 1,908 | ~147k | 115 | ~4.6k | 96.8% | | Stripe | 2,438 | ~180k | 192 | ~9.4k | 94.8% | | React Docs | 1,846 | ~68k | 139 | ~6.4k | 90.7% | | Hacker News | 807 | ~8.6k | 226 | ~10.2k | n/a (already minimal) |

What the LLM Sees

Instead of 2,000 nodes of raw HTML, the LLM gets a clean array like this:

[
  {
    "id": "dom-node-mmeytkdf-5",
    "text": "Acme Corp",
    "selector": "[data-testid=\"nav-home\"]",
    "rank": 6,
    "confidence": 1.0,
    "attributes": { "testId": "nav-home", "href": "/" }
  },
  {
    "id": "dom-node-mmeytkdf-12",
    "text": "Start free trial",
    "selector": "button[aria-label=\"Start free trial\"]",
    "rank": 3,
    "confidence": 0.7,
    "attributes": { "type": "button" }
  },
  {
    "id": "dom-node-mmeytkdf-21",
    "selector": "#email",
    "rank": 6,
    "confidence": 0.9,
    "attributes": { "name": "email", "placeholder": "[email protected]", "type": "email" }
  }
]

Each node includes:

  • selector — Deterministic CSS selector the LLM can act on: click("[data-testid=\"nav-home\"]")
  • rank — InteractionRank score (0–10+) based on tag semantics, ARIA roles, and attributes
  • confidence — Selector stability score (0–1): how likely the selector survives DOM changes

Install

npm install dom-distill

Quick Start

With Playwright

import { chromium } from 'playwright';
import { readFileSync } from 'fs';

const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('https://example.com');

// Inject and run dom-distill in the browser context
const bundle = readFileSync('node_modules/dom-distill/dist/index.js', 'utf8');

const result = await page.evaluate((code) => {
  const exports = {};
  const module = { exports };
  new Function('exports', 'module', code)(exports, module);
  const { distill, filter } = module.exports;

  const tree = distill(document.body, { maxDepth: 15, maxNodes: 500 });
  return filter(tree, { minRank: 2 });
}, bundle);

console.log(result); // Clean, actionable nodes for your LLM

In-Browser

import { distill, filter, compress, diff } from 'dom-distill';

// 1. Distill the live DOM into a structured tree
const tree = distill(document.body, { maxDepth: 10, maxNodes: 500 });

// 2. Filter to high-value interactive elements
const nodes = filter(tree, { minRank: 2 });

// 3. Compress for LLM consumption (strips WeakRefs, parent refs)
const payload = compress(tree);

// 4. Incremental updates via diffing
const nextNodes = filter(distill(document.body), { minRank: 2 });
const delta = diff(nodes, nextNodes);
// → Only send delta.changed + delta.appeared to the LLM

API

Distillation

| Function | Description | |---|---| | distill(root?, config?) | Synchronous single-pass DOM → tree | | distillAsync(root?, config?) | Chunked async traversal via requestIdleCallback (non-blocking) | | metrics(tree) | Aggregate stats: node count, depth, branching factor, forms, nav |

Filtering

| Function | Description | |---|---| | filter(tree, config?) | Keep only nodes with InteractionRank ≥ threshold | | filterAsync(tree, config?) | Same, cooperatively scheduled | | calculateInteractionRank(node) | Score a single node (0–10+) |

Compression & Diffing

| Function | Description | |---|---| | compress(tree) | Strip runtime fields → JSON-serializable | | decompress(data) | Reconstruct full tree from compressed data | | fingerprintNode(node) | Stable hash for a single node | | fingerprintTree(nodes) | Hash an entire node array | | diff(prev, next) | Three-way delta: changed / appeared / disappeared |

React Integration (tree-shakeable)

| Function | Description | |---|---| | enhanceTreeWithFiber(tree) | Walk React Fiber tree for component names, props, patterns | | findReactComponents(tree, name?) | Find components by name | | getComponentHierarchy(node) | Get the component ancestry path | | analyzeReactPatterns(tree) | Count forms, modals, dropdowns, etc. |

How It Works

  • Single-pass construction — One DFS walk builds the tree with smart selectors, semantic tags, visibility checks, and action type detection.
  • InteractionRank scoring — Each node scores 0–10+ based on tag semantics (button +3, href +3), ARIA roles (role="button" +2), and attributes. Invisible nodes score 0. Only high-ranking nodes survive filtering.
  • Selector confidence — Every selector gets a stability score: data-testid → 1.0, id → 0.9, name → 0.8, aria-label → 0.7, structural nth-of-type → 0.3. Agents can use this to prefer stable selectors.
  • Cooperative schedulingdistillAsync performs genuinely chunked traversal (stack-based DFS, yielding every 50 nodes via requestIdleCallback). No main-thread blocking, even on 10k+ node DOMs.
  • Fingerprint diffing — Stable node hashes enable three-way diffs (changed / appeared / disappeared) for incremental LLM updates instead of full-tree resends.
  • React Fiber integration — Optional enhanceTreeWithFiber() walks __reactFiber$ internals to extract component names, sanitized props (passwords/API keys redacted), and structural patterns. Fully tree-shakeable — excluded from the bundle if not imported.

Try It

Run the live demo against real websites:

git clone https://github.com/skirianov/dom-distill.git && cd dom-distill
npm install && npm run build
npm install --save-dev playwright tsx && npx playwright install chromium
npx tsx examples/demo.ts

Or point it at any URL:

npx tsx examples/demo.ts https://stripe.com

AI Agent Cookbook

Build a complete browser agent that uses an LLM to decide actions:

# Set your API key (works with OpenRouter, OpenAI, or any compatible endpoint)
OPENROUTER_API_KEY=sk-or-... npx tsx examples/agent-loop.ts

# Custom task
OPENROUTER_API_KEY=sk-or-... npx tsx examples/agent-loop.ts "Go to github.com and find trending repositories"

# Use a different model
MODEL=anthropic/claude-sonnet-4 OPENROUTER_API_KEY=sk-or-... npx tsx examples/agent-loop.ts

The agent opens a visible browser, distills the page, asks the LLM what to do, executes the action, and repeats — so you can watch it work in real-time. See examples/agent-loop.ts for the full source (~180 lines).

Zero Dependencies

No runtime dependencies. Pure TypeScript, browser APIs only. Ships ESM + CJS + full type declarations.

License

MIT