npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@monostate/node-scraper

v2.2.2

Published

Intelligent web scraping with AI Q&A, PDF support and multi-level fallback system - 11x faster than traditional scrapers

Downloads

249

Readme

@monostate/node-scraper

Intelligent web scraping with multi-tier fallback — 11x faster than traditional scrapers

npm License Node

Install

npm install @monostate/node-scraper

LightPanda is downloaded automatically on install. Puppeteer is an optional peer dependency for full browser fallback.

Usage

import { smartScrape, smartScreenshot, quickShot } from '@monostate/node-scraper';

// Scrape with automatic method selection
const result = await smartScrape('https://example.com');
console.log(result.content);
console.log(result.method); // 'direct-fetch' | 'lightpanda' | 'puppeteer'

// Screenshots
const screenshot = await smartScreenshot('https://example.com');
const quick = await quickShot('https://example.com'); // optimized for speed

// PDFs are detected and parsed automatically
const pdf = await smartScrape('https://example.com/doc.pdf');

Force a specific method

const result = await smartScrape('https://example.com', { method: 'direct' });
// Also: 'lightpanda', 'puppeteer', 'auto' (default)

No fallback occurs when a method is forced — useful for testing and debugging.

Advanced usage

import { BNCASmartScraper } from '@monostate/node-scraper';

const scraper = new BNCASmartScraper({
  timeout: 10000,
  verbose: true,
});

const result = await scraper.scrape('https://complex-spa.com');
const stats = scraper.getStats();
const health = await scraper.healthCheck();

await scraper.cleanup();

Bulk scraping

import { bulkScrape, bulkScrapeStream } from '@monostate/node-scraper';

const results = await bulkScrape(urls, {
  concurrency: 5,
  continueOnError: true,
  progressCallback: (p) => console.log(`${p.percentage.toFixed(1)}%`),
});

// Or stream results as they complete
await bulkScrapeStream(urls, {
  concurrency: 10,
  onResult: async (result) => await saveToDatabase(result),
  onError: async (error) => console.error(error.url, error.error),
});

See BULK_SCRAPING.md for full documentation.

Browser sessions

Persistent browser sessions with real-time control. Three modes:

import { createSession } from '@monostate/node-scraper';

// Headless (default) — LightPanda with Chrome fallback
const session = await createSession({ mode: 'auto' });
await session.goto('https://example.com');
const content = await session.extractContent();
const state = await session.getPageState({ includeScreenshot: true });
await session.close();

// Visual — Chrome with headless:false for dev/debug
const visual = await createSession({ mode: 'visual' });
await visual.goto('https://example.com');
await visual.screenshot(); // real Chrome rendering
await visual.close();

Session methods: goto, click, type, scroll, hover, select, pressKey, goBack, goForward, screenshot, extractContent, getPageState, waitFor, evaluate, getCookies, setCookies.

Computer use (coordinate-based browser control)

For AI agents that navigate by pixel coordinates -- useful for anti-bot sites, dynamic UIs, or anything that can't be scraped with selectors.

import { createSession, LocalProvider } from '@monostate/node-scraper';

// LocalProvider runs Xvfb + Chrome + xdotool (Linux only)
const session = await createSession({
  mode: 'computer-use',
  provider: new LocalProvider({ screenWidth: 1280, screenHeight: 800, enableVnc: true }),
});

await session.goto('https://example.com');

// Coordinate-based actions (delegated to provider)
await session.clickAt(640, 400);
await session.typeText('hello world');
await session.mouseMove(100, 200);
await session.drag(10, 20, 300, 400);
await session.scrollAt(640, 400, 'down', 5);
const pos = await session.getCursorPosition();
const size = await session.getScreenSize();

// Selector-based actions still work (via Puppeteer CDP)
await session.click('#submit');
await session.type('#search', 'query');

// VNC streaming URL (if provider supports it)
console.log(session.getVncUrl());

await session.close();

Custom providers

Implement ComputerUseProvider to connect any VM/container backend:

import { ComputerUseProvider } from '@monostate/node-scraper';

class MyProvider extends ComputerUseProvider {
  async start() {
    // Provision VM, return { cdpUrl, vncUrl, screenSize }
  }
  async mouseClick(x, y, button) { /* ... */ }
  async screenshot() { /* ... */ }
  async stop() { /* cleanup */ }
}

AI-powered Q&A

Ask questions about any website using OpenRouter, OpenAI, or local fallback:

import { askWebsiteAI } from '@monostate/node-scraper';

const answer = await askWebsiteAI('https://example.com', 'What is this site about?', {
  openRouterApiKey: process.env.OPENROUTER_API_KEY,
});

API key priority: OpenRouter > OpenAI > BNCA backend > local pattern matching (no key needed).

How it works

The scraper uses a three-tier fallback system:

  1. Direct fetch — Pure HTTP with HTML parsing. Sub-second, handles ~75% of sites.
  2. LightPanda — Lightweight browser engine, 2-3x faster than Chromium. Handles SPAs.
  3. Puppeteer — Full Chromium for maximum compatibility.

Additional specialized handlers:

  • PDF parser — Automatic detection by URL, content-type, or magic bytes. Extracts text, metadata, and page count.
  • Screenshots — Chrome CLI capture with retry logic and smart timeouts.

Browser instances are pooled (max 3, 5s idle timeout) to prevent memory leaks.

Performance

| Site Type | node-scraper | Firecrawl | Speedup | |-----------|-------------|-----------|---------| | Wikipedia | 154ms | 4,662ms | 30.3x | | Hacker News | 1,715ms | 4,644ms | 2.7x | | GitHub | 9,167ms | 9,790ms | 1.1x |

Average: 11.35x faster with 100% reliability.

API Reference

Convenience functions

| Function | Description | |----------|-------------| | smartScrape(url, opts?) | Scrape with intelligent fallback | | smartScreenshot(url, opts?) | Full page screenshot | | quickShot(url, opts?) | Fast screenshot capture | | bulkScrape(urls, opts?) | Batch scrape multiple URLs | | bulkScrapeStream(urls, opts?) | Stream results as they complete | | askWebsiteAI(url, question, opts?) | AI Q&A about a webpage |

BNCASmartScraper options

{
  timeout: 10000,            // Request timeout (ms)
  retries: 2,                // Retries per method
  verbose: false,            // Detailed logging
  lightpandaPath: './bin/lightpanda',
  lightpandaFormat: 'html',  // 'html' or 'markdown'
  userAgent: 'Mozilla/5.0 ...',
  openRouterApiKey: '...',
  openAIApiKey: '...',
  openAIBaseUrl: 'https://api.openai.com',
}

Methods

  • scraper.scrape(url, opts?) — Scrape with fallback
  • scraper.screenshot(url, opts?) — Take screenshot
  • scraper.quickshot(url, opts?) — Fast screenshot
  • scraper.askAI(url, question, opts?) — AI Q&A
  • scraper.getStats() — Performance statistics
  • scraper.healthCheck() — Check method availability
  • scraper.cleanup() — Close browser instances

Response shape

{
  success: true,
  content: '...',          // Extracted content (JSON string)
  method: 'direct-fetch',  // Method used
  url: 'https://...',
  performance: { totalTime: 154 },
  stats: { ... }
}

TypeScript

Full type definitions are included (index.d.ts).

import { BNCASmartScraper, ScrapingResult } from '@monostate/node-scraper';

Notes

  • Server-side only — requires filesystem access and browser automation.
  • Node.js 20+ required.
  • LightPanda binary is auto-installed. Puppeteer is optional.
  • No external API calls for scraping — all processing is local.

Changelog

See CHANGELOG.md for the full release history.

License

MIT