npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

html-data-scraper

v2.0.0

Published

A resilient, stealth-enabled web scraper built on Playwright with declarative CSS selectors and concurrent page processing.

Readme

html-data-scraper

npm version License: MIT

A resilient, stealth-enabled web scraper built on Playwright. Scrape data from multiple URLs concurrently using declarative CSS selectors or custom JavaScript evaluation, with built-in anti-detection and automatic retries.

Features

  • Playwright-powered -- uses Chromium via Playwright for reliable, modern browser automation
  • Concurrent scraping -- distributes URLs across multiple browser tabs automatically
  • Declarative CSS selectors -- extract data using simple selector strings alongside custom functions
  • Stealth by default -- randomized user agents, viewports, and human-like delays
  • Resilient crawling -- automatic retries with exponential backoff, rate limiting, and error collection
  • TypeScript-first -- fully typed API with strict null checks

Quick start

npm install html-data-scraper
import htmlDataScraper from 'html-data-scraper';

const { results, browserInstance } = await htmlDataScraper([
    'https://en.wikipedia.org/wiki/Web_scraping',
], {
    onEvaluateForEachUrl: {
        heading: '#firstHeading',                  // CSS selector -> textContent
        title: () => document.title,               // function -> page.evaluate()
    },
});

console.log(results[0].evaluates);
// { heading: 'Web scraping', title: 'Web scraping - Wikipedia' }

await browserInstance.close();

Documentation

| Guide | Description | |---|---| | Getting Started | Installation, basic usage, and first scraper | | API Reference | Full API documentation with all types and options | | Stealth | Anti-detection features and configuration | | Resilience | Retries, rate limiting, and error handling | | Migration from v1 | Upgrading from v1 (Puppeteer) to v2 (Playwright) | | Contributing | Development setup and contribution guidelines |

Examples

Ready-to-run example projects in the examples/ folder:

| Example | Description | |---|---| | Wikipedia Scraper | Scrape structured data from 6 Wikipedia articles across 3 concurrent tabs using CSS selectors, functions, route interception, and progress tracking | | News Monitor | Monitor headlines from 5 international news sites with stealth, rate limiting, retries, screenshots, and graceful error handling |

cd examples/wikipedia-scraper
npm install && npx playwright install chromium
npm start

Example: scrape with error tolerance

const { results } = await htmlDataScraper([
    'https://example.com/page-1',
    'https://example.com/page-2',
    'https://this-will-fail.invalid',
], {
    onEvaluateForEachUrl: {
        title: 'h1',
    },
    resilience: {
        retries: 2,
        continueOnError: true,
    },
});

// Failed URLs have an error field instead of crashing the batch
for (const result of results) {
    if (result.error) {
        console.error(`Failed: ${result.url} - ${result.error}`);
    } else {
        console.log(`${result.url}: ${result.evaluates?.title}`);
    }
}

License

MIT -- Ravidhu Dissanayake