npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@p-yiush07/scraper-engine

v0.2.2

Published

High-performance web scraper with anti-bot bypass, built in Rust

Readme

@p-yiush07/scraper-engine

High-performance web scraper with built-in anti-bot bypass for Node.js.

Handles 401/403 Forbidden errors automatically by emulating real browser TLS fingerprints, rotating headers across Chrome, Firefox, Safari, and Edge, and retrying with different browser profiles.

Installation

npm install @p-yiush07/scraper-engine

Prebuilt native binaries are included for macOS, Linux, and Windows.

Quick Start

const { Scraper } = require('@p-yiush07/scraper-engine');

const scraper = new Scraper();

// Auto-extract article content — no selectors needed
const content = await scraper.extractFromUrl('https://www.reuters.com/some-article');
console.log(content.headline);    // "Article Title"
console.log(content.paragraphs);  // ["First paragraph...", ...]
console.log(content.links);       // [{ text: "...", url: "..." }, ...]
console.log(content.images);      // [{ src: "...", alt: "..." }, ...]

API

new Scraper(config?)

Creates a scraper instance with optional configuration.

const scraper = new Scraper({
  browserEmulation: 'chrome',   // "chrome" | "firefox" | "safari" | "edge"
  maxRetries: 5,                // Retries on 401/403/429 with different browser profiles (default: 3)
  retryBaseDelayMs: 1000,       // Base delay for exponential backoff (default: 1000)
  timeoutMs: 30000,             // Request timeout in ms (default: 30000)
  rateLimitPerSecond: 2,        // Max requests/sec (default: unlimited)
  followRedirects: true,        // Follow redirects (default: true)
  maxRedirects: 10,             // Max redirect hops (default: 10)
  cookieStore: true,            // Enable cookie jar (default: true)
  proxy: {                      // Optional proxy
    url: 'socks5://127.0.0.1:1080',
    username: 'user',
    password: 'pass',
  },
  customHeaders: [              // Optional extra headers
    { name: 'X-Custom', value: 'value' },
  ],
  userAgents: [                 // Optional custom UA list (overrides built-in rotation)
    'Mozilla/5.0 ...',
  ],
});

scraper.extractFromUrl(url): Promise<ExtractedContent>

Fetches a URL and auto-extracts structured content. Works on any website — no CSS selectors needed.

const content = await scraper.extractFromUrl('https://example.com/article');

Returns:

{
  title: string;           // Page title
  headline: string;        // Article headline (h1)
  description: string;     // Meta description
  paragraphs: string[];    // Article body paragraphs
  links: LinkInfo[];       // All links: { text, url }
  images: ImageInfo[];     // All images: { src, alt }
}

The extraction uses a multi-strategy cascade that handles React sites, semantic HTML, common CMS patterns, and falls back to content-density analysis.

scraper.fetch(url): Promise<ScrapeResponse>

Fetches a URL with full anti-bot measures and returns raw HTML.

const response = await scraper.fetch('https://example.com');
console.log(response.status); // 200
console.log(response.body);   // Raw HTML string
console.log(response.url);    // Final URL after redirects

scraper.scrape(url, selector): Promise<ParsedElement[]>

Fetches a URL and extracts elements matching a CSS selector.

const elements = await scraper.scrape('https://news.ycombinator.com', '.titleline > a');

for (const el of elements) {
  console.log(el.text);       // "Article title"
  console.log(el.tag);        // "a"
  console.log(el.html);       // Inner HTML
  console.log(el.attributes); // [{ name: "href", value: "https://..." }]
}

scraper.fetchMany(urls, concurrency?): Promise<ScrapeResponse[]>

Fetches multiple URLs concurrently with bounded concurrency.

const urls = ['https://example.com/1', 'https://example.com/2', 'https://example.com/3'];
const responses = await scraper.fetchMany(urls, 5); // 5 concurrent requests

Scraper.parse(html, selector): ParsedElement[] (static)

Parses existing HTML with a CSS selector. No network request.

const elements = Scraper.parse(htmlString, 'div.product > h2');

Scraper.extract(html): ExtractedContent (static)

Auto-extracts structured content from existing HTML. No network request.

const content = Scraper.extract(htmlString);

How Anti-Bot Bypass Works

When a request gets blocked (401/403/429), the scraper automatically:

  1. Switches browser identity — rotates to a completely different browser's TLS fingerprint (e.g., Chrome → Firefox → Safari)
  2. Updates all headers — User-Agent, Sec-Ch-Ua, Accept, and other headers match the new browser so nothing looks suspicious
  3. Handles cookies — automatically stores and resends session cookies to solve challenge-response flows
  4. Backs off — waits with exponential backoff + jitter before retrying
  5. Repeats — tries up to maxRetries times with a different browser profile each attempt

This works because many anti-bot systems identify scrapers by their TLS fingerprint (JA3/JA4) — the scraper emulates 100+ real browser fingerprints at the TLS level, not just the User-Agent string.

TypeScript Support

Full TypeScript definitions are included. All types are auto-generated:

import { Scraper, ScraperConfig, ExtractedContent, ScrapeResponse, ParsedElement } from '@p-yiush07/scraper-engine';

Supported Platforms

| Platform | Architecture | Status | |----------|-------------|--------| | macOS | ARM64 (Apple Silicon) | Available | | macOS | x86_64 (Intel) | Available | | Linux | x86_64 (GNU) | Available | | Windows | x86_64 (MSVC) | Available |

Prebuilt binaries are included for all platforms

Issues

Report bugs and request features at scraper-engine-issues.

License

MIT