npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@firekid/scraper

v1.0.1

Published

The most advanced web scraping machine ever built

Readme

Firekid Scraper

Advanced web scraping framework built on Playwright with intelligent anti-detection, automatic healing, and distributed crawling capabilities.

GitHub: Firekid-is-him/firekid-scraper-sdk

Features

Anti-Detection & Stealth:

  • Ghost fingerprinting system spoofs canvas, WebGL, audio, fonts, and navigator properties
  • Cloudflare bypass with automatic fallback to manual solving
  • Behavioral profiles that mimic human interaction patterns
  • Network forensics cleaning removes tracking artifacts

Intelligent Automation:

  • Self-healing selectors with 7 fallback strategies
  • Pattern caching with SQLite storage for learned behaviors
  • Smart fetch with automatic referer chain management
  • Action recorder captures and replays user interactions

Distributed Crawling:

  • Queue-based task distribution across multiple workers
  • Browser worker pool with resource management
  • Rate limiting with configurable windows and thresholds
  • Session persistence and recovery

Developer Experience:

  • Simple command-based scripting language
  • Plugin system for extensibility
  • Multiple scraping modes: auto, manual, SSR, infinite scroll, pagination
  • Built-in scheduler for recurring tasks
  • Webhook notifications and database export

Installation

npm install @firekid/scraper
npx playwright install chromium

Global CLI Installation

npm install -g @firekid/scraper
firekid-scraper --help

Docker Installation

docker pull firekid/scraper:latest
docker run -v $(pwd)/data:/data firekid/scraper

Quick Start

Basic Scraping

import { FirekidScraper } from '@firekid/scraper'

const scraper = new FirekidScraper({
  headless: true,
  bypassCloudflare: true
})

await scraper.init()

const data = await scraper.scrape('https://example.com', {
  selectors: {
    title: 'h1',
    content: '.article-body',
    author: '.author-name'
  }
})

console.log(data)
await scraper.close()

Command-Based Scripting

const scraper = new FirekidScraper()
await scraper.init()

await scraper.runCommands(`
GOTO https://example.com
WAIT .product-list
EXTRACT .product-title text AS titles
EXTRACT .product-price text AS prices
SCREENSHOT products.png
`)

await scraper.close()

Auto Mode

const scraper = new FirekidScraper()
await scraper.init()

const data = await scraper.auto('https://example.com/products', {
  depth: 2,
  extractPattern: 'product'
})

await scraper.close()

Core API

FirekidScraper

Main scraper class that orchestrates all operations.

Constructor Options:

new FirekidScraper({
  headless: boolean,           // Run browser in headless mode (default: true)
  bypassCloudflare: boolean,   // Enable Cloudflare bypass (default: false)
  useGhost: boolean,           // Enable fingerprint spoofing (default: true)
  browserArgs: string[],       // Additional Chromium arguments
  timeout: number,             // Default timeout in ms (default: 30000)
  userAgent: string,           // Custom user agent
  viewport: { width, height }, // Browser viewport size
  proxy: string,               // Proxy URL (http://user:pass@host:port)
  rateLimit: {                 // Rate limiting configuration
    enabled: boolean,
    max: number,
    window: number
  }
})

Methods:

await scraper.init(): Initialize browser and context.

await scraper.close(): Close browser and cleanup resources.

await scraper.goto(url): Navigate to URL with anti-detection measures.

await scraper.scrape(url, options): Extract data using CSS selectors.

Options:

  • selectors: Object mapping field names to CSS selectors
  • attribute: Extract attribute instead of text (default: text)
  • multiple: Return array of all matches (default: false)
  • screenshot: Take screenshot after extraction

await scraper.runCommands(script): Execute command-based script.

await scraper.auto(url, options): Automatically detect and extract data.

Options:

  • depth: Maximum crawl depth (default: 1)
  • extractPattern: Pattern hint (product, article, listing, etc)
  • followLinks: Follow pagination/navigation links

await scraper.paginate(url, selector, options): Scrape paginated content.

Options:

  • maxPages: Maximum pages to scrape
  • waitBetween: Delay between pages in ms
  • nextSelector: Selector for next page button

await scraper.infiniteScroll(url, options): Scrape infinite scroll pages.

Options:

  • maxScrolls: Maximum scroll iterations
  • itemSelector: Selector for items to extract
  • scrollDelay: Delay between scrolls in ms

Plugin System

Extend functionality through plugins.

Loading Plugins:

const scraper = new FirekidScraper()
await scraper.loadPlugin('./plugins/custom-plugin.js')

Plugin Structure:

export default {
  name: 'custom-extractor',
  type: 'extractor',
  
  async execute(page, options) {
    const data = await page.evaluate(() => {
      return {
        title: document.title,
        meta: Array.from(document.querySelectorAll('meta'))
          .map(m => ({ name: m.name, content: m.content }))
      }
    })
    return data
  }
}

Plugin Types:

  • scraping: Custom scraping logic
  • action: Custom page actions
  • extractor: Data extraction methods
  • filter: Data filtering and validation
  • output: Custom output formats
  • parser: Data parsing and transformation

Distributed Scraping

Scale scraping across multiple workers.

import { DistributedEngine } from '@firekid/scraper'

const engine = new DistributedEngine({
  workers: 5,
  queueSize: 100,
  retries: 3
})

await engine.init()

engine.addTask({
  id: 'task-1',
  url: 'https://example.com',
  mode: 'scrape',
  options: {
    selectors: { title: 'h1' }
  },
  priority: 10
})

engine.on('taskComplete', (result) => {
  console.log('Task completed:', result)
})

engine.on('taskFailed', (error) => {
  console.error('Task failed:', error)
})

await engine.start()

Command Reference

Commands use simple syntax for browser automation.

Navigation Commands

GOTO url: Navigate to URL. Example: GOTO https://example.com

BACK: Go back in history.

FORWARD: Go forward in history.

REFRESH: Reload current page.

Interaction Commands

CLICK selector: Click element. Example: CLICK button.submit

TYPE selector text: Type text into input. Example: TYPE input[name="search"] laptop

PRESS key: Press keyboard key. Example: PRESS Enter

SELECT selector value: Select dropdown option. Example: SELECT select[name="country"] US

CHECK selector: Check checkbox. Example: CHECK input[type="checkbox"]

UPLOAD selector filepath: Upload file. Example: UPLOAD input[type="file"] ./document.pdf

Wait Commands

WAIT selector: Wait for element to appear. Example: WAIT .product-list

WAITLOAD: Wait for page load.

Scroll Commands

SCROLL selector: Scroll element into view. Example: SCROLL .footer

SCROLLDOWN pixels: Scroll down by pixels. Example: SCROLLDOWN 500

Extraction Commands

SCAN: Analyze page structure.

EXTRACT selector type AS variable: Extract data. Types: text, html, attr:name, href, src Example: EXTRACT h1 text AS title

SCREENSHOT filename: Take screenshot. Example: SCREENSHOT page.png

Advanced Commands

PAGINATE selector: Auto-paginate through results. Example: PAGINATE .next-page

INFINITESCROLL count: Scroll and load more items. Example: INFINITESCROLL 10

FETCH url: Fetch URL with smart referer. Example: FETCH https://api.example.com/data

DOWNLOAD url: Download file. Example: DOWNLOAD https://example.com/file.pdf

REFERER url: Set custom referer. Example: REFERER https://google.com

BYPASS_CLOUDFLARE: Attempt Cloudflare bypass.

Flow Control

REPEAT selector: Loop over matching elements.

REPEAT .product
  EXTRACT .title text AS titles
  EXTRACT .price text AS prices

IF selector: Conditional execution.

IF .login-button
  CLICK .login-button
  TYPE input[name="username"] admin

LOOP count: Repeat commands N times.

LOOP 5
  SCROLLDOWN 300
  WAIT 1000

Configuration

Environment Variables

HEADLESS: Run in headless mode (true/false) MAX_QUEUE_WORKERS: Maximum concurrent workers (number) BROWSER_TIMEOUT: Browser timeout in ms (number) CF_BYPASS: Cloudflare bypass mode (auto/manual/skip) TURNSTILE_SOLVER: Turnstile solver (skip/manual/2captcha/capsolver) CAPTCHA_API_KEY: API key for captcha solver API_ENABLED: Enable web API (true/false) API_PORT: API server port (number) API_KEY: API authentication key PROXY_ENABLED: Enable proxy (true/false) PROXY_URL: Proxy URL DATA_DIR: Data storage directory PATTERNS_DB: Pattern cache database path SESSIONS_DB: Session storage database path LOG_LEVEL: Logging level (error/warn/info/debug) RECORD_SCREENSHOTS: Record screenshots (true/false) RATE_LIMIT_ENABLED: Enable rate limiting (true/false) RATE_LIMIT_MAX: Max requests per window (number) RATE_LIMIT_WINDOW: Rate limit window in ms (number)

Configuration File

Create .env file in project root:

HEADLESS=true
MAX_QUEUE_WORKERS=5
BROWSER_TIMEOUT=30000
CF_BYPASS=auto
LOG_LEVEL=info
RATE_LIMIT_ENABLED=true
RATE_LIMIT_MAX=100
RATE_LIMIT_WINDOW=3600000

Advanced Usage

Custom Behavioral Profiles

const scraper = new FirekidScraper()
await scraper.init()

await scraper.setProfile('human')

await scraper.goto('https://example.com')

Available profiles:

  • fast: 30-60ms delays, minimal randomization
  • normal: 80-120ms delays, moderate randomization
  • careful: 120-180ms delays, high randomization
  • human: 50-150ms delays, natural patterns

Pattern Learning

const scraper = new FirekidScraper()
await scraper.init()

await scraper.goto('https://example.com/products')

const pattern = await scraper.learnPattern('product', {
  containerSelector: '.product-card',
  fields: ['title', 'price', 'image']
})

const products = await scraper.applyPattern('product')

Self-Healing Selectors

const scraper = new FirekidScraper()
await scraper.init()

const healer = scraper.getHealer()

const element = await healer.find('.old-selector', {
  strategies: ['id', 'className', 'text', 'position'],
  savePattern: true
})

Webhook Integration

const scraper = new FirekidScraper({
  webhook: {
    url: 'https://your-api.com/webhook',
    events: ['scrapeComplete', 'error']
  }
})

await scraper.init()
await scraper.scrape('https://example.com')

Database Export

const scraper = new FirekidScraper()
await scraper.init()

const data = await scraper.scrape('https://example.com', {
  selectors: { title: 'h1' }
})

await scraper.exportToDatabase(data, {
  type: 'postgresql',
  connection: {
    host: 'localhost',
    database: 'scraping',
    user: 'user',
    password: 'pass'
  },
  table: 'products'
})

Scheduled Tasks

import { TaskScheduler } from '@firekid/scraper'

const scheduler = new TaskScheduler()

scheduler.schedule('daily-scrape', '0 0 * * *', async () => {
  const scraper = new FirekidScraper()
  await scraper.init()
  await scraper.scrape('https://example.com')
  await scraper.close()
})

Examples

Product Scraper

import { FirekidScraper } from '@firekid/scraper'

const scraper = new FirekidScraper({ headless: true })
await scraper.init()

const products = await scraper.paginate('https://store.example.com/products', '.next-page', {
  maxPages: 10,
  selectors: {
    title: '.product-title',
    price: '.product-price',
    image: 'img.product-image',
    rating: '.product-rating'
  }
})

await scraper.export(products, 'json', './products.json')
await scraper.close()

Login and Scrape

const scraper = new FirekidScraper()
await scraper.init()

await scraper.runCommands(`
GOTO https://example.com/login
TYPE input[name="username"] myuser
TYPE input[name="password"] mypass
CLICK button[type="submit"]
WAITLOAD
GOTO https://example.com/dashboard
EXTRACT .data-table text AS tableData
`)

await scraper.close()

Infinite Scroll

const scraper = new FirekidScraper()
await scraper.init()

const items = await scraper.infiniteScroll('https://example.com/feed', {
  maxScrolls: 20,
  itemSelector: '.feed-item',
  scrollDelay: 1000,
  extractFields: {
    content: '.feed-content',
    author: '.feed-author',
    timestamp: '.feed-time'
  }
})

await scraper.close()

API Hunting

import { APIHunter } from '@firekid/scraper'

const hunter = new APIHunter()
await hunter.init()

const apis = await hunter.hunt('https://example.com', {
  captureXHR: true,
  captureFetch: true,
  captureWebSocket: true
})

console.log('Discovered APIs:', apis)
await hunter.close()

Video Download

const scraper = new FirekidScraper()
await scraper.init()

await scraper.runCommands(`
GOTO https://video-site.com/video/123
WAIT video
BYPASS_CLOUDFLARE
DOWNLOAD https://cdn.video-site.com/videos/file.mp4
`)

await scraper.close()

TypeScript Support

Full TypeScript definitions included.

import { FirekidScraper, ScraperOptions, ScrapeResult } from '@firekid/scraper'

interface Product {
  title: string
  price: number
  image: string
}

const scraper = new FirekidScraper({
  headless: true,
  bypassCloudflare: true
})

await scraper.init()

const result: ScrapeResult<Product> = await scraper.scrape('https://example.com', {
  selectors: {
    title: 'h1.product-title',
    price: '.price',
    image: 'img.main'
  }
})

await scraper.close()

Docker Usage

Using Docker Compose

version: '3.8'
services:
  scraper:
    image: firekid/scraper:latest
    volumes:
      - ./data:/data
      - ./output:/output
    environment:
      - HEADLESS=true
      - LOG_LEVEL=info
    command: firekid-scraper run ./scripts/scrape.cmd

Custom Dockerfile

FROM firekid/scraper:latest

COPY ./scripts /app/scripts
COPY ./plugins /app/plugins

WORKDIR /app
CMD ["firekid-scraper", "run", "./scripts/main.cmd"]

Performance Optimization

Connection Pooling

const scraper = new FirekidScraper({
  browserArgs: [
    '--disable-dev-shm-usage',
    '--no-sandbox',
    '--disable-setuid-sandbox',
    '--disable-gpu'
  ]
})

Resource Blocking

await scraper.optimizeRequests({
  blockImages: true,
  blockFonts: true,
  blockMedia: true
})

Parallel Scraping

import { DistributedEngine } from '@firekid/scraper'

const engine = new DistributedEngine({ workers: 10 })
await engine.init()

const urls = ['url1', 'url2', 'url3']
urls.forEach((url, i) => {
  engine.addTask({
    id: `task-${i}`,
    url,
    mode: 'scrape',
    priority: 10
  })
})

await engine.start()

Troubleshooting

Cloudflare Challenges

If automatic bypass fails, enable manual solving:

const scraper = new FirekidScraper({
  bypassCloudflare: true,
  cloudflareMode: 'manual'
})

The browser will open in headed mode for manual solving.

Memory Issues

Reduce memory usage by limiting concurrent workers:

const scraper = new FirekidScraper({
  maxWorkers: 3,
  timeout: 15000
})

Rate Limiting

Implement delays between requests:

const scraper = new FirekidScraper({
  rateLimit: {
    enabled: true,
    max: 10,
    window: 60000
  }
})

Selector Not Found

Enable self-healing selectors:

const element = await scraper.healSelector('.old-selector', {
  savePattern: true,
  strategies: ['id', 'className', 'text']
})

Contributing

Contributions are welcome. Please read the contributing guidelines before submitting pull requests.

License

MIT License. See LICENSE file for details.

Support

For issues and questions:

  • GitHub Repository: https://github.com/Firekid-is-him/firekid-scraper-sdk
  • GitHub Issues: Report bugs and request features
  • Documentation: Complete guides in the docs folder
  • Examples: Sample scripts in the examples folder

Changelog

See CHANGELOG.md for version history and release notes.