npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@rafikidota/scoutee

v0.19.1

Published

Sometimes, the best way to solve your own problems is to help someone else.

Readme

🕵️ @rafikidota/scoutee

"Sometimes, the best way to solve your own problems is to help someone else."

npm version license node pnpm

Scoutee is a NestJS library that wraps Crawlee crawlers into injectable, environment-driven modules. It gives you production-ready HttpCrawler, CheerioCrawler, PlaywrightCrawler, and stealth Camoufox crawlers — all wired up with pre/post navigation hooks, structured logging, and full ConfigService integration out of the box.


📦 Installation

pnpm add @rafikidota/scoutee

Peer dependencies

Install the crawlers you actually need:

# HTTP / Cheerio (lightweight)
pnpm add crawlee

# Playwright (full browser)
pnpm add crawlee @crawlee/playwright playwright

# Camoufox (stealth browser — anti-bot fingerprint spoofing)
pnpm add crawlee @crawlee/playwright playwright camoufox-js

Scoutee also requires a NestJS application context:

pnpm add @nestjs/common @nestjs/core @nestjs/config

🗂️ Package exports

Each crawler ships as a separate entry point so you only bundle what you use:

| Import path | What you get | |---|---| | @rafikidota/scoutee | All four modules | | @rafikidota/scoutee/http | HttpModule + HttpService | | @rafikidota/scoutee/cheerio | CheerioModule + CheerioService | | @rafikidota/scoutee/playwright | PlaywrightModule + PlaywrightService | | @rafikidota/scoutee/camoufox | CamoufoxModule + CamoufoxService |


🚀 Quick start

1. Register the module

Import only the modules you need. Each one is self-contained.

// app.module.ts
import { Module } from '@nestjs/common';
import { ConfigModule } from '@nestjs/config';
import { PlaywrightModule } from '@rafikidota/scoutee/playwright';

@Module({
  imports: [
    ConfigModule.forRoot({ isGlobal: true }),
    PlaywrightModule,
  ],
})
export class AppModule {}

2. Inject the service and create a crawler

import { Injectable } from '@nestjs/common';
import { PlaywrightService } from '@rafikidota/scoutee/playwright';
import { Dataset } from 'crawlee';

@Injectable()
export class ScraperService {
  constructor(private readonly playwright: PlaywrightService) {}

  async run() {
    const crawler = this.playwright.create({
      async requestHandler({ page, request }) {
        const title = await page.title();
        await Dataset.pushData({ url: request.url, title });
      },
    });

    await crawler.run(['https://example.com']);
  }
}

🧩 Modules

🌐 HttpModule

Thin wrapper around Crawlee's HttpCrawler. Best for raw HTTP requests without a browser.

import { HttpModule, HttpService } from '@rafikidota/scoutee/http';
// or from '@rafikidota/scoutee'

Service method:

const crawler = httpService.create(options: HttpCrawlerOptions): HttpCrawler

Environment variables:

| Variable | Description | |---|---| | CRAWLEE_HTTP_MAX_CONCURRENCY | Maximum parallel requests | | CRAWLEE_HTTP_MIN_CONCURRENCY | Minimum parallel requests | | CRAWLEE_HTTP_MAX_REQUEST_RETRIES | Retry count per request | | CRAWLEE_HTTP_TIMEOUT_SECS | Request handler timeout (seconds) | | CRAWLEE_HTTP_MAX_REQUESTS | Total request cap per run | | CRAWLEE_HTTP_INITIAL_PAGE | Starting page number |


🍋 CheerioModule

Wrapper around Crawlee's CheerioCrawler. Automatically parses HTML with Cheerio — ideal for static or server-rendered pages.

import { CheerioModule, CheerioService } from '@rafikidota/scoutee/cheerio';

Service method:

const crawler = cheerioService.create(options: CheerioCrawlerOptions): CheerioCrawler

Environment variables:

| Variable | Description | |---|---| | CRAWLEE_CHEERIO_MAX_CONCURRENCY | Maximum parallel requests | | CRAWLEE_CHEERIO_MIN_CONCURRENCY | Minimum parallel requests | | CRAWLEE_CHEERIO_MAX_REQUEST_RETRIES | Retry count per request | | CRAWLEE_CHEERIO_TIMEOUT_SECS | Request handler timeout (seconds) | | CRAWLEE_CHEERIO_MAX_REQUESTS | Total request cap per run | | CRAWLEE_CHEERIO_INITIAL_PAGE | Starting page number |


🎭 PlaywrightModule

Full browser automation via Crawlee's PlaywrightCrawler. Supports Chromium, Firefox, and WebKit with session pooling, fingerprinting, and built-in Cloudflare challenge handling.

import { PlaywrightModule, PlaywrightService } from '@rafikidota/scoutee/playwright';

Service methods:

// Create a crawler instance
const crawler = playwrightService.create(options: PlaywrightCrawlerOptions): PlaywrightCrawler

// Get a raw browser instance
const browser = await playwrightService.getBrowser()

Environment variables:

| Variable | Description | |---|---| | CRAWLEE_PLAYWRIGHT_BROWSER | Browser engine: chromium | firefox | webkit | | CRAWLEE_PLAYWRIGHT_MAX_CONCURRENCY | Maximum parallel browser pages | | CRAWLEE_PLAYWRIGHT_MIN_CONCURRENCY | Minimum parallel browser pages | | CRAWLEE_PLAYWRIGHT_MAX_REQUEST_RETRIES | Retry count per request | | CRAWLEE_PLAYWRIGHT_TIMEOUT_SECS | Navigation and handler timeout (seconds) | | CRAWLEE_PLAYWRIGHT_MAX_REQUESTS | Total request cap per run | | CRAWLEE_PLAYWRIGHT_INITIAL_PAGE | Starting page number | | CRAWLEE_PLAYWRIGHT_HEADLESS | Run browser headless (true | false) | | CRAWLEE_PLAYWRIGHT_USE_INCOGNITO_PAGES | Use incognito context (true | false) | | CRAWLEE_PLAYWRIGHT_HANDLE_CLOUDFLARE_CHALLENGE | Auto-solve Cloudflare challenges (true | false) |

Browser types (BrowserType enum):

import { BrowserType } from '@rafikidota/scoutee/playwright';

BrowserType.CHROMIUM  // 'chromium'
BrowserType.FIREFOX   // 'firefox'
BrowserType.WEBKIT    // 'webkit'

🦊 CamoufoxModule

Stealth browser powered by Camoufox — a hardened Firefox fork designed to bypass bot detection. Uses PlaywrightCrawler under the hood with fingerprint spoofing, GeoIP emulation, WebRTC blocking, and human-like behavior simulation.

import { CamoufoxModule, CamoufoxService } from '@rafikidota/scoutee/camoufox';

Service methods:

// Create a stealth crawler instance
const crawler = await camoufoxService.create(options: PlaywrightCrawlerOptions): Promise<PlaywrightCrawler>

// Get a raw Camoufox browser instance
const browser = await camoufoxService.getBrowser()

Environment variables:

| Variable | Description | |---|---| | CRAWLEE_CAMOUFOX_MAX_CONCURRENCY | Maximum parallel browser pages | | CRAWLEE_CAMOUFOX_MIN_CONCURRENCY | Minimum parallel browser pages | | CRAWLEE_CAMOUFOX_MAX_REQUEST_RETRIES | Retry count per request | | CRAWLEE_CAMOUFOX_TIMEOUT_SECS | Navigation and handler timeout (seconds) | | CRAWLEE_CAMOUFOX_MAX_REQUESTS | Total request cap per run | | CRAWLEE_CAMOUFOX_INITIAL_PAGE | Starting page number | | CRAWLEE_CAMOUFOX_HEADLESS | Run browser headless (true | false) | | CRAWLEE_CAMOUFOX_EXECUTABLE_PATH | Custom Camoufox binary path (optional) | | CRAWLEE_CAMOUFOX_HANDLE_CLOUDFLARE_CHALLENGE | Auto-solve Cloudflare challenges (true | false) | | CRAWLEE_CAMOUFOX_USE_INCOGNITO_PAGES | Use incognito context (true | false) | | CRAWLEE_CAMOUFOX_GEOIP | Enable GeoIP emulation (true | false) | | CRAWLEE_CAMOUFOX_OS | Spoof OS fingerprint: windows | macos | linux | | CRAWLEE_CAMOUFOX_BLOCK_WEBRTC | Block WebRTC leaks (true | false) | | CRAWLEE_CAMOUFOX_HUMANIZE | Human-like mouse delay multiplier (number) | | CRAWLEE_CAMOUFOX_BLOCK_IMAGES | Block image loading for speed (true | false) | | CRAWLEE_CAMOUFOX_ENABLE_CACHE | Enable browser cache (true | false) |

OS spoof options (CamoufoxOS enum):

import { CamoufoxOS } from '@rafikidota/scoutee/camoufox';

CamoufoxOS.WINDOWS  // 'windows'
CamoufoxOS.MACOS    // 'macos'
CamoufoxOS.LINUX    // 'linux'

⚙️ Environment file example

# --- HTTP ---
CRAWLEE_HTTP_MAX_CONCURRENCY=5
CRAWLEE_HTTP_MIN_CONCURRENCY=1
CRAWLEE_HTTP_MAX_REQUEST_RETRIES=3
CRAWLEE_HTTP_TIMEOUT_SECS=30
CRAWLEE_HTTP_MAX_REQUESTS=100
CRAWLEE_HTTP_INITIAL_PAGE=1

# --- Cheerio ---
CRAWLEE_CHEERIO_MAX_CONCURRENCY=5
CRAWLEE_CHEERIO_MIN_CONCURRENCY=1
CRAWLEE_CHEERIO_MAX_REQUEST_RETRIES=3
CRAWLEE_CHEERIO_TIMEOUT_SECS=30
CRAWLEE_CHEERIO_MAX_REQUESTS=100
CRAWLEE_CHEERIO_INITIAL_PAGE=1

# --- Playwright ---
CRAWLEE_PLAYWRIGHT_BROWSER=chromium
CRAWLEE_PLAYWRIGHT_MAX_CONCURRENCY=3
CRAWLEE_PLAYWRIGHT_MIN_CONCURRENCY=1
CRAWLEE_PLAYWRIGHT_MAX_REQUEST_RETRIES=2
CRAWLEE_PLAYWRIGHT_TIMEOUT_SECS=60
CRAWLEE_PLAYWRIGHT_MAX_REQUESTS=50
CRAWLEE_PLAYWRIGHT_INITIAL_PAGE=1
CRAWLEE_PLAYWRIGHT_HEADLESS=true
CRAWLEE_PLAYWRIGHT_USE_INCOGNITO_PAGES=false
CRAWLEE_PLAYWRIGHT_HANDLE_CLOUDFLARE_CHALLENGE=false

# --- Camoufox ---
CRAWLEE_CAMOUFOX_MAX_CONCURRENCY=2
CRAWLEE_CAMOUFOX_MIN_CONCURRENCY=1
CRAWLEE_CAMOUFOX_MAX_REQUEST_RETRIES=2
CRAWLEE_CAMOUFOX_TIMEOUT_SECS=60
CRAWLEE_CAMOUFOX_MAX_REQUESTS=50
CRAWLEE_CAMOUFOX_INITIAL_PAGE=1
CRAWLEE_CAMOUFOX_HEADLESS=true
CRAWLEE_CAMOUFOX_HANDLE_CLOUDFLARE_CHALLENGE=true
CRAWLEE_CAMOUFOX_USE_INCOGNITO_PAGES=false
CRAWLEE_CAMOUFOX_GEOIP=true
CRAWLEE_CAMOUFOX_OS=linux
CRAWLEE_CAMOUFOX_BLOCK_WEBRTC=true
CRAWLEE_CAMOUFOX_HUMANIZE=1
CRAWLEE_CAMOUFOX_BLOCK_IMAGES=false
CRAWLEE_CAMOUFOX_ENABLE_CACHE=false

🏗️ Architecture overview

@rafikidota/scoutee
├── HttpModule          → HttpService (HttpCrawler)
├── CheerioModule       → CheerioService (CheerioCrawler)
├── PlaywrightModule    → PlaywrightService (PlaywrightCrawler)
│   ├── BrowserService  → browser launcher selection
│   ├── ConfigService   → env-driven configuration
│   └── HookService     → pre/post navigation hooks + logging
└── CamoufoxModule      → CamoufoxService (PlaywrightCrawler + Camoufox)
    ├── BrowserService  → Camoufox launch options
    ├── ConfigService   → env-driven configuration
    └── HookService     → pre/post navigation hooks + Cloudflare handling

Every module ships with:

  • 📋 ConfigService — reads all settings from @nestjs/config's ConfigService
  • 🪝 HookService — injects default pre/post navigation hooks (URL logging, HTTP status logging, Cloudflare challenge handling)
  • 🏭 Service — exposes a create() factory that merges default options with any overrides you pass in

📋 Choosing a crawler

| Scenario | Recommended module | |---|---| | Fast data extraction, no JS needed | 🌐 HttpModule | | Static HTML with CSS selectors | 🍋 CheerioModule | | JavaScript-heavy SPAs | 🎭 PlaywrightModule | | Anti-bot / Cloudflare protected sites | 🦊 CamoufoxModule |


🛠️ Development

# Install dependencies
pnpm install

# Build
pnpm run build

# Lint & format
pnpm run lint
pnpm run format

Publishing is automated via GitHub Actions on every v* tag push.


📄 License

MIT © rafiki