npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@nikx/dory-worker

v1.0.4

Published

Standalone BullMQ worker for Dory – runs on any machine with Docker (including Raspberry Pi)

Readme

dory-worker

BullMQ job consumer for the Dory web scraping platform. Runs on any machine with Docker — including a Raspberry Pi. Pulls scraping jobs off a shared Redis queue and executes them by launching dory-core containers locally.

npm: @nikx/[email protected]


Architecture

dory-api (Railway)
  └─ enqueues job → BullMQ (Redis)
       └─ dory-worker (your home machine / Pi)
            │  GET /api/runs/:id/config
            │  POST /api/runs/:id/status  (running / completed / failed)
            │
            ├─ Single-container mode  (containerCount = 1)
            │    └─ docker run dory-core:v2
            │         └─ Crawlee in-memory queue
            │
            └─ Distributed mode  (containerCount > 1)
                 ├─ docker run dory-core:v2 × N
                 │    ├─ REDIS_URL=redis://host.docker.internal:6379
                 │    ├─ QUEUE_NAME=<runId>   ← job-scoped, isolated
                 │    ├─ WORKER_ID=worker-1..N
                 │    └─ IDLE_TIMEOUT_SECS=60
                 │
                 └─ Shared Redis queue  (rq:<queueId>:*)
                      ├─ :meta      queue metadata
                      ├─ :requests  all URLs ever added (Hash)
                      ├─ :ordering  Lua-locked sorted set
                      └─ :handled   completed requestIds (Set)

Distributed queue internals

| Concern | Mechanism | |---------|-----------| | Deduplication | SHA-256(uniqueKey).slice(0,15) → requestId; HGET :requests guard before any write | | Atomic locking | Lua script LUA_LIST_AND_LOCKZADD score = ±lockExpiresAt; no two containers claim the same URL | | Retry | Crawlee increments retryCount, re-enqueues until maxRequestRetries; exhausted → SADD :handled + errorMessages | | Idle shutdown | IDLE_TIMEOUT_SECS = min(60, actorTimeoutSecs / 2); containers exit cleanly when the queue drains |


Prerequisites

  • Node.js ≥ 20
  • Docker (with access to dory-core:v2 image — build locally or pull from registry)
  • Redis (local container or remote — same instance used by dory-api)

Quick Start

1. Install

npm install -g @nikx/dory-worker

Or run from source:

git clone https://github.com/your-org/dory-worker
cd dory-worker
npm install

2. Configure

cp .env.example .env

Edit .env:

# Required — public URL of dory-api (must be reachable from Docker containers)
API_BASE_URL=https://your-api.railway.app

# Redis — Option A: full URL (recommended for Railway)
REDIS_URL=redis://default:[email protected]:6379

# Redis — Option B: host + port
REDIS_HOST=localhost
REDIS_PORT=6379

# Optional — single-container mode default (overridden per-actor by dory-api)
CONTAINER_COUNT=1

# Distributed mode — Redis the crawling containers share
# Must be reachable from INSIDE Docker containers on this machine
# e.g. redis://host.docker.internal:6379  for a local Redis
CRAWLER_REDIS_URL=redis://host.docker.internal:6379

# Worker concurrency — keep at 1-2 for Raspberry Pi
MAX_CONCURRENT_RUNS=2

# Fallback image if dory-api doesn't return one
DOCKER_IMAGE=dory-core:v2

3. Run

# From npm package
dory-worker

# From source
npm run dev

# Built
npm run build && npm start

Environment Variables

| Variable | Required | Default | Description | |----------|----------|---------|-------------| | API_BASE_URL | ✅ | — | dory-api URL (reachable from Docker containers) | | REDIS_URL | one of | — | Full Redis URL | | REDIS_HOST | one of | localhost | Redis hostname | | REDIS_PORT | — | 6379 | Redis port | | REDIS_PASSWORD | — | — | Redis password | | CRAWLER_REDIS_URL | distributed | — | Redis for the per-run crawler queue | | CONTAINER_COUNT | — | 1 | Containers per job (overridden by dory-api per actor) | | MAX_CONCURRENT_RUNS | — | 2 | Parallel BullMQ jobs | | WORKER_ID | — | dory-worker-{pid} | Label shown in logs | | LOG_LEVEL | — | info | debug \| info \| warn \| error | | DOCKER_IMAGE | — | — | Fallback image if API doesn't return one | | GCS_BUCKET | — | — | Passed through to containers for result uploads | | GCP_PROJECT_ID | — | — | Passed through to containers | | GOOGLE_APPLICATION_CREDENTIALS | — | — | Path to GCP service account JSON file | | GOOGLE_APPLICATION_CREDENTIALS_JSON | — | — | Full service account JSON string (Railway / CI — written to /tmp/gcp-dory-credentials.json on startup) | | STORAGE_EMULATOR_HOST | — | — | fake-gcs-server URL (local dev) | | QUEUE_RUN_EXECUTION | — | run-execution | BullMQ queue name for scraping jobs — must match the API's value |


How a Job Flows

  1. dory-api enqueues a BullMQ job { runId } onto the run-execution queue (configurable via QUEUE_RUN_EXECUTION).
  2. Worker picks up the job — calls GET /api/runs/:id/config to get actorConfig, dockerImage, containerCount, memoryLimitMb, actorTimeoutSecs.
  3. Worker calls POST /api/runs/:id/status{ status: "running" }.
  4. Worker calls docker run (once for single-container, N times for distributed). Each container receives:
    • ACTOR_CONFIG — base64-encoded actor/user-input JSON
    • API_BASE_URL — so the container can POST status callbacks
    • CRAWLEE_MEMORY_MBYTES — from memoryLimitMb
    • (distributed only) REDIS_URL, QUEUE_NAME, WORKER_ID, IDLE_TIMEOUT_SECS
  5. Worker extends the BullMQ lock every 2 minutes while containers run.
  6. Worker calls docker wait on all containers (in parallel). Uses the worst exit code.
  7. Worker calls POST /api/runs/:id/status{ status: "completed"|"failed", exitCode } — only if no HTTP callback arrived (fallback).

containerCount precedence: dory-api /config response > CONTAINER_COUNT env var > default 1.

localhost rewriting: API_BASE_URL and CRAWLER_REDIS_URL containing localhost are automatically rewritten to host.docker.internal before being injected into containers.


Source Layout

src/
  cli.ts             Entry point — loads config, starts BullMQ Worker
  config.ts          WorkerConfig interface + loadConfig() from env vars
  worker.ts          BullMQ Worker setup, concurrency, graceful shutdown
  processor.ts       Core job handler — fetch config, spawn containers, wait
  docker.ts          docker run / docker wait wrappers; DistributedOpts
  logger.ts          Structured logger with log levels

test/
  harness.ts         Standalone test harness — mock API + real worker + Redis inspection
  redis-inspector.ts Post-run queue inspector — reads rq:* keys, returns metrics

scripts/
  run-all-tests.ts   13-scenario E2E suite runner → writes E2E-TEST-REPORT.md

test-image/
  Dockerfile         Minimal test image used by the harness in CI

Testing

Run a single scenario

# Minimal (single container, empty handlers — validates worker lifecycle)
npm test

# Real cheerio crawl (quotes.toscrape.com, 10 pages)
SCENARIO=real-crawl npm test

# Distributed mode (2 containers)
npm run test:distributed

# Distributed, 3 containers, 50 pages
DISTRIBUTED=true CONTAINER_COUNT=3 SCENARIO=dist-large npm test

# Deduplication — triplicate seed URLs
DISTRIBUTED=true SCENARIO=dedup npm test

# Retry on failure — handler throws on page 2
DISTRIBUTED=true SCENARIO=retry-failure npm test

# API failure resilience
SCENARIO=api-error EXPECT_FAILURE=true npm test

Valid SCENARIO values: minimal, real-crawl, large-crawl, distributed, dist-large, dedup, retry-failure, api-error, missing-redis.

Run the full 13-scenario suite

npm run test:all

Results are written to E2E-TEST-REPORT.md.

E2E test results (v1.0.3)

| # | Category | Scenario | Result | Duration | |---|----------|----------|--------|----------| | T01 | happy-path | Single-container · minimal | ✅ | 6.1s | | T02 | happy-path | Single-container · 10-page crawl | ✅ | 6.8s | | T03 | happy-path | Single-container · 50-page crawl | ✅ | 7.3s | | T04 | distribution | Distributed · 2 containers · 10 pages | ✅ 0% skew | 67.3s | | T05 | distribution | Distributed · 3 containers · 10 pages | ✅ 10% skew | 68.1s | | T06 | distribution | Distributed · 2 containers · 50 pages | ✅ 0% skew | 9.1s | | T07 | distribution | Distributed · 3 containers · 50 pages | ✅ 0% skew | 8.7s | | T08 | correctness | Deduplication · triplicate seed | ✅ | 66.6s | | T09 | correctness | Retry on failure · handler throws | ✅ | 67.0s | | T10 | resilience | API /config returns 500 | ✅ | 3.1s | | T11 | resilience | Non-existent Docker image | ✅ | 2.5s | | T12 | resilience | Distributed · missing CRAWLER_REDIS_URL | ✅ | 2.1s | | T13 | resilience | containerCount precedence | ✅ | 7.1s |

13/13 passed — 321.8s total. See E2E-TEST-REPORT.md for full metrics including per-worker URL counts, deduplication proof, and retry traces.


Building & Publishing

npm run build          # compile src/ → dist/
npm publish --access public

The published package exports dist/cli.js as the dory-worker binary.