npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

docs-scraper

v1.0.3

Published

CLI and SDK for scraping documents from DocSend, Notion, and other sources

Readme

docs-scraper

A CLI tool that scrapes documents from various sources (Notion, DocSend, PDFs, etc.) and saves them as local PDF files. Uses Playwright for browser automation.

Installation

bun install

Usage

# Scrape a document (auto-starts daemon if not running)
docs-scraper scrape https://docsend.com/view/xxx

# Use a named profile for session persistence
docs-scraper scrape https://notion.so/xxx -p myprofile

# Pre-fill data for scraper (e.g., email for DocSend)
docs-scraper scrape https://docsend.com/view/xxx -D [email protected]

# Direct mode without daemon (single-shot)
docs-scraper scrape https://example.com/doc.pdf --no-daemon

Handling Authentication

When a scrape is blocked (requires login, email, passcode):

# Initial scrape returns a job ID
docs-scraper scrape https://docsend.com/view/xxx
# Output: Scrape blocked
#         Job ID: abc123

# Retry with data
docs-scraper update abc123 -D [email protected] -D password=1234

Daemon Management

The daemon runs in the background to keep browser instances warm:

docs-scraper daemon start    # Start daemon
docs-scraper daemon stop     # Stop daemon
docs-scraper daemon status   # Show PID, uptime, pending jobs

Profiles & Jobs

docs-scraper profiles list   # List saved profiles
docs-scraper profiles clear  # Clear all profiles
docs-scraper jobs list       # List pending blocked jobs

Cleanup

PDFs are stored in ~/.docs-scraper/output/. The daemon auto-cleans files older than 1 hour.

docs-scraper cleanup                  # Delete all PDFs
docs-scraper cleanup --older-than 1h  # Delete PDFs older than 1 hour

Supported Sources

  • [x] Direct PDF links
  • [x] Notion pages
  • [x] DocSend documents
  • [ ] Pitch.com
  • [ ] Google Slides
  • [ ] Google Drive
  • [x] LLM fallback (any webpage via Claude)

How It Works

  1. User runs docs-scraper scrape <url> with optional profile name
  2. CLI connects to daemon via Unix socket (auto-starts if needed)
  3. Daemon launches Playwright browser, applies saved session cookies
  4. Scraper registry routes URL to appropriate handler (Notion, DocSend, PDF, or LLM fallback)
  5. If blocked (auth required), returns job ID for retry with credentials
  6. On success, PDF is saved to ~/.docs-scraper/output/
  7. Session cookies saved to profile for future scrapes

Tech Stack

  • TypeScript + Commander.js (CLI)
  • Playwright (browser automation)
  • Anthropic Claude API (LLM fallback scraper)
  • pdf-lib (PDF manipulation)
  • Pino (logging)
  • tsup (bundling)

Configuration

Copy .env.example to .env and configure:

# Anthropic (optional, for LLM fallback scraper)
ANTHROPIC_API_KEY=

# Browser
BROWSER_HEADLESS=true          # false for debugging
BROWSER_POOL_SIZE=3

Development

Quick Start

bun install         # Install dependencies
bun run build       # Build to dist/
bun link            # Link globally for testing

Scripts

bun run dev         # Run CLI in watch mode
bun run cli         # Run CLI directly from source
bun run daemon      # Run daemon directly from source
bun run typecheck   # Type-check only
bun run test        # Run tests
bun run build       # Build for distribution

Local Testing

To test the CLI locally as if it were installed globally:

# 1. Build the project
bun run build

# 2. Link globally
bun link

# 3. Now you can use the CLI directly
docs-scraper daemon start
docs-scraper scrape https://example.com/doc.pdf
docs-scraper daemon stop

# 4. Unlink when done (optional)
bun unlink

Project Structure

src/
├── cli.ts                 # CLI entry point
├── sdk.ts                 # Programmatic SDK
├── daemon/
│   ├── server.ts          # Background daemon server
│   └── client.ts          # CLI-to-daemon communication
├── scrapers/
│   ├── index.ts           # Scraper registry
│   ├── base/              # Base scraper class
│   ├── pdf/               # Direct PDF scraper
│   ├── notion/            # Notion scraper
│   ├── docsend/           # DocSend scraper
│   └── llm/               # LLM fallback scraper
├── profiles/              # Session profile management
├── jobs/                  # Blocked job tracking
├── services/
│   └── browser/           # Playwright browser manager
├── config/                # Environment config
├── types/                 # TypeScript types
└── utils/                 # Logging, errors

Adding a New Scraper

  1. Create src/scrapers/newsource/NewScraper.ts extending BaseScraper
  2. Implement static getCapabilities() with URL patterns
  3. Implement static canHandle(url): boolean
  4. Implement scrape(url): Promise<ScraperResult>
  5. Register in src/scrapers/index.ts: scraperRegistry.register(NewScraper, 30)