npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

lupin-cli

v0.2.0

Published

Adaptive scraper with HTTP-first routing, Camoufox headless escalation, and Patchright fallback.

Readme

npm install -g lupin-cli
lupin setup
lupin fetch https://www.nytimes.com/ --format markdown

Why Lupin · Comparison · Benchmark · Platforms · MCP for AI Agents · Docs


Why Lupin?

Most web pages don't need a stealth browser. But when they do, you shouldn't have to figure that out yourself. Sometimes, your scraping pipeline works with plain HTTP 10 times in a row, but then fails the 11th time. Lupin solves that issue by implementing smart escalation.

HTTP (fast, ~0.2s) ──→ Blocked ? ──→ Camoufox (stealth Firefox) ──→ Blocked ? ──→ Patchright (stealth Chrome)

Lupin starts with a plain HTTP request. If the response looks blocked (Cloudflare challenge, empty body, bot detection page), it automatically escalates to two heavily patched stealth browsers: Camoufox, an anti-fingerprint Firefox fork, and Patchright, a patched Chromium that passes every major bot detector. Having two different engines (selected for their efficiency) maintained by two different teams diminishes the risk of watching your request suddenly get blocked on all engines.

Domains that needed escalation are remembered with their engine. Next time, Lupin skips straight to the engine that worked (24h sticky memory). Over time, your scraping gets faster automatically.

This matters because:

  • Most of your requests will go through HTTP, saving 10-20x time and bandwidth compared to a headless browser
  • Only when exhausted or picking a hard domain, your requests will use a stealth browser
  • This means: faster, a bit more reliable scraping and less egress/proxy costs for all your projects

Benchmark

Benchmark as of 2026-04-07, on 25 real-world targets considered hard. These results are not definitive, anti-bot protections evolve all the time and one website that was crawlable one day may become blocked tomorrow.

| Site | Lupin | Crawlee | Scrapling | Crawl4AI | Exa MCP | Claude Code fetch() | | --------------- | --------- | --------- | --------- | --------- | --------- | ------------------- | | Reuters | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | | Bloomberg | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | | NY Times | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | | Booking.com | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | | Zillow | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | TikTok | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | | Indeed | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | | ScienceDirect | ✅ | ✅ | ✅ | ✅ | — | — | | Reddit | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | | Instagram | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | | YouTube | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | | X.com | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | | Pinterest | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | | Amazon | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | | LinkedIn | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Washington Post | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | | Medium | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | | Cloudflare | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Polymarket | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Airbnb | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | | eBay | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | | ArXiv | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Wikipedia | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Craigslist | ✅ | ✅ | ✅ | ✅ | — | — | | example.com | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Score | 25/25 | 22/25 | 21/25 | 19/25 | 17/23 | 7/23 |

Benchmark run 2026-04-07. Crawlee uses PlaywrightCrawler, Scrapling uses curl_cffi (HTTP-only), Claude Code uses the native fetch web function, Crawl4AI uses Playwright via patchright. Exa MCP and CC fetch tested on 23 of 25 URLs (— = not tested). Please note that in our tests, some heavily protected websites still fail after 4-5 consecutive attempts; these websites need either proxy rotation or more custom fingerprinting.


Built-in web search

Lupin provides built-in web search as a convenience (supporting DuckDuckGo and Google as engines), DuckDuckGo is the default engine and the most reliable in our tests.

# Search the web (default engine: DuckDuckGo)
lupin search web "best open source web scraping tools" --limit 10

# Search a specific site with most recent results first and in markdown format
lupin search web "agent memory" --site docs.anthropic.com --sort recent --format markdown

Popular social media platforms

Lupin provides built-in scrapers for the 8 most popular social platforms, using web search as a source for links. No API keys and no cookie exports required.

lupin search x "from:elonmusk AI" --limit 5
lupin search tiktok "productivity hacks" --limit 10
lupin search instagram "street photography" --limit 5
lupin fetch reddit https://reddit.com/r/node/comments/abc --max-comments 20

| Platform | Search | Fetch | Method | | ------------ | ------ | ----- | ------------------ | | Web / Google | ✅ | ✅ | Browser | | X / Twitter | ✅ | ✅ | Browser | | Reddit | ✅ | ✅ | HTTP only | | Hacker News | ✅ | ✅ | HTTP only | | YouTube | ✅ | ✅ | HTTP only | | Instagram | ✅ | ✅ | Browser for search | | TikTok | ✅ | ✅ | Browser for search | | Polymarket | ✅ | ✅ | HTTP only |

Platform scrapers are provided as a convenience. You can install/uninstall them at any time. Please note that scrapers for popular platforms often change and require updates (see below). Need a site that isn't built in? You can build your own installable platform package. See the custom platform guide.

Platform updates and health checks

Social sites change often. Lupin separates platform health from core scraping so you can see what is installed, check whether a provider still works, and update platform packages when fixes ship.

# Show installed platforms, source, status, and version
lupin platform list

# Check whether Lupin core or platform packages have updates
lupin update check
lupin platform update --check

# Run manifest/tool checks for every platform
lupin platform doctor --all

# Run live smoke checks against known public targets
lupin platform doctor --all --smoke

Quick start

npm install -g lupin-cli
lupin setup               # installs browser engines
lupin setup --with-video  # adds yt-dlp + FFmpeg for video download
lupin doctor              # shows what's ready
# Scrape any page
lupin fetch https://example.com

# Output as markdown (for LLMs, RAG pipelines)
lupin fetch https://example.com --format markdown

# Output as JSON (for scripts/crawl)
lupin fetch https://example.com --format json

# Output as HTML (for scripts/crawl)
lupin fetch https://example.com --format html

# Search the web
lupin search web "best web scraping library 2026"

# Crawl an entire site
lupin crawl https://docs.example.com --depth 2 --limit 50 --format markdown -o docs.jsonl

# Extract structured data with an LLM
lupin fetch https://example.com --schema '{"type":"object","properties":{"title":{"type":"string"}}}'

# Download YT/TikTok/Instagram video content
lupin download https://www.youtube.com/watch?v=dQw4w9WgXcQ

Docker

docker build -t lupin .
docker run --rm -i lupin fetch https://example.com
docker run --rm -i lupin --mcp

HTTP-only flows (fetch in auto mode, search reddit, search hn, search youtube) work before browser setup.


Using in AI Agents

We recommend that your agents use Lupin as a CLI or as an MCP server. Both let your agents scrape, search, browse and crawl.

CLI Setup (recommended, less token usage, similar features)

Claude Code / Codex / OpenCode / Hermes / OpenClaw: add instructions to your AGENTS.md:

## Web Scraping

This project uses `lupin-cli` for web scraping. Run `lupin --help` for full usage.

Common commands:
- `lupin fetch ` — scrape any page (returns JSON with text, title, status)
- `lupin fetch  --format markdown` — get clean LLM-ready markdown
- `lupin search web "query"` — web search
- `lupin search x "query"` — search X/Twitter without API keys
- `lupin search reddit "query"` — search Reddit

This setup uses ~90% fewer tokens than the MCP server and works with any agent that can run shell commands.

MCP Setup

Claude Desktop — add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "lupin": {
      "command": "npx",
      "args": ["lupin-cli", "--mcp"]
    }
  }
}

Cursor / other MCP clients:

{
  "command": "npx",
  "args": ["lupin-cli", "--mcp"]
}

Available tools in MCP

| Category | Tools | | ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Search (9) | search_web, search_google, search_x, search_reddit, search_hn, search_youtube, search_polymarket, search_instagram, search_tiktok | | Fetch (10) | fetch_page, fetch_x_post, fetch_reddit_post, fetch_hn_item, fetch_polymarket_market, fetch_youtube_video, fetch_instagram_post, fetch_instagram_profile, fetch_tiktok_post, fetch_tiktok_profile | | Browser (10) | browser_open_session, browser_navigate, browser_click, browser_type, browser_press, browser_wait_for, browser_snapshot, browser_extract, browser_screenshot, browser_close_session | | Site (2) | crawl_site, map_site | | Video (1) | download_video (requires lupin setup --with-video) |


Use as a Library

import { Lupin } from "lupin-cli";

const scraper = new Lupin();

try {
  const result = await scraper.scrape("https://example.com");
  console.log(result.engine, result.confidence, result.text.slice(0, 300));
} finally {
  await scraper.close();
}

One-shot convenience:

import { scrapePage } from "lupin-cli";

const result = await scrapePage("https://example.com", { engine: "auto" });

LLM summarization and structured schemas

Like Firecrawl and modern solutions, Lupin provides the possibility to wire in an LLM to retrieve structured data from any page using any LLM (Ollama or OpenAI-compatible endpoint) and return content as summarized markdown or structured JSON.

# Free-form extraction
lupin fetch <url> --extract "what are the prices?"

# Structured extraction with JSON Schema
lupin fetch <url> --schema '{"type":"object","properties":{"title":{"type":"string"},"price":{"type":"number"}}}'

# Multimodal: analyze images and video from platform posts
lupin fetch instagram <url> --extract "what brands are visible in the image?"
lupin fetch youtube <url> --extract "list the products shown in this video"
lupin fetch tiktok <url> --schema '{"type":"object","properties":{"products_shown":{"type":"array","items":{"type":"string"}}}}'

# Text-only extraction on any platform
lupin fetch reddit <url> --extract "summarize the top comments"

# Per-page extraction during crawls
lupin crawl https://docs.example.com --extract "summarize" --llm ollama

For platform providers (Instagram, TikTok, YouTube, X), the model receives the actual images and video alongside text, not just metadata. You can ask about what's in a photo or video, not just what the caption says.

Recommended setup: Ollama (free, local LLM, zero API keys. Requires 2-4GB of VRAM)

ollama pull qwen3.5:4b
lupin llm add ollama --base-url http://localhost:11434/v1 --model qwen3.5:4b --default

Alternative: OpenAI / OpenRouter-like endpoint

export OPENROUTER_API_KEY=sk-or-...

lupin llm add openrouter \
  --base-url https://openrouter.ai/api/v1 \
  --api-key '${OPENROUTER_API_KEY}' \
  --model qwen/qwen3.5-9b \
  --default

Also supports any OpenAI-compatible endpoint. See LLM extraction docs for all options.


Video, audio & social content download

Lupin can download video or audio from YouTube, TikTok, Instagram, and 1000+ other sites by installing yt-dlp as a dependency.

lupin setup --with-video                                    # one-time setup
lupin download https://www.youtube.com/watch?v=dQw4w9WgXcQ  # video as MP4
lupin download <url> --audio-only                            # extract MP3
lupin download <url> --subtitles                             # grab subs too

Content is downloaded temporarily into ~/.lupin/; yt-dlp will auto-update on each run.


Proxy Support

Lupin can route fetch, search, and crawl traffic through a single proxy or a rotating proxy list.

lupin fetch https://example.com --proxy socks5://127.0.0.1:1080
lupin search web "agentic AI" --proxy http://user:pass@host:port
lupin crawl https://example.com --proxy-list proxies.txt --proxy-rotate sticky-domain

Docs

| Document | Description | | -------------------------------------- | ----------------------------------------------------- | | CLI Reference | Full flag reference for every command | | Configuration | Environment variables, result schemas, engine routing | | Custom Platforms | Build, install, and share your own Lupin platforms |


Tests

npm test           # local/fixture suite
npm run test:live  # public-site verification
npm run test:all   # both

License

MIT