npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

sitepull

v0.3.0

Published

Reverse-engineer a hosted web app and run it locally. Probes endpoints, vendors all assets, generates a zero-dependency local server. Auto-detects SPA vs MPA, recursive crawl, auth via cookie jar.

Readme

sitepull

Reverse-engineer a hosted web app and run it locally. Auto-detects SPA vs MPA, vendors every asset, generates a zero-dependency local server with safe stubs for any backend endpoints found.

npx sitepull https://example.com

v0.2 adds: --diff (re-audit + change report), --beautify (pretty-print bundles), --browser (Playwright for JS-heavy sites), --auth-flow (interactive login), and an MCP server (sitepull-mcp) so AI agents can call it as a tool.

▶ sitepull  v0.2.1
  Target: https://example.com/
  Output: ./audits/example.com/

[1/6] Reconnaissance
  ✓ HTTP 200, server: ECS
  · Scripts: 0, stylesheets: 0, icons: 0, images: 0
[2/6] Detecting site type (SPA vs MPA)
  ✓ Detected: MPA
  → 3/3 unknown paths returned 404
[3/6] Probing endpoints
  ✓ Found 1 real endpoints (out of 38 probed)
[4/6] Crawling site (max-pages=200, max-depth=4)
  ✓ [depth 0] / (1256B)
  ✓ [depth 1] /domains (4081B)
  ...
[5/6] Generating server.js + package.json + README
  ✓ Wrote app/ at audits/example.com/app
[6/6] Smoke test on port 8080
  ✓ GET / -> 200 (1256 bytes)

✓ Done in 1.7s.  cd audits/example.com/app && node server.js

Install

# One-off
npx sitepull <URL>

# Globally
npm install -g sitepull
sitepull <URL>

# From source
git clone <this repo>
cd sitepull && npm link
sitepull <URL>

Requires Node ≥ 18. Zero npm dependencies.

What it does

  1. Recon — fetches /, parses every script/stylesheet/image URL, captures HTTP headers.
  2. Detect — probes 3 random unknown paths to decide SPA vs MPA.
  3. Probe — hits 38 well-known endpoints (/api/*, /health, /.env, /admin, …) plus light POST-fuzz on any API endpoints discovered.
  4. Vendor — downloads every same-origin asset to audits/<host>/app/public/ byte-for-byte.
    • SPA: walks the asset graph from index.html + recurses into CSS url(...) refs.
    • MPA: BFS-crawls links from the homepage with depth + page caps + robots.txt support.
  5. Generate — writes server.js (zero deps, Node built-in http), package.json, README.md, AUDIT.md. Stubs each discovered backend endpoint with a safe placeholder response.
  6. Smoke test — boots the generated server, fetches /, prints status + bytes.

Flags

--out <dir>            Output directory (default: ./audits/<host>)
--port <N>             Port the generated server.js will listen on (default: 8080)

--cookie <str>         Cookie header for auth-walled sites
--user-agent <str>     Override the default User-Agent

--max-pages <N>        MPA crawler page cap (default: 200)
--max-depth <N>        MPA crawler depth cap (default: 4)
--include <regex>      Only follow URLs matching this regex
--exclude <regex>      Skip URLs matching this regex
--no-respect-robots    Ignore robots.txt
--rate-ms <N>          Polite delay between requests in ms (default: 50)
--concurrency <N>      Parallel HTTP fetches (default: 6)

--force-mode spa|mpa   Override auto-detection
--no-probe             Skip endpoint probing
--no-fuzz              Skip POST-fuzzing
--no-smoke-test        Skip the final boot test

--beautify             Pretty-print minified .js/.css/.html bundles to *.pretty.* siblings
--browser              Use Playwright Chromium instead of fetch (JS-rendered sites)
--storage-state <f>    Use Playwright storage-state file (cookies + localStorage) for auth
--auth-flow            Open Chromium for interactive login, save state, then exit
                       (use with --storage-state to point at the file to write)

--diff                 Re-audit and write DIFF.md against the previous run at --out

v0.2 features

--beautify — readable bundles

Adds a post-vendor pass that line-breaks and indents minified .js/.css/.html files larger than 30 KB. Saved as *.pretty.* siblings of the originals. Zero dependencies (uses a brace-aware walker that respects strings, template literals, regex, and comments).

sitepull https://example.com --beautify
# → public/assets/index-abc123.js          (original, untouched)
# → public/assets/index-abc123.pretty.js   (~15 K lines, grep-friendly)

--diff — detect changes since last clone

Each audit writes a .sitepull-manifest.json with SHA-256 of every vendored file. A subsequent run with --diff re-audits, compares manifests, and writes DIFF.md:

sitepull https://my-site.com                           # initial clone
# ... a week later ...
sitepull https://my-site.com --diff                    # writes DIFF.md
# → 3 added, 1 removed, 7 changed, 42 unchanged

Use this to monitor a site for changes (CSP rotation, asset cache-bust, content edits, removed routes).

--browser — render JS-heavy sites

When a site needs JavaScript to produce content (Cloudflare interstitials, hydrated SSR, infinite-scroll feeds), pass --browser to use Playwright Chromium instead of raw fetch. Playwright is an optional peer dependency — only loaded when this flag is used.

# One-time setup:
npm install -g playwright && npx playwright install chromium

sitepull https://complex-spa.example.com --browser

--auth-flow — clone sites behind login

Opens a real Chromium window. You log in interactively. On close, cookies + localStorage are persisted. Re-use that state on subsequent audits with --storage-state.

sitepull --auth-flow https://app.example.com --storage-state ./auth.json
# (browser opens; you log in; close the window)
sitepull https://app.example.com --browser --storage-state ./auth.json

Or use the simpler --cookie flag if you already have the Cookie header from devtools:

sitepull https://app.example.com --cookie "session=eyJhbGc...; csrf=xyz"

sitepull-mcp — MCP server for AI agents

A second binary, sitepull-mcp, exposes sitepull as MCP tools any compatible AI agent (Claude Desktop, Cursor, Cline, etc.) can call directly. Stdio transport.

Tools exposed:

  • web_audit(url, ...options) — run a full audit
  • web_audit_diff(url, out) — re-audit and produce DIFF.md
  • web_audit_serve(out, port) — boot the local server

Add to your agent's MCP config, e.g. for Claude Desktop:

{ "mcpServers": { "sitepull": { "command": "sitepull-mcp" } } }

Output layout

audits/<host>/
├── AUDIT.md                  # Findings report (architecture, endpoints, fuzz, hashes)
└── app/
    ├── server.js             # Generated zero-dep Node server with stubs
    ├── package.json          # `"type": "module"`
    ├── README.md             # How to run + integrity hashes + caveats
    └── public/               # Byte-for-byte vendored assets
        ├── index.html
        ├── assets/...
        └── (all other files)

Auth (cloning sites behind a login)

# Open the site in your browser, log in, copy the Cookie header from devtools,
# then pass it to sitepull:
sitepull https://app.example.com --cookie "session=eyJhbGc...; csrf=xyz"

The cookie is attached to every request the crawler/probe makes. Same-origin only — never sent to external CDNs.

Safety / responsible use

  • Polite by default: 50 ms delay between requests, max 6 concurrent fetches, respects robots.txt. Tune with --rate-ms and --concurrency.
  • Read-only: never POSTs except during --probe (and only with safe payloads to a fixed list of known API paths).
  • No upstream forwarding: generated stubs never forward to the real third-party APIs (Stripe, OpenAI, etc.). They return placeholders so the local frontend can boot.
  • No secret theft: any API keys spotted in bundles are NOT copied into the generated server. Stubs are inert.
  • Don't audit what you don't have permission to: this tool is for your own sites, sites you've been hired to test, OSS frontends, and educational reverse-engineering of public static pages. Don't use it to scrape sites that prohibit it in their ToS.

Architecture

bin/sitepull.mjs    ─ CLI entry, argv parser, error handling
lib/
├── audit.js        ─ Phases 1–6 orchestrator
├── recon.js        ─ Phase 1: fetch /, extract asset URLs from HTML
├── detect.js       ─ Phase 2: SPA/MPA classifier
├── probe.js        ─ Phase 3: endpoint sweep + POST fuzzer
├── crawl.js        ─ Phase 4 (MPA): BFS crawler with robots support
├── vendor.js       ─ Phase 4 (SPA + MPA assets): downloads to disk
├── stubs.js        ─ Phase 5a: synthesize safe stub branches
├── server-template.js ─ Phase 5b: builds server.js + package.json
├── report.js       ─ Phase 6: writes AUDIT.md + README.md
└── util.js         ─ shared (logging, fetch, hashing, throttle)

License

MIT