npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

webstract

v1.1.0

Published

Webstract extracts a website's complete HTML DOM along with all related CSS, JavaScript, and image assets, representing everything as a structured tree.

Downloads

2

Readme

Webstract

npm version license

CLI to snapshot a web page for offline use: downloads HTML plus CSS/JS/images on the same site, rewrites references, and saves everything locally.

Install

npm install -g webstract          # after publishing
# or run without installing
npx webstract <url> <outputDir>

Quick start

yarn start <url> <outputDir> [--concurrency <n>] [--timeout <ms>]
yarn start https://example.com ./dump

Build once for distribution:

yarn build
node dist/cli.js <url> <outputDir>

What it does

  • Follows redirects; the final URL is the base for asset rewriting.
  • Saves index.html under <outputDir>/<domain>/ and rewrites references to point to downloaded files.
  • Downloads assets on the same registrable domain or same root label (e.g., daum.netdaumcdn.net), not just strict origin.
  • Collects linked CSS (link[rel=stylesheet]), JS (script[src]), images (img/srcset, img[src], source[src|srcset], icons), inline CSS in <style>/style=, and meta images (OG/Twitter).
  • Parses downloaded CSS for @import and url(...) references on the same domain/root label.
  • External origins remain absolute; skipped items are listed in missing-assets.json. Use --download-external to force-download other domains (saved under a hostname-prefixed path).
  • Writes _WST.md summary with request/final URLs and download/skip/fail counts.

CLI options

| Option | Description | Default | | --- | --- | --- | | -c, --concurrency <n> | Concurrent downloads | WEBSTRACT_CONCURRENCY or 5 | | -t, --timeout <ms> | Request timeout in ms | WEBSTRACT_TIMEOUT_MS or 15000 | | -r, --retries <n> | Retry attempts per request | WEBSTRACT_MAX_RETRIES or 3 | | --retry-delay <ms> | Delay between retries (exponential backoff) | WEBSTRACT_RETRY_DELAY_MS or 1000 | | --user-agent <ua> | Custom User-Agent string | WEBSTRACT_USER_AGENT | | --no-follow-redirects | Do not follow HTTP redirects | follow redirects | | --insecure | Allow insecure TLS (self-signed) | off | | --download-external | Force download of external-domain assets (prefixed by hostname) | off | | --no-css-parse | Skip CSS @import/url() parsing | on | | --no-meta | Skip meta (OG/Twitter) image discovery | on | | --summary-format <md|json> | _WST summary format | md | | --output-name <name> | Override output folder name | derived from domain | | --quiet / --verbose | Control log verbosity | normal |

Environment variables: WEBSTRACT_CONCURRENCY, WEBSTRACT_TIMEOUT_MS, WEBSTRACT_USER_AGENT.

Output layout

<outputDir>/<domain>/
├─ index.html
├─ _WST.md                  # summary
├─ missing-assets.json      # only if something was skipped
├─ css/...
├─ js/...
└─ images/...               # file tree mirrors remote paths

Open index.html in a browser for the offline copy. Check _WST.md for a quick summary and missing-assets.json to see which external assets stayed remote.

Programmatic use

import { webstract } from "webstract";

await webstract("https://example.com", "./dump/example.com");

Project layout

  • src/webstract.ts: Orchestrates extraction and options.
  • src/lib/: Shared utilities (HTTP client, logger).
  • src/extract/: Core extraction logic (collector, CSS parsing, downloader, rewriter, output).
  • CLI entry: src/cli.ts.

Environment variables are loaded via dotenv (quiet mode); keep your .env out of version control.