wayback-machine-downloader
v0.5.0
Published
Interactive Wayback Machine downloader for archiving websites locally.
Maintainers
Readme
wayback-machine-downloader
Downloads archived snapshots of a website from the Wayback Machine and saves them locally.
Requirements
Node.js 18 or later.
Installation
npm install -g wayback-machine-downloaderOr run directly from a local clone:
npm install
node cli.js [url] [options]Usage
Interactive mode
Run without arguments to be guided through all options via prompts:
wayback-machine-downloaderNon-interactive mode
Pass a URL (domain or full URL) directly on the command line:
wayback-machine-downloader example.com [options]
wayback-machine-downloader --url example.com [options]If both a positional URL and --url are given, --url takes precedence.
Options
Arguments:
url Domain or URL to archive (same as --url)
Options:
--url <url> Domain or URL to archive
--from <timestamp> Start timestamp YYYYMMDDhhmmss (default: none)
--to <timestamp> End timestamp YYYYMMDDhhmmss (default: none)
--threads <n> Concurrent download threads (default: 3)
--directory <path> Output directory (default: websites/<host>/)
--rewrite-links Rewrite page links to relative paths
--canonical <action> Canonical tag handling: keep|remove (default: keep)
--exact-url Download only the exact URL, no wildcard /*
--external-assets Also download off-site (external) assets
--debug Enable verbose debug logging
-h, --help Show this help and exitExamples
# Archive everything from example.com
wayback-machine-downloader example.com
# Archive snapshots from a specific year
wayback-machine-downloader example.com --from 20200101000000 --to 20201231235959
# Rewrite links for offline browsing; strip canonical tags
wayback-machine-downloader example.com --rewrite-links --canonical remove
# Download only the exact URL (no wildcard crawl) with 8 threads
wayback-machine-downloader https://example.com/blog/ --exact-url --threads 8
# Save to a custom directory
wayback-machine-downloader example.com --directory ./archive/exampleProgrammatic API
import { WaybackMachineDownloader, setDebugMode } from "wayback-machine-downloader";
import { normalizeBaseUrlInput } from "wayback-machine-downloader/lib/utils.js";
const base = normalizeBaseUrlInput("example.com");
const dl = new WaybackMachineDownloader({
base_url: base.canonicalUrl,
normalized_base: base,
from_timestamp: 0,
to_timestamp: 0,
threads_count: 3,
rewrite_mode: "as-is", // "as-is" | "relative"
canonical_action: "keep", // "keep" | "remove"
exact_url: false,
download_external_assets: false,
directory: null, // null = default websites/<host>/
});
await dl.download_files();Output
Files are saved under websites/<host>/ by default. Each snapshot is stored at the path it had on the original site.
