wayback-site-rescue-test
v1.0.3
Published
Download and locally replay Wayback Machine captures with resumable state and typed APIs.
Maintainers
Readme
wayback-site-rescue
Download and replay archived websites from the Internet Archive Wayback Machine with a typed API and CLI.
This package helps you:
- fetch snapshots for a URL
- download pages and requisites (assets)
- rewrite links for local replay
- resume interrupted runs with state
- apply cleanup/SEO transforms (robots/sitemap/redirect helpers)
Why this package?
Typical use cases:
- Site recovery / migration Recover old pages and map them to a new domain.
- Archive-based static backup Create a local copy for documentation or legal/ops needs.
- SEO-safe legacy handling
Generate
sitemap.xml,robots.txt, and optional archived-404 redirects. - Research and audits Programmatically inspect historical captures.
Install
Package usage (consumer)
npm install wayback-site-rescueor
bun add wayback-site-rescueCLI usage
npx wayback-site-rescue --url https://example.com --list-onlyQuick start (API)
import { runDownloader } from "wayback-site-rescue";
const result = await runDownloader({
url: "https://example.com",
listOnly: true,
});
console.log(result);Quick start (CLI)
wayback-site-rescue --url https://example.com --directory ./downloadsInteractive prompt mode:
wayback-site-rescue --interactive --list-onlyCommon options
--url <url>target URL--directory <path>output directory (default./downloads)--from <timestamp>start range (YYYYMMDDhhmmss)--to <timestamp>end range (YYYYMMDDhhmmss)--list-onlyquery/list without downloading--exact-urluse exact URL matching in CDX--capture-concurrency <n>concurrent capture workers--rate-limit-per-second <n>global request pacing--recovery-domain <domain>rewrite internal links/canonical/meta to a new domain--create-sitemapwrite sitemap after run--block-scrapers-in-robotsgenerate restrictive robots
For a complete set, run:
wayback-site-rescue --helpExamples
See the examples/ directory for practical variants:
examples/list-only.tsexamples/download-with-rewrite.tsexamples/seo-and-cleanup.ts
Run one with:
bun examples/list-only.tsDevelopment (repo)
This repository uses Bun for local tooling.
bun install
bun run checkIndividual tasks:
bun run lint
bun run typecheck
bun run test:ci
bun run buildCredits, references, docs, and specs
This project builds on great open-source work and public standards. Credit where it’s due:
- Internet Archive / Wayback Machine
- Wayback Machine: https://web.archive.org/
- Internet Archive org/repositories: https://github.com/internetarchive
- CDX and archival replay context
- Wayback CDX Server implementation: https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server
- Core libraries used by this package
- Commander: https://github.com/tj/commander.js
- Inquirer prompts: https://github.com/SBoudrias/Inquirer.js
- Axios: https://github.com/axios/axios
- Cheerio: https://github.com/cheeriojs/cheerio
- PQueue: https://github.com/sindresorhus/p-queue
- Tooling and registry workflows
- Bun docs: https://bun.sh/docs
- Bun package publishing: https://bun.sh/docs/cli/publish
- npm registry docs: https://docs.npmjs.com/
- Specs referenced by generated outputs/behavior
- Memento protocol (RFC 7089): https://www.rfc-editor.org/rfc/rfc7089
- Robots Exclusion Protocol (RFC 9309): https://www.rfc-editor.org/rfc/rfc9309
- Sitemaps protocol: https://www.sitemaps.org/protocol.html
License
MIT
