npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

archive-url

v1.0.1

Published

Archive URLs using the Internet Archive's Wayback Machine - CLI tool and Node.js library

Downloads

274

Readme

archive-url

A TypeScript/JavaScript library and CLI tool for archiving URLs using the Internet Archive's Wayback Machine.

Features

  • 📦 Archive URLs using the Wayback Machine
  • 💻 Use as a CLI tool or Node.js library
  • 📄 Process CSV and TXT files with URLs
  • 🔄 Automatic retry logic and rate limiting
  • ⚡ Batch processing with progress tracking
  • 📝 Full TypeScript support
  • 🚀 Simple and intuitive API

Table of Contents


Installation

As a Global CLI Tool

npm install -g archive-url

In Your Project

npm install archive-url

CLI Usage

Archive a Single URL

npm run archive https://example.com

Output:

🔍 Archiving URL: https://example.com

✅ Success!

📦 Archive URL:
https://web.archive.org/web/20260501135114/https://example.com

ℹ️  Details:
  • Original URL: https://example.com
  • Timestamp: 20260501135114
  • New snapshot: Yes

Process CSV Files

npm run archive data.csv

Expected CSV format: Any CSV with a column named url, link, uri, href, or website

Output: Creates data_archived.csv with an additional archive_url column

Example:

# Input: data.csv
date,topic,link
2024-01-01,Example,https://example.com

# Output: data_archived.csv
date,topic,link,archive_url
2024-01-01,Example,https://example.com,https://web.archive.org/web/...

Process Text Files

npm run archive urls.txt

Text file format: One URL per line, or comma-separated:

https://example.com
https://github.com
https://stackoverflow.com

Or:

https://example.com, https://github.com, https://stackoverflow.com

Output: Creates urls_archived.csv with url and archive_url columns

CLI Options

# Force new snapshot (don't reuse existing archives)
npm run archive data.csv --force-new

# Custom timeout (milliseconds)
npm run archive data.csv --timeout 60000

# Custom retry attempts
npm run archive data.csv --retries 5

# Custom output file
npm run archive data.csv --output results.csv

# Combine options
npm run archive data.csv -f -t 60000 -r 5 -o output.csv

Available options:

  • -f, --force-new - Force new snapshot
  • -t, --timeout <ms> - Request timeout (default: 30000)
  • -r, --retries <n> - Retry attempts (default: 3)
  • -o, --output <path> - Custom output path
  • -h, --help - Show help

Get Help

npm run archive --help

Library Usage

Basic Import

// CommonJS
const { archiveUrl, archiveUrls } = require('archive-url');

// ES Modules
import { archiveUrl, archiveUrls } from 'archive-url';

// TypeScript
import { archiveUrl, archiveUrls, ArchiveResult } from 'archive-url';

Archive a Single URL

const { archiveUrl } = require('archive-url');

async function main() {
  const result = await archiveUrl('https://example.com');
  
  if (result.success) {
    console.log('Archive URL:', result.archiveUrl);
    console.log('Is new snapshot:', result.isNewSnapshot);
    console.log('Timestamp:', result.timestamp);
  } else {
    console.error('Error:', result.error);
  }
}

main();

Archive Multiple URLs

const { archiveUrls } = require('archive-url');

async function main() {
  const urls = [
    'https://example.com',
    'https://github.com',
    'https://stackoverflow.com'
  ];
  
  const results = await archiveUrls(urls, {}, (completed, total, result) => {
    console.log(`[${completed}/${total}] ${result.originalUrl}`);
    console.log(`Archive: ${result.archiveUrl}`);
  });
  
  // Summary
  const successful = results.filter(r => r.success).length;
  console.log(`Successful: ${successful}/${results.length}`);
}

main();

With Options

const result = await archiveUrl('https://example.com', {
  forceNew: true,    // Force new snapshot
  timeout: 60000,    // 60 second timeout
  retries: 5         // Retry 5 times on failure
});

Check if Already Archived

const { checkArchived } = require('archive-url');

const existing = await checkArchived('https://example.com');

if (existing) {
  console.log('Already archived:', existing.archiveUrl);
} else {
  console.log('Not archived yet');
}

API Reference

archiveUrl(url, options?)

Archive a single URL.

Parameters:

  • url (string): URL to archive
  • options (object, optional):
    • forceNew (boolean): Force new snapshot (default: false)
    • timeout (number): Timeout in milliseconds (default: 30000)
    • retries (number): Retry attempts (default: 3)
    • retryDelay (number): Delay between retries in ms (default: 1000)

Returns: Promise<ArchiveResult>

interface ArchiveResult {
  originalUrl: string;      // Original URL
  archiveUrl: string;       // Wayback Machine URL
  timestamp?: string;       // Archive timestamp (e.g., "20240101120000")
  isNewSnapshot: boolean;   // Whether a new snapshot was created
  success: boolean;         // Success status
  error?: string;           // Error message if failed
}

archiveUrls(urls, options?, onProgress?)

Archive multiple URLs with automatic rate limiting.

Parameters:

  • urls (string[]): Array of URLs
  • options (object, optional): Same as archiveUrl
  • onProgress (function, optional): (completed, total, result) => void

Returns: Promise<ArchiveResult[]>

Note: Automatically applies 4-second delay between requests (rate limiting)

checkArchived(url)

Check if URL has existing archive.

Parameters:

  • url (string): URL to check

Returns: Promise<ArchiveResult | null>

getLatestArchive(url)

Get most recent archive URL.

Parameters:

  • url (string): URL to look up

Returns: Promise<string | null>


Examples

Example 1: Simple Archiving

const { archiveUrl } = require('archive-url');

const result = await archiveUrl('https://example.com');
console.log(result.archiveUrl);
// Output: https://web.archive.org/web/20260501135114/https://example.com

Example 2: Force New Snapshot

const result = await archiveUrl('https://example.com', { forceNew: true });
console.log('New archive created:', result.archiveUrl);

Example 3: Process CSV Data

const { archiveUrls } = require('archive-url');
const fs = require('fs');

// Read CSV
const csvContent = fs.readFileSync('urls.csv', 'utf-8');
const lines = csvContent.split('\n');
const urls = lines.slice(1).map(line => line.split(',')[0]);

// Archive URLs
const results = await archiveUrls(urls);

// Create output CSV
const output = lines.map((line, i) => {
  if (i === 0) return line + ',archive_url';
  const archiveUrl = results[i-1]?.archiveUrl || 'ERROR';
  return `${line},"${archiveUrl}"`;
}).join('\n');

fs.writeFileSync('output.csv', output);

Example 4: Smart Archiving (Check First)

const { checkArchived, archiveUrl } = require('archive-url');

async function smartArchive(url) {
  // Check if already archived
  const existing = await checkArchived(url);
  
  if (existing) {
    console.log('Using existing archive:', existing.archiveUrl);
    return existing;
  }
  
  // Create new archive
  const result = await archiveUrl(url);
  console.log('Created new archive:', result.archiveUrl);
  return result;
}

await smartArchive('https://example.com');

Example 5: Batch Processing with Progress

const { archiveUrls } = require('archive-url');

const urls = ['https://example.com', 'https://github.com'];

const results = await archiveUrls(urls, {}, (completed, total, result) => {
  console.log(`\n[${completed}/${total}] Processing: ${result.originalUrl}`);
  
  if (result.success) {
    console.log(`✅ Archived: ${result.archiveUrl}`);
    console.log(`   New snapshot: ${result.isNewSnapshot ? 'Yes' : 'No'}`);
  } else {
    console.log(`❌ Failed: ${result.error}`);
  }
});

Example 6: Express.js Integration

const express = require('express');
const { archiveUrl } = require('archive-url');

const app = express();
app.use(express.json());

app.post('/api/archive', async (req, res) => {
  const { url } = req.body;
  
  if (!url) {
    return res.status(400).json({ error: 'URL required' });
  }
  
  const result = await archiveUrl(url);
  
  if (result.success) {
    res.json({
      archiveUrl: result.archiveUrl,
      timestamp: result.timestamp,
      isNew: result.isNewSnapshot
    });
  } else {
    res.status(500).json({ error: result.error });
  }
});

app.listen(3000);

Example 7: Error Handling

const { archiveUrl } = require('archive-url');

async function safeArchive(url) {
  try {
    const result = await archiveUrl(url, {
      timeout: 30000,
      retries: 3
    });
    
    if (result.success) {
      return { success: true, url: result.archiveUrl };
    } else {
      return { success: false, error: result.error };
    }
  } catch (error) {
    console.error('Unexpected error:', error);
    return { success: false, error: error.message };
  }
}

const result = await safeArchive('https://example.com');
console.log(result);

How Archiving Works

Understanding the Wayback Machine

The Internet Archive's Wayback Machine saves snapshots of web pages over time. This library interacts with their Save API.

Two Archiving Modes

1. Reuse Existing Archives (Default)

const result = await archiveUrl('https://example.com');
// May return existing archive if available
  • Checks if URL is already archived
  • Returns existing archive if found (faster)
  • Creates new snapshot only if needed
  • result.isNewSnapshot will be false

2. Force New Snapshot

const result = await archiveUrl('https://example.com', { forceNew: true });
// Always creates new snapshot
  • Always requests new snapshot
  • Takes longer (4-10 seconds)
  • result.isNewSnapshot will be true
  • Use when you need current content

Archive URL Format

Archive URLs follow this pattern:

https://web.archive.org/web/[TIMESTAMP]/[ORIGINAL_URL]

Example:
https://web.archive.org/web/20260501135114/https://example.com
                              └─timestamp─┘

Timestamp format: YYYYMMDDHHmmss

  • 20260501135114 = May 1, 2026 at 13:51:14 UTC

Rate Limiting

The library automatically handles rate limiting:

  • Single URL: No delay
  • Multiple URLs: 4-second delay between requests
  • Recommended: ~15 URLs per minute
  • Built-in retries: 3 attempts by default

Output File Locations

CSV files:

  • Input: data.csv
  • Output: data_archived.csv (same directory)

Text files:

  • Input: urls.txt
  • Output: urls_archived.csv (same directory)

Custom output:

npm run archive data.csv --output /custom/path/result.csv

Important Notes

  1. URL Normalization: URLs are normalized (fragments removed, trailing slashes handled)
  2. Error Handling: Failed URLs are marked with ERROR: in CSV output
  3. Progress Tracking: Use progress callbacks for long-running operations
  4. Timeouts: Default 30-second timeout per request
  5. Success Rate: Check result.success before using archive URLs

Advanced Topics

TypeScript Usage

Full TypeScript support with type definitions:

import { 
  archiveUrl, 
  archiveUrls,
  checkArchived,
  ArchiveOptions,
  ArchiveResult 
} from 'archive-url';

// Type-safe options
const options: ArchiveOptions = {
  forceNew: true,
  timeout: 60000,
  retries: 5
};

// Type-safe result
const result: ArchiveResult = await archiveUrl('https://example.com', options);

// Type-safe progress callback
await archiveUrls(
  urls,
  options,
  (completed: number, total: number, result: ArchiveResult) => {
    console.log(`${completed}/${total}`);
  }
);

Utility Functions

import { isValidUrl, normalizeUrl, formatTimestamp } from 'archive-url';

// Validate URL
const valid = isValidUrl('https://example.com');  // true

// Normalize URL
const normalized = normalizeUrl('https://example.com/#section');
// Returns: https://example.com

// Format timestamp
const date = formatTimestamp('20260501135114');
// Returns: "2026-05-01 13:51:14"

Custom Delays for Batch Processing

const { archiveUrl } = require('archive-url');

async function batchWithCustomDelay(urls, delayMs = 5000) {
  const results = [];
  
  for (let i = 0; i < urls.length; i++) {
    const result = await archiveUrl(urls[i]);
    results.push(result);
    
    if (i < urls.length - 1) {
      await new Promise(resolve => setTimeout(resolve, delayMs));
    }
  }
  
  return results;
}

Next.js Integration

// pages/api/archive.ts
import type { NextApiRequest, NextApiResponse } from 'next';
import { archiveUrl, ArchiveResult } from 'archive-url';

export default async function handler(
  req: NextApiRequest,
  res: NextApiResponse
) {
  if (req.method !== 'POST') {
    return res.status(405).json({ error: 'Method not allowed' });
  }
  
  const { url } = req.body;
  const result: ArchiveResult = await archiveUrl(url);
  
  if (result.success) {
    res.status(200).json({ archiveUrl: result.archiveUrl });
  } else {
    res.status(500).json({ error: result.error });
  }
}

Troubleshooting

Issue: Rate limiting errors

Increase timeout and retries:

await archiveUrl(url, {
  timeout: 60000,
  retries: 5
});

Issue: CSV not processed

Ensure your CSV has a column named one of: url, link, uri, href, website

Issue: TypeScript errors

Ensure proper TypeScript configuration:

// tsconfig.json
{
  "compilerOptions": {
    "moduleResolution": "node",
    "esModuleInterop": true
  }
}

Common Use Cases

1. Backup Important Links

const urls = [
  'https://important-article.com',
  'https://research-paper.com'
];
const results = await archiveUrls(urls);

2. Research Paper Citations

async function archiveCitations(citations) {
  const results = {};
  for (const citation of citations) {
    if (citation.url) {
      const result = await archiveUrl(citation.url);
      results[citation.id] = result.archiveUrl;
    }
  }
  return results;
}

3. News Article Preservation

async function preserveArticles(articles) {
  const urls = articles.map(a => a.url);
  const results = await archiveUrls(urls);
  
  return articles.map((article, i) => ({
    ...article,
    archiveUrl: results[i].archiveUrl
  }));
}

License

MIT

Contributing

Contributions welcome! Please submit a Pull Request.

Support

For bugs and feature requests, create an issue on GitHub.