@overbrowsing/wasteback-machine
v2.0.2
Published
JavaScript library for measuring archived web pages.
Maintainers
Readme
Wasteback Machine
What is Wasteback Machine?
Wasteback Machine is a JavaScript library for analysing archived web pages, measuring their size and composition to support retrospective, quantitative web research.
Features
- Archive-agnostic access: Works with web archives that use the Memento Protocol and expose the unmodified archived page via the id_ endpoint.
- Page composition analysis: Analyses the full structure of an archived page, including HTML, stylesheets, scripts, images, fonts, and more.
- Resource inventory: Produces an optional structured list of all discovered resources with their URLs, types, and byte sizes.
- Byte-accurate measurement: Precisely measures the size of each resource, cleans stylesheets and scripts to remove archive-injected content, and excludes any resources that are not part of the original page.
- Completeness scoring: Calculates how completely an archived page and its resources were retrieved.
Supported Web Archives
| Web Archive | Organisation | Web Archive ID ⭐️ | |---------------------------------------------------------------------------------------------------|------------------------------------------------|---------------------------------------| | Arquivo.pt | 🇵🇹 FCCN/FCT | arq | | Australia Web Archive (Trove) | 🇦🇺 National Library of Australia | awa | | Webarchiv | 🇨🇿 National Library of the Czech Republic | cz | | Government of Canada Web Archive | 🇨🇦 Library and Archives Canada | gcwa | | Wayback Machine | 🇺🇸 Internet Archive | ia | | Icelandic Web Archive (Vefsafn.is) | 🇮🇸 National and University Library of Iceland | iwa | | Library of Congress Web Archive | 🇺🇸 Library of Congress | loc | | National Library of Ireland Web Archive | 🇮🇪 National Library of Ireland | nliwa | | New Zealand Web Archive | 🇳🇿 National Library of New Zealand | nzwa | | PRONI Web Archive | 🇬🇧 The Public Record Office of Northern Ireland | pwa | | Spletni Arhiv | 🇸🇮 National and University Library of Slovenia | slo | | UK Government Web Archive (UKGWA) | 🇬🇧 The National Archives | ukgwa | | ~~UK Web Archive~~ (Offline) | 🇬🇧 British Library | ukwa |
⭐️ This ID is used to select the web archive you want to query.
Adding a New Web Archive
If you maintain a web archive not currently supported, please contact us at [email protected].
Installation
To install Wasteback Machine as a dependency for your projects using NPM:
npm i @overbrowsing/wasteback-machineUsage
Wasteback Machine provides two primary functions:
- Fetch available memento-datetimes within a specific web archive for a given URL and time range.
- Analyse a specific memento from a specific web archive to measure its page size and composition.
1. Fetch Available Memento-datetimes
Get all mementos for https://nytimes.com between 1996 and 2025 from the Wayback Machine (ia)
import { getMementos } from "@overbrowsing/wasteback-machine";
const mementos = await getMementos(
"ia", // Web archive ID (ia = Wayback Machine)
"https://nytimes.uk", // Target URL
1996, // Start year
2025 // End year
);
console.log(mementos);Example Output
[
'19961112181513',
'19961112181513',
'19961112181513',
'19961219002950'...
]2. Analyse a Specific Memento
Analyse https://nytimes.com from November 12, 1996 from the Wayback Machine (ia)
import { getMementoSizes } from "@overbrowsing/wasteback-machine";
const mementoData = await getMementoSizes(
"ia", // Web Archive ID (ia = Wayback Machine)
"https://nytimes.com", // Target URL
"19961112181513", // Memento datetime
{ includeResources: true } // Resource list (true/false)
);
console.log(mementoData);Example Output
{
url: 'https://nytimes.com',
requestedMemento: '19961112181513',
memento: '19961112181513',
mementoUrl: 'https://web.archive.org/web/19961112181513if_/https://nytimes.com',
archive: 'Wayback Machine',
archiveOrg: 'Internet Archive',
archiveUrl: 'https://web.archive.org',
sizes: {
html: { bytes: 1653, count: 1 },
stylesheet: { bytes: 0, count: 0 },
script: { bytes: 0, count: 0 },
image: { bytes: 46226, count: 2 },
video: { bytes: 0, count: 0 },
audio: { bytes: 0, count: 0 },
font: { bytes: 0, count: 0 },
flash: { bytes: 0, count: 0 },
plugin: { bytes: 0, count: 0 },
data: { bytes: 0, count: 0 },
document: { bytes: 0, count: 0 },
other: { bytes: 0, count: 0 },
total: { bytes: 47879, count: 3 }
},
completeness: '100%',
resources: [
{
url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/index.gif',
type: 'image',
size: 45259
},
{
url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/free-images/marker.gif',
type: 'image',
size: 967
}
]
}Wasteback Machine CLI
The Wasteback Machine CLI lets you easily query web archives, fetch mementos for a given URL and date, and see page size, composition, and estimated emissions using CO2.js.
Quick Start
To initiate Wasteback Machine CLI using NPM:
npm run cliCLI Prompts
1. Enter web archive ID ('help' to list archives or [Enter ↵] = Wayback Machine):
2. Enter URL to analyse:
3. Enter target year (YYYY):
4. Enter target month (MM or [Enter ↵] = 01):
5. Enter target day (DD or [Enter ↵] = 01):Example Output
________________________________________________________
MEMENTO INFO
Memento URL: https://web.archive.org/web/19961112181513if_/https://nytimes.com
Web Archive: Wayback Machine
Organisation: Internet Archive
Website: https://web.archive.org
________________________________________________________
PAGE SIZE
Data: 46.76 KB
Emissions: 0.014 g CO₂e
Completeness: 100%
________________________________________________________
PAGE COMPOSITION
HTML
Count: 1
Data: 1653 bytes (3.5%)
Emissions: 0.000 g CO₂e
IMAGE
Count: 2
Data: 46226 bytes (96.5%)
Emissions: 0.013 g CO₂e
________________________________________________________Methodology
For details of the underlying methodology, assumptions, and limitations, please refer to our paper DOI 10.1371/journal.pclm.0000767.
Wasteback Machine was developed as part of doctoral research at The University of Edinburgh’s Institute for Design Informatics.
Disclaimer
[!IMPORTANT] Wasteback Machine is provided for informational and research purposes only. The authors make no guarantees about the accuracy of the results and disclaim any liability for their use. Use of Wasteback Machine is subject to the terms of service of each respective web archive.
Contributing
Contributions are welcome! Please submit an issue or a pull request.
Licenses
Wasteback Machine is licensed under Apache 2.0. For full licensing details, see the LICENSE file.
