npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

gutenbergscraper

v1.0.3

Published

A Scraper for Project Gutenberg allowing you to use it for scraping data into datasets, very customizable and friendly

Maintainers

whitzscottwhitzscott

Keywords

gutenbergscrapernodetypescriptweb scrapinggutenberg scrapergutenberg downloaderbook scraperbook downloadergutenberg apinode.jsnode scraperhttp requestparallel scrapingweb scraping nodegutenberg booksopen sourceproject gutenbergscrape booksgutenberg downloader nodescraping librarydata extractionhtml parseraxioscheerionpm scraperscrape datagutenberg projectscraper toolnode scraping librarycsv outputjson outputtxt outputbook metadataebook downloaderbook formatscraper frameworkgutenberg text extractionnode scraping toolnodejs scrapertypescript scraperbooks in csvbooks in jsonebooks in txtbook extractionscrape project gutenberggutenberg contentscrape Gutenberg projectweb crawlerdata scraperautomated scrapingscraping frameworknodejs web scrapinghtml to csvhtml to jsonscrape web contentbook content extractionscraping toolextracting book datanodejs scrapingtext extractionbook data scraperscraping project gutenbergscrape ebooksgutenberg libraryebook extractionscraper node packagescraper for booksbook data exporterbook web scrapertypescript web scrapernode parallelismscraping parallelnodejs parallel scrapingdata extraction toolscraping framework nodeasync scrapernpm scrapernpm scraping librarynpm scraper toolscrape from gutenbergbook scraper nodenodejs downloadernpm scraper projectnode scraper typescriptparallel request scraperscraper with retriesscrape with retriesscraping with retriesweb scraping packagenodejs web scraperscraping npm packageparallel scraping npmscrape ebooks nodescraper npmnpm web scraperasync scrapinghtml to bookweb scraper npmscraper parallelgutenberg ebooksopen bookstext scrapingnodejs scraping tooltypescript web scrapingweb scraping toolsscrape html contentscraping data frameworkscrape content nodejsscrape books project gutenbergdata extraction nodejsscraper nodejs toolweb scraper typescriptgutenberg book extractorbooks from gutenbergscraping packageparallel requests scrapingscraping tools nodejsscraping with nodejsscraping html to csvscraping html to jsonscraping in nodejsgutenberg book downloaderscraper for gutenbergbook scraper npmhtml web scrapergutenberg html scraperscraping library nodejsscraper retry nodejsgutenberg node scraperdata extraction tool nodejsscrape html booksscrape gutenberg contentscrape books into csvscraper javascriptscraper for nodejsdata scraping toolgutenberg library scrapebook download toolgutenberg web scrapingnodejs scraper toolscraping project gutenberg booksscraping with axiosscraping with cheerioscraper nodejs projectscraper nodejs npmparallel data scrapingscrape books jsonscraper retryscrape books retryscraper csv jsonscraper typescript nodescrape gutenberg booksbooks scraperscraper node npmgutenberg scraper npmscrape from gutenberg nodejsgutenberg content extractionscrape gutenberg books nodejsnpm book scraperscrape gutenberg projectscraping books nodejsgutenberg content extractorscraper for gutenberg booksscrape gutenberg with nodejsscraper with axios cheerioscraper npm packagegutenberg html datascraper nodejs npmscraper nodejs parallelscraping with cheerio npmscraping books npmscrape book textbooks from gutenberg scraperscraper books nodejshtml extraction nodejsscrape gutenberg librarybook data extractionscraping books to csvscraper npm nodejsscraping books textnodejs scraping librarygutenberg project scraperbook data scraper nodenodejs text scrapinggutenberg scraping toolscrape html nodejsgutenberg metadata scraperbooks scraper npmscrape to csvscraper for ebooksproject gutenberg scrapingscrape gutenberg textscraper for gutenberg projectnodejs scraper npmscraper html databook data nodejsscraper parallel requestscraper library nodeweb scraping tools npmgutenberg text scraperscrape gutenberg project datascrape nodejsscraper project gutenbergnodejs project gutenbergscrape books from gutenbergscraper text extractionhtml book scraperscraper gutenberg htmlscraper parallel processingscraper nodejs retryscrape gutenberg books jsonscraper nodejs csvscraper with cheerio htmlscraping gutenberg with nodescrape from gutenberg csvgutenberg nodejs scraperhtml scraping nodejsbook extraction npmscraping books jsonscraping with axios cheerioscrape nodejs booksscrape gutenberg html nodejsscraper project gutenberg npmscraping gutenberg booksscraper for books project gutenbergscrape books text nodejsscraper npm projectscraper for gutenberg project booksscraper books projectscraper for html to jsonscraping books in nodejsscraping to json npmscrape html books nodejsscrape books nodejs npmgutenberg text extraction scrapergutenberg books json scraperscraper books text extractionscraper books datagutenberg scrape npmscraper text to csvgutenberg node scraperscraping books npmscraper gutenberg nodejs

Readme

Gutenberg Scraper

The Gutenberg Scraper is a tool designed to scrape content from Project Gutenberg. But how does it work?

The Gutenberg Scraper uses parallelism and other technologies to speed up the scraping process for Node.js applications. It is primarily built with TypeScript.

If you'd like to use this scraper, here's an example of how to set it up:

You’ll likely notice a file named index.ts. This is where you can begin. By default, it will contain some example code, such as:

import { Scraper } from './Scraper';

const scraper = new Scraper({
  useBooknum: [12, 50],  // Scrape books from 12 to 50
  FormatOutput: 'csv',   // Output format will be CSV
  userAgent: 'Mozilla/5.0',
  timeout: 5000          // Set a timeout for requests
}, 10, 3); // Scrape 10 books at once and retry 3 times in case of failure

scraper.scrape();

In this example:

  • useBooknum: [12, 50] specifies the range of books to scrape, from book number 12 to 50.
  • FormatOutput: 'csv' indicates that the output will be in CSV format. You can also choose other formats, such as CSV, TXT, or JSON.
  • userAgent: 'Mozilla/5.0' sets a custom user-agent to help prevent the scraper from being blocked by the website.
  • timeout: 5000 sets the timeout for each request to 5000 milliseconds (5 seconds).

The second part of the constructor, 10 and 3, represents:

  • 10: The number of parallel requests to make at once. This allows the scraper to scrape multiple books simultaneously, speeding up the process.
  • 3: The number of retry attempts in case a request fails. If a book fails to scrape, the scraper will retry up to 3 times before it gives up.

Once you've set this up, calling scraper.scrape() will start the scraping process based on the provided configuration. You can choose the output format to be CSV, JSON, or TXT as per your preference.

To use it first install the package by running npm i gutenbergscraper once run you can directly type in the command prompt or powershell npm i then npm run start and your done~!