npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@cherastain/scraper

v2.0.0

Published

An simple scraper

Readme

@cherastain/scraper

An simple scraper using puppeteer to evaluate content and for further page processing.

Basic usage

import { Scraper } from "@cherastain/scraper";

const scraper = new Scraper();
const url = "https://www.npmjs.com/";
(async () => {
  const result = await scraper.process(url);
  console.log(result);
})();

// expected result (default template): { hrefs: [...] }

Examples

Template option

Each template property can be defined so the result contains a property with the same name. For a result as follow:

{
  title:"Title of page",
  subTitles:[...],
  links:[...]
}

template should be defined as:

const template: IScraperTemplate = {
  title: { selector: "h1", format: ["unique"] },
  subTitles: "h2",
  links: { selector: "a", format: { attr: "href" } },
};

Remark: default 'format' value is DOM element innerText

const scraper = new Scraper();
const url = "https://www.npmjs.com/";
const template: IScraperTemplate = {
  title: { selector: "h1", format: ["unique"] },
  subTitles: "h2",
  links: { selector: "a", format: { attr: "href" } },
};
(async () => {
  const result = await scraper.process(url, { template });
  console.log(result);
})();

Preprocess

The following example use preprocess option to :

  • scroll to the bottom of the page
  • change every href to "foo"

and use a template to get an unique link href formatted with -${x}-

import { Scraper, IScraperTemplate, IScraperOptions } from "@cherastain/scraper";
import { Page } from "puppeteer";

const s = new Scraper();
const url = "https://www.npmjs.com/";
const template: IScraperTemplate = {
  firstLinkHref: {
    selector: "a",
    format: [{ attr: "href" }, "unique", (x) => `-${x}-`],
  },
};
const options: IScraperOptions = {
  preProcess: [
    "scrollBottom",
    async (page: Page) => {
      await page.evaluate(() => {
        //@ts-ignore
        const links = [...document.getElementsByTagName("a")];
        links.forEach((link) => {
          link.href = "foo";
        });
      });
    },
  ],
  template,
};
const result = await s.process(url, options);

// expected result : { firstLinkHref: "-foo-"}

Html head tags and attributes template

const template: IScraperTemplate = {
  metas: {
    selector: "//head/meta", // meta tag
    format: [{ attr: "content" }, { attr: "property" }], // content & property attributes
  },
  title: {
    selector: "//head/title", // title tag
    format: ["unique"],
  },
};

Documentation

Scraper class

process method

Executes the scraping process

scraperInstance.process(url, options);

| Parameter | Type | Description | Default | | --------- | --------------------------------------------- | ------------- | --------------------------------------------------------------------------- | | url | string | url to scrape | | | options | IScraperOptions | (optional) | { template: { hrefs: { selector: "a", format: { attr: "href" } } } } |

getCookies method

When isCookiesPersisted scraper option is set to true, gets persisted cookies once process has been called

scraperInstance.getCookies();

isVerboseEnabled static property

Set to true to globally enable verbose

Scraper.isVerboseEnabled = true;

Contracts

IScraperOptions interface

| Property | Type | Description | | ------------------ | ----------------------------------------------------- | --------------------------------------------------------------------- | | cookies | Cookie[] | (optional) cookies to use during scraping process | | isConsoleEnabled | boolean | (optional) enable console from page evaluation | | isCookiesPersisted | boolean | (optional) enable cookies to persist from one process call to another | | isRobotIgnored | boolean | (optional) ignore robots.txt on domain scraped | | isVerboseEnabled | boolean | (optional) enable verbose debugging messages | | preProcess | (((page: Page) => Promise) or "scrollBottom")[] | (optional) function called before scraping occured | | template | IScraperTemplate | (optional) template to use for scrape result | | userAgent | string | (optional) set user-agent as seen by the scraped site |

IScraperTemplate interface

Template property can be:

Example 1

{
  links: "a"; // string as html tag
}

Example 2

{
  links: ".link"; // string as css class
}

Example 3

{
  links: "/html/body/a"; // string as xpath
}

Example 4

{
  links: {
    selector: "a";
  } // IScraperSelectorIdentifier equivalent to Example 1
}

IScraperSelectorIdentifier interface

| Property | Type | Description | | -------- | ------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------- | | selector | string | can be a html tag name, a css class (prefixed with .) or a xpath | | format | IScraperTemplate or ScraperValueFormater[] | (optional) |

ScraperValueFormater type

By default, result values return DOM element innerText but can be formated using:

| Format | Description | | ------------------------- | -------------------------------------------------------------------------------------------------------------------------- | | { attr: string } | value will be the given attribute of the DOM container | | "html" | value will be the innerHTML of the DOM container | | "unique" | value will be unique (instead of an array) | | ((value: any) => string); | final value will be formatted during post process based on given function and value set for the element by other formatter |

Versions changelog

2.0.0

  • scraping options removed from ctor
  • template/selector disambiguation
  • process options integrate template
  • getCookies added
  • verbose can be enabled globally by setting Scraper.isVerboseEnabled to true

1.2.1

  • isCookiesPersisted and cookies new options added

1.1.3

  • unique formatter no longer throws error when no element found
  • dependencies update

1.1.2

  • formatter can manage several attribute format ({attr:string})

1.1.1

  • fix robots.txt check that invalidate path starting with a disallow rule

1.1.0

  • robots.txt check (enable by default)