npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@letsscrapedata/scraper

v0.0.90

Published

Web scraper that scraping web pages by LetsScrapeData XML template

Readme

Please get help and discuss how to scrape a website on the discord server, which can respond quickly. It is better to submit issues on github for better tracking.

Features

  1. Template driven web scraping
  • you can quickly design templates for scraping different websites.
  • The templates are intuitive and easier to maintain.
  1. Browser operations supported by the controller package
  • Same interface of playwright, patchright, camoufox, puppeteer, cheerio: easy to switch between them
  • Web browsing automation: goto(open) / click / input / hover / select / scroll
  • Automatic captcha solver: Recaptcha(v2 & v3), Cloudflare Turnstile, GeeTest(v3 & v4), image/text, cooridinate
  • State data management: cookies, localStorage, HTTP Headers, custom session data
  • Elements selection by CSS selectors or XPath: whether in frames or not
  • Automatic file saving: such as screenshot, pdf, mhtml, download directly or by clicking
  1. API request
  • Both browser and API can be used at the same time and cookies/headers are shared.
  • HTTP headers: intercepted, generated automatically or by browser automation, got by API or others
  1. fingerprint management:
  • Automatically generate fingerprints of the latest common browsers
  1. Simple rate limits: automatic flow control, such as interval / max concurrency /times per period
  2. Simple proxy management: multiple "static" proxies to increase concurrency
  3. Subtasks: complex tasks can be split into multiple simple subtasks for better maintenance and increased concurrency
  4. Data export

Install

npm install @letsscrapedata/scraper

Examples

  1. Example with default ScraperConfig:
// javascript
import { scraper } from "@letsscrapedata/sraper";

/**
 * tid: ID of template to be executed, such as template for scraping one list of example in page "https://www.letsscrapedata.com/pages/listexample1.html"
 * parasstrs: input parameters of tasks, such as "1"
 * this example will execute five tasks using template 10001, each of them scrapes the data in one page.
 */
const newTasks = [{ tid: 10001, parasstrs: ["1", "2", "3", "4", "5"] }];

/* The following line can do the same thing using subtasks, scraping the data in the first five pages */
// const newTasks = [{ tid: 10002, parasstrs: ["5"] }];

await scraper(newTasks);
  1. Example with ScraperConfig
// typescript
import { scraper, TemplateTasks, ScraperConfig } from "@letsscrapedata/sraper";

const scraperConfig: ScraperConfig = {
  browserConfigs: [
    /* launch a chromium browser using puppeteer, no proxy */
    { browserControllerType: "puppeteer", proxyUrl: "" },
    /* launch a chromium browser using playwright, proxy */
    { browserContollerType: "playwright", proxyUrl: "http://proxyId:port" },
    /* connect to the current browser using patchright */
    { browserUrl: "http://localhost:9222/" },
  ],
  // exitWhenCompleted: true,
  // lsdLaunchOptions: { headless: true },
  // loadUnfinishedTasks: true,
  // loadFailedTasksInterval: 5
  // captcha: { clientKey: "xxx" } // to solve captcha using 2captca
};

const newTasks: TemplateTasks[] = [{ tid: 10002, parasstrs: ["9"] }];

await scraper(newTasks, scraperConfig);

ScraperConfig

Common configurations:

  • Proxies and browser: browserConfigs, by default launching a browser using browserControllerType/browserType, without proxy
  • Launch options of browser: lsdLaunchOptions, default {headless: false}
  • Whether to load unfinished tasks: loadUnfinishedTasks, default false
  • Whether to exist when completed: exitWhenCompleted, default false
  • File format of scraped data: dataFileFormat, default "jsonl"
  • API Key of captcha solver: captcha.clientKey

Complete configurations:

export interface ScraperConfig {
  /**
   * @default false
   */
  exitWhenCompleted?: boolean;
  /**
   * whether to use the parasstr in XML if parasstr of a task is ""
   * @default false
   */
  useParasstrInXmlIfNeeded?: boolean;
  /**
   * whether to load unfinished tasks
   * @default false
   */
  loadUnfinishedTasks?: boolean;
  ////////////////////////////////////////////////////////////////////////////    directory
  /**
   * @default "", which will use current directory of process + "/data/"
   * if not empty, baseDir must be an absolute path, and the directory must exist and have read and write permissions.
   */
  baseDir?: string;
  /**
   * filename in action_setvar_get/get_file must include inputFileDirePart for security.
   * @default "LetsScrapeData"
   */
  inputFileDirPart?: string;
  ////////////////////////////////////////////////////////////////////////////    browser
  /**
   * wether to use puppeteer-extra-plugin-stealth, use patchright instead
   * @default false
   */
  useStealthPlugin?: boolean;
  /**
   * default browserControllerType of BrowserConfig
   * @default "patchright"
   */
  browserControllerType?: BrowserControllerType;
  /**
   * default browserType of BrowserConfig
   * @default "chromium"
   */
  browserType?: LsdBrowserType;
  /**
   * @default { headless: false, geoip: true }
   */
  lsdLaunchOptions?: LsdLaunchOptions;
  /**
   * @default {browserUrl: ""}
   */
  lsdConnectOptions?: LsdConnectOptions;
  /**
   * Important: browsers to be launched or connected using proxyUrl
   * @default [{proxyUrl: ""}], launch a default browser using default type of browser controller, no proxy
   */
  browserConfigs?: BrowserConfig[];
  ////////////////////////////////////////////////////////////////////////////    captcha
  captcha?: {
    /**
     * clientKey of 2captcha
     */
    clientKey: string;
    // if you need to solve captcha in camoufox, please contact administrator
  },
  ////////////////////////////////////////////////////////////////////////////    template
  /**
   * the default maximum number of concurrent tasks that can execute the same template in a browserContext
   * @default 1
   */
  maxConcurrency?: number;
  /**
   * @default ""
   */
  readCode?: string;
  /**
   * @default []
   */
  templateParas?: TemplatePara[];
  ////////////////////////////////////////////////////////////////////////////    scheduler
  /**
   * @default 10
   */
  totalMaxConcurrency?: number;
  /**
   * min miliseconds between two tasks of the same template
   * @default 2000
   */
  minMiliseconds?: number,
  ////////////////////////////////////////////////////////////////////////////    data
  /**
   * whether to move all dat_* files into a new directory "yyyyMMddHHmmss"
   * @default false
   */
  moveDataWhenStart?: boolean;
  /**
   ** DataFileFormat = "csv" | "jsonl" | "tsv" | "txt";
   * @default "jsonl"
   */
  dataFileFormat?: DataFileFormat;

   * valid only when dataFileFormat is "txt"
   */
  columnSeperator?: string;
}

/**
 * Only one of browserUrl and proxyUrl will take effect, and browserUrl has higher priority.
 */
export interface BrowserConfig {
  browserControllerType?: BrowserControllerType;
  /**
   * url used to connected the current browser
   ** url starts with "http://", such as "http://localhost:9222/"
   ** browserUrl can be used when mannaul login in advance.
   */
  browserUrl?: string;
  /**
   * proxy
   ** no proxy will be used if proxyUrl is ""
   ** valid only if !browserUrl
   */
  proxyUrl?: string;
  /**
   * type of browser to be launched
   * valid only if !browserUrl
   * @default "chromium"
   */
  browserType?: LsdBrowserType;
}