npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

n8n-nodes-scraper-web-new

v1.0.9

Published

n8n node for web scraping using Cheerio and Crawlee

Readme

n8n-nodes-scraper-web

This is an n8n community node for web scraping using Cheerio and Crawlee.

n8n is a fair-code licensed workflow automation platform.

Features

  • Scrape Single Page: Extract data from a single web page using CSS selectors
  • Crawl Website: Crawl multiple pages following internal links
  • Flexible Extraction: Extract text, HTML, or specific attributes
  • Multiple Selectors: Define multiple CSS selectors to extract different data points
  • Crawl Control: Control max pages, depth, and link patterns
  • Same Domain Filtering: Option to stay within the same domain while crawling

Installation

Follow the installation guide in the n8n community nodes documentation.

npm

npm install n8n-nodes-scraper-web

n8n

In n8n, go to Settings > Community Nodes and install:

n8n-nodes-scraper-web

Operations

Scrape Single Page

Extract data from a single web page.

Parameters:

  • URL: The URL to scrape
  • Extraction Mode: Choose between CSS selectors, full HTML, or text content
  • Selectors: Define CSS selectors to extract specific data

Example:

URL: https://example.com
Selector: .title -> Extract text
Result: { title: "Example Title", url: "https://example.com" }

Crawl Website

Crawl multiple pages following internal links.

Parameters:

  • Start URLs: Starting URLs for crawling (one per line)
  • Max Pages: Maximum number of pages to crawl
  • Max Depth: Maximum depth of crawling
  • Link Selector: CSS selector for links to follow (default: a[href])
  • Pagination Selector: CSS selector specifically for pagination links (e.g., .pagination a, a[aria-label*="next"]). Leave empty to use Link Selector for all links
  • Same Domain Only: Only crawl pages on the same domain (default: true)

Example:

Start URLs: https://example.com
Max Pages: 50
Max Depth: 2
Link Selector: a[href]
Pagination Selector: .pagination a

CSS Selectors

The node supports standard CSS selectors:

  • Element: div, p, a
  • Class: .classname
  • ID: #idname
  • Attribute: [href], [data-id]
  • Combined: div.content > p.text

Extraction Options

For each selector, you can extract:

  • Text: The text content of the element
  • HTML: The HTML content of the element
  • Attribute: A specific attribute value (e.g., href, src)

You can also choose to extract:

  • Single: Only the first matching element
  • Multiple: All matching elements (returns an array)

Advanced Options

  • User Agent: Custom user agent string
  • Timeout: Request timeout in milliseconds
  • Max Retries: Maximum number of retries for failed requests
  • Wait For: Wait time before scraping (useful for dynamic content)

Examples

Extract Article Titles and Links

Operation: Scrape Single Page
URL: https://news.example.com
Selectors:
  - Field: titles, Selector: .article-title, Extract: text, Multiple: true
  - Field: links, Selector: .article-link, Extract: attribute (href), Multiple: true

Crawl Blog Posts

Operation: Crawl Website
Start URLs: https://blog.example.com
Max Pages: 20
Max Depth: 2
Link Selector: a.post-link
Selectors:
  - Field: title, Selector: h1.post-title, Extract: text
  - Field: content, Selector: .post-content, Extract: text
  - Field: author, Selector: .author-name, Extract: text

Example: Crawl a Blog

Operation: Crawl Website
Start URLs: https://blog.example.com
Max Pages: 50
Max Depth: 2
Link Selector: a[href]
Same Domain Only: Yes

Selectors:
  - Field: title, Selector: h1, Extract: text
  - Field: content, Selector: .post-content, Extract: text
  - Field: author, Selector: .author-name, Extract: text

Example: Scrape Paginated Results (e.g., Real Estate Listings)

Operation: Crawl Website
Start URLs: https://www.vivareal.com.br/venda/rj/niteroi/bairros/centro/apartamento_residencial/
Max Pages: 100
Max Depth: 1
Pagination Selector: .olx-core-pagination a, a[aria-label*="página"]
Same Domain Only: Yes

Selectors:
  - Field: title, Selector: h2.property-card__title, Extract: text, Multiple: Yes
  - Field: price, Selector: .property-card__price, Extract: text, Multiple: Yes
  - Field: link, Selector: a.property-card__content-link, Extract: attribute (href), Multiple: Yes

Tip for Pagination:

  • Use Pagination Selector to target only pagination links (next page, page numbers)
  • Set Max Depth: 1 to avoid following links inside individual listings
  • Set Max Pages to the number of result pages you want to scrape
  • The crawler will automatically follow pagination links and extract data from each page

Dependencies

  • Cheerio - Fast, flexible HTML parsing
  • Crawlee - Web scraping and browser automation library

Compatibility

  • Requires n8n version 1.0.0 or later
  • Node.js 18.10 or later

Resources

License

MIT