npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

n8n-nodes-crawl4ai-plus

v5.1.0

Published

n8n nodes for Crawl4AI v0.8.0 web crawler and data extraction with enhanced features

Readme

Crawl4AI Plus for n8n

Enhanced fork targeting Crawl4AI v0.8.0 with a progressive-disclosure two-node architecture: a Simple node (4 operations) for general users and an Advanced node (15 operations) for power users.

Project History & Attribution

This is a maintained fork with enhanced features for Crawl4AI 0.8.0.

Fork Chain

All credit for the original implementation goes to Heictor Hsiao and Matias Lopez.

v5.0.0 is a breaking change — the 3-node architecture (BasicCrawler, ContentExtractor, SmartExtract) has been replaced with 2 new nodes. Existing workflows will need rebuilding. See CHANGELOG.md for full details.


Features

Crawl4AI Plus — Simple Node (4 operations)

Designed for general users with smart defaults and minimal configuration:

  • Get Page Content — Crawl a URL and get markdown (single page, follow links, or full site via crawl scope)
  • Ask Question — Ask a question about a page using LLM extraction
  • Extract Data — Extract contact info, financial data, or custom structured data (regex presets + LLM)
  • CSS Extractor — Extract structured data using CSS selectors

Crawl4AI Plus Advanced — Advanced Node (15 operations in 3 groups)

Full API control via 3 standardized collections (Browser & Session, Crawl Settings, Output & Filtering):

Crawling

  • Crawl URL — Single URL with full browser/crawler/output configuration
  • Crawl Multiple URLs — Manual list or recursive discovery (BFS/DFS/BestFirst strategies)
  • Stream Crawl — Streaming via /crawl/stream for large URL sets
  • Process Raw HTML — Process pre-fetched HTML without a network request
  • Discover Links — Extract, filter, and score links (internal/external, include/exclude patterns)

Extraction

  • LLM Extractor — AI-powered structured extraction with schema support
  • CSS Extractor — Structured extraction using JsonCssExtractionStrategy
  • JSON Extractor — Extract JSON from direct URLs, script tags, or JSON-LD
  • Regex Extractor — Pattern-based extraction with built-in, custom, or LLM-generated patterns
  • Cosine Similarity — Semantic clustering (requires unclecode/crawl4ai:all Docker image)
  • SEO Metadata — Meta tags, Open Graph, Twitter Cards, JSON-LD, robots, hreflang

Jobs & Monitoring

  • Submit Crawl Job — Async crawl via /crawl/job with webhook support
  • Submit LLM Job — Async LLM extraction via /llm/job
  • Get Job Status — Poll /job/{task_id} for results
  • Health Check — Server health and endpoint stats

Requirements

  • n8n: 1.79.1 or higher
  • Crawl4AI Docker: 0.8.0
    • Standard operations: unclecode/crawl4ai:latest
    • Cosine Similarity Extractor: unclecode/crawl4ai:all (includes sentence-transformers)

Installation

Via n8n UI (recommended)

  1. Go to Settings → Community Nodes
  2. Click Install a community node
  3. Enter n8n-nodes-crawl4ai-plus
  4. Restart n8n

From source (development)

# Install with pnpm (required — npm/yarn not supported)
pnpm install
pnpm build

Then restart your n8n instance. The nodes are declared in package.json → "n8n" → "nodes" and loaded from dist/.


Setup

Credentials

  1. Settings → Credentials → New → Crawl4AI API
  2. Configure:
    • Docker URL — URL of your Crawl4AI container (default: http://crawl4ai:11235)
    • Authentication — Defaults to No Authentication, which is correct for a standard Docker quickstart deployment. Switch to Token or Basic auth only if your Crawl4AI instance is configured with authentication.
    • LLM Settings — Enable and configure a provider for AI-powered operations:
      • OpenAI, Anthropic, Groq, Ollama, or custom LiteLLM endpoint

Simple Node

  1. Add Crawl4AI Plus to your workflow
  2. Select an operation (Get Page Content, Ask Question, Extract Data, or CSS Extractor)
  3. Configure the URL and required fields
  4. Optional settings are in a single flat Options collection

Advanced Node

  1. Add Crawl4AI Plus Advanced to your workflow
  2. Select an operation from one of the 3 groups (Crawling, Extraction, Jobs & Monitoring)
  3. Configure the URL/required fields
  4. Fine-tune via 3 standardized collections: Browser & Session, Crawl Settings, Output & Filtering
  5. LLM-based operations use the provider configured in credentials

Configuration Reference

Browser Options

| Option | Description | |---|---| | Browser Type | Chromium (default), Firefox, or Webkit | | Headless Mode | Run browser without a visible window | | Enable JavaScript | Enable JS execution (required for dynamic pages) | | Enable Stealth Mode | Hides webdriver properties to bypass bot detection | | Extra Browser Arguments | Command-line flags passed to the browser process | | Init Scripts | JavaScript injected before page load (stealth setup) | | Viewport Width / Height | Browser viewport dimensions | | Timeout (MS) | Maximum page load wait time | | User Agent | Override the browser user agent string |

Session & Authentication

| Option | Description | |---|---| | Storage State (JSON) | Browser state (cookies, localStorage) as JSON — works on n8n Cloud | | Cookies | Structured cookie entries for authentication | | Session ID | Reuse a named browser context across requests | | Use Managed Browser | Connect to an existing managed browser instance | | Use Persistent Context | Persist browser profile to disk (self-hosted only) | | User Data Directory | Path to the browser profile directory |

Crawler Options

| Option | Description | |---|---| | Cache Mode | ENABLED, BYPASS, DISABLED, READ_ONLY, or WRITE_ONLY | | CSS Selector | Pre-filter page content before extraction | | JavaScript Code | Execute custom JS on the page before extracting | | Wait For | CSS selector or JS expression to wait for before extracting | | Check Robots.txt | Respect the site's robots.txt rules | | Word Count Threshold | Minimum word count for a content block to be included | | Exclude External Links | Strip external links from results | | Preserve HTTPS for Internal Links | Normalise internal link protocols |

Deep Crawl Options (Crawl Multiple URLs — Discover mode)

| Option | Description | |---|---| | Seed URL | Starting URL for recursive discovery | | Discovery Query | Keywords that guide which links to follow (required for BestFirst) | | Strategy | BestFirst (recommended), BFS, or DFS | | Max Depth | Maximum link depth to follow | | Max Pages | Maximum number of pages to crawl | | Score Threshold | Minimum relevance score for BestFirst |

Output Options

| Option | Description | |---|---| | Include HTML | Include raw HTML in content.html | | Include Links | Include links.internal and links.external arrays | | Include Media | Include images, videos, and audio metadata | | Screenshot | Capture a screenshot (base64) | | PDF | Generate a PDF (base64) | | SSL Certificate | Extract SSL certificate details |

Output Shape

All operations return a consistent output object:

{
  "domain": "example.com",
  "url": "https://example.com/page",
  "fetchedAt": "2026-02-18T10:00:00.000Z",
  "success": true,
  "statusCode": 200,
  "content": {
    "markdownRaw": "...",
    "markdownFit": "..."
  },
  "extracted": {
    "strategy": "JsonCssExtractionStrategy",
    "json": { ... }
  },
  "links": {
    "internal": [{ "href": "...", "text": "..." }],
    "external": []
  },
  "metrics": {
    "durationMs": 1240
  }
}

Async Job Workflow

For large or long-running crawls, use the async pattern:

  1. Submit Crawl Job → returns taskId
  2. Get Job Status (poll with taskId) → returns status: pending | processing | completed | failed
  3. When completed, result fields are returned directly at top level alongside taskId and status

Webhook callbacks are supported in Submit Crawl Job for push-based notification when the job finishes.


Project Structure

nodes/
  ├── shared/                                 # Shared code used by both nodes
  │   ├── apiClient.ts                        # Crawl4aiClient — all HTTP calls
  │   ├── utils.ts                            # Config builders, LLM helpers, validation
  │   ├── interfaces.ts                       # TypeScript types
  │   ├── formatters.ts                       # formatCrawlResult, formatExtractionResult
  │   └── descriptions/                       # Reusable n8n UI field definitions
  │       ├── index.ts                        # Barrel export
  │       ├── common.fields.ts                # urlField, urlsField, cacheModeField, etc.
  │       ├── browserSession.fields.ts        # getBrowserSessionFields()
  │       ├── crawlSettings.fields.ts         # getCrawlSettingsFields()
  │       └── outputFiltering.fields.ts       # getOutputFilteringFields()
  │
  ├── Crawl4aiPlus/                           # Simple node (4 operations)
  │   ├── Crawl4aiPlus.node.ts
  │   ├── crawl4aiplus.svg
  │   ├── actions/
  │   │   ├── operations.ts
  │   │   ├── router.ts
  │   │   ├── getPageContent.operation.ts
  │   │   ├── askQuestion.operation.ts
  │   │   ├── extractData.operation.ts
  │   │   └── cssExtractor.operation.ts
  │   └── helpers/
  │       ├── utils.ts                        # getSimpleDefaults, executeCrawl, deduplicateResults
  │       └── formatters.ts                   # Simple node formatters
  │
  └── Crawl4aiPlusAdvanced/                   # Advanced node (15 operations, 3 groups)
      ├── Crawl4aiPlusAdvanced.node.ts
      ├── crawl4aiplus.svg
      ├── actions/
      │   ├── operations.ts                   # 15 operations with groupName for UI grouping
      │   ├── router.ts
      │   ├── crawlUrl.operation.ts           # ─┐
      │   ├── crawlMultipleUrls.operation.ts  # │ Crawling group
      │   ├── crawlStream.operation.ts        # │
      │   ├── processRawHtml.operation.ts     # │
      │   ├── discoverLinks.operation.ts      # ─┘
      │   ├── llmExtractor.operation.ts       # ─┐
      │   ├── cssExtractor.operation.ts       # │
      │   ├── jsonExtractor.operation.ts      # │ Extraction group
      │   ├── regexExtractor.operation.ts     # │
      │   ├── cosineExtractor.operation.ts    # │
      │   ├── seoExtractor.operation.ts       # ─┘
      │   ├── submitCrawlJob.operation.ts     # ─┐
      │   ├── submitLlmJob.operation.ts       # │ Jobs & Monitoring group
      │   ├── getJobStatus.operation.ts       # │
      │   └── healthCheck.operation.ts        # ─┘
      └── helpers/
          ├── interfaces.ts                   # Re-exports shared types
          └── formatters.ts                   # Re-exports shared + formatJobSubmission()

credentials/
  └── Crawl4aiApi.credentials.ts             # Docker URL, auth, LLM provider config

Version History

See CHANGELOG.md for detailed version history and breaking changes.

License

MIT