n8n-nodes-plasmate

v0.1.0

Published

3 months ago

n8n community node for Plasmate — fetch web pages and get structured Semantic Object Model (SOM) content instead of raw HTML

0High
0Medium
0Low

dbhurley

n8n-community-node-package plasmate web-scraping browser som semantic-object-model

What it does

The Plasmate node fetches any URL using Plasmate — a fast headless browser engine — and returns structured data instead of raw HTML. Plasmate compiles pages into a Semantic Object Model (SOM): organized regions, interactive elements with stable IDs, extracted text, and structured data (JSON-LD, OpenGraph).

Why not just use the HTTP Request node?

The HTTP Request node returns raw HTML — tens of thousands of tokens that downstream AI nodes have to parse. Plasmate returns structured JSON that's 10-800x smaller and immediately usable.

Operations

| Operation | Output | |---|---| | Fetch Page | Full SOM: title, regions, elements, metadata | | Extract Text | Plain text joined from all page regions | | Extract Links | Array of {text, href, region} objects | | Extract Structured Data | JSON-LD, OpenGraph, and microdata |

Prerequisites

A self-hosted n8n instance (community nodes require self-hosted n8n)
Plasmate installed on the same machine as n8n:

curl -fsSL https://plasmate.app/install.sh | sh

Installation

In your n8n instance, go to Settings → Community Nodes → Install and enter:

n8n-nodes-plasmate

Or install via npm in your n8n directory:

npm install n8n-nodes-plasmate

Usage

Basic — Fetch a page

Add a Plasmate node to your workflow
Set Operation to "Fetch Page"
Set URL to any web address
Connect downstream nodes to work with the SOM output

Extract links from a page

Set Operation to "Extract Links". The output includes links (an array) and link_count. Use the Split Out node to process each link individually in downstream steps.

Authenticated browsing

Set Auth Profile in Options to the domain (e.g. github.com). Requires cookies to be stored via the Plasmate browser extension beforehand.

Batch processing

Connect multiple URLs from an upstream node (e.g. a list from a Google Sheet or database). The Plasmate node processes one URL per input item.

Options

| Option | Default | Description | |---|---|---| | Auth Profile | (none) | Domain for authenticated browsing (e.g. github.com) | | Plasmate Binary Path | plasmate | Override if plasmate is not in PATH | | Timeout (Seconds) | 30 | Max seconds to wait for a page fetch |

Example output — Fetch Page

{
  "url": "https://example.com",
  "title": "Example Domain",
  "lang": "en",
  "element_count": 4,
  "interactive_count": 1,
  "region_count": 1,
  "som": {
    "regions": [
      {
        "id": "main",
        "role": "main",
        "elements": [
          { "id": "e1", "role": "heading", "text": "Example Domain" },
          { "id": "e2", "role": "text", "text": "This domain is for use in illustrative examples." },
          { "id": "e3", "role": "link", "text": "More information...", "href": "https://www.iana.org/domains/example" }
        ]
      }
    ]
  }
}

Token savings

Real-world benchmark (SOM vs raw HTML):

| Site | Savings | |---|---| | Vercel docs | 99.6% | | Stripe API | 95.8% | | Next.js docs | 92.3% | | Stack Overflow | 85.6% | | Wikipedia | 82.8% |

Plasmate — the browser engine
skill-openclaw — OpenClaw agent skill
plasmate-mcp — MCP server for Claude Code, Cursor, Windsurf

License

MIT