n8n-nodes-plasmate
v0.1.0
Published
n8n community node for Plasmate — fetch web pages and get structured Semantic Object Model (SOM) content instead of raw HTML
Maintainers
Readme
What it does
The Plasmate node fetches any URL using Plasmate — a fast headless browser engine — and returns structured data instead of raw HTML. Plasmate compiles pages into a Semantic Object Model (SOM): organized regions, interactive elements with stable IDs, extracted text, and structured data (JSON-LD, OpenGraph).
Why not just use the HTTP Request node?
The HTTP Request node returns raw HTML — tens of thousands of tokens that downstream AI nodes have to parse. Plasmate returns structured JSON that's 10-800x smaller and immediately usable.
Operations
| Operation | Output |
|---|---|
| Fetch Page | Full SOM: title, regions, elements, metadata |
| Extract Text | Plain text joined from all page regions |
| Extract Links | Array of {text, href, region} objects |
| Extract Structured Data | JSON-LD, OpenGraph, and microdata |
Prerequisites
- A self-hosted n8n instance (community nodes require self-hosted n8n)
- Plasmate installed on the same machine as n8n:
curl -fsSL https://plasmate.app/install.sh | shInstallation
In your n8n instance, go to Settings → Community Nodes → Install and enter:
n8n-nodes-plasmateOr install via npm in your n8n directory:
npm install n8n-nodes-plasmateUsage
Basic — Fetch a page
- Add a Plasmate node to your workflow
- Set Operation to "Fetch Page"
- Set URL to any web address
- Connect downstream nodes to work with the SOM output
Extract links from a page
Set Operation to "Extract Links". The output includes links (an array) and link_count. Use the Split Out node to process each link individually in downstream steps.
Authenticated browsing
Set Auth Profile in Options to the domain (e.g. github.com). Requires cookies to be stored via the Plasmate browser extension beforehand.
Batch processing
Connect multiple URLs from an upstream node (e.g. a list from a Google Sheet or database). The Plasmate node processes one URL per input item.
Options
| Option | Default | Description |
|---|---|---|
| Auth Profile | (none) | Domain for authenticated browsing (e.g. github.com) |
| Plasmate Binary Path | plasmate | Override if plasmate is not in PATH |
| Timeout (Seconds) | 30 | Max seconds to wait for a page fetch |
Example output — Fetch Page
{
"url": "https://example.com",
"title": "Example Domain",
"lang": "en",
"element_count": 4,
"interactive_count": 1,
"region_count": 1,
"som": {
"regions": [
{
"id": "main",
"role": "main",
"elements": [
{ "id": "e1", "role": "heading", "text": "Example Domain" },
{ "id": "e2", "role": "text", "text": "This domain is for use in illustrative examples." },
{ "id": "e3", "role": "link", "text": "More information...", "href": "https://www.iana.org/domains/example" }
]
}
]
}
}Token savings
Real-world benchmark (SOM vs raw HTML):
| Site | Savings | |---|---| | Vercel docs | 99.6% | | Stripe API | 95.8% | | Next.js docs | 92.3% | | Stack Overflow | 85.6% | | Wikipedia | 82.8% |
Related
- Plasmate — the browser engine
- skill-openclaw — OpenClaw agent skill
- plasmate-mcp — MCP server for Claude Code, Cursor, Windsurf
License
MIT
