@fast-scrape/node
v0.2.0
Published
High-performance HTML parsing library for Node.js
Maintainers
Readme
@fast-scrape/node
10-50x faster HTML parsing for Node.js. Rust-powered, Cheerio-compatible API.
Installation
npm install @fast-scrape/nodeyarn add @fast-scrape/node
pnpm add @fast-scrape/node
bun add @fast-scrape/node[!NOTE] Includes TypeScript definitions. No separate
@typespackage needed.
Quick start
import { Soup } from '@fast-scrape/node';
const soup = new Soup("<html><body><div class='content'>Hello, World!</div></body></html>");
const div = soup.find("div");
console.log(div.text); // Hello, World!Usage
import { Soup } from '@fast-scrape/node';
const soup = new Soup(html);
// Find first element by tag
const div = soup.find("div");
// Find all elements
const divs = soup.findAll("div");
// CSS selectors
for (const el of soup.select("div.content > p")) {
console.log(el.text);
}const element = soup.find("a");
const text = element.text; // Get text content
const html = element.innerHTML; // Get inner HTML
const href = element.getAttribute("href"); // Get attributeimport { Soup } from '@fast-scrape/node';
// Process multiple documents in parallel
const documents = [html1, html2, html3];
const soups = Soup.parseBatch(documents);
for (const soup of soups) {
console.log(soup.find("title")?.text);
}[!TIP] Use
parseBatch()for multiple documents. Uses all CPU cores via native threads.
Full TypeScript support with exported types:
import { Soup, Tag, type SoupOptions } from '@fast-scrape/node';
function extractLinks(soup: Soup): string[] {
return soup.select("a[href]").map(a => a.getAttribute("href") ?? "");
}Requirements
- Node.js >= 18
- Platforms: macOS (arm64, x64), Linux (x64, arm64, musl), Windows (x64)
Performance
v0.2.0 improvements:
- SIMD-accelerated — Class selector matching 2-10x faster on large documents
- Zero-copy serialization — 50-70% memory reduction in HTML output
- Batch processing —
Soup.parseBatch()parallelizes across all CPU cores - Trait abstractions — 45% simpler binding code via ElementFilter iterators
Built on Servo
Powered by battle-tested libraries from the Servo browser engine: html5ever (HTML5 parser) and selectors (CSS selector engine).
Related packages
| Platform | Package |
|----------|---------|
| Rust | scrape-core |
| Python | fast-scrape |
| WASM | @fast-scrape/wasm |
License
MIT OR Apache-2.0
