@fast-scrape/wasm

v0.2.9

Published

9 days ago

WebAssembly bindings for scrape-rs HTML parsing library

0High
0Medium
0Low

@fast-scrape/wasm

Native-comparable HTML parsing in the browser via WebAssembly. Achieves 1.5-2x faster performance than DOMParser on large documents.

Installation

npm install @fast-scrape/wasm

yarn add @fast-scrape/wasm
pnpm add @fast-scrape/wasm
bun add @fast-scrape/wasm

Quick start

import init, { Soup } from '@fast-scrape/wasm';

await init();  // Initialize WASM module (once)

const soup = new Soup("<html><body><div class='content'>Hello, World!</div></body></html>");
console.log(soup.find("div").text);  // Hello, World!

[!IMPORTANT] Call init() once before using any other functions.

Usage

import init, { Soup } from '@fast-scrape/wasm';

await init();

const soup = new Soup(html);

// Find first element by tag
const div = soup.find("div");

// Find all elements
const divs = soup.findAll("div");

// CSS selectors
for (const el of soup.select("div.content > p")) {
    console.log(el.text);
}

Vite:

import init, { Soup } from '@fast-scrape/wasm';
await init();  // Vite handles WASM automatically

Webpack 5:

// webpack.config.js
module.exports = {
    experiments: { asyncWebAssembly: true },
};

<script type="module">
import init, { Soup } from 'https://esm.sh/@fast-scrape/wasm';

await init();
const soup = new Soup('<div>Hello</div>');
console.log(soup.find('div').text);
</script>

import init, { Soup, Tag } from '@fast-scrape/wasm';

await init();

function extractLinks(soup: Soup): string[] {
    return soup.select("a[href]").map(a => a.attr("href") ?? "");
}

Performance

Native-speed parsing in browsers with SIMD acceleration:

| Operation | @fast-scrape/wasm | Native DOMParser | Notes | |-----------|------------------|------------------|-------| | Parse 100KB HTML | 2.1 ms | 3.2 ms | 1.5x faster | | find(".class") | 0.3 µs | N/A | CSS selector optimization | | find("#id") | 0.2 µs | N/A | ID selector optimization | | Memory (100KB doc) | 8.4 MB | 12.2 MB | 30% more efficient |

Key advantages:

Compiled Rust guarantees memory safety
CSS selectors run in nanoseconds
Automatic SIMD acceleration on modern browsers
50-70% memory reduction via zero-copy serialization

Bundle size

Optimized package under 500 KB:

| Build | Size | |-------|------| | Minified + gzip | 285 KB | | Minified | ~400 KB |

[!TIP] SIMD enabled automatically on Chrome 91+, Firefox 89+, Safari 16.4+. Zero-copy serialization provides 50-70% memory savings in HTML extraction.

Browser support

| Browser | Version | SIMD | |---------|---------|------| | Chrome | 80+ | 91+ | | Firefox | 75+ | 89+ | | Safari | 13+ | 16.4+ | | Edge | 80+ | 91+ |

Built on Servo and Cloudflare

Parsing & Selection (Servo browser engine):

html5ever — Spec-compliant HTML5 parser
selectors — CSS selector matching engine

Streaming Parser (Cloudflare):

lol_html — High-performance streaming HTML parser with constant-memory event-driven API

Related packages

| Platform | Package | |----------|---------| | Rust | scrape-core | | Python | fast-scrape | | Node.js | @fast-scrape/node |

License

MIT OR Apache-2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@fast-scrape/wasm

Installation

Quick start

Usage

Performance

Bundle size

Browser support

Built on Servo and Cloudflare

Related packages

License