@fast-scrape/wasm
v0.2.5
Published
WebAssembly bindings for scrape-rs HTML parsing library
Readme
@fast-scrape/wasm
Native-comparable HTML parsing in the browser via WebAssembly. Achieves 1.5-2x faster performance than DOMParser on large documents.
Installation
npm install @fast-scrape/wasmyarn add @fast-scrape/wasm
pnpm add @fast-scrape/wasm
bun add @fast-scrape/wasmQuick start
import init, { Soup } from '@fast-scrape/wasm';
await init(); // Initialize WASM module (once)
const soup = new Soup("<html><body><div class='content'>Hello, World!</div></body></html>");
console.log(soup.find("div").text); // Hello, World![!IMPORTANT] Call
init()once before using any other functions.
Usage
import init, { Soup } from '@fast-scrape/wasm';
await init();
const soup = new Soup(html);
// Find first element by tag
const div = soup.find("div");
// Find all elements
const divs = soup.findAll("div");
// CSS selectors
for (const el of soup.select("div.content > p")) {
console.log(el.text);
}Vite:
import init, { Soup } from '@fast-scrape/wasm';
await init(); // Vite handles WASM automaticallyWebpack 5:
// webpack.config.js
module.exports = {
experiments: { asyncWebAssembly: true },
};<script type="module">
import init, { Soup } from 'https://esm.sh/@fast-scrape/wasm';
await init();
const soup = new Soup('<div>Hello</div>');
console.log(soup.find('div').text);
</script>import init, { Soup, Tag } from '@fast-scrape/wasm';
await init();
function extractLinks(soup: Soup): string[] {
return soup.select("a[href]").map(a => a.getAttribute("href") ?? "");
}Performance
Native-speed parsing in browsers with SIMD acceleration:
| Operation | @fast-scrape/wasm | Native DOMParser | Notes | |-----------|------------------|------------------|-------| | Parse 100KB HTML | 2.1 ms | 3.2 ms | 1.5x faster | | find(".class") | 0.3 µs | N/A | CSS selector optimization | | find("#id") | 0.2 µs | N/A | ID selector optimization | | Memory (100KB doc) | 8.4 MB | 12.2 MB | 30% more efficient |
Key advantages:
- Compiled Rust guarantees memory safety
- CSS selectors run in nanoseconds
- Automatic SIMD acceleration on modern browsers
- 50-70% memory reduction via zero-copy serialization
Bundle size
Optimized package under 500 KB:
| Build | Size | |-------|------| | Minified + gzip | 285 KB | | Minified | ~400 KB |
[!TIP] SIMD enabled automatically on Chrome 91+, Firefox 89+, Safari 16.4+. Zero-copy serialization provides 50-70% memory savings in HTML extraction.
Browser support
| Browser | Version | SIMD | |---------|---------|------| | Chrome | 80+ | 91+ | | Firefox | 75+ | 89+ | | Safari | 13+ | 16.4+ | | Edge | 80+ | 91+ |
Built on Servo and Cloudflare
Parsing & Selection (Servo browser engine):
Streaming Parser (Cloudflare):
- lol_html — High-performance streaming HTML parser with constant-memory event-driven API
Related packages
| Platform | Package |
|----------|---------|
| Rust | scrape-core |
| Python | fast-scrape |
| Node.js | @fast-scrape/node |
License
MIT OR Apache-2.0
