flash-rerank-wasm
v0.2.1
Published
WASM module for Flash-Rerank — browser and edge inference via tract
Downloads
28
Readme
flash-rerank-wasm
The fastest neural reranker — now in your browser. Client-side cross-encoder inference via WebAssembly. Documents never leave the device.
Install
npm install flash-rerank-wasmUsage
import init, { load_model, rerank } from 'flash-rerank-wasm';
// Initialize the WASM module
await init();
// Load a quantized ONNX model (fetch the model bytes and tokenizer JSON yourself)
const modelBytes = await fetch('/models/minilm-l6-int8.onnx').then(r => r.arrayBuffer());
const tokenizerJson = await fetch('/models/tokenizer.json').then(r => r.text());
load_model(new Uint8Array(modelBytes), tokenizerJson);
// Rerank documents
const results = rerank(
"What is the capital of France?",
["Paris is the capital of France.", "Berlin is in Germany.", "London is in the UK."],
2 // top_k
);
console.log(results);
// [{ index: 0, score: 0.94 }, { index: 2, score: 0.12 }]Features
- Client-side inference — No server roundtrip. Documents stay on the user's device.
- ~3MB gzipped bundle (excluding model weights)
- Calibrated scores — Sigmoid-normalized output in [0.0, 1.0], identical contract to the Rust and Python APIs
- Any ONNX cross-encoder — Bring your own model from HuggingFace Hub
- Web Worker compatible — Run inference off the main thread
Recommended Models
| Model | Size | Use Case | |-------|------|----------| | ms-marco-MiniLM-L-6-v2 (INT8) | ~22 MB | Fast, small footprint | | ms-marco-MiniLM-L-6-v2 (FP32) | ~87 MB | Higher accuracy, larger download |
Use INT8 quantized models for browser deployment. FP32 models work but are large for browser delivery.
Part of Flash-Rerank
This is the WASM target of Flash-Rerank, the world's fastest neural reranker. Also available as:
- Rust:
cargo add flash_rerank - Python:
pip install flash-rerank - CLI:
cargo install flash-rerank-cli - HTTP Server:
flash-rerank serve
Pairs with BM25-Turbo for 80ms end-to-end search + reranking across 8.8M documents.
License
AGPL-3.0-or-later. Commercial license available at alessandrobenigni.com.
