@zlaabsi/turboquant-wasm
v0.1.1
Published
TurboQuant vector quantization for the browser — compress embeddings 8x, search client-side
Maintainers
Readme
turboquant-wasm is a Rust/WebAssembly implementation of the TurboQuant MSE variant (Algorithm 1 from the paper). It is built for applications that already have embeddings and want local retrieval without shipping a vector database or a graph index.
Why this repo does not ship the QJL variant
The short version is that QJL works against the main design goal of turboquant-wasm: keep browser-side retrieval small and memory-efficient.
- QJL adds an extra projection matrix, which materially increases runtime memory pressure.
- In browser and WASM settings, that extra matrix becomes expensive quickly, especially once embedding dimensions get large.
- The MSE variant already gives strong recall for the bit-rates this repo actually targets in practice, especially at
3+bits. - For this project, the tradeoff was not worth it: more complexity and more memory, without fitting the core promise of a tiny browser-first package.
So the repo deliberately optimizes for the TurboQuant MSE path: smaller package, lower memory footprint, simpler runtime story.
At a glance
- Small web package. The current measured browser npm build is about
30.3 KiBgzip. - Aggressive compression. With
4-bitquantization, a384dvector takes about196 Band a768dvector about388 B. - Direct search on compressed vectors. No full decode step on every query.
- Portable packaging. Runs in browsers, Node.js, and WASM-friendly edge runtimes.
- Persistence built in. Save indexes with
save()and restore them withIndex.load(). - Example-first repo. Includes browser, WebGPU, and Cloudflare demos.
Bundle Size Analysis
Current turboquant-wasm bundle numbers below come from the latest measured snapshot in benchmarks/results/2026-04-09-m1-max-node22.json. That snapshot keeps the 2026-04-08 search measurements and refreshes the browser npm package size to the current pkg-bundler/ output. Alternative-library rows are maintained comparison estimates from benchmarks/wasm_analysis.md, not a fresh side-by-side rerun in this repo.
Current measured package
The npm browser entrypoint now ships the wasm-pack --target bundler output rather than the raw web loader. That keeps the published package free of a runtime fetch()-based Wasm bootstrap, which avoids the Socket alert on pkg/turboquant_wasm.js while still keeping the repo-local demos on the plain web target.
Comparison with alternative browser-side vector search libraries
turboquant-wasm is materially smaller than graph-based WASM alternatives. That matters most for edge deployments, mobile web, and embedded search widgets where bundle budget is tight.
Why it stays small
- No HNSW graph or graph-tuning machinery in the binary.
- No external native dependency stack, BLAS, or LAPACK.
- A small core: PRNG, orthogonalization, centroid tables, scalar quantization, packed storage, and compressed brute-force scan.
- Size-oriented WASM build settings, plus a design that matches the algorithm instead of wrapping a larger ANN engine.
Feature Comparison
This table keeps the product-level comparison from benchmarks/wasm_analysis.md, but refreshes the turboquant-wasm numbers to the current implementation.
Key Advantages Summary
Good fit
- Static-site search for docs, blogs, and catalogs
- Local-first semantic search in PWAs or desktop apps
- Client-side RAG where documents never leave the machine
- Browser extensions indexing tabs or notes locally
- Edge APIs with a prebuilt compressed index
Probably not the right tool
- Very large corpora where you want graph-based ANN over
100k+vectors - Workloads that need sub-millisecond latency at large
N - Benchmarks where you need a mature head-to-head comparison suite today
Install
npm install @zlaabsi/turboquant-wasmFor npm consumers, the browser entrypoint is packaged with the wasm-pack bundler target. The repo-local examples/ continue to use the raw web target in pkg/.
Quick start
Minimal usage
import { createQuantizer } from "@zlaabsi/turboquant-wasm";
const dim = 384;
const bits = 4;
const quantizer = await createQuantizer({ dim, bits });
const index = quantizer.buildIndex(embeddings, nVectors);
const resultIds = index.search(queryEmbedding, 10);Persist and reload
import { createQuantizer, Index } from "@zlaabsi/turboquant-wasm";
const quantizer = await createQuantizer({ dim: 384, bits: 4 });
const index = quantizer.buildIndex(embeddings, nVectors);
const bytes = index.save();
const restored = Index.load(bytes, quantizer);
const resultIds = restored.search(queryEmbedding, 10);Build from source
rustup target add wasm32-unknown-unknown
cargo install wasm-pack
git clone https://github.com/zlaabsi/turboquant-wasm.git
cd turboquant-wasm
npm run buildUse npm run build:node when you also want the Node.js target in pkg-node/.
Try the examples
npm run build
python3 -m http.server 8080Then open:
http://localhost:8080/examples/browser/http://localhost:8080/examples/transformers-js/http://localhost:8080/examples/onnx-webgpu/
Example matrix:
More detail: examples/README.md
Cookbook
Use these guides when you want an integration pattern instead of a toy demo:
Performance snapshot
Honest version: the implementation looks useful for moderate corpus sizes, but this repo still does not have a full benchmark suite across devices, browsers, public datasets, and competing libraries.
The table below is the current source of truth for measured TurboQuant behavior in this repo. The old March analysis mixed theory, estimates, and older implementation assumptions; benchmarks/wasm_analysis.md now explains explicitly why current measured search latency is higher than those early estimates.
Current evidence is a local snapshot on:
Apple M1 MaxNode v22.11.0npm 10.9.0Darwin 25.3.0 arm64- synthetic clustered embeddings
That means the numbers below are directional evidence, not a universal SLA.
Current snapshot
Charts
Raw benchmark data
- Packaging note: the
2026-04-09snapshot refreshes bundle-size fields for the current npm browser package, while the raw search log remains the2026-04-08run. - Snapshot JSON: benchmarks/results/2026-04-09-m1-max-node22.json
- Raw console log: benchmarks/results/2026-04-08-m1-max-node22-realworld.txt
- Chart generator: benchmarks/render_charts.js
Comparative context
The charts above are about turboquant-wasm alone. The charts below add comparative context using the positioning tables in benchmarks/wasm_analysis.md.
Important caveat: these comparative plots are not a fresh controlled benchmark suite run side-by-side in this repo. The TurboQuant bars use the current measured package size and current packed storage model; the alternative-library bars come from the maintained comparison estimates in benchmarks/wasm_analysis.md. They are here for positioning and tradeoff discussion, not to pretend we already have airtight head-to-head numbers.
Reading guide: purple is the current measured turboquant-wasm result, gray bars are the comparison points documented in benchmarks/wasm_analysis.md, and the small labels under the gray bars show the relative overhead versus TurboQuant.
What is still missing
- repeated runs with variance reporting
- lower-variance harnesses for build and search sweeps
- browser benchmarks on low-end and mid-range hardware
- public real-world embedding corpora
- head-to-head comparisons against exact float32 search and graph-based ANN libraries
API and package notes
- Install from npm with
@zlaabsi/turboquant-wasm - Repository:
github.com/zlaabsi/turboquant-wasm - Primary workflow: create quantizer -> build or stream index -> save/load -> search
- Generated artifacts live in
pkg/andpkg-node/
Development
For local workflow, release process, and commit conventions, see CONTRIBUTING.md.
Common commands:
npm run build
npm run build:node
npm run test
npm run verify
npm run bench:realworld
npm run bench:chartsReferences
License
Apache-2.0
