fast-theta
v0.2.0
Published
WASM SIMD-accelerated cosine similarity for vectors
Maintainers
Readme
Fast Theta
Damn fast cosine similarity computation with Web Assembly SIMD. Works in both Node.js and browser.
Install
npm install fast-thetaUsage
import { normalize, getSimilarity } from "fast-theta";
// Pre-normalize vectors once (accepts Float32Array or number[])
const a = normalize(new Float32Array([1, 2, 3, 4]));
const b = normalize([4, 3, 2, 1]);
// Fast dot-product similarity (assumes pre-normalized vectors)
const sim = getSimilarity(a, b);
// Or compute full cosine similarity in one call (normalizes inside WASM)
const sim2 = getSimilarity([1, 2, 3, 4], [4, 3, 2, 1], { normalize: true });API
normalize(vec: Float32Array | number[]): Float32Array
L2-normalize a vector. Returns a new Float32Array with unit length.
getSimilarity(vec1: Float32Array | number[], vec2: Float32Array | number[], options?): number
Compute similarity between two vectors.
- Default (no options): computes the dot product. Assumes vectors are already normalized, in which case the dot product equals cosine similarity.
{ normalize: true }: normalizes both vectors and computes the dot product entirely inside WASM, minimizing JS↔WASM round trips.
Performance
WASM SIMD with 4× loop unrolling and independent accumulators. Benchmarked against popular JS libraries:
| Library | 384 dims | 768 dims | 1536 dims | | ------------------------------- | -------------- | -------------- | -------------- | | fast-theta (pre-normalized) | 3.3M ops/s | 2.6M ops/s | 1.8M ops/s | | cos-similarity | 1.8M ops/s | 1.1M ops/s | 597K ops/s | | fast-cosine-similarity | 1.4M ops/s | 847K ops/s | 414K ops/s | | compute-cosine-similarity | 503K ops/s | 260K ops/s | 134K ops/s |
Development
npm install
npm run compile-wat # Compile .wat → WASM binary
npm run build # Full build (compile + vite + types)
npm test # Run unit tests
npm run bench:node # Node.js benchmarks
npm run bench:web # Web benchmark page (open in browser)How it works
- Hand-written WebAssembly Text Format (WAT) with SIMD instructions
f32x4SIMD operations process 4 floats per instruction- 4× loop unrolling with independent accumulators for ILP
- Zero-copy IPC via shared
WebAssembly.MemoryandFloat32Arrayviews - WASM binary embedded as base64 (< 1KB)
Related
- Need kNN vector search? Try Eigen DB.
