turboquant-search
v0.1.1
Published
Vector search for JSON datasets. Build quantized indexes and search with WASM SIMD.
Maintainers
Readme
turboquant-search
Vector search for JSON datasets. Build quantized indexes and search with WASM SIMD.
Takes any JSON array, embeds text fields into vectors, compresses them with 3-bit quantization, and runs similarity search entirely via WebAssembly SIMD, in the browser or Node.js.
Install
npm install turboquant-searchQuick Start
import { TurboSearch } from 'turboquant-search';
// Build from any JSON array
const ts = await TurboSearch.from(products, {
fields: ['name', 'description', 'tags'],
});
// Text search
const results = await ts.search('wireless audio bluetooth', { topK: 5 });
// => [{ index: 0, score: 0.94, data: { name: 'Wireless Headphones', ... } }]
// Find similar items
const similar = ts.similar(0, { topK: 5 });
// Save for later
await ts.save('./products.index.json');
// Load a pre-built index
const loaded = await TurboSearch.load('./products.index.json');
// Clean up
ts.destroy();CLI
# Build an index
npx tqs build --input products.json --fields "name,description,tags" --output search.json
# Inspect an index
npx tqs info search.jsonAPI
TurboSearch.from(data, options)
Build a search index from a JSON array.
| Option | Type | Default | Description |
|---|---|---|---|
| fields | string[] | required | Fields to embed |
| dim | number | 384 | Embedding dimensions |
| bits | number | 3 | Quantization bits |
| seed | number | 42 | Random seed |
| embedder | Embedder | keyword | Custom embedder |
TurboSearch.load(pathOrUrl)
Load a pre-built index from a file or URL.
Instance Methods
ts.search(query, { topK }) // text search
ts.similar(index, { topK }) // find similar items
ts.vectorSearch(vec, { topK }) // search by embedding
ts.save(path) // save index to disk
ts.size // number of indexed items
ts.destroy() // clean up WASMCustom Embedder
// Works with any embedding source: transformers.js, OpenAI, Gemini, Cohere, etc.
import { pipeline } from '@xenova/transformers';
const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
const ts = await TurboSearch.from(data, {
fields: ['text'],
embedder: {
async embed(text, dim) {
const output = await extractor(text, { pooling: 'mean', normalize: true });
return new Float32Array(output.data);
},
},
});Scalability
| Items | Index Size | Search Time | |---|---|---| | 100 | ~14 KB | <1ms | | 10,000 | ~1.4 MB | ~5ms | | 50,000 | ~7 MB | ~15ms | | 100,000 | ~14 MB | ~30ms |
How It Works
- Text extraction - concatenates specified JSON fields per item
- Embedding - TF-IDF keyword hashing into 384-dim vectors (or your custom embedder)
- Quantization - 3-bit TurboQuant compression (1,536 bytes to ~144 bytes per vector)
- Search - WASM SIMD dot products, returns top-K results
License
MIT
