turboquant-search

v0.1.1

Published

2 months ago

Vector search for JSON datasets. Build quantized indexes and search with WASM SIMD.

0High
0Medium
0Low

hemanth

vector-search similarity wasm simd quantization embeddings search json

turboquant-search

Vector search for JSON datasets. Build quantized indexes and search with WASM SIMD.

Takes any JSON array, embeds text fields into vectors, compresses them with 3-bit quantization, and runs similarity search entirely via WebAssembly SIMD, in the browser or Node.js.

Install

npm install turboquant-search

Quick Start

import { TurboSearch } from 'turboquant-search';

// Build from any JSON array
const ts = await TurboSearch.from(products, {
  fields: ['name', 'description', 'tags'],
});

// Text search
const results = await ts.search('wireless audio bluetooth', { topK: 5 });
// => [{ index: 0, score: 0.94, data: { name: 'Wireless Headphones', ... } }]

// Find similar items
const similar = ts.similar(0, { topK: 5 });

// Save for later
await ts.save('./products.index.json');

// Load a pre-built index
const loaded = await TurboSearch.load('./products.index.json');

// Clean up
ts.destroy();

CLI

# Build an index
npx tqs build --input products.json --fields "name,description,tags" --output search.json

# Inspect an index
npx tqs info search.json

API

`TurboSearch.from(data, options)`

Build a search index from a JSON array.

| Option | Type | Default | Description | |---|---|---|---| | fields | string[] | required | Fields to embed | | dim | number | 384 | Embedding dimensions | | bits | number | 3 | Quantization bits | | seed | number | 42 | Random seed | | embedder | Embedder | keyword | Custom embedder |

`TurboSearch.load(pathOrUrl)`

Load a pre-built index from a file or URL.

Instance Methods

ts.search(query, { topK })     // text search
ts.similar(index, { topK })    // find similar items
ts.vectorSearch(vec, { topK }) // search by embedding
ts.save(path)                  // save index to disk
ts.size                        // number of indexed items
ts.destroy()                   // clean up WASM

Custom Embedder

// Works with any embedding source: transformers.js, OpenAI, Gemini, Cohere, etc.
import { pipeline } from '@xenova/transformers';

const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');

const ts = await TurboSearch.from(data, {
  fields: ['text'],
  embedder: {
    async embed(text, dim) {
      const output = await extractor(text, { pooling: 'mean', normalize: true });
      return new Float32Array(output.data);
    },
  },
});

Scalability

| Items | Index Size | Search Time | |---|---|---| | 100 | ~14 KB | <1ms | | 10,000 | ~1.4 MB | ~5ms | | 50,000 | ~7 MB | ~15ms | | 100,000 | ~14 MB | ~30ms |

How It Works

Text extraction - concatenates specified JSON fields per item
Embedding - TF-IDF keyword hashing into 384-dim vectors (or your custom embedder)
Quantization - 3-bit TurboQuant compression (1,536 bytes to ~144 bytes per vector)
Search - WASM SIMD dot products, returns top-K results

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

turboquant-search

Install

Quick Start

CLI

API

TurboSearch.from(data, options)

TurboSearch.load(pathOrUrl)

Instance Methods

Custom Embedder

Scalability

How It Works

License

`TurboSearch.from(data, options)`

`TurboSearch.load(pathOrUrl)`