npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@trovec/core

v2.0.0

Published

Lightweight, zero-dependency vector database library for Node.js

Readme

@trovec/core

A lightweight, zero-dependency vector database library for Node.js. Store, query, and persist vector embeddings with support for multiple quantization types and similarity metrics.

Features

  • Zero runtime dependencies — only Node.js required
  • Multiple quantization modes — F32 (full precision), INT8 (compressed), BIT (binary)
  • Four similarity metrics — Cosine, Euclidean, Dot Product, Hamming
  • Fluent APIdb.add(), db.query(), db.queryByText() — clean and discoverable
  • Functional API — stateless functions for tree-shaking and backward compatibility
  • Dual ESM/CJS — works with both import and require
  • TypeScript-first — full type definitions included
  • Mixed ID types — supports both string and bigint entry IDs
  • Pluggable Embedder — bring your own embedding adapter for text-to-vector conversion

Quick Start

Installation

npm install @trovec/core

Basic Usage

import { create } from '@trovec/core';

// 1. Create an instance
const db = await create({ dimensions: 3 });

// 2. Add entries
db.add({ id: 'cat', embedding: [0.9, 0.1, 0.0], context: { type: 'animal' } });
db.add({ id: 'dog', embedding: [0.8, 0.2, 0.0], context: { type: 'animal' } });
db.add({ id: 'car', embedding: [0.0, 0.1, 0.9], context: { type: 'vehicle' } });

// 3. Query for similar vectors
const results = db.query({ vector: [1, 0, 0], topK: 2 });

console.log(results);
// [
//   { id: 'cat', score: 0.993..., context: { type: 'animal' } },
//   { id: 'dog', score: 0.970..., context: { type: 'animal' } }
// ]

With Quantization and Filtering

import { create } from '@trovec/core';

const db = await create({
  dimensions: 128,
  quantization: 'INT8',    // compress vectors to int8
  metric: 'euclidean',
});

// Batch insert
db.addMany([
  { id: 1n, embedding: new Array(128).fill(0.5), context: { category: 'A' } },
  { id: 2n, embedding: new Array(128).fill(0.3), context: { category: 'B' } },
  { id: 3n, embedding: new Array(128).fill(0.7), context: { category: 'A' } },
]);

// Query with filter
const results = db.query({
  vector: new Array(128).fill(0.6),
  topK: 5,
  filter: (ctx) => ctx?.category === 'A',
});

Persistence

Trovec provides two built-in storage drivers:

File Storage (recommended for most use cases)

Persists data to disk with automatic Brotli compression. Data survives app restarts.

import { create, createFileDriver } from '@trovec/core';

// Zero-config: defaults to .trovec/ directory with Brotli compression
const driver = createFileDriver();

// Or customize:
// const driver = createFileDriver({
//   directory: './my-data',    // default: '.trovec'
//   compression: true,         // default: true (Brotli)
//   compressionLevel: 1,       // default: 1 (fast), range: 0-11
// });

const db = await create({
  dimensions: 3,
  storageDriver: driver,
  collectionId: 'my-collection',
});
db.add({ id: 'a', embedding: [1, 2, 3] });
// Data auto-persists after a short debounce (default: 500ms)

// When done, close() flushes any pending changes and cleans up
await db.close();

// Later: create() auto-loads existing data from storage
const db2 = await create({
  dimensions: 3,
  storageDriver: driver,
  collectionId: 'my-collection',
});
// db2 already has the previously saved entries — no manual load needed

// Clean up all stored files when no longer needed
await driver.destroy();

The file driver:

  • Auto-creates the directory on first write
  • Uses atomic writes (temp file + rename) to prevent corruption
  • Applies Brotli compression by default (typically 60-80% size reduction)
  • Exposes driver.directory for inspecting the resolved path

Auto-flush: When a storageDriver is configured, data is automatically persisted after a short debounce (default: 500ms). You can disable this with autoFlush: false or customize the delay with autoFlush: 2000 (ms). See Configuration for details.

Memory Storage (for testing and ephemeral data)

Stores data in a Map — fast, but data is lost when the process exits.

import { create, createMemoryDriver } from '@trovec/core';

const driver = createMemoryDriver();
const db = await create({ dimensions: 3, storageDriver: driver, collectionId: 'test' });

db.add({ id: 'a', embedding: [1, 2, 3] });
// Auto-flushes after debounce; or call close() for immediate flush + cleanup
await db.close();

// Data auto-loads on create()
const db2 = await create({ dimensions: 3, storageDriver: driver, collectionId: 'test' });

Text Embedding (with adapter)

Trovec provides an Embedder interface for text-to-vector conversion. Install an adapter package, then use text-based methods:

import { create } from '@trovec/core';
import { createOpenAIEmbedder } from '@trovec/embedder-openai'; // adapter package

const db = await create({
  dimensions: 1536,
  embedder: createOpenAIEmbedder({ apiKey: process.env.OPENAI_API_KEY }),
});

// Add entries using text — embedding happens automatically
await db.addWithText({ id: 'doc1', text: 'The cat sat on the mat', context: { source: 'book' } });
await db.addWithText({ id: 'doc2', text: 'Dogs love to play fetch' });

// Query using text
const results = await db.queryByText({ text: 'animals sitting', topK: 5 });

No built-in embedder is included — this keeps Trovec zero-dependency. Available adapters:

| Adapter | Dimensions | Notes | |---------|-----------|-------| | @trovec/embedder-local | 64 | Trigram hash, zero deps, offline — for testing/demos | | @trovec/embedder-ollama | 768 | Local Ollama server, no API key — good semantic quality | | @trovec/embedder-openai | 1536 | OpenAI API — best semantic quality |

See Writing an Embedder Adapter below for how to create your own.

API Reference

create() returns a Trovec object with bound methods. All examples below use the fluent style. A functional API is also available for tree-shaking and backward compatibility (see Functional API).

Lifecycle

| Method | Signature | Description | |--------|-----------|-------------| | create | (config: TrovecConfig) => Promise<Trovec> | Create a new instance (auto-loads from storage) | | db.flush() | () => Promise<void> | Persist all data to storage immediately | | db.close() | () => Promise<void> | Flush pending changes and disable auto-flush | | db.stats() | () => TrovecStats | Get instance statistics |

Collection Operations

| Method | Signature | Description | |--------|-----------|-------------| | db.add(entry) | (entry: Entry) => void | Insert or replace an entry | | db.addMany(entries) | (entries: Entry[]) => void | Atomic batch insert (all-or-nothing) | | db.delete(id) | (id: EntryId) => boolean | Remove an entry, returns true if it existed | | db.get(id) | (id: EntryId) => Entry \| undefined | Retrieve an entry by ID |

Query

| Method | Signature | Description | |--------|-----------|-------------| | db.query(params) | (params: QueryParams) => QueryResult[] | Similarity search |

QueryParams:

  • vector: number[] — the query vector
  • topK?: number — max results to return (default: 10)
  • filter?: (context) => boolean — pre-scoring filter function

Embedder (text-based operations)

| Method | Signature | Description | |--------|-----------|-------------| | db.embed(input) | (input: string) => Promise<EmbedResult> | Embed a single string | | db.embedMany(input) | (input: string[]) => Promise<EmbedResult[]> | Embed multiple strings | | db.addWithText(entry) | (entry: TextEntry) => Promise<void> | Embed text and add entry | | db.addManyWithText(entries) | (entries: TextEntry[]) => Promise<void> | Batch embed and add entries | | db.queryByText(params) | (params: TextQueryParams) => Promise<QueryResult[]> | Embed query text and search |

All embedder methods throw TrovecError if no embedder is configured.

Serialization

| Method | Signature | Description | |--------|-----------|-------------| | db.serialize() | () => Buffer | Serialize all entries to a binary buffer | | db.deserialize(buffer) | (buffer: Buffer) => void | Restore entries from a binary buffer |

Functional API

Every fluent method is also available as a standalone function that takes the instance as the first argument. This is useful for tree-shaking or when you prefer a functional style:

import { create, add, query, close } from '@trovec/core';

const db = await create({ dimensions: 3 });
add(db, { id: 'a', embedding: [1, 2, 3] });
const results = query(db, { vector: [1, 2, 3], topK: 1 });
await close(db);

Trovec objects are fully compatible with functional functions — you can mix and match both styles.

Configuration

interface TrovecConfig {
  dimensions: number;                  // required: vector dimensionality
  quantization?: 'F32' | 'INT8' | 'BIT';  // default: 'F32'
  metric?: 'cosine' | 'euclidean' | 'dot' | 'hamming'; // default: 'cosine'
  storageDriver?: StorageDriver;       // default: no-op (in-memory only)
  embedder?: Embedder;                 // default: none (install an adapter)
  collectionId?: string;               // default: auto-generated ('trovec_1', etc.)
  autoFlush?: boolean | number;        // default: true when storageDriver is set
}

Notes:

  • The hamming metric requires BIT quantization.
  • autoFlush: true (default with a storage driver) enables debounced auto-persistence with a 500ms delay. Pass a number for a custom delay in ms, or false to disable (manual flush() only).

Architecture

src/
  index.ts                   Public API barrel export
  types.ts                   All type definitions (including Trovec interface)
  errors.ts                  TrovecError, DimensionMismatchError, InvalidConfigError
  validation.ts              Config/embedding validation, ID serialization
  core.ts                    create(), flush(), stats()
  fluent.ts                  wrapInstance() — binds methods to create the Trovec object
  collection.ts              add(), addMany(), delete(), get()
  query.ts                   Brute-force similarity search
  embedder.ts                Text-based convenience functions (embed, addWithText, queryByText)
  serialization.ts           Binary format for persistence
  quantization/
    index.ts                 Codec dispatcher
    f32.ts                   Float64 passthrough
    int8.ts                  Min-max linear mapping to [-128, 127]
    bit.ts                   Sign-threshold bit packing
  similarity/
    index.ts                 Metric dispatcher
    cosine.ts                dot(a,b) / (||a|| * ||b||)
    euclidean.ts             1 / (1 + distance)
    dot.ts                   Raw dot product
    hamming.ts               Matching bits / total bits
  storage/
    index.ts                 StorageDriver re-export
    memory.ts                In-memory Map-backed driver
    file.ts                  File system driver with Brotli compression

How It Works

  1. create() validates configuration, resolves the quantization codec and similarity function once, checks the storage driver for existing data (auto-deserializes if found), and returns a Trovec object — the raw instance enriched with bound methods that delegate to the functional implementations (zero logic duplication).

  2. add() / addMany() validates embedding dimensions, quantizes the vector through the codec, and stores the quantized representation in a Map<string, StoredEntry>. addMany validates all entries before mutating any state (atomic semantics).

  3. query() quantizes the query vector, iterates all entries (brute-force), applies the optional filter, computes similarity scores, sorts descending with deterministic tie-breaking (lower ID first), and returns the top-K results.

  4. get() dequantizes the stored vector back to number[] before returning, so callers always receive float arrays regardless of the quantization mode.

  5. flush() serializes all entries into a binary buffer and writes it through the StorageDriver interface. When auto-flush is enabled, this is called automatically after a debounce delay following mutations. close() flushes any pending changes, removes the beforeExit safety handler, and disables further auto-flush scheduling.

Internal Precision

All math operations use float64 precision internally (Float64Array). The quantization type (F32, INT8, BIT) controls storage compression, not computation precision.

Extensibility

Three extension points are available:

  • Embedder — text-to-vector conversion (see below)
  • QuantizationCodec — implement encode(embedding) => QuantizedVector and decode(quantized) => number[]
  • SimilarityFn — implement (a: QuantizedVector, b: QuantizedVector) => number

Writing an Embedder Adapter

An embedder adapter is any object that implements the Embedder interface:

import type { Embedder, EmbedResult } from '@trovec/core';

export function createMyEmbedder(options: { apiKey: string }): Embedder {
  return {
    async embed(input: string): Promise<EmbedResult> {
      // Call your embedding API/model here
      const embedding = await callEmbeddingAPI(input, options.apiKey);
      return { embedding };
    },
    async embedMany(inputs: string[]): Promise<EmbedResult[]> {
      // Batch implementation (or loop over embed())
      return Promise.all(inputs.map((input) => this.embed(input)));
    },
  };
}

Publish as a separate package (e.g., @trovec/embedder-mymodel) to keep Trovec zero-dependency.

How Persistence Works

When using a storage driver, all data is loaded into memory for querying:

  1. On create(), existing data is automatically read from the storage driver and deserialized into an in-memory Map. Use a stable collectionId to ensure the same data is loaded across restarts.
  2. Queries run entirely in-memory via brute-force scan — the storage driver is never touched during search.
  3. Auto-flush — after each mutation (add, addMany, delete), a debounced timer schedules a flush(). Multiple rapid mutations are batched into a single write. A beforeExit handler provides a safety net: if the process exits gracefully without an explicit close(), pending changes are still persisted.
  4. On close(), any pending changes are flushed immediately, the debounce timer is cleared, and the beforeExit handler is removed. Read operations (get, query, stats) continue to work after close().
  5. On flush(), all entries are serialized and written back to storage. Manual flush() calls are still supported alongside auto-flush.

This design keeps queries fast (sub-millisecond for thousands of entries) but means the full dataset must fit in memory.

Future Improvement Considerations

For larger datasets that exceed available memory, several strategies could be explored:

  • Streaming query — read and score entries in chunks directly from the binary buffer, keeping only the top-K results in a min-heap. Memory usage becomes O(K) instead of O(N).
  • Partitioned storage — split collections into fixed-size shards (e.g., 10K entries each). Query loads one shard at a time, merging top-K across shards. Memory stays bounded to a single shard.
  • Memory-mapped files — use mmap to map .trovec files into virtual address space. The OS pages data in/out on demand, giving near-memory speed for hot data without loading everything.
  • Approximate Nearest Neighbor (ANN) indexing — replace brute-force with structures like HNSW or IVF that only visit a subset of vectors per query. Index metadata stays in memory while vectors can remain on disk.
  • Hot/cold tiering — keep recently accessed entries in an LRU cache, everything else on disk. Queries hit the cache first, fall back to disk for misses.

Development

npm install          # install dev dependencies
npm test             # run tests (vitest)
npm run test:watch   # run tests in watch mode
npm run build        # compile to dist/esm + dist/cjs
npm run clean        # remove dist/

License

MIT