@trovec/core
v2.0.0
Published
Lightweight, zero-dependency vector database library for Node.js
Maintainers
Readme
@trovec/core
A lightweight, zero-dependency vector database library for Node.js. Store, query, and persist vector embeddings with support for multiple quantization types and similarity metrics.
Features
- Zero runtime dependencies — only Node.js required
- Multiple quantization modes — F32 (full precision), INT8 (compressed), BIT (binary)
- Four similarity metrics — Cosine, Euclidean, Dot Product, Hamming
- Fluent API —
db.add(),db.query(),db.queryByText()— clean and discoverable - Functional API — stateless functions for tree-shaking and backward compatibility
- Dual ESM/CJS — works with both
importandrequire - TypeScript-first — full type definitions included
- Mixed ID types — supports both
stringandbigintentry IDs - Pluggable Embedder — bring your own embedding adapter for text-to-vector conversion
Quick Start
Installation
npm install @trovec/coreBasic Usage
import { create } from '@trovec/core';
// 1. Create an instance
const db = await create({ dimensions: 3 });
// 2. Add entries
db.add({ id: 'cat', embedding: [0.9, 0.1, 0.0], context: { type: 'animal' } });
db.add({ id: 'dog', embedding: [0.8, 0.2, 0.0], context: { type: 'animal' } });
db.add({ id: 'car', embedding: [0.0, 0.1, 0.9], context: { type: 'vehicle' } });
// 3. Query for similar vectors
const results = db.query({ vector: [1, 0, 0], topK: 2 });
console.log(results);
// [
// { id: 'cat', score: 0.993..., context: { type: 'animal' } },
// { id: 'dog', score: 0.970..., context: { type: 'animal' } }
// ]With Quantization and Filtering
import { create } from '@trovec/core';
const db = await create({
dimensions: 128,
quantization: 'INT8', // compress vectors to int8
metric: 'euclidean',
});
// Batch insert
db.addMany([
{ id: 1n, embedding: new Array(128).fill(0.5), context: { category: 'A' } },
{ id: 2n, embedding: new Array(128).fill(0.3), context: { category: 'B' } },
{ id: 3n, embedding: new Array(128).fill(0.7), context: { category: 'A' } },
]);
// Query with filter
const results = db.query({
vector: new Array(128).fill(0.6),
topK: 5,
filter: (ctx) => ctx?.category === 'A',
});Persistence
Trovec provides two built-in storage drivers:
File Storage (recommended for most use cases)
Persists data to disk with automatic Brotli compression. Data survives app restarts.
import { create, createFileDriver } from '@trovec/core';
// Zero-config: defaults to .trovec/ directory with Brotli compression
const driver = createFileDriver();
// Or customize:
// const driver = createFileDriver({
// directory: './my-data', // default: '.trovec'
// compression: true, // default: true (Brotli)
// compressionLevel: 1, // default: 1 (fast), range: 0-11
// });
const db = await create({
dimensions: 3,
storageDriver: driver,
collectionId: 'my-collection',
});
db.add({ id: 'a', embedding: [1, 2, 3] });
// Data auto-persists after a short debounce (default: 500ms)
// When done, close() flushes any pending changes and cleans up
await db.close();
// Later: create() auto-loads existing data from storage
const db2 = await create({
dimensions: 3,
storageDriver: driver,
collectionId: 'my-collection',
});
// db2 already has the previously saved entries — no manual load needed
// Clean up all stored files when no longer needed
await driver.destroy();The file driver:
- Auto-creates the directory on first write
- Uses atomic writes (temp file + rename) to prevent corruption
- Applies Brotli compression by default (typically 60-80% size reduction)
- Exposes
driver.directoryfor inspecting the resolved path
Auto-flush: When a
storageDriveris configured, data is automatically persisted after a short debounce (default: 500ms). You can disable this withautoFlush: falseor customize the delay withautoFlush: 2000(ms). See Configuration for details.
Memory Storage (for testing and ephemeral data)
Stores data in a Map — fast, but data is lost when the process exits.
import { create, createMemoryDriver } from '@trovec/core';
const driver = createMemoryDriver();
const db = await create({ dimensions: 3, storageDriver: driver, collectionId: 'test' });
db.add({ id: 'a', embedding: [1, 2, 3] });
// Auto-flushes after debounce; or call close() for immediate flush + cleanup
await db.close();
// Data auto-loads on create()
const db2 = await create({ dimensions: 3, storageDriver: driver, collectionId: 'test' });Text Embedding (with adapter)
Trovec provides an Embedder interface for text-to-vector conversion. Install an adapter package, then use text-based methods:
import { create } from '@trovec/core';
import { createOpenAIEmbedder } from '@trovec/embedder-openai'; // adapter package
const db = await create({
dimensions: 1536,
embedder: createOpenAIEmbedder({ apiKey: process.env.OPENAI_API_KEY }),
});
// Add entries using text — embedding happens automatically
await db.addWithText({ id: 'doc1', text: 'The cat sat on the mat', context: { source: 'book' } });
await db.addWithText({ id: 'doc2', text: 'Dogs love to play fetch' });
// Query using text
const results = await db.queryByText({ text: 'animals sitting', topK: 5 });No built-in embedder is included — this keeps Trovec zero-dependency. Available adapters:
| Adapter | Dimensions | Notes | |---------|-----------|-------| |
@trovec/embedder-local| 64 | Trigram hash, zero deps, offline — for testing/demos | |@trovec/embedder-ollama| 768 | Local Ollama server, no API key — good semantic quality | |@trovec/embedder-openai| 1536 | OpenAI API — best semantic quality |See Writing an Embedder Adapter below for how to create your own.
API Reference
create() returns a Trovec object with bound methods. All examples below use the fluent style. A functional API is also available for tree-shaking and backward compatibility (see Functional API).
Lifecycle
| Method | Signature | Description |
|--------|-----------|-------------|
| create | (config: TrovecConfig) => Promise<Trovec> | Create a new instance (auto-loads from storage) |
| db.flush() | () => Promise<void> | Persist all data to storage immediately |
| db.close() | () => Promise<void> | Flush pending changes and disable auto-flush |
| db.stats() | () => TrovecStats | Get instance statistics |
Collection Operations
| Method | Signature | Description |
|--------|-----------|-------------|
| db.add(entry) | (entry: Entry) => void | Insert or replace an entry |
| db.addMany(entries) | (entries: Entry[]) => void | Atomic batch insert (all-or-nothing) |
| db.delete(id) | (id: EntryId) => boolean | Remove an entry, returns true if it existed |
| db.get(id) | (id: EntryId) => Entry \| undefined | Retrieve an entry by ID |
Query
| Method | Signature | Description |
|--------|-----------|-------------|
| db.query(params) | (params: QueryParams) => QueryResult[] | Similarity search |
QueryParams:
vector: number[]— the query vectortopK?: number— max results to return (default: 10)filter?: (context) => boolean— pre-scoring filter function
Embedder (text-based operations)
| Method | Signature | Description |
|--------|-----------|-------------|
| db.embed(input) | (input: string) => Promise<EmbedResult> | Embed a single string |
| db.embedMany(input) | (input: string[]) => Promise<EmbedResult[]> | Embed multiple strings |
| db.addWithText(entry) | (entry: TextEntry) => Promise<void> | Embed text and add entry |
| db.addManyWithText(entries) | (entries: TextEntry[]) => Promise<void> | Batch embed and add entries |
| db.queryByText(params) | (params: TextQueryParams) => Promise<QueryResult[]> | Embed query text and search |
All embedder methods throw TrovecError if no embedder is configured.
Serialization
| Method | Signature | Description |
|--------|-----------|-------------|
| db.serialize() | () => Buffer | Serialize all entries to a binary buffer |
| db.deserialize(buffer) | (buffer: Buffer) => void | Restore entries from a binary buffer |
Functional API
Every fluent method is also available as a standalone function that takes the instance as the first argument. This is useful for tree-shaking or when you prefer a functional style:
import { create, add, query, close } from '@trovec/core';
const db = await create({ dimensions: 3 });
add(db, { id: 'a', embedding: [1, 2, 3] });
const results = query(db, { vector: [1, 2, 3], topK: 1 });
await close(db);Trovec objects are fully compatible with functional functions — you can mix and match both styles.
Configuration
interface TrovecConfig {
dimensions: number; // required: vector dimensionality
quantization?: 'F32' | 'INT8' | 'BIT'; // default: 'F32'
metric?: 'cosine' | 'euclidean' | 'dot' | 'hamming'; // default: 'cosine'
storageDriver?: StorageDriver; // default: no-op (in-memory only)
embedder?: Embedder; // default: none (install an adapter)
collectionId?: string; // default: auto-generated ('trovec_1', etc.)
autoFlush?: boolean | number; // default: true when storageDriver is set
}Notes:
- The
hammingmetric requiresBITquantization.autoFlush: true(default with a storage driver) enables debounced auto-persistence with a 500ms delay. Pass anumberfor a custom delay in ms, orfalseto disable (manualflush()only).
Architecture
src/
index.ts Public API barrel export
types.ts All type definitions (including Trovec interface)
errors.ts TrovecError, DimensionMismatchError, InvalidConfigError
validation.ts Config/embedding validation, ID serialization
core.ts create(), flush(), stats()
fluent.ts wrapInstance() — binds methods to create the Trovec object
collection.ts add(), addMany(), delete(), get()
query.ts Brute-force similarity search
embedder.ts Text-based convenience functions (embed, addWithText, queryByText)
serialization.ts Binary format for persistence
quantization/
index.ts Codec dispatcher
f32.ts Float64 passthrough
int8.ts Min-max linear mapping to [-128, 127]
bit.ts Sign-threshold bit packing
similarity/
index.ts Metric dispatcher
cosine.ts dot(a,b) / (||a|| * ||b||)
euclidean.ts 1 / (1 + distance)
dot.ts Raw dot product
hamming.ts Matching bits / total bits
storage/
index.ts StorageDriver re-export
memory.ts In-memory Map-backed driver
file.ts File system driver with Brotli compressionHow It Works
create()validates configuration, resolves the quantization codec and similarity function once, checks the storage driver for existing data (auto-deserializes if found), and returns aTrovecobject — the raw instance enriched with bound methods that delegate to the functional implementations (zero logic duplication).add()/addMany()validates embedding dimensions, quantizes the vector through the codec, and stores the quantized representation in aMap<string, StoredEntry>.addManyvalidates all entries before mutating any state (atomic semantics).query()quantizes the query vector, iterates all entries (brute-force), applies the optional filter, computes similarity scores, sorts descending with deterministic tie-breaking (lower ID first), and returns the top-K results.get()dequantizes the stored vector back tonumber[]before returning, so callers always receive float arrays regardless of the quantization mode.flush()serializes all entries into a binary buffer and writes it through theStorageDriverinterface. When auto-flush is enabled, this is called automatically after a debounce delay following mutations.close()flushes any pending changes, removes thebeforeExitsafety handler, and disables further auto-flush scheduling.
Internal Precision
All math operations use float64 precision internally (Float64Array). The quantization type (F32, INT8, BIT) controls storage compression, not computation precision.
Extensibility
Three extension points are available:
Embedder— text-to-vector conversion (see below)QuantizationCodec— implementencode(embedding) => QuantizedVectoranddecode(quantized) => number[]SimilarityFn— implement(a: QuantizedVector, b: QuantizedVector) => number
Writing an Embedder Adapter
An embedder adapter is any object that implements the Embedder interface:
import type { Embedder, EmbedResult } from '@trovec/core';
export function createMyEmbedder(options: { apiKey: string }): Embedder {
return {
async embed(input: string): Promise<EmbedResult> {
// Call your embedding API/model here
const embedding = await callEmbeddingAPI(input, options.apiKey);
return { embedding };
},
async embedMany(inputs: string[]): Promise<EmbedResult[]> {
// Batch implementation (or loop over embed())
return Promise.all(inputs.map((input) => this.embed(input)));
},
};
}Publish as a separate package (e.g., @trovec/embedder-mymodel) to keep Trovec zero-dependency.
How Persistence Works
When using a storage driver, all data is loaded into memory for querying:
- On
create(), existing data is automatically read from the storage driver and deserialized into an in-memoryMap. Use a stablecollectionIdto ensure the same data is loaded across restarts. - Queries run entirely in-memory via brute-force scan — the storage driver is never touched during search.
- Auto-flush — after each mutation (
add,addMany,delete), a debounced timer schedules aflush(). Multiple rapid mutations are batched into a single write. AbeforeExithandler provides a safety net: if the process exits gracefully without an explicitclose(), pending changes are still persisted. - On
close(), any pending changes are flushed immediately, the debounce timer is cleared, and thebeforeExithandler is removed. Read operations (get,query,stats) continue to work afterclose(). - On
flush(), all entries are serialized and written back to storage. Manualflush()calls are still supported alongside auto-flush.
This design keeps queries fast (sub-millisecond for thousands of entries) but means the full dataset must fit in memory.
Future Improvement Considerations
For larger datasets that exceed available memory, several strategies could be explored:
- Streaming query — read and score entries in chunks directly from the binary buffer, keeping only the top-K results in a min-heap. Memory usage becomes O(K) instead of O(N).
- Partitioned storage — split collections into fixed-size shards (e.g., 10K entries each). Query loads one shard at a time, merging top-K across shards. Memory stays bounded to a single shard.
- Memory-mapped files — use
mmapto map.trovecfiles into virtual address space. The OS pages data in/out on demand, giving near-memory speed for hot data without loading everything. - Approximate Nearest Neighbor (ANN) indexing — replace brute-force with structures like HNSW or IVF that only visit a subset of vectors per query. Index metadata stays in memory while vectors can remain on disk.
- Hot/cold tiering — keep recently accessed entries in an LRU cache, everything else on disk. Queries hit the cache first, fall back to disk for misses.
Development
npm install # install dev dependencies
npm test # run tests (vitest)
npm run test:watch # run tests in watch mode
npm run build # compile to dist/esm + dist/cjs
npm run clean # remove dist/License
MIT
