vectlite

v0.1.11

Published

a month ago

Embedded vector store for local-first AI applications.

0High
0Medium
0Low

mcsedition-hub

vectlite vector-database vector-search embeddings hnsw rag napi-rs

vectlite

Embedded vector store for local-first AI applications.

vectlite is a single-file, zero-dependency vector database written in Rust with Node.js bindings. It gives you dense + sparse hybrid search, HNSW indexing, metadata filtering, transactions, and crash-safe persistence in a single .vdb file -- no server, no Docker, no network calls.

Installation

npm install vectlite

Requires Node.js 18+. Pre-built binaries are available for macOS (x86_64, arm64), Linux (x86_64), and Windows (x86_64). Other platforms fall back to compiling from source (requires Rust/Cargo).

Quick Start

const vectlite = require('vectlite')

// Create or open a database
const db = vectlite.open('knowledge.vdb', { dimension: 384 })

// Insert records with vectors, metadata, and sparse terms
db.upsert('doc1', embedding, { source: 'blog', title: 'Auth Guide' })
db.upsert('doc2', embedding2, { source: 'notes', title: 'Billing' })

// Search with filters
const results = db.search(embeddingQuery, { k: 5, filter: { source: 'blog' } })

// Query-free inspection
console.log(db.count({ filter: { source: 'blog' } }))

// Clean up
db.close()

Features

Core

Single-file storage -- one .vdb file per database, portable and easy to back up
Dense vectors -- cosine similarity with automatic HNSW indexing for large collections
Sparse vectors -- BM25-scored inverted index for keyword retrieval
Hybrid search -- dense + sparse fusion with linear or RRF strategies
Rich metadata -- string, number, boolean, null, array, and nested object values
Crash-safe WAL -- writes land in a write-ahead log first, then checkpoint with compact()
Transactions -- atomic batched writes with db.transaction()
File locking -- advisory locks prevent corruption from concurrent access

Search & Retrieval

Metadata filters -- MongoDB-style operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $contains, $exists, $and, $or, $not
Nested filters -- dot-path traversal (author.name), $elemMatch, $size on arrays and objects
Named vectors -- multiple vector spaces per record (vectors: { title: [...], body: [...] })
Multi-vector queries -- weighted search across vector spaces in a single call
MMR diversification -- mmrLambda controls relevance vs. diversity trade-off
Namespaces -- logical isolation with per-namespace or cross-namespace search
Observability -- searchWithStats() returns timings, BM25 term scores, ANN stats, and per-result explain payloads

Data Management

Physical collections -- vectlite.openStore() manages a directory of independent databases
Bulk ingestion -- bulkIngest() with deferred index rebuilds for fast imports
Listing & filtered counts -- list() and count({ namespace, filter }) without a vector query
Delete by filter -- deleteByFilter() for bulk deletion by metadata filter
Snapshots -- db.snapshot(path) creates a self-contained copy
Backup / Restore -- db.backup(dir) and vectlite.restore(dir, path) for full roundtrips
Read-only mode -- vectlite.open(path, { readOnly: true }) for safe concurrent readers
Explicit close -- db.close() to release locks deterministically
Lock timeouts -- lockTimeout for bounded lock acquisition waits

Usage

Hybrid Search

const vectlite = require('vectlite')

const db = vectlite.open('knowledge.vdb', { dimension: 384 })

// Upsert with dense + sparse vectors
db.upsert(
  'doc1',
  denseEmbedding,
  { source: 'docs', title: 'Auth Setup', text: 'How to configure SSO...' },
  { sparse: vectlite.sparseTerms('How to configure SSO authentication') },
)

// Hybrid search
const results = db.search(queryEmbedding, {
  k: 10,
  sparse: vectlite.sparseTerms('SSO authentication'),
  fusion: 'rrf',
  filter: { source: 'docs' },
  explain: true,
})

for (const result of results) {
  console.log(result.id, result.score)
}

Collections

const store = vectlite.openStore('./my_collections')
const products = store.createCollection('products', 384)
products.upsert('p1', embedding, { name: 'Widget', price: 9.99 })

const logs = store.openOrCreateCollection('logs', 128)
console.log(store.collections()) // ["logs", "products"]

Transactions

const tx = db.transaction()
try {
  tx.upsert('doc1', emb1, { source: 'a' })
  tx.upsert('doc2', emb2, { source: 'b' })
  tx.delete('old_doc')
  tx.commit() // All operations commit atomically
} catch (err) {
  tx.rollback() // Roll back on error
  throw err
}

Text Helpers

async function run() {
  // embedFn can be sync or async
  await vectlite.upsertText(db, 'doc1', 'Auth setup guide', embedFn, { source: 'docs' })
  const results = await vectlite.searchText(db, 'how to authenticate', embedFn, { k: 5 })
}

Snapshots & Backup

db.snapshot('/backups/knowledge_2024.vdb') // Self-contained copy
db.backup('/backups/full/')                // Full backup with ANN sidecars

const restored = vectlite.restore('/backups/full/', 'restored.vdb')

Read-Only Mode

const ro = vectlite.open('knowledge.vdb', { readOnly: true, lockTimeout: 5 })
const results = ro.search(query, { k: 5 }) // Reads work
ro.upsert(...)                              // Throws VectLiteError

Listing, Counting, and Lifecycle

const db = vectlite.open('knowledge.vdb', { dimension: 384, lockTimeout: 5 })

const records = db.list({ namespace: 'docs', filter: { stale: false }, limit: 20 })
const count = db.count({ namespace: 'docs', filter: { source: 'blog' } })
const deleted = db.deleteByFilter({ stale: true }, { namespace: 'docs' })

db.close()

Search Diagnostics

const outcome = db.searchWithStats(query, {
  k: 5,
  sparse: terms,
  explain: true,
})

console.log(outcome.stats.timings)      // { dense_us: 120, sparse_us: 45, ... }
console.log(outcome.stats.used_ann)     // true
console.log(outcome.results[0].explain) // Detailed scoring breakdown

Database Methods Reference

Write Methods

| Method | Description | |---|---| | db.upsert(id, vector, metadata, options) | Insert or update a single record | | db.insert(id, vector, metadata, options) | Insert a record (throws on duplicate id) | | db.upsertMany(records, { namespace }) | Upsert a batch of records | | db.insertMany(records, { namespace }) | Insert a batch | | db.bulkIngest(records, { namespace, batchSize }) | Fastest bulk import with batched WAL writes | | db.delete(id, { namespace }) | Delete a single record | | db.deleteMany(ids, { namespace }) | Delete multiple records by id | | db.deleteByFilter(filter, { namespace }) | Delete all records matching a filter |

Read Methods

| Method | Description | |---|---| | db.get(id, { namespace }) | Get a single record by id | | db.search(query, options) | Search and return a list of results | | db.searchWithStats(query, options) | Search with detailed performance stats | | db.count({ namespace, filter }) | Count records, optionally scoped by namespace/filter | | db.list({ namespace, filter, limit, offset }) | List records without issuing a vector query | | db.namespaces() | List all namespaces | | db.dimension | Vector dimension (property) | | db.path | Database file path (property) | | db.readOnly | Whether the database is read-only (property) |

Maintenance Methods

| Method | Description | |---|---| | db.compact() | Fold WAL into snapshot and persist ANN indexes | | db.flush() | Alias for compact() | | db.snapshot(dest) | Create a self-contained .vdb copy | | db.backup(destDir) | Full backup including ANN sidecar files | | db.transaction() | Begin an atomic transaction | | db.close() | Flush pending state, release the file lock, and invalidate the handle |

Filter Operators

| Operator | Example | Description | |---|---|---| | $eq | { field: { $eq: 'value' } } | Equal (also { field: 'value' }) | | $ne | { field: { $ne: 'value' } } | Not equal | | $gt / $gte | { field: { $gt: 5 } } | Greater than (or equal) | | $lt / $lte | { field: { $lt: 20 } } | Less than (or equal) | | $in / $nin | { field: { $in: ['a', 'b'] } } | In / not in set | | $contains | { field: { $contains: 'auth' } } | Substring match | | $exists | { field: { $exists: true } } | Field presence | | $and / $or | { $and: [{...}, {...}] } | Logical combinators | | $not | { $not: {...} } | Logical negation | | $elemMatch | { tags: { $elemMatch: { $eq: 'rust' } } } | Match array elements | | $size | { tags: { $size: 3 } } | Array length | | dot-path | { 'author.name': 'Alice' } | Nested field access |

How It Works

Records are stored in a compact binary .vdb snapshot file
Writes go through a crash-safe WAL (.wal) before being applied in memory
compact() folds the WAL into the snapshot and persists HNSW sidecar files
Dense search uses HNSW indexes (auto-built for collections above ~128 records)
Sparse search uses an inverted index with BM25 scoring
Hybrid fusion combines dense + sparse via linear combination or reciprocal rank fusion
Advisory file locks (flock) prevent concurrent write corruption

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

vectlite

Installation

Quick Start

Features

Core

Search & Retrieval

Data Management

Usage

Hybrid Search

Collections

Transactions

Text Helpers

Snapshots & Backup

Read-Only Mode

Listing, Counting, and Lifecycle

Search Diagnostics

Database Methods Reference

Write Methods

Read Methods

Maintenance Methods

Filter Operators

How It Works

Links

License