@notelab/notelab-vector

v0.1.0

Published

a month ago

A file-based local vector database with ANN indexing for Electron apps

0High
0Medium
0Low

sreeragh

vector database vector search ANN HNSW embeddings local pinecone

notelab-vector

A file-based local vector database with ANN (Approximate Nearest Neighbor) indexing for Electron apps. Supports OpenAI-compatible embeddings, local transformers, and Ollama models.

notelab-vector has two indexing engines:

TypeScript HNSW (LocalIndex) — Pure TypeScript, portable, no compilation
Native C++ HNSW (NativeIndex) — SIMD-accelerated via C++/N-API addon for maximum performance

Use useNative: true in LocalDocumentIndex to prefer native; it automatically falls back to TypeScript if the addon is not available.

Quick Usage

import { LocalDocumentIndex, OllamaEmbeddings } from 'notelab-vector';

const index = new LocalDocumentIndex({
  folderPath: './my-index',
  embeddings: new OllamaEmbeddings({ model: 'bge-m3' }),
  useNative: true, // optional: native-first, falls back to TypeScript if addon is unavailable
});

await index.createIndex({ dimensions: 1024 });
console.log(index.engine); // 'native' | 'typescript'

await index.upsertDocument('doc://1', 'Quantum mechanics describes matter and energy at atomic scales.', 'text/plain');
await index.upsertDocument('doc://2', 'A symphony orchestra performs classical compositions with strings, woodwinds, brass, and percussion.', 'text/plain');

const results = await index.queryDocuments('physics and particles', { maxDocuments: 3 });
console.log(results[0].document?.content); // "Quantum mechanics describes matter..."

Architecture

notelab-vector/
├── src/                          # TypeScript source
│   ├── LocalIndex.ts             # Pure-TS HNSW index (LocalIndex class)
│   ├── NativeIndex.ts            # Wrapper bridging to native/HNSWIndex.cc
│   ├── LocalDocumentIndex.ts     # Document-level API (chunks, metadata, text fields)
│   ├── LocalDocument.ts          # Document with URI, content, mimeType
│   ├── LocalDocumentResult.ts    # Query result with score
│   ├── OllamaEmbeddings.ts       # Ollama BGE-M3 / compatible API embeddings
│   ├── OpenAIEmbeddings.ts       # OpenAI text-embedding-3 / ada-002
│   ├── LocalEmbeddings.ts        # Local transformers.js embeddings
│   ├── TextSplitter.ts           # Text chunking (recursive, token-aware)
│   ├── ItemSelector.ts           # Metadata-filtered result selection
│   ├── codecs/                   # Persistence codecs (JSON, Protobuf)
│   │   ├── JsonCodec.ts
│   │   └── ProtobufCodec.ts
│   ├── storage/                  # Storage backends
│   │   ├── LocalFileStorage.ts   # File I/O on the local filesystem
│   │   └── VirtualFileStorage.ts # In-memory overlay (testing/browser)
│   └── types/index.ts            # Shared TypeScript interfaces
│
└── native/                       # C++ N-API addon (optional, for maximum speed)
    ├── hnswlib/
    │   ├── vec_simd.h           # SIMD vector math (Arm NEON, AVX2/AVX-512)
    │   ├── hnsw_index.h          # HNSW graph algorithm in C++
    │   ├── vec_simd.cpp
    │   └── hnsw_index.cpp
    ├── hnsw_index_wrapper.cc     # N-API C++ wrapper (Node.js ↔ C++)
    ├── binding.gyp               # node-gyp build configuration
    └── package.json

Core Classes

| Class | Description | |---|---| | LocalDocumentIndex | High-level document API — stores chunked documents with metadata, runs queries | | LocalIndex | Pure TypeScript HNSW ANN index — no native dependencies | | NativeIndex | Wrapper around the C++ N-API addon — transparent fallback to LocalIndex if unavailable | | OllamaEmbeddings | Ollama API embeddings (BGE-M3 recommended for best quality) | | OpenAIEmbeddings | OpenAI API compatible embeddings | | LocalEmbeddings | Local inference via @huggingface/transformers (no API needed) |

Persistence Format

Documents and vectors are persisted as typed binary files in the folderPath:

<folderPath>/
├── index/                    # HNSW graph data
│   ├── header.json           # Index metadata (dimensions, metric, HNSW params)
│   ├── nodes/                # One file per node: <id>.json | <id>.protobuf
│   └── lock                  # Write lock during batch updates
├── metadata.json             # Document registry (URI → file mappings)
└── documents/               # Original document content
    └── <safe-uri>            # One file per document

Codecs

Codecs serialize/deserialize vector and graph data:

JsonCodec — Human-readable JSON, portable across environments. Slower for large indices.
ProtobufCodec — Compact binary Protobuf. ~10x smaller, ~2x faster serialization. Default for large indices.

Storage Backends

LocalFileStorage — Reads/writes files to the local filesystem. Production use.
VirtualFileStorage — In-memory Map store. For testing, browser, or ephemeral workloads.

HNSW Algorithm

notelab-vector implements HNSW (Hierarchical Navigable Small World) — a graph-based ANN algorithm that provides excellent query speed vs accuracy trade-offs.

Index Build

For each inserted vector:

Random level — Node gets a level L with probability P(level) = 0.5^level (max maxLevel)
Greedy descent — Starting from the entry point, descend through upper graph levels to find the best insertion point for each level ≤ L
Layer search — At each level, perform beam search (BFS + scoring) to collect efConstruction candidates
Neighbor selection — From candidates, select m best neighbors using distance-based pruning (no two selected neighbors can be closer to each other than to the new node)
Bidirectional edges — Add edges in both directions (new node → neighbor, neighbor → new node)

Query

Greedy descent — From entry point, descend through upper levels greedily picking the best-scoring neighbor at each level
Base-level search — At level 0, perform beam search (BFS expanding from best candidates) to collect candidates
Scoring — Score all candidates against the query vector, sort descending, return top maxResults

Parameters

| Parameter | Default | Effect | |---|---|---| | m | 16 | Max neighbors per node per level. Higher = better accuracy, slower build, more memory | | efConstruction | 200 | Candidates considered during insertion. Higher = better accuracy, slower build | | efSearch | 64 | Candidates considered during search. Higher = better accuracy, slower query | | maxLevel | 10 | Upper bound on random level. Higher = deeper hierarchy, better long-range connectivity | | dimensions | 1536 | Embedding vector dimension count |

Supported Metrics

| Metric | Description | Use case | |---|---|---| | cosine (default) | Cosine similarity = 1 - cosine_distance. Vectors are L2-normalized before indexing | General text similarity | | dotproduct | Raw dot product. Higher = more similar | Fine-tuned embeddings, recommendation | | euclidean | Negative squared L2 distance. Higher = closer | Image retrieval, clustering |

Native C++ Addon (`NativeIndex`)

For maximum performance, notelab-vector includes an optional C++ N-API addon compiled with SIMD vectorization.

SIMD Vector Math (`vec_simd.h`)

| Function | SIMD Paths | Fallback | |---|---|---| | cosine_simd | Arm NEON (Apple Silicon M1-M3), AVX2, AVX-512 | Scalar | | l2_sq_simd | Arm NEON, AVX2, AVX-512 | Scalar | | dot_simd | Arm NEON, AVX2, AVX-512 | Scalar | | normalize_inplace | Arm NEON, AVX2 | Scalar |

On Apple Silicon (M1-M3), Arm NEON SIMD processes 4 single-precision floats per cycle per core, giving a ~4x throughput improvement over scalar code for the dominant vector operations.

Build

The addon is built with node-gyp:

cd native
npm install
node-gyp configure
node-gyp build

The resulting .node binary is loaded at runtime by NativeIndex.ts. If the addon is absent or fails to load, NativeIndex transparently falls back to LocalIndex.

To select engine in LocalDocumentIndex, set:

const index = new LocalDocumentIndex({
  folderPath: './my-index',
  embeddings,
  useNative: true, // native-first with TS fallback
});

Build requires:

macOS: Xcode Command Line Tools + node-gyp
Linux: clang++ / g++ with target flags (-march=native), python3, node-gyp

Performance Results

Benchmark Setup

Hardware: Apple MacBook Pro M3 (12-core CPU, 24 GB RAM)
Embeddings: Ollama BGE-M3 (1024 dimensions, fp16), served locally at localhost:11434
Dataset: 100 documents × 10 topics (10 variations each)
Queries: 10 domain-specific queries, 200 iterations each
Comparison: notelab-vector Native C++ HNSW vs vectra (reference local vector DB)

Results

============================================================
Native C++ HNSW (SIMD + BFS)
============================================================

Build:        8.5 ms total   (0.08 ms/doc)
Query avg:    0.06 ms        (17,423 QPS)
Query p50:    0.06 ms
Query p95:    0.09 ms
Query p99:    0.13 ms

============================================================
Vectra (reference)
============================================================

Build:        10,734 ms total (107.34 ms/doc)
Query avg:    89.03 ms        (11 QPS)
Query p50:    86.31 ms
Query p95:    104.87 ms
Query p99:    125.42 ms

Speed Comparison

| Metric | notelab-vector (native) | Vectra | Speedup | |---|---|---|---| | Build / doc | 0.08 ms | 107.34 ms | 1,342x faster | | Query avg | 0.06 ms | 89.03 ms | 1,484x faster | | Query p95 | 0.09 ms | 104.87 ms | 1,165x faster | | Query p99 | 0.13 ms | 125.42 ms | 965x faster | | QPS | 17,423 | 11 | 1,584x higher |

The Native C++ HNSW achieves ~1,500x faster queries than Vectra. The majority of this speedup comes from:

SIMD vectorization — Arm NEON cosine similarity across 1024-dim vectors in a tight loop
Memory layout — Contiguous std::vector<float> in C++ vs JavaScript Float32Array with V8 overhead
Algorithmic efficiency — BFS layer search with unordered expansion + final sort (cache-friendly) vs complex priority queue management

Query Score Quality

Top-5 results for "quantum mechanics and particle physics wave duality" using notelab-vector (BGE-M3 embeddings):

doc://80  score: 0.7506  ← quantum physics variant
doc://73  score: 0.4431  ← quantum physics variant
doc://83  score: 0.4265  ← quantum physics variant
doc://97  score: 0.4074  ← quantum physics variant
doc://82  score: 0.4064  ← quantum physics variant

Results are semantically coherent — all top matches are quantum physics documents, correctly ranked above unrelated topics (music, art, etc.).

Metadata Filtering

notelab-vector supports MongoDB-style metadata filters on queries:

const results = await index.queryDocuments('machine learning', {
  maxDocuments: 5,
  filter: {
    category: 'science',
    year: { $gte: 2020 },
    tags: { $in: ['AI', 'ML'] },
    $not: { status: 'archived' },
  },
});

Supported operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $exists, $not

Text Chunking

Documents are automatically chunked before indexing:

// Default: recursive character split + token-aware merge
const chunks = splitText(document.content, {
  chunkSize: 256,       // Target tokens per chunk
  chunkOverlap: 64,      // Token overlap between chunks
  separators: ['\n\n', '\n', '. ', ' '],
});

// BM25-based split (requires wink-nlp)
const chunks = splitTextByTokens(document.content, {
  countFor: 'word',      // or 'token' (uses tiktoken)
  chunkLength: 256,
  overlapLength: 64,
});

Installation

npm install @notelab/notelab-vector

Peer dependencies (optional but recommended):

npm install @xenova/transformers   # Local embeddings (transformers.js)

Native addon (optional, for maximum performance):

cd node_modules/notelab-vector/native
npm install
node-gyp build

Engine Test Script (TS + Native + Ollama diagnostics)

Run the built-in engine smoke test:

npm run test:engines

What it does:

runs TypeScript engine with deterministic embeddings
runs native engine with deterministic embeddings
prints performance metrics (create, ingest, avg query latency, QPS, and TS vs native query speedup)
runs load test (default 1000 queries per engine) and prints p50/p95/p99 latency + QPS
runs TypeScript and native again using Ollama embeddings
prints explicit Ollama errors if Ollama is not running or the model is missing

Note: deterministic relevance assertions are strict for the TypeScript engine. Native results may differ between runs because HNSW graph construction is probabilistic; the script still reports native performance and warns on relevance drift.

To change load size:

LOAD_TEST_QUERIES=5000 npm run test:engines

Optional environment overrides:

OLLAMA_MODEL=bge-m3 OLLAMA_BASE_URL=http://localhost:11434/api/embeddings npm run test:engines

API Reference

`LocalDocumentIndex`

const index = new LocalDocumentIndex({
  folderPath: string,             // Directory for index files
  embeddings: EmbeddingsModel,    // Ollama | OpenAI | Local
  codec?: IndexCodec,             // Default: JsonCodec
  storage?: FileStorage,         // Default: LocalFileStorage
  useNative?: boolean,            // Native-first, falls back to TypeScript if addon unavailable
  m?: number;                     // Default: 16
  efConstruction?: number;       // Default: 200
  efSearch?: number;             // Default: 64
  allowReplace?: boolean;        // Default: true (upsert semantics)
  metric?: 'cosine' | 'dotproduct' | 'euclidean';
});

Methods:

| Method | Description | |---|---| | createIndex(config) | Initialize the index | | engine | Active engine at runtime: 'native' or 'typescript' | | loadIndex() | Load an existing index from folderPath | | deleteIndex() | Delete all index data | | upsertDocument(uri, content, mimeType) | Add or update a document | | upsertDocuments(docs[]) | Batch add/update documents | | deleteDocument(uri) | Remove a document | | queryDocuments(query, options?) | ANN search with optional metadata filter | | getDocument(uri) | Retrieve a document by URI | | listDocuments() | List all document URIs |

`LocalIndex`

Pure TypeScript HNSW index (no native dependency):

const index = new LocalIndex({
  folderPath: string,
  dimensions: number,
  metric?: 'cosine' | 'dotproduct' | 'euclidean',
  m?: number,
  efConstruction?: number,
  efSearch?: number,
});

`EmbeddingsModel`

// Ollama (recommended)
const embeddings = new OllamaEmbeddings({
  baseUrl?: 'http://localhost:11434',  // Default
  model?: 'bge-m3',                    // Default
  batchSize?: 32,                      // Batch size for bulk embedding
});

// OpenAI
const embeddings = new OpenAIEmbeddings({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'text-embedding-3-small',     // or 'ada-002'
});

// Local (transformers.js)
const embeddings = new LocalEmbeddings({
  model: 'Xenova/bge-small-en-v1.5',   // or any HF sentence-transformers model
  device: 'webgpu',                    // 'cpu', 'wasm', 'webgpu'
});

Open Source

notelab-vector is MIT licensed and community contributions are welcome.

Contribution process: see CONTRIBUTING.md
Code of conduct: see CODE_OF_CONDUCT.md
Security reporting: see SECURITY.md
License: see LICENSE

If you are opening a pull request, use the provided PR template and include test evidence (build, lint, and test output summary).