vcdb

v0.0.2

Published

6 months ago

Minimal, dependency‑free vector database with pluggable ANN strategies (bruteforce, HNSW, IVF), attribute filtering, and a clean persistence model (index/data split with CRUSH placement). Runs in Node.js, browsers (OPFS), Electron, and Tauri.

0High
0Medium
0Low

trkbt10

vcdb (WIP)

Features

ANN strategies: bruteforce, HNSW, IVF
Attribute filtering: callback or expression
Persistence: index/data split, CRUSH placement, manifest + catalog
Storage backends: Node FS, Memory, OPFS, S3
connect() API: open existing or create
CLI: inspect, search, edit, rebuild

Install

npm install vcdb
# or
pnpm add vcdb
# or
yarn add vcdb

CLI (via npx or global install):

npx vcdb          # run once without installing
# or
npm install -g vcdb
vcdb              # launch the interactive CLI

Quick Start (API)

import { connect } from "vcdb";
import { createNodeFileIO } from "vcdb/storage/node";

// Open existing by name ("db"); if missing, create then save
const client = await connect<{ tag?: string }>({
  storage: {
    index: createNodeFileIO("./.vcdb"),
    data: createNodeFileIO("./.vcdb/data"),
  },
  database: { dim: 3, metric: "cosine", strategy: "bruteforce" },
  index: { name: "db", shards: 1, segmented: true },
});

client.set(1, { vector: new Float32Array([1, 0, 0]), meta: { tag: "a" } });
client.set(2, { vector: new Float32Array([0, 1, 0]), meta: { tag: "b" } });

const hits = client.findMany(new Float32Array([1, 0, 0]), { k: 2 });
console.log(hits);

// Persist snapshot
await client.index.saveState(client.state, { baseName: "db" });

Configuration

The project loads a top‑level config named vectordb.config[mjs/mts/ts/cjs/js] from the current directory (or a given base path).

Patterns: vectordb.config[.mjs/.mts/.ts/.cjs/.js]
Locations: ./vectordb.config.* or ./<dir>/vectordb.config.*

Storage Options

The storage field accepts FileIOs, URI strings, or a mix. Built‑in registries support file: and mem: schemes.

index: string | FileIO
data: string | Record<string,string> | FileIO | (ns: string) => FileIO

Examples:

// URI-based, portable config
export default defineConfig({
  name: "db",
  storage: {
    index: ".vectordb/index",     // resolved as file:.vectordb/index
    data: ".vectordb/data/{ns}",  // {ns} expands to top-level name
  },
  database: { dim: 3, metric: "cosine", strategy: "bruteforce" },
  index: { segmented: true },
});

// Mixed: explicit FileIO for index, template for data
export default defineConfig({
  name: "db",
  storage: {
    index: createNodeFileIO(".vectordb/index"),
    data: "mem:{ns}",
  },
  database: { dim: 2 },
});

// Fully explicit FileIOs (including a function for data)
export default defineConfig({
  name: "db",
  storage: {
    index: createMemoryFileIO(),
    data: (ns) => createMemoryFileIO(),
  },
  database: { dim: 2 },
});

Notes:

When using TypeScript configs directly, a TS loader may be required in some environments. Alternatively, use .mjs/.js.

Storage Adapters

Import per environment:

Node.js: import { createNodeFileIO } from "vcdb/storage/node"
Memory: import { createMemoryFileIO } from "vcdb/storage/memory"
OPFS (browser): import { saveToOPFS, loadFromOPFS } from "vcdb/storage/opfs"
S3: implement a FileIO using the AWS SDK (see example below)

All adapters implement the same FileIO interface:

import type { FileIO } from "vcdb/storage/types";

Example: CRUSH + S3 (AWS SDK)

This example shows how to split data segments across S3 using CRUSH (shards/replicas):

import { connect } from "vcdb";
import type { FileIO } from "vcdb/storage/types";
import { S3Client, PutObjectCommand, GetObjectCommand, DeleteObjectCommand } from "@aws-sdk/client-s3";

const s3 = new S3Client({ region: process.env.AWS_REGION });

// Minimal FileIO backed by AWS SDK
function s3FileIOFor(prefix: string): FileIO {
  function parse(key: string) {
    // Supports either raw keys or s3://bucket/prefix form
    if (prefix.startsWith("s3://")) {
      const u = new URL(prefix);
      return { Bucket: u.hostname, Key: `${u.pathname.replace(/^\//, "")}${key}` };
    }
    return { Bucket: process.env.S3_BUCKET!, Key: `${prefix}${key}` };
  }
  return {
    async read(key: string) {
      const { Bucket, Key } = parse(key);
      const res = await s3.send(new GetObjectCommand({ Bucket, Key }));
      const buf = await res.Body!.transformToByteArray();
      return new Uint8Array(buf);
    },
    async write(key: string, data: Uint8Array | ArrayBuffer) {
      const { Bucket, Key } = parse(key);
      const bytes = data instanceof Uint8Array ? data : new Uint8Array(data);
      await s3.send(new PutObjectCommand({ Bucket, Key, Body: bytes }));
    },
    async append(key: string, data: Uint8Array | ArrayBuffer) {
      // Emulate append with read+concat+write
      const prev = await this.read(key).catch(() => new Uint8Array());
      const next = data instanceof Uint8Array ? data : new Uint8Array(data);
      const merged = new Uint8Array(prev.length + next.length);
      merged.set(prev, 0);
      merged.set(next, prev.length);
      await this.write(key, merged);
    },
    async atomicWrite(key: string, data: Uint8Array | ArrayBuffer) {
      await this.write(key, data);
    },
    async del(key: string) {
      const { Bucket, Key } = parse(key);
      await s3.send(new DeleteObjectCommand({ Bucket, Key }));
    },
  };
}

// Map CRUSH targetKey → S3 prefix. For example, across 3 buckets/prefixes.
const DATA_PREFIXES: Record<string, string> = {
  "0": "s3://bucket-a/data/",
  "1": "s3://bucket-b/data/",
  "2": "s3://bucket-c/data/",
};

const client = await connect<{ tag?: string }>({
  storage: {
    // Index artifacts (catalog/manifests/index file)
    index: s3FileIOFor("s3://bucket-index/index/"),
    // Data segments: CRUSH assigns a targetKey; return a FileIO for that location
    data: (targetKey: string) => s3FileIOFor(DATA_PREFIXES[targetKey] ?? DATA_PREFIXES["0"]),
  },
  database: { dim: 128, metric: "cosine", strategy: "hnsw", hnsw: { M: 16, efSearch: 50 } },
  index: {
    name: "products",
    shards: 3, // number of CRUSH targets
    replicas: 2, // write replicas per segment
    pgs: 64, // placement groups (higher → smoother distribution)
    segmented: true, // write segment files
    includeAnn: true, // persist ANN when saving
  },
});

// Upsert data as usual; CRUSH determines where segments go on save
client.upsert(
  { id: 1, vector: new Float32Array(128), meta: { tag: "a" } },
  { id: 2, vector: new Float32Array(128), meta: { tag: "b" } },
);

// Save snapshot (writes index + data segments to S3 via the API)
await client.index.saveState(client.state, { baseName: "products" });

Notes:

CRUSH uses shards, replicas, and pgs to place segments across storage.data(targetKey) destinations.
Index files typically live in a single place (one FileIO) while data segments fan out.
For read paths, index.openState({ baseName }) resolves segment locations via manifest, falling back to CRUSH if needed.

CLI

The vcdb CLI provides two main modes:

Interactive UI (default): Launch the TUI for database exploration and management
HTTP Server: Start a REST server for programmatic access

Usage:

vcdb [command] [options]

Commands:
  serve                 Start HTTP server (see HTTP Server section for details)

Options:
  --config, -c <path>   Path to config file (vectordb.config.*)
  --port, -p <number>   Override server port
  --host, -H <host>     Override server host
  --help, -h            Show help

Examples:

# Launch interactive UI
vcdb

# Start HTTP server (see HTTP Server section for configuration details)
vcdb serve

HTTP Server (Hono)

REST server using hono + @hono/node-server, started via the vcdb serve CLI command and configured with vectordb.config.* (executable JS/TS).

Starting the Server

# Start using config in current directory
vcdb serve

# Specify config file explicitly
vcdb serve --config ./vectordb.config.mjs

# Override port/host from command line
vcdb serve -p 8787 -H 0.0.0.0

Note: The serve command requires a valid executable config. CLI flags --port/-p and --host/-H take precedence over config values.

Config: server options

Author an executable config with a server block (mjs, mts, ts, js, cjs supported; JSON is not supported):

// vectordb.config.* (mjs, mts, ts, js, cjs)
import { defineConfig } from "vcdb/config";

export default defineConfig({
  name: "db",
  storage: {
    index: "file:.vectordb/index", // index artifacts
    data: "file:.vectordb/data", // data segments (can be URI or key→URI map)
  },
  database: { dim: 3, metric: "cosine", strategy: "bruteforce" },
  index: { name: "db", segmented: true },
  server: {
    host: "0.0.0.0",
    port: 8787,
    cors: true,
    // Enable time-based result consistency (bounded-staleness read via HEAD); default: true
    resultConsistency: true,
    embeddings: {
      provider: "openai",
      model: "text-embedding-3-small",
      openAICompatRoute: true,
    },
  },
});

Result consistency (bounded staleness)

server.resultConsistency (default true)
- true: readers prefer .head.json when its commitTs is readable at clock.now() - epsilonMs.
- false: readers ignore .head.json and open the default manifest ${name}.manifest.json directly.
Related knobs (optional): server.clock, server.epsilonMs.

Invalid configs print errors to stderr. Check server startup logs when troubleshooting.

Storage URIs are scheme-based:

Built‑in: file: (Node FS), mem: (in‑memory)
Others (e.g., s3, gs, r2, dynamodb) can be provided by the server when starting (driver registry).
storage.data supports either a single URI or a key→URI map per CRUSH target. Templates with {ns} are supported, for example: "data": "s3://bucket/prefix/{ns}".

WAL: The server binds WAL to the index storage as <name>.wal (no separate config required).

Notes:

server.cors: true to allow all, or an object matching Hono's CORS options.
server.embeddings.provider: "openai" exposes POST /embeddings and /v1/embeddings (OpenAI-compatible). API key is read from OPENAI_API_KEY or server.embeddings.apiKey.
Config formats supported: mjs, mts, ts, js, cjs (JSON is not supported).

REST endpoints

GET /health → { ok: true }
GET /stats → { size, dim, metric, strategy }
GET /config → server config (secrets redacted)
GET /vectors/:id → { id, vector, meta }
DELETE /vectors/:id → { ok }
POST /vectors → Insert-only. Single { id, vector:number[], meta? } or bulk { rows:[{ id, vector, meta? }] }
PUT /vectors → Bulk upsert. Body { rows:[{ id, vector, meta? }] }
PUT /vectors/:id → Single upsert. Body { vector:number[], meta? }
POST /vectors/search → Body { vector:number[], k?, expr? } returns { hits }
POST /vectors/find → Body { vector:number[], expr? } returns { hit }
POST /save → Persist current state via index ops (single-writer enforced)
Embeddings (if enabled):
- POST /embeddings or /v1/embeddings → forwards to OpenAI embeddings with configured model/key

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

vcdb (WIP)

Features

Install

Quick Start (API)

Configuration

Storage Options

Storage Adapters

Example: CRUSH + S3 (AWS SDK)

CLI

HTTP Server (Hono)

Starting the Server

Config: server options

Result consistency (bounded staleness)

REST endpoints