@recvector/adapters

v0.1.4

Published

a month ago

RecVector adapters - Chroma, OpenAI, Gemini, and Knex for @recvector/sdk

Downloads

0High
0Medium
0Low

prospercoded

recommendation vector embeddings chroma chromadb openai gemini knex adapter recvector

@recvector/adapters

Concrete adapter implementations for @recvector/sdk. Provides the SQL storage adapter (Knex.js), vector DB client (Chroma), and all supported embedding model providers (OpenAI, Gemini, HuggingFace).

These adapters implement the pluggable interfaces defined in @recvector/sdk — the core engine never imports from this package directly, keeping the SDK decoupled from any specific infrastructure.

Installation

pnpm add @recvector/adapters @recvector/sdk

Install the dependencies for the adapters you plan to use:

# Database driver (pick one)
pnpm add pg             # PostgreSQL
pnpm add mysql2         # MySQL
pnpm add better-sqlite3 # SQLite

# Embedding provider (pick one)
pnpm add openai         # OpenAI
pnpm add @google/genai  # Gemini

# Vector DB
# Chroma is the only supported vector DB in v1
# The chromadb client is included as a peer dependency

Quick start

In almost all cases you do not instantiate adapters directly — createRecEngine() reads recvector.config.ts and constructs them automatically. You only need to import from this package when:

Passing a pre-built adapter to createRecEngine() to override the default
Writing tests with mock or in-memory adapters
Using an adapter standalone outside of RecVector

import { createRecEngine } from '@recvector/sdk'

// Default: createRecEngine reads recvector.config.ts and builds all adapters
const rec = await createRecEngine()

`KnexStorageAdapter`

Implements the StorageAdapter interface using Knex.js. Supports PostgreSQL, MySQL, and SQLite.

Responsibilities

Auto-creates the two SDK-managed tables (rec_user_profiles, rec_entity_stats) on initialize()
Reads user interaction history from your existing interaction tables (defined in rec_schema.json)
Reads entity features from your existing entity table, applying column and join feature mappings
Reads and writes user profile embeddings and entity popularity stats

Constructor

import knex from 'knex'
import { KnexStorageAdapter } from '@recvector/adapters'
import type { RecVectorSchema } from '@recvector/sdk'

const db = knex({
  client: 'pg',
  connection: process.env.DATABASE_URL,
})

const schema: RecVectorSchema = { /* your schema */ }

const storage = new KnexStorageAdapter(db, schema)

// Creates rec_user_profiles and rec_entity_stats if they don't exist
await storage.initialize()

Feature mapping

KnexStorageAdapter reads entity features by applying the features array from rec_schema.json. Two source modes are supported:

"column" source — reads a column directly from the entity table:

{
  "name": "category",
  "type": "categorical",
  "source": { "type": "column", "column": "category" }
}

"join" source — joins through a linking table to collect multi-value features:

{
  "name": "tags",
  "type": "multi_categorical",
  "source": {
    "type": "join",
    "join_table": "product_tags",
    "join_fk": "product_id",
    "value_column": "tag_name"
  }
}

Join features are collected into an array (e.g. ["audio", "wireless"]) and serialised as comma-separated text during embedding.

SDK-managed tables

initialize() creates these two tables if they don't already exist. They are safe to add to an existing production database — no existing tables are touched.

rec_user_profiles

| Column | Type | Description | |--------|------|-------------| | user_id | TEXT PRIMARY KEY | User identifier | | embedding | TEXT | JSON-serialised profile vector (number[]) | | last_updated | DATETIME | Timestamp of last profile recomputation | | version | TEXT | Schema version at update time | | interaction_count_since_update | INTEGER | Batch threshold counter | | accumulated_weight | REAL | Total weight accumulated (incremental strategy only) |

rec_entity_stats

| Column | Type | Description | |--------|------|-------------| | entity_id | TEXT PRIMARY KEY | Entity identifier | | feedback_counts | TEXT | JSON object { interactionType: count } | | version | TEXT | Schema version |

`StorageAdapter` interface

KnexStorageAdapter implements the full StorageAdapter interface:

| Method | Description | |--------|-------------| | initialize() | Auto-creates SDK tables | | fetchUserInteractions(userId, since?) | Returns all interactions for a user, optionally since a date | | fetchEntityById(entityId) | Fetches a single entity with its resolved features | | fetchEntitiesBatch(entityIds) | Fetches multiple entities efficiently | | fetchEntityStats(entityId) | Reads popularity counts from rec_entity_stats | | upsertUserProfile(profile) | Insert-or-update a user profile row | | fetchUserProfile(userId) | Load a user profile including accumulated weight | | incrementInteractionCounter(userId, type) | Atomically increment the batch counter (transactional) |

`ChromaVectorClient`

Implements the VectorDbClient interface against a Chroma vector database.

Constructor

import { ChromaVectorClient } from '@recvector/adapters'

const vectorDb = new ChromaVectorClient({
  type: 'chroma',
  url: 'http://localhost:8000',
  collection: 'my-app',
  index: {
    metric: 'cosine', // 'cosine' | 'dot' | 'l2'
  },
})

ChromaVectorDbConfig

| Field | Type | Required | Description | |-------|------|----------|-------------| | type | 'chroma' | Yes | Discriminant | | url | string | Yes | Chroma server URL (http://host:port) | | collection | string | Yes | Collection name for entity vectors | | index.metric | 'cosine' \| 'dot' \| 'l2' | No | Distance metric (default: cosine) | | namespace | string | No | Logical namespace — encoded as {collection}__{namespace} in Chroma |

Namespace encoding

Chroma has no native namespace concept. ChromaVectorClient encodes namespaces into the collection name: my-app__tenant-a. This means deleteAll({ namespace: 'tenant-a' }) drops exactly the my-app__tenant-a collection, leaving my-app untouched. Future adapters (Pinecone, Weaviate, Milvus) can map namespace to their native equivalents without any core logic changes.

Collection caching

Collections are lazily created on first use and cached in memory to avoid repeated getOrCreateCollection calls across requests.

Score normalisation

Chroma returns distances, not similarities. ChromaVectorClient normalises to a [0, 1] similarity score:

| Metric | Conversion | |--------|-----------| | cosine | score = 1 - distance | | l2 | score = 1 - distance | | dot (ip) | score = -distance (Chroma returns negative inner product) |

`VectorDbClient` interface

| Method | Description | |--------|-------------| | upsertVectors({ ids, vectors, metadata?, namespace? }) | Upsert entity embeddings into the collection | | query({ vector, topK, filter?, namespace? }) | HNSW nearest-neighbour search; returns scored results | | fetchByIds({ ids, namespace? }) | Retrieve stored vectors by ID (used during profile computation) | | delete({ ids, namespace? }) | Delete specific entity vectors | | deleteAll(args?) | Drop the entire collection — recreated fresh on next operation |

Running Chroma locally

docker run -p 8000:8000 chromadb/chroma

Or with persistent storage:

docker run -p 8000:8000 -v chroma-data:/chroma/.chroma/index chromadb/chroma

Embedding Models

All embedding models implement the EmbeddingModel interface:

interface EmbeddingModel {
  embed(text: string): Promise<number[]>
  embedBatch(texts: string[]): Promise<number[][]>
}

Use the createEmbeddingModel factory to construct the right model from a config object, or instantiate a class directly for more control.

`createEmbeddingModel(config)`

import { createEmbeddingModel } from '@recvector/adapters'

const model = createEmbeddingModel({
  provider: 'openai',
  model: 'text-embedding-3-small',
  dimensions: 1536,
  apiKey: process.env.OPENAI_API_KEY,
})

const vector = await model.embed('wireless noise-cancelling headphones')

Supported providers: 'openai', 'gemini', 'huggingface'. For 'custom', pass an EmbeddingModel instance directly to createRecEngine({ embeddingModel }) instead.

`OpenAIEmbeddingModel`

Uses the OpenAI Embeddings API.

import { OpenAIEmbeddingModel } from '@recvector/adapters'

const model = new OpenAIEmbeddingModel({
  provider: 'openai',
  model: 'text-embedding-3-small',
  dimensions: 1536,
  apiKey: process.env.OPENAI_API_KEY,
})

Recommended models:

| Model | Dimensions | Notes | |-------|-----------|-------| | text-embedding-3-small | 1536 | Best cost/quality ratio for most use cases | | text-embedding-3-large | 3072 | Higher accuracy, higher cost | | text-embedding-ada-002 | 1536 | Legacy model |

Config fields:

| Field | Default | Description | |-------|---------|-------------| | apiKey | OPENAI_API_KEY env | API key (falls back to env var if omitted) | | model | required | Model ID | | dimensions | required | Must match the model's output dimensions |

`GeminiEmbeddingModel`

Uses the Google Gemini Embeddings API via @google/genai.

import { GeminiEmbeddingModel } from '@recvector/adapters'

const model = new GeminiEmbeddingModel({
  provider: 'gemini',
  model: 'gemini-embedding-exp-03-07',
  dimensions: 768,
  apiKey: process.env.GEMINI_API_KEY,
})

Recommended models:

| Model | Dimensions | Notes | |-------|-----------|-------| | gemini-embedding-exp-03-07 | 768 | Latest experimental, high quality | | text-embedding-004 | 768 | Stable production model |

Rate limit handling:

GeminiEmbeddingModel automatically retries on HTTP 429 (rate limit) with exponential backoff and respects Retry-After headers. The free tier allows 100 requests/minute — embedBatch runs texts sequentially within each batch so the concurrency throttle in syncEntities is the only parallelism knob.

Config fields:

| Field | Default | Description | |-------|---------|-------------| | apiKey | required | Gemini API key | | model | required | Model ID | | dimensions | required | Output vector dimensions |

`HuggingFaceEmbeddingModel`

Uses the Hugging Face Inference API. Works with any model hosted on HuggingFace that supports the feature-extraction pipeline.

import { HuggingFaceEmbeddingModel } from '@recvector/adapters'

const model = new HuggingFaceEmbeddingModel({
  provider: 'huggingface',
  model: 'sentence-transformers/all-MiniLM-L6-v2',
  dimensions: 384,
  apiKey: process.env.HF_API_KEY,
})

Popular models:

| Model | Dimensions | Notes | |-------|-----------|-------| | sentence-transformers/all-MiniLM-L6-v2 | 384 | Fast, lightweight, good for most domains | | sentence-transformers/all-mpnet-base-v2 | 768 | Higher accuracy, slower | | BAAI/bge-small-en-v1.5 | 384 | Strong multilingual support |

Self-hosted inference:

Point at your own inference server by passing baseUrl:

import { HuggingFaceEmbeddingModel } from '@recvector/adapters'

// instantiate directly (not via createEmbeddingModel) to access baseUrl
const model = new HuggingFaceEmbeddingModel({
  provider: 'huggingface',
  model: 'sentence-transformers/all-MiniLM-L6-v2',
  dimensions: 384,
  apiKey: 'your-key',
  baseUrl: 'http://my-tgi-server:8080',
})

Config fields:

| Field | Default | Description | |-------|---------|-------------| | apiKey | required | Hugging Face API key (Bearer token) | | model | required | Model ID on HuggingFace Hub | | dimensions | required | Output vector dimensions | | baseUrl | https://api-inference.huggingface.co | Override for self-hosted inference servers |

Custom Embedding Model

Implement EmbeddingModel to use any embedding source — local models, custom APIs, or caching wrappers.

import type { EmbeddingModel } from '@recvector/sdk'

class MyEmbeddingModel implements EmbeddingModel {
  async embed(text: string): Promise<number[]> {
    // call your API / run local model
    return myApi.embed(text)
  }

  async embedBatch(texts: string[]): Promise<number[][]> {
    return Promise.all(texts.map(t => this.embed(t)))
  }
}

const rec = await createRecEngine({
  embeddingModel: new MyEmbeddingModel(),
})

Exports

import {
  // Storage
  KnexStorageAdapter,

  // Vector DB
  ChromaVectorClient,

  // Embedding models
  OpenAIEmbeddingModel,
  GeminiEmbeddingModel,
  HuggingFaceEmbeddingModel,
  createEmbeddingModel,
} from '@recvector/adapters'

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@recvector/adapters

Installation

Quick start

KnexStorageAdapter

Responsibilities

Constructor

Feature mapping

SDK-managed tables

StorageAdapter interface

ChromaVectorClient

Constructor

Namespace encoding

Collection caching

Score normalisation

VectorDbClient interface

Running Chroma locally

Embedding Models

createEmbeddingModel(config)

OpenAIEmbeddingModel

GeminiEmbeddingModel

HuggingFaceEmbeddingModel

Custom Embedding Model

Exports

License

`KnexStorageAdapter`

`StorageAdapter` interface

`ChromaVectorClient`

`VectorDbClient` interface

`createEmbeddingModel(config)`

`OpenAIEmbeddingModel`

`GeminiEmbeddingModel`

`HuggingFaceEmbeddingModel`