@recvector/sdk

v0.1.8

Published

a month ago

RecVector core SDK - AI-powered content-based recommendations for any database

0High
0Medium
0Low

prospercoded

recommendation recommendations vector embeddings similarity content-based sdk ai

@recvector/sdk

The core RecVector SDK. Provides createRecEngine(), the complete RecEngineClient API, CLI commands (studio, bootstrap, reset), schema validation, and the full recommendation engine — including two profile update strategies, MMR re-ranking, and exploration noise.

Installation

pnpm add @recvector/sdk @recvector/adapters
# also install your DB driver
pnpm add pg           # PostgreSQL
pnpm add mysql2       # MySQL
pnpm add better-sqlite3  # SQLite

Configuration

Create recvector.config.ts at your project root. The CLI and createRecEngine() both discover this file automatically via find-up.

import { defineConfig } from '@recvector/sdk'

export default defineConfig({
  db: {
    client: 'postgresql',
    connection: process.env.DATABASE_URL,
  },
  schemaPath: './rec_schema.json',
  vectorDb: {
    type: 'chroma',
    url: 'http://localhost:8000',
    collection: 'my-app',
  },
  embeddingModel: {
    provider: 'openai',
    model: 'text-embedding-3-small',
    dimensions: 1536,
    apiKey: process.env.OPENAI_API_KEY,
  },
})

Config fields

| Field | Type | Required | Description | |-------|------|----------|-------------| | db.client | 'postgresql' \| 'mysql2' \| 'sqlite3' \| 'better-sqlite3' | Yes | Knex driver name | | db.connection | string \| object | Yes | Connection string or Knex connection object | | schemaPath | string | Yes | Path to rec_schema.json (relative to config file) | | vectorDb.type | 'chroma' | Yes | Vector DB adapter (only Chroma in v1) | | vectorDb.url | string | Yes | Chroma server URL | | vectorDb.collection | string | Yes | Chroma collection name | | vectorDb.index.metric | 'cosine' \| 'dot' \| 'l2' | No | Similarity metric (default: cosine) | | embeddingModel.provider | 'openai' \| 'gemini' \| 'huggingface' | Yes | Embedding provider | | embeddingModel.model | string | Yes | Model name/ID | | embeddingModel.dimensions | number | Yes | Output vector dimensions | | embeddingModel.apiKey | string | No | API key (falls back to env var) | | port | number | No | Studio server port (default: 4242) |

Schema JSON (`rec_schema.json`)

The schema file maps your existing database tables to RecVector's logical model. Generate it with npx recvector studio (visual drag-and-drop editor) or write it by hand.

Minimal schema

{
  "version": "1.0.0",
  "connections": {
    "user": {
      "table": "users",
      "id_column": "id"
    },
    "entity": {
      "table": "products",
      "id_column": "id"
    },
    "interactions": [
      {
        "name": "purchase",
        "table": "orders",
        "user_fk": "user_id",
        "entity_fk": "product_id",
        "timestamp_column": "created_at",
        "base_weight": 1.0
      },
      {
        "name": "view",
        "table": "product_views",
        "user_fk": "user_id",
        "entity_fk": "product_id",
        "timestamp_column": "viewed_at",
        "base_weight": 0.3
      }
    ]
  }
}

Feature mappings

Feature mappings define what attributes of each entity drive similarity. Without features, all entities will produce near-identical recommendations.

{
  "version": "1.0.0",
  "connections": { "...": "..." },
  "features": [
    {
      "name": "category",
      "type": "categorical",
      "source": { "type": "column", "column": "category" }
    },
    {
      "name": "description",
      "type": "text",
      "source": { "type": "column", "column": "description" }
    },
    {
      "name": "tags",
      "type": "multi_categorical",
      "source": {
        "type": "join",
        "join_table": "product_tags",
        "join_fk": "product_id",
        "value_column": "tag_name"
      }
    },
    {
      "name": "price",
      "type": "numeric",
      "source": { "type": "column", "column": "price_cents" }
    }
  ]
}

Feature source types:

| source.type | Description | |---------------|-------------| | "column" | Reads a column directly from the entity table | | "join" | Joins through a linking table to collect multi-value features (e.g. tags) |

Feature types and how they are represented in the embedding text:

| type | Example output | |--------|----------------| | categorical | "category: electronics" | | multi_categorical | "tags: audio, wireless, noise-cancelling" | | text | raw value appended to embedding text | | numeric | "price: 299" |

Aggregation configuration

Controls when profiles are updated and which strategy is used.

{
  "aggregation": {
    "profile_update_strategy": "full_recompute",
    "batch_thresholds": {
      "purchase": 1,
      "view": 5
    },
    "time_fallback_hours": 24
  }
}

| Field | Default | Description | |-------|---------|-------------| | profile_update_strategy | "full_recompute" | "full_recompute" or "incremental" | | batch_thresholds | {} | Per-interaction-type thresholds that trigger a profile update | | time_fallback_hours | 24 | Force update after N hours even if threshold not reached |

Ranking configuration

{
  "ranking": {
    "similarity_weight": 1.0,
    "popularity_weight": 0.1,
    "time_decay_factor": 0.95
  }
}

| Field | Default | Description | |-------|---------|-------------| | similarity_weight | 1.0 | Weight of cosine similarity in final score | | popularity_weight | 0.1 | Weight of log(1 + interaction_count) popularity bonus | | time_decay_factor | 0.95 | Exponential decay applied per day to older interactions |

API Reference

`createRecEngine(params?)`

Factory function. Discovers recvector.config.ts, initializes adapters, auto-creates SDK tables, and returns a fully wired RecEngineClient.

All parameters are optional — when omitted, configuration is read from the config file.

import { createRecEngine } from '@recvector/sdk'

const rec = await createRecEngine()

// With overrides
const rec = await createRecEngine({
  configPath: './path/to/recvector.config.ts',
  embeddingModel: myCustomEmbeddingModel,
})

CreateRecEngineParams

| Field | Type | Description | |-------|------|-------------| | configPath | string | Override config file path (default: auto-discovered) | | db | StorageAdapter | Pre-built storage adapter — skips internal Knex setup | | rawDb | Knex | Raw Knex instance — required for syncEntities / bootstrapProfiles | | schema | RecVectorSchema | Override schema (default: read from schemaPath) | | vectorDb | VectorDbConfig \| VectorDbClient | Override vector DB config or pass a pre-built client | | embeddingModel | EmbeddingConfig \| EmbeddingModel | Override embedding model config or pass a pre-built model |

`rec.logInteraction(params)`

Records that a user interacted with an entity. Internally increments the interaction counter and triggers a profile update (fire-and-forget) when the configured batch threshold is reached.

await rec.logInteraction({
  userId: 'user_123',
  entityId: 'product_456',
  type: 'purchase',
  timestamp: new Date(),   // optional, defaults to now
})

LogInteractionParams

| Field | Type | Required | Description | |-------|------|----------|-------------| | userId | string | Yes | The user who interacted | | entityId | string | Yes | The entity they interacted with | | type | string | Yes | Interaction type matching a name in connections.interactions | | timestamp | Date | No | When the interaction happened (default: new Date()) | | metadata | Record<string, unknown> | No | Optional context (stored but not currently used in scoring) |

`rec.recommend(params)`

Returns personalised recommendations for a user. Loads the pre-computed profile vector (one DB read), queries the vector DB with HNSW nearest-neighbour search, scores results by α × similarity + β × log(1 + popularity), and optionally re-ranks with MMR for diversity.

Returns [] if the user has no profile yet.

const recs = await rec.recommend({
  userId: 'user_123',
  topK: 10,
  lambda: 0.7,       // 1.0 = pure relevance, 0.0 = pure diversity
  exploration: 0.05, // adds slight randomness to vary results between calls
  filters: {
    excludeEntityIds: ['product_123', 'product_456'],
  },
})
// → [{ entityId: 'product_789', score: 0.94 }, ...]

RecommendParams

| Field | Type | Required | Description | |-------|------|----------|-------------| | userId | string | Yes | User to get recommendations for | | topK | number | Yes | Number of recommendations to return | | lambda | number | No | MMR trade-off: 1.0 = pure relevance, 0.0 = pure diversity (default: 1.0) | | exploration | number | No | Random perturbation epsilon: 0 = deterministic, 0.1 = subtle variation (default: 0) | | filters.excludeEntityIds | string[] | No | Entity IDs to exclude (e.g. already seen/purchased) | | filters.namespace | string | No | Restrict to a specific vector DB namespace |

Recommendation

interface Recommendation {
  entityId: string;
  score: number;        // composite: similarity + popularity bonus
  explanation?: string;
}

`rec.upsertEntity(params)`

Embeds an entity's features and stores the vector in the vector DB. Call this whenever a new entity is created or its features change.

await rec.upsertEntity({
  entityId: 'product_789',
  features: {
    category: 'electronics',
    description: 'Wireless noise-cancelling headphones',
    tags: ['audio', 'wireless', 'noise-cancelling'],
    price: 29900,
  },
})

The entity's features are concatenated into a text string, embedded via the configured model, and upserted to the vector DB. Your entity's row in the database is not modified.

`rec.syncEntities(params?)`

Batch-embeds all entities in your database. Paginates through the entity table, skips IDs already present in the vector DB (unless force: true), and calls upsertEntity on each new entity.

await rec.syncEntities({
  batchSize: 100,
  concurrency: 5,
  force: false,
  onProgress: (done, total) => {
    process.stdout.write(`\rSyncing: ${done}/${total}`)
  },
})

SyncEntitiesParams

| Field | Type | Default | Description | |-------|------|---------|-------------| | batchSize | number | 100 | Rows per page for entity table pagination | | concurrency | number | 5 | Number of pages processed in parallel | | force | boolean | false | Re-embed entities already present in the vector DB | | onProgress | (processed, total) => void | — | Progress callback called after each page |

Throws DbAdapterError if no rawDb (Knex instance) is available.

`rec.updateUserProfile(userId)`

Manually triggers a profile recomputation for a specific user. Fetches the user's last 100 interactions, retrieves their stored entity vectors from the vector DB, computes a weighted time-decayed average, and upserts the result to rec_user_profiles.

await rec.updateUserProfile('user_123')

This is the same operation that logInteraction triggers automatically at batch threshold. Use it to force an immediate update, e.g. after importing historical data.

Why fetchByIds instead of re-embedding: Entity vectors are embedded once at upsertEntity() time and stored permanently in the vector DB. Profile computation reuses those exact stored vectors. Re-embedding from SQL features would produce subtly different values due to embedding API non-determinism, placing the profile in a shifted coordinate space relative to the entity vectors it is compared against.

`rec.bootstrapProfiles(params?)`

Initialises profiles for all users who have interaction history in the database. Collects distinct user IDs from every interaction table defined in the schema, filters out users who already have a profile (unless force: true), then calls updateUserProfile() for each in parallel chunks.

await rec.bootstrapProfiles({
  concurrency: 5,
  force: false,
  onProgress: (done, total) => {
    process.stdout.write(`\rProfiles: ${done}/${total}`)
  },
})

BootstrapProfilesParams

| Field | Type | Default | Description | |-------|------|---------|-------------| | concurrency | number | 5 | Parallel updateUserProfile calls per chunk | | force | boolean | false | Re-process users who already have a profile | | onProgress | (processed, total) => void | — | Progress callback called after each chunk |

`rec.destroy()`

Closes the internal Knex connection pool. Call on application shutdown to allow the process to exit cleanly.

process.on('SIGTERM', async () => {
  await rec.destroy()
  process.exit(0)
})

Profile Update Strategies

`full_recompute` (default)

Best for accuracy. Re-derives the profile from the full interaction history on every update.

Write path:

logInteraction() increments a counter in rec_user_profiles
When the counter hits the configured batch_thresholds for that interaction type (or time_fallback_hours expires) → updateUserProfile() fires asynchronously
updateUserProfile() fetches the last 100 interactions → retrieves their stored vectors via vectorDb.fetchByIds() → computes a weighted average with time decay → upserts the profile

Read path: recommend() loads the pre-computed profile row (1 DB read) → nearest-neighbour HNSW query.

`incremental`

Best for high-frequency interaction pipelines. O(D) write path — no history reads, no batch accumulation.

Write path:

logInteraction() fetches the single interacted entity's stored vector from the vector DB → nudges the existing profile vector using a weighted moving average → immediately upserts the updated profile
Time decay is NOT applied on the hot path. Schedule periodic updateUserProfile() calls (e.g. nightly) to re-apply decay and correct drift over time

Read path: same as full_recompute.

CLI Commands

All CLI commands are available as npx recvector <command> or recvector <command> if installed globally.

`recvector studio`

Starts the visual schema editor — a React Flow drag-and-drop UI that introspects your database, lets you map tables to RecVector's logical model, and saves rec_schema.json to disk.

npx recvector studio
npx recvector studio --port 3000
npx recvector studio --no-open   # suppress automatic browser launch

Opens at http://localhost:4242 by default.

`recvector bootstrap`

One-time migration command that populates the vector DB with entity embeddings and builds initial user profiles from existing interaction history. Shows live progress for each step.

npx recvector bootstrap
npx recvector bootstrap --entities-only    # skip profile bootstrap
npx recvector bootstrap --profiles-only    # skip entity sync
npx recvector bootstrap --force            # re-process already synced records
npx recvector bootstrap --concurrency 10
npx recvector bootstrap --batch-size 200

| Option | Default | Description | |--------|---------|-------------| | --entities-only | — | Only sync entities, skip profile bootstrap | | --profiles-only | — | Only build profiles, skip entity sync | | --force | false | Re-process records that are already synced/profiled | | --concurrency <n> | 5 | Parallel workers | | --batch-size <n> | 100 | Rows per page for entity sync |

`recvector reset`

Wipes RecVector data — useful before a re-bootstrap with a different embedding model.

npx recvector reset --entities              # delete all vectors from the vector DB
npx recvector reset --profiles              # truncate rec_user_profiles
npx recvector reset --all                   # both entities and profiles
npx recvector reset --all --yes             # skip confirmation prompt
npx recvector reset --entities --namespace my-ns  # reset a specific namespace only

SDK-Managed Tables

RecVector creates two tables in your existing database automatically on first createRecEngine() call. No other tables are touched.

`rec_user_profiles`

| Column | Type | Description | |--------|------|-------------| | user_id | TEXT PRIMARY KEY | Foreign key to your users table | | embedding | TEXT | JSON-serialised profile vector | | last_updated | DATETIME | Timestamp of last profile recomputation | | version | TEXT | Schema version at time of last update | | interaction_count_since_update | INTEGER | Counter used by batch threshold logic | | accumulated_weight | REAL | Total interaction weight accumulated (incremental mode) |

`rec_entity_stats`

| Column | Type | Description | |--------|------|-------------| | entity_id | TEXT PRIMARY KEY | Foreign key to your entity table | | feedback_counts | TEXT | JSON object of { interactionType: count } | | version | TEXT | Schema version |

Typed Errors

All SDK errors extend RecVectorError (which extends Error) and include the original cause when available.

| Class | When thrown | |-------|-------------| | RecVectorError | Base class — never thrown directly | | SchemaValidationError | Invalid or missing rec_schema.json | | DbAdapterError | SQL query failure, missing rawDb, table not found | | VectorDbError | Chroma API error, collection creation failure | | EmbeddingError | Embedding API failure, missing API key |

import { DbAdapterError, VectorDbError, EmbeddingError } from '@recvector/sdk'

try {
  await rec.upsertEntity({ entityId: 'p1', features: { ... } })
} catch (err) {
  if (err instanceof EmbeddingError) {
    // embedding API issue — retry or log
  } else if (err instanceof VectorDbError) {
    // Chroma connection issue
  }
}

CommonJS (CJS) Support

The SDK ships both ESM (dist/) and CommonJS (dist/cjs/) builds. CJS is the fallback for frameworks like NestJS that use module: CommonJS.

// ESM (default)
import { createRecEngine } from '@recvector/sdk'

// CommonJS
const { createRecEngine } = require('@recvector/sdk')

If your framework resolves modules via webpack (e.g. @nestjs/cli), add this to your webpack.config.js to ensure the correct build is selected:

module.exports = (options) => ({
  ...options,
  resolve: {
    ...options.resolve,
    conditionNames: ['require', 'node', 'default'],
  },
})

Performance Targets

| Operation | Target | |-----------|--------| | Recommendation (p95) | < 100ms at 1M entities with pre-computed profile | | Entity embedding | < 500ms per entity | | Batch sync throughput | > 1,000 entities/min | | Profile update | < 500ms | | Storage per profile | ~6KB (6GB at 1M users) | | Tested scale | 1–5M entities on a single Chroma node |

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@recvector/sdk

Installation

Configuration

Config fields

Schema JSON (rec_schema.json)

Minimal schema

Feature mappings

Aggregation configuration

Ranking configuration

API Reference

createRecEngine(params?)

rec.logInteraction(params)

rec.recommend(params)

rec.upsertEntity(params)

rec.syncEntities(params?)

rec.updateUserProfile(userId)

rec.bootstrapProfiles(params?)

rec.destroy()

Profile Update Strategies

full_recompute (default)

incremental

CLI Commands

recvector studio

recvector bootstrap

recvector reset

SDK-Managed Tables

rec_user_profiles

rec_entity_stats

Typed Errors

CommonJS (CJS) Support

Performance Targets

License

Schema JSON (`rec_schema.json`)

`createRecEngine(params?)`

`rec.logInteraction(params)`

`rec.recommend(params)`

`rec.upsertEntity(params)`

`rec.syncEntities(params?)`

`rec.updateUserProfile(userId)`

`rec.bootstrapProfiles(params?)`

`rec.destroy()`

`full_recompute` (default)

`incremental`

`recvector studio`

`recvector bootstrap`

`recvector reset`

`rec_user_profiles`

`rec_entity_stats`