vectorsync

v1.0.1

Published

4 months ago

Real-time RAG synchronization engine. Automatically syncs MongoDB changes (insert/update/delete) with Pinecone vector embeddings. Features built-in retrieval, multi-provider support (OpenAI, Gemini, Ollama), and smart field-level updates.

Downloads

0High
0Medium
0Low

abhideepkumar

mongodb embeddings rag vector-database synchronization llm

VectorSync

Real-time RAG Synchronization & Retrieval for MongoDB + Pinecone

VectorSync solves the problem of keeping your separate Vector Database (Pinecone) in sync with your primary application database (MongoDB). Instead of writing manual hooks or cron jobs, VectorSync attaches to MongoDB Change Streams to automatically reflect insert, update, and delete operations in real-time.

It also provides a built-in Retrieval / RAG Engine to chat with your data immediately.

Features

Real-time Sync: Reacts instantly to data changes.
Multi-Provider: Support for OpenAI, Google Gemini, and Ollama (Local).
Smart Updates: Only regenerates embeddings if specific fields change (saves money).
RAG Engine: Built-in VectorRetrieval class for context-aware chat, history, and citations.
Local-First: Full offline support using Ollama for both embeddings and LLM.

Installation

npm install vectorsync

Prerequisites

MongoDB: Must be a Replica Set (required for Change Streams). Atlas has this by default.
Pinecone: An index created (dimension must match your model, e.g., 1536 for OpenAI, 768 for Gemini).

Quick Start

1. The Sync Engine (Background Process)

This code acts as the listener. Run it when your server starts.

import { VectorSync } from 'vectorsync';

// 1. Initialize
const syncer = new VectorSync({
  mongoUri: process.env.MONGO_URI!,
  // Optional: dbName: 'my_db',
  vectorDb: {
    type: 'pinecone',
    options: { apiKey: process.env.PINECONE_KEY!, index: 'my-index' }
  },
  embeddingProvider: {
    type: 'openai', // or 'gemini', 'ollama'
    options: { apiKey: process.env.OPENAI_KEY!, model: 'text-embedding-3-small' }
  }
});

// 2. Start Watching Collections
// Only syncs when 'name' or 'description' changes
await syncer.createContext('products', {
  fields: ['name', 'description']
});

console.log('VectorSync is processing changes...');

2. The RAG Engine (Chat API)

Use this within your API routes to query the synced data.

import { VectorRetrieval } from 'vectorsync';

// Reuse the 'syncer' config, or create new instances of adapters
const retrieval = new VectorRetrieval(syncer.vectorDb, syncer.embeddingProvider);

// A. Stateful Chat (Maintains History)
const sessionId = retrieval.createSession({
  systemPrompt: "You are a shopping assistant.",
  contextSources: [{ collectionName: 'products', fields: ['name', 'description'] }],
  model: 'gpt-4o',
  provider: 'openai'
});

const response = await retrieval.query(sessionId, "Do you have ergonomic chairs?");
console.log(response.response); // "Yes, we have..."
console.log(response.retrievedDocuments); // [{ id: '...', score: 0.89, ... }]

// B. Stateless Query (One-off)
const result = await retrieval.queryOnce({
  systemPrompt: "Answer based on context",
  contextSources: [{ collectionName: 'products', fields: ['name'] }],
  model: 'llama3.2',
  provider: 'ollama'
}, "Tell me about product X", { debug: true });

Supported Providers

Configuration Reference

`VectorSyncConfig`

{
  mongoUri: string;
  dbName?: string;
  vectorDb: {
    type: 'pinecone' | 'custom';
    options: { apiKey: string; index: string; };
  };
  embeddingProvider: {
    type: 'openai' | 'gemini' | 'ollama' | 'custom';
    options: { 
      apiKey?: string;     // Not needed for Ollama
      baseUrl?: string;    // Special for Ollama
      model?: string;      // Optional override
    };
  };
}

Environment Variables

Typical setup in .env:

MONGO_URI=mongodb://localhost:27017/mydb?replSet=rs0
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=AIza...
PINECONE_API_KEY=pc-...

License

ISC

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme