@memberjunction/ai-vector-sync
v5.11.0
Published
MemberJunction: AI Vector/Entity Sync Package - handles synchronization between MemberJunction entities and vector databases
Downloads
5,060
Keywords
Readme
@memberjunction/ai-vector-sync
Synchronizes MemberJunction entity records with vector databases by transforming records into embeddings through a template-based pipeline. Handles batch processing, worker-based parallelism, Entity Document management, and Entity Record Document tracking.
Architecture
graph TD
subgraph SyncPkg["@memberjunction/ai-vector-sync"]
EVS["EntityVectorSyncer"]
EDC["EntityDocumentCache"]
EDTP["EntityDocumentTemplateParser"]
BW["BatchWorker"]
end
subgraph Pipeline["Vectorization Pipeline"]
FETCH["Fetch Records<br/>(batched)"] --> TEMPL["Parse Templates<br/>(text from fields)"]
TEMPL --> EMBED["Generate Embeddings<br/>(AI model)"]
EMBED --> UPSERT["Upsert to<br/>Vector DB"]
UPSERT --> TRACK["Create Entity<br/>Record Documents"]
end
subgraph MJEntities["MemberJunction Entities"]
ED["Entity Documents"]
EDT["Entity Document Types"]
ERD["Entity Record Documents"]
VDI["Vector Indexes"]
end
subgraph External["External Services"]
AI["Embedding Model<br/>(OpenAI, Mistral, etc.)"]
VDB["Vector Database<br/>(Pinecone, etc.)"]
end
EVS --> EDC
EVS --> EDTP
EVS --> BW
EDTP --> TEMPL
BW --> EMBED
BW --> UPSERT
BW --> TRACK
EVS --> ED
EVS --> ERD
BW --> AI
BW --> VDB
style SyncPkg fill:#2d6a9f,stroke:#1a4971,color:#fff
style Pipeline fill:#2d8659,stroke:#1a5c3a,color:#fff
style MJEntities fill:#b8762f,stroke:#8a5722,color:#fff
style External fill:#7c5295,stroke:#563a6b,color:#fffInstallation
npm install @memberjunction/ai-vector-syncOverview
This package converts MemberJunction entity records into vector embeddings stored in a vector database. The process is driven by Entity Documents -- metadata records that define which entity to vectorize, how to generate text from it (via templates), which embedding model to use, and where to store the results.
Key capabilities:
- Batch processing with configurable sizes for fetching, embedding, and upserting
- Template-based text generation using Entity Document templates that reference entity fields
- Worker architecture for concurrent embedding and upsert operations
- Entity Document caching via a singleton cache to avoid repeated database lookups
- Default Entity Document creation for entities that lack one
- Resume support via
StartingOffsetfor interrupted processes - Entity Record Document tracking to record which records have been vectorized
Vectorization Flow
sequenceDiagram
participant Caller
participant EVS as EntityVectorSyncer
participant Cache as EntityDocumentCache
participant Parser as TemplateParser
participant Worker as BatchWorker
participant Model as Embedding Model
participant VDB as Vector Database
participant DB as MJ Database
Caller->>EVS: VectorizeEntity(params, user)
EVS->>EVS: Config(forceRefresh, user)
EVS->>Cache: Refresh (loads Entity Documents)
EVS->>Cache: GetDocument(entityDocumentID)
Cache-->>EVS: EntityDocumentEntity
EVS->>DB: Load template for Entity Document
EVS->>DB: Fetch entity records (batch)
loop For each batch
EVS->>Parser: Parse template for each record
Parser-->>EVS: Text strings
EVS->>Worker: VectorizeTemplates batch
Worker->>Model: createBatchEmbedding(texts)
Model-->>Worker: Embedding vectors
EVS->>Worker: UpsertVectors batch
Worker->>VDB: createRecords(vectors)
VDB-->>Worker: Success/failure
EVS->>Worker: Create EntityRecordDocuments
Worker->>DB: Save tracking records
end
EVS-->>Caller: VectorizeEntityResponseCore Components
EntityVectorSyncer
The main class that orchestrates the entire vectorization process. Extends VectorBase from @memberjunction/ai-vectors.
Key methods:
| Method | Description |
|---|---|
| Config(forceRefresh, contextUser) | Initializes engines and caches; must be called before vectorization |
| VectorizeEntity(params, contextUser) | Runs the full vectorization pipeline for an entity |
| GetEntityDocument(id) | Retrieves an Entity Document by ID |
| GetEntityDocumentByName(name, user) | Retrieves an Entity Document by name |
| GetActiveEntityDocuments(entityNames?) | Gets all active Entity Documents, optionally filtered |
| CreateDefaultEntityDocument(entityID, vectorDB, aiModel) | Creates a default Entity Document when one does not exist |
EntityDocumentCache
A singleton cache that loads all Entity Document and Entity Document Type records into memory for fast lookup.
classDiagram
class EntityDocumentCache {
-_instance : EntityDocumentCache
-_cache : Record~string, EntityDocumentEntity~
-_typeCache : Record~string, EntityDocumentTypeEntity~
+Instance : EntityDocumentCache
+IsLoaded : boolean
+GetDocument(id) EntityDocumentEntity
+GetDocumentByName(name) EntityDocumentEntity
+GetDocumentType(id) EntityDocumentTypeEntity
+GetDocumentTypeByName(name) EntityDocumentTypeEntity
+GetFirstActiveDocumentForEntityByID(entityID) EntityDocumentEntity
+GetFirstActiveDocumentForEntityByName(name) EntityDocumentEntity
+Refresh(forceRefresh, user) void
+SetCurrentUser(user) void
}
style EntityDocumentCache fill:#2d6a9f,stroke:#1a4971,color:#fffEntityDocumentTemplateParser
Converts entity records into text strings by evaluating Entity Document templates. Templates use ${FieldName} syntax to reference entity field values.
// Template example: "${FirstName} ${LastName} works at ${Company} as ${Title}"
// With record { FirstName: 'Jane', LastName: 'Doe', Company: 'Acme', Title: 'Engineer' }
// Result: "Jane Doe works at Acme as Engineer"BatchWorker
Handles the parallel execution of embedding generation, vector database upserts, and Entity Record Document creation. Configurable batch sizes allow tuning for memory and API rate limits.
Usage
Basic Vectorization
import { EntityVectorSyncer } from '@memberjunction/ai-vector-sync';
import { UserInfo } from '@memberjunction/core';
const syncer = new EntityVectorSyncer();
// Initialize (required once)
await syncer.Config(false, contextUser);
// Vectorize all records for an entity
await syncer.VectorizeEntity({
entityID: 'entity-uuid',
entityDocumentID: 'doc-uuid',
listBatchCount: 50,
VectorizeBatchCount: 50,
UpsertBatchCount: 50
}, contextUser);Vectorize a Specific List
await syncer.VectorizeEntity({
entityID: 'entity-uuid',
entityDocumentID: 'doc-uuid',
listID: 'list-uuid' // Only records in this list
}, contextUser);Resume Interrupted Processing
await syncer.VectorizeEntity({
entityID: 'entity-uuid',
entityDocumentID: 'doc-uuid',
StartingOffset: 5000 // Skip first 5000 records
}, contextUser);Manage Entity Documents
// Look up by name
const doc = await syncer.GetEntityDocumentByName('Contacts Vectorization', contextUser);
// Get all active documents
const activeDocs = await syncer.GetActiveEntityDocuments();
// Get active documents for specific entities only
const filtered = await syncer.GetActiveEntityDocuments(['Contacts', 'Companies']);
// Create a default document when none exists
const newDoc = await syncer.CreateDefaultEntityDocument(
entityID, vectorDatabase, aiModel
);Configuration Types
VectorizeEntityParams
type VectorizeEntityParams = {
entityID: string; // Entity to vectorize
entityDocumentID?: string; // Entity Document configuration
listID?: string; // Optional: vectorize only this list
listBatchCount?: number; // Records per fetch batch (default: 50)
VectorizeBatchCount?: number; // Embedding batch size (default: 50)
UpsertBatchCount?: number; // DB upsert batch size (default: 50)
StartingOffset?: number; // Skip records for resume
CurrentUser?: UserInfo; // User context
};EntitySyncConfig
type EntitySyncConfig = {
EntityDocumentID: string;
Interval: number; // Seconds between syncs
RunViewParams: RunViewParams;
IncludeInSync: boolean;
LastRunDate: string;
VectorIndexID: number;
VectorID: number;
};Entity Document Templates
Templates define how entity records are transformed into text for embedding generation.
graph LR
ED["Entity Document"] --> TMPL["Template<br/>${Field} syntax"]
TMPL --> PARSER["Template Parser"]
REC["Entity Record"] --> PARSER
PARSER --> TEXT["Plain Text"]
TEXT --> EMBED["Embedding Model"]
EMBED --> VEC["Vector"]
style ED fill:#2d6a9f,stroke:#1a4971,color:#fff
style TMPL fill:#2d8659,stroke:#1a5c3a,color:#fff
style PARSER fill:#b8762f,stroke:#8a5722,color:#fff
style EMBED fill:#7c5295,stroke:#563a6b,color:#fff
style REC fill:#2d8659,stroke:#1a5c3a,color:#fff
style TEXT fill:#b8762f,stroke:#8a5722,color:#fff
style VEC fill:#7c5295,stroke:#563a6b,color:#fffEnvironment Variables
# Database
DB_HOST=your-sql-server
DB_PORT=1433
DB_USERNAME=your-username
DB_PASSWORD=your-password
DB_DATABASE=your-database
# AI Models
OPENAI_API_KEY=your-openai-key
MISTRAL_API_KEY=your-mistral-key
# Vector Database
PINECONE_API_KEY=your-pinecone-key
PINECONE_HOST=your-pinecone-host
PINECONE_DEFAULT_INDEX=your-default-index
# User Context
[email protected]Dependencies
| Package | Purpose |
|---|---|
| @memberjunction/ai | BaseEmbeddings, GetAIAPIKey, EmbedTextsResult |
| @memberjunction/ai-vectordb | VectorDBBase, VectorRecord |
| @memberjunction/ai-vectors | VectorBase base class |
| @memberjunction/aiengine | AIEngine singleton |
| @memberjunction/core | Metadata, RunView, BaseEntity, UserInfo |
| @memberjunction/core-entities | Entity type definitions |
| @memberjunction/global | MJGlobal class factory |
| @memberjunction/templates | Template engine for text generation |
Performance Considerations
- Batch sizes: Adjust
listBatchCount,VectorizeBatchCount, andUpsertBatchCountbased on available memory and API rate limits - Long-running: Full vectorization of large entities can take hours; use
StartingOffsetto resume - Worker concurrency: The BatchWorker processes embedding and upsert operations concurrently within each batch
- Caching:
EntityDocumentCachereduces database lookups for document metadata
Development
# Build
npm run build
# Development mode
npm run startLicense
ISC
