rag-system-pgvector

v2.4.9

Published

2 months ago

A complete Retrieval-Augmented Generation system using pgvector, LangChain, and LangGraph for Node.js applications with dynamic embedding and model providers, structured data queries, and chat history - supports OpenAI, Anthropic, HuggingFace, Azure, Goog

RAG System Package

A production-ready Retrieval-Augmented Generation (RAG) system package built with PostgreSQL pgvector, LangChain, and LangGraph. Supports multiple AI providers including OpenAI, Anthropic, HuggingFace, Azure, Google AI, and local models.

🚀 Features

📦 Easy Integration: Simple npm install and ready-to-use API
🤖 Multi-Provider Support: OpenAI, Anthropic, HuggingFace, Azure, Google AI, Ollama
📚 Multi-format Support: PDF, DOCX, TXT, HTML, Markdown, JSON
🔍 Vector Search: High-performance similarity search with pgvector
🎯 Structured Data Queries: Accept JSON data for precise, contextual responses
💬 Chat History Support: Full conversation memory with summarization
⚡ Production Ready: Error handling, connection pooling, monitoring
🔧 Flexible Configuration: Choose your preferred embedding and LLM providers
💾 Buffer Processing: Process documents directly from memory buffers
🌐 URL Processing: Download and process documents from web URLs
📊 Batch Operations: Efficient processing of multiple documents

📦 Installation

npm install rag-system-pgvector

# Choose your AI provider (one or more):
npm install @langchain/openai          # For OpenAI
npm install @langchain/anthropic       # For Anthropic Claude
npm install @langchain/azure-openai    # For Azure OpenAI
npm install @langchain/google-genai    # For Google AI
npm install @langchain/community       # For HuggingFace, Ollama, etc.

🚀 Quick Start

OpenAI Provider (Traditional)

import { RAGSystem } from 'rag-system-pgvector';
import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';

// Create provider instances
const embeddings = new OpenAIEmbeddings({
  openAIApiKey: 'your-openai-api-key',
  modelName: 'text-embedding-ada-002',
});

const llm = new ChatOpenAI({
  openAIApiKey: 'your-openai-api-key',
  modelName: 'gpt-4',
  temperature: 0.7,
});

// Initialize RAG system
const rag = new RAGSystem({
  database: {
    host: 'localhost',
    database: 'your_db',
    username: 'postgres',
    password: 'your_password'
  },
  embeddings: embeddings,
  llm: llm,
  embeddingDimensions: 1536,
});

await rag.initialize();

// Add documents and query
await rag.addDocuments(['./docs/file1.pdf', './docs/file2.txt']);

// Simple query
const result = await rag.query("What is the main topic?");
console.log(result.answer);

// Query with structured data for precise responses
const structuredResult = await rag.query("Tell me about iPhone features", {
  structuredData: {
    intent: "product_information",
    entities: { product: "iPhone", category: "smartphone" },
    constraints: ["Focus on latest features", "Include specifications"],
    responseFormat: "structured_list"
  }
});
console.log(structuredResult.answer);

Mixed Providers (Advanced)

import { RAGSystem } from 'rag-system-pgvector';
import { OpenAIEmbeddings } from '@langchain/openai';
import { ChatAnthropic } from '@langchain/anthropic';

// Use OpenAI for embeddings, Anthropic for chat
const embeddings = new OpenAIEmbeddings({
  openAIApiKey: 'your-openai-api-key',
  modelName: 'text-embedding-ada-002',
});

const llm = new ChatAnthropic({
  anthropicApiKey: 'your-anthropic-api-key',
  modelName: 'claude-3-haiku-20240307',
  temperature: 0.7,
});

const rag = new RAGSystem({
  database: { /* your config */ },
  embeddings: embeddings,
  llm: llm,
  embeddingDimensions: 1536,
});

Local Models (Privacy-First)

import { RAGSystem } from 'rag-system-pgvector';
import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers';
import { Ollama } from '@langchain/community/llms/ollama';

// Use local models (no API keys required)
const embeddings = new HuggingFaceTransformersEmbeddings({
  modelName: 'sentence-transformers/all-MiniLM-L6-v2',
});

const llm = new Ollama({
  baseUrl: 'http://localhost:11434',
  model: 'llama2',
});

const rag = new RAGSystem({
  database: { /* your config */ },
  embeddings: embeddings,
  llm: llm,
  embeddingDimensions: 384, // all-MiniLM-L6-v2 dimensions
});

Buffer Processing (New in v1.1.0)

import { DocumentProcessor } from 'rag-system-pgvector/utils';

const processor = new DocumentProcessor();

// Process document from Buffer
const buffer = fs.readFileSync('document.pdf');
const result = await processor.processDocumentFromBuffer(
    buffer, 
    'document.pdf', 
    'pdf',
    { source: 'api-upload', category: 'research' }
);

console.log(result.chunks); // Processed chunks with embeddings

URL Processing (New in v1.1.0)

import { DocumentProcessor } from 'rag-system-pgvector/utils';

const processor = new DocumentProcessor();

// Process single URL
const result = await processor.processDocumentFromUrl(
    'https://example.com/document.pdf',
    { source: 'web-crawl', priority: 'high' }
);

// Process multiple URLs
const urls = [
    'https://example.com/doc1.pdf',
    'https://example.com/doc2.html',
    'https://example.com/doc3.md'
];

const results = await processor.processDocumentsFromUrls(urls, {
    source: 'batch-import',
    maxConcurrent: 3
});

console.log(`Processed ${results.successful.length} documents`);

🎯 Structured Data Queries (New in v2.2.0)

The RAG system now supports structured JSON data alongside natural language queries for more precise and contextual responses.

Basic Structured Query

const result = await rag.query("Tell me about iPhone features", {
  structuredData: {
    intent: "product_information",
    entities: {
      product: "iPhone",
      category: "smartphone",
      brand: "Apple"
    },
    constraints: [
      "Focus on latest model features",
      "Include technical specifications"
    ],
    context: {
      userType: "potential_buyer",
      priceRange: "premium"
    },
    responseFormat: "structured_list"
  }
});

Troubleshooting Query

const result = await rag.query("My device won't connect to WiFi", {
  structuredData: {
    intent: "troubleshooting",
    entities: {
      issue_type: "connectivity",
      device_category: "mobile",
      problem_area: "wifi"
    },
    constraints: [
      "Provide step-by-step solution",
      "Include alternative methods"
    ],
    responseFormat: "step_by_step_guide"
  }
});

Comparison Query

const result = await rag.query("Compare iPhone vs Samsung Galaxy", {
  structuredData: {
    intent: "comparison",
    entities: {
      item1: "iPhone",
      item2: "Samsung Galaxy"
    },
    constraints: [
      "Compare key specifications",
      "Highlight main differences"
    ],
    responseFormat: "comparison_table"
  }
});

Combined with Chat History

const result = await rag.query("What about the camera quality?", {
  chatHistory: [
    { role: 'user', content: 'Tell me about iPhone features' },
    { role: 'assistant', content: 'The iPhone offers excellent features...' }
  ],
  structuredData: {
    intent: "follow_up_question",
    entities: {
      topic: "camera",
      context_reference: "previous_iphone_discussion"
    },
    responseFormat: "detailed_explanation"
  }
});

Structured Data Schema

interface StructuredData {
  intent: string;                    // Query intent/category (required)
  entities?: {                       // Named entities and values
    [key: string]: string | number;
  };
  constraints?: string[];            // Requirements/constraints
  context?: {                        // Additional context
    [key: string]: string | number | boolean;
  };
  responseFormat?: string;           // Desired response format
}

Common Intents

product_information - Product details and specifications
troubleshooting - Problem-solving and technical support
comparison - Comparing multiple items
how_to_guide - Step-by-step instructions
explanation - Detailed explanations
follow_up_question - Context-aware follow-ups

Response Formats

structured_list - Organized bullet points
step_by_step_guide - Numbered instructions
comparison_table - Side-by-side comparison
detailed_explanation - Comprehensive explanation
bullet_points - Simple bullet format
json_format - Structured JSON response

Advanced Filtering (New in v2.1.0)

import RAGSystem from 'rag-system-pgvector';

const rag = new RAGSystem(config);
await rag.initialize();

// Add documents with user/knowledgebot metadata
const documentData = await processor.processDocumentFromBuffer(
    buffer, 
    'user-manual.pdf', 
    'pdf',
    {
        userId: 'user_123',
        knowledgebotId: 'tech_support_bot',
        department: 'engineering',
        priority: 'high'
    }
);

await rag.documentStore.saveDocument(documentData);

// Query with user filtering
const userResults = await rag.query('What technical info is available?', {
    userId: 'user_123',
    limit: 5
});

// Query with knowledgebot filtering
const botResults = await rag.query('Help with technical issues', {
    knowledgebotId: 'tech_support_bot'
});

// Query with multiple filters
const filteredResults = await rag.query('Show important documents', {
    userId: 'user_123',
    filter: {
        priority: 'high',
        department: 'engineering'
    }
});

// Direct search with filtering
const searchResults = await rag.searchDocumentsByUserId(
    'documentation',
    'user_123'
);

// Get all documents for a specific user
const userDocs = await rag.getDocumentsByUserId('user_123');

Chat History & Session Persistence (New in v2.3.0)

Enable multi-turn conversations with persistent chat history stored in PostgreSQL.

Basic Chat History

// First query
const result1 = await rag.query('What is machine learning?');

// Follow-up with context
const result2 = await rag.query('Can you give me examples?', {
    chatHistory: result1.chatHistory
});

// Another follow-up
const result3 = await rag.query('Which one is most popular?', {
    chatHistory: result2.chatHistory
});

Session Persistence

const sessionId = 'user_conversation_123';

// Query with automatic session save/load
const result = await rag.query('What is machine learning?', {
    sessionId: sessionId,
    persistSession: true,  // Auto-save after query
    userId: 'user_456',
    knowledgebotId: 'tech_bot'
});

// Continue conversation (automatically loads history)
const result2 = await rag.query('Tell me more', {
    sessionId: sessionId,
    persistSession: true
});

// Load session manually
const session = await rag.loadSession(sessionId);
console.log(`Session has ${session.messageCount} messages`);

// Get all user sessions
const userSessions = await rag.getUserSessions('user_456');
console.log(`User has ${userSessions.length} sessions`);

// Get session statistics
const stats = await rag.getSessionStats({ userId: 'user_456' });
console.log(`Total messages: ${stats.totalMessages}`);

History Summarization

// Long conversations are automatically managed
const result = await rag.query('Complex question', {
    sessionId: sessionId,
    persistSession: true,
    maxHistoryLength: 20  // Keeps recent 20 messages
});

Testing Chat Features

# Basic chat history
npm run test:chat:basic

# Session management
npm run test:chat:session

# History summarization
npm run test:chat:summarization

# Session persistence
npm run test:chat:persistence

Documentation:

📚 API Documentation

DocumentProcessor Class

The DocumentProcessor class provides powerful document processing capabilities for files, buffers, and URLs.

Buffer Processing Methods

`processDocumentFromBuffer(buffer, fileName, fileType, metadata = {})`

Process a document directly from a memory buffer.

import { DocumentProcessor } from 'rag-system-pgvector/utils';

const processor = new DocumentProcessor();
const buffer = Buffer.from('This is a test document', 'utf8');

const result = await processor.processDocumentFromBuffer(
    buffer,
    'test.txt',
    'txt',
    { source: 'api', category: 'test' }
);

// Returns:
// {
//   title: 'Test Document',
//   content: 'This is a test document',
//   chunks: [...], // Array of processed chunks with embeddings
//   metadata: { ... },
//   fileType: 'txt',
//   filePath: 'test.txt'
// }

Parameters:

buffer (Buffer): The document content as a Buffer object
fileName (string): Name of the file (used for metadata)
fileType (string): File type ('pdf', 'docx', 'txt', 'html', 'md', 'json')
metadata (object): Additional metadata to attach to the document

Supported Buffer Types:

TXT: Plain text files
HTML: HTML documents (extracts text content)
Markdown: Markdown files
JSON: JSON files (converts to readable text)

`extractTextFromBuffer(buffer, fileType)`

Extract raw text from a buffer without processing into chunks.

const text = await processor.extractTextFromBuffer(buffer, 'html');
console.log(text); // Extracted plain text

URL Processing Methods

`processDocumentFromUrl(url, metadata = {})`

Download and process a document from a URL.

const result = await processor.processDocumentFromUrl(
    'https://example.com/document.pdf',
    { 
        source: 'web-crawl',
        priority: 'high',
        category: 'research' 
    }
);

// Automatically detects file type from URL and content headers
// Downloads to temp directory and processes

Parameters:

url (string): HTTP/HTTPS URL to download from
metadata (object): Additional metadata for the document

Features:

Automatic file type detection from URL extension and Content-Type headers
Temporary file handling (auto-cleanup)
Support for redirects and various HTTP response types
Comprehensive error handling

`processDocumentsFromUrls(urls, options = {})`

Process multiple URLs in parallel with concurrency control.

const urls = [
    'https://site1.com/doc1.pdf',
    'https://site2.com/doc2.html',
    'https://site3.com/doc3.md'
];

const results = await processor.processDocumentsFromUrls(urls, {
    maxConcurrent: 3,           // Process up to 3 URLs simultaneously
    metadata: { batch: 'import-2024' },
    timeout: 30000,             // 30 second timeout per URL
    retries: 2                  // Retry failed downloads
});

// Returns:
// {
//   successful: [...],         // Array of successfully processed documents
//   failed: [...],            // Array of failed URLs with error details
//   total: 3,
//   successCount: 2,
//   failureCount: 1
// }

Options:

maxConcurrent (number): Maximum concurrent downloads (default: 5)
metadata (object): Metadata applied to all documents
timeout (number): Timeout per URL in milliseconds
retries (number): Number of retry attempts for failed downloads

Error Handling

All methods include comprehensive error handling:

try {
    const result = await processor.processDocumentFromBuffer(buffer, 'test.pdf', 'pdf');
} catch (error) {
    if (error.message.includes('Buffer is empty')) {
        console.log('Empty buffer provided');
    } else if (error.message.includes('Unsupported file type')) {
        console.log('File type not supported for buffer processing');
    } else {
        console.log('Processing error:', error.message);
    }
}

Integration with RAG System

Use processed documents with the RAG system:

import RAGSystem from 'rag-system-pgvector';
import { DocumentProcessor } from 'rag-system-pgvector/utils';

const rag = new RAGSystem(config);
const processor = new DocumentProcessor();

await rag.initialize();

// Process from buffer
const buffer = fs.readFileSync('document.pdf');
const processed = await processor.processDocumentFromBuffer(buffer, 'doc.pdf', 'pdf');

// Add to RAG system
await rag.documentStore.saveDocument(processed);

// Process from URL and add to RAG
const urlProcessed = await processor.processDocumentFromUrl('https://example.com/doc.html');
await rag.documentStore.saveDocument(urlProcessed);

// Now query across all documents
const answer = await rag.query('What information is available?');

🌐 With Web Interface

const rag = new RAGSystem({
    // ... configuration
    server: { port: 3000, enableWebUI: true }
});

await rag.initialize();
await rag.startServer();
// Visit http://localhost:3000

📖 Documentation

📚 Complete Package Documentation - Full API reference and examples
🔧 Integration Guide - Step-by-step integration examples
🎯 Examples - Ready-to-run examples

⚡ Quick Examples

Run the included examples:

# Basic usage example
npm run example:basic

# Web server example  
npm run example:server

# Advanced integration example
npm run example:advanced

# Usage patterns overview
npm run example:patterns

🛠️ Development & Contributing

For local development and contributions:

Prerequisites

Node.js v18+
PostgreSQL v12+ with pgvector extension
OpenAI API Key

Setup

# Clone and install
git clone https://github.com/yourusername/rag-system-pgvector.git
cd rag-system-pgvector
npm install

# Configure environment
cp .env.example .env
# Edit .env with your credentials

# Initialize database
npm run setup

# Start development
npm run dev

Testing

# Run examples
npm run example:basic

# Run with web interface
npm run example:server

curl -X POST http://localhost:3000/documents/upload \
  -F "document=@path/to/your/document.pdf" \
  -F "title=My Document"

Process Document from File Path

curl -X POST http://localhost:3000/documents/process \
  -H "Content-Type: application/json" \
  -d '{
    "filePath": "/path/to/document.pdf",
    "title": "My Document"
  }'

Search/Query

curl -X POST http://localhost:3000/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the main topic of the document?",
    "sessionId": "optional-session-id"
  }'

Get All Documents

curl http://localhost:3000/documents

Get Specific Document

curl http://localhost:3000/documents/{document-id}

Delete Document

curl -X DELETE http://localhost:3000/documents/{document-id}

Command Line Tools

Process Documents from Directory

npm run process-docs /path/to/documents/folder

Interactive Search

npm run search

Single Query Search

npm run search "Your question here"

🏗️ Architecture

System Components

Document Processor (src/utils/documentProcessor.js)
- Extracts text from various file formats
- Splits documents into chunks with configurable overlap
- Generates embeddings using OpenAI
Document Store (src/services/documentStore.js)
- Manages document and chunk storage in PostgreSQL
- Performs vector similarity search using pgvector
- Handles CRUD operations
RAG Workflow (src/workflows/ragWorkflow.js)
- LangGraph-based workflow orchestration
- Three-step process: Retrieve → Rerank → Generate
- Supports conversational context
API Server (src/index.js)
- Express.js REST API
- File upload handling
- Conversation session management

Database Schema

-- Documents table
CREATE TABLE documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  title VARCHAR(255) NOT NULL,
  content TEXT NOT NULL,
  file_path VARCHAR(500),
  file_type VARCHAR(50),
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Document chunks with embeddings
CREATE TABLE document_chunks (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  document_id UUID REFERENCES documents(id) ON DELETE CASCADE,
  chunk_index INTEGER NOT NULL,
  content TEXT NOT NULL,
  embedding vector(1536),
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Search sessions for tracking
CREATE TABLE search_sessions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  query TEXT NOT NULL,
  results JSONB,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Chat Sessions for conversation persistence (NEW)
CREATE TABLE chat_sessions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  session_id VARCHAR(255) UNIQUE NOT NULL,
  user_id VARCHAR(255),
  knowledgebot_id VARCHAR(255),
  history JSONB DEFAULT '[]'::jsonb,
  metadata JSONB DEFAULT '{}'::jsonb,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  last_activity TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  message_count INTEGER DEFAULT 0
);

-- Indexes for chat sessions
CREATE INDEX idx_chat_sessions_session_id ON chat_sessions(session_id);
CREATE INDEX idx_chat_sessions_user_id ON chat_sessions(user_id);
CREATE INDEX idx_chat_sessions_knowledgebot_id ON chat_sessions(knowledgebot_id);
CREATE INDEX idx_chat_sessions_last_activity ON chat_sessions(last_activity);

LangGraph Workflow

graph TD
    A[Query Input] --> B[Retrieve Node]
    B --> C[Rerank Node]
    C --> D[Generate Node]
    D --> E[Response Output]
    
    B --> F[Vector Search]
    F --> G[Similar Chunks]
    
    C --> H[Score Ranking]
    H --> I[Top Chunks]
    
    D --> J[LLM Generation]
    J --> K[Contextual Response]

🔧 Configuration

The RAG system is highly configurable. You can customize every aspect of its behavior through the constructor configuration object.

Complete Configuration Example

import RAGSystem from 'rag-system-pgvector';
import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';

const rag = new RAGSystem({
  // ========================================
  // 1. Database Configuration (Required)
  // ========================================
  database: {
    host: 'localhost',              // Database host
    port: 5432,                     // Database port
    database: 'rag_db',             // Database name
    username: 'postgres',           // Database user
    password: 'your_password',      // Database password
    
    // Connection Pool Settings
    max: 10,                        // Max connections in pool
    min: 0,                         // Min connections in pool
    maxUses: Infinity,              // Max uses per connection
    allowExitOnIdle: false,         // Allow pool to close when idle
    maxLifetimeSeconds: 0,          // Max connection lifetime (0 = unlimited)
    idleTimeoutMillis: 10000        // Idle timeout (10 seconds)
  },

  // ========================================
  // 2. AI Provider Configuration (Required)
  // ========================================
  embeddings: new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY,
    modelName: 'text-embedding-ada-002'
  }),
  
  llm: new ChatOpenAI({
    openAIApiKey: process.env.OPENAI_API_KEY,
    modelName: 'gpt-4',
    temperature: 0.7
  }),

  // ========================================
  // 3. Embedding Configuration
  // ========================================
  embeddingDimensions: 1536,        // Dimensions for embeddings
                                    // OpenAI ada-002: 1536
                                    // HuggingFace MiniLM: 384
                                    // Anthropic: varies

  // ========================================
  // 4. Vector Store Configuration
  // ========================================
  vectorStore: {
    tableName: 'document_chunks_vector',
    vectorColumnName: 'embedding',
    contentColumnName: 'content',
    metadataColumnName: 'metadata'
  },

  // ========================================
  // 5. Document Processing Configuration
  // ========================================
  processing: {
    chunkSize: 1000,                // Characters per chunk
    chunkOverlap: 200               // Overlap between chunks
  },

  // ========================================
  // 6. Chat History Configuration (NEW)
  // ========================================
  chatHistory: {
    enabled: true,                  // Enable chat history feature
    maxMessages: 20,                // Max messages before management kicks in
    maxTokens: 3000,                // Max tokens in chat history
    summarizeThreshold: 30,         // Trigger summarization after N messages
    keepRecentCount: 10,            // Recent messages to preserve
    alwaysKeepFirst: true,          // Always keep conversation starter
    persistSessions: true,          // Store sessions in database
    sessionTimeout: 3600000         // Session timeout (1 hour in ms)
  }
});

await rag.initialize();

Configuration Sections Explained

1. Database Configuration

Controls PostgreSQL connection and pool behavior:

database: {
  host: 'localhost',              // Where PostgreSQL is running
  port: 5432,                     // PostgreSQL port (default: 5432)
  database: 'rag_db',             // Your database name
  username: 'postgres',           // Database user
  password: 'your_password',      // User password
  
  // Pool Settings (Advanced)
  max: 10,                        // Maximum concurrent connections
  min: 0,                         // Minimum idle connections
  idleTimeoutMillis: 10000        // Close idle connections after 10s
}

Best Practices:

Use environment variables for sensitive data
Set max based on your application's concurrency needs
Monitor connection pool usage in production

2. AI Provider Configuration

Specify your embedding and language model providers:

OpenAI Example:

import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';

embeddings: new OpenAIEmbeddings({
  openAIApiKey: process.env.OPENAI_API_KEY,
  modelName: 'text-embedding-ada-002'
}),

llm: new ChatOpenAI({
  openAIApiKey: process.env.OPENAI_API_KEY,
  modelName: 'gpt-4',
  temperature: 0.7
})

Anthropic Example:

import { OpenAIEmbeddings } from '@langchain/openai';
import { ChatAnthropic } from '@langchain/anthropic';

embeddings: new OpenAIEmbeddings({
  openAIApiKey: process.env.OPENAI_API_KEY,
  modelName: 'text-embedding-ada-002'
}),

llm: new ChatAnthropic({
  anthropicApiKey: process.env.ANTHROPIC_API_KEY,
  modelName: 'claude-3-sonnet-20240229',
  temperature: 0.7
})

Local Models Example:

import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers';
import { Ollama } from '@langchain/community/llms/ollama';

embeddings: new HuggingFaceTransformersEmbeddings({
  modelName: 'sentence-transformers/all-MiniLM-L6-v2'
}),

llm: new Ollama({
  baseUrl: 'http://localhost:11434',
  model: 'llama2'
})

3. Embedding Dimensions

Match this to your embedding model's output dimensions:

| Model | Dimensions | Provider | |-------|------------|----------| | text-embedding-ada-002 | 1536 | OpenAI | | all-MiniLM-L6-v2 | 384 | HuggingFace | | text-embedding-3-small | 1536 | OpenAI | | text-embedding-3-large | 3072 | OpenAI |

embeddingDimensions: 1536  // Must match your embedding model

Important: If you change embedding models, you must recreate the database schema!

4. Vector Store Configuration

Customize the vector store table structure:

vectorStore: {
  tableName: 'document_chunks_vector',    // Table name for vectors
  vectorColumnName: 'embedding',          // Column for embeddings
  contentColumnName: 'content',           // Column for text content
  metadataColumnName: 'metadata'          // Column for metadata
}

Most users can use the defaults.

5. Document Processing

Control how documents are chunked:

processing: {
  chunkSize: 1000,      // Characters per chunk (500-2000 recommended)
  chunkOverlap: 200     // Overlap between chunks (10-20% of chunkSize)
}

Guidelines:

Small chunks (500): Better precision, more chunks, higher cost
Large chunks (2000): Better context, fewer chunks, lower cost
Overlap: Prevents context loss at boundaries (typically 10-20%)

Examples:

// For technical documentation (needs precision)
processing: { chunkSize: 800, chunkOverlap: 150 }

// For books/long content (needs context)
processing: { chunkSize: 1500, chunkOverlap: 300 }

// For code documentation (needs structure)
processing: { chunkSize: 1000, chunkOverlap: 200 }

6. Chat History Configuration (NEW in v2.3.0)

Control conversation history management:

chatHistory: {
  enabled: true,                  // Enable/disable chat history
  maxMessages: 20,                // Start management after N messages
  maxTokens: 3000,                // Maximum tokens in history
  summarizeThreshold: 30,         // Summarize after N messages
  keepRecentCount: 10,            // Recent messages to always keep
  alwaysKeepFirst: true,          // Keep conversation starter
  persistSessions: true,          // Store in database
  sessionTimeout: 3600000         // 1 hour timeout (in milliseconds)
}

Chat History Options Explained:

enabled: Master switch for chat history feature
maxMessages: Soft limit before history management activates
maxTokens: Hard limit on token count (prevents API errors)
summarizeThreshold: When to trigger LLM-based summarization
keepRecentCount: Recent messages to preserve during summarization
alwaysKeepFirst: Preserve conversation context from the beginning
persistSessions: Save sessions to database for persistence
sessionTimeout: Milliseconds before session is considered inactive

Preset Configurations:

// Minimal (cost-effective)
chatHistory: {
  enabled: true,
  maxMessages: 10,
  maxTokens: 1500,
  summarizeThreshold: 15,
  keepRecentCount: 5,
  persistSessions: false
}

// Balanced (recommended)
chatHistory: {
  enabled: true,
  maxMessages: 20,
  maxTokens: 3000,
  summarizeThreshold: 30,
  keepRecentCount: 10,
  persistSessions: true
}

// Maximum context (for complex conversations)
chatHistory: {
  enabled: true,
  maxMessages: 40,
  maxTokens: 6000,
  summarizeThreshold: 50,
  keepRecentCount: 20,
  persistSessions: true
}

// Disabled (for single-shot queries)
chatHistory: {
  enabled: false
}

Environment Variables

Create a .env file for sensitive configuration:

# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=rag_db
DB_USER=postgres
DB_PASSWORD=your_secure_password

# OpenAI
OPENAI_API_KEY=sk-...

# Anthropic (optional)
ANTHROPIC_API_KEY=sk-ant-...

# Azure (optional)
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://...

# Processing (optional)
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
EMBEDDING_DIMENSIONS=1536

Then use in your code:

import 'dotenv/config';

const rag = new RAGSystem({
  database: {
    host: process.env.DB_HOST,
    port: parseInt(process.env.DB_PORT),
    database: process.env.DB_NAME,
    username: process.env.DB_USER,
    password: process.env.DB_PASSWORD
  },
  embeddings: new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  }),
  llm: new ChatOpenAI({
    openAIApiKey: process.env.OPENAI_API_KEY
  }),
  embeddingDimensions: parseInt(process.env.EMBEDDING_DIMENSIONS || '1536')
});

Query-Time Configuration

You can also configure behavior at query time:

const result = await rag.query('Your question', {
  // Filtering
  userId: 'user_123',               // Filter by user
  knowledgebotId: 'bot_456',        // Filter by bot
  filter: { category: 'tech' },     // Custom metadata filters
  
  // Retrieval
  limit: 10,                        // Number of chunks to retrieve
  threshold: 0.5,                   // Similarity threshold (0-1)
  
  // Chat History
  chatHistory: previousHistory,     // Previous conversation
  maxHistoryLength: 15,             // Override default history length
  sessionId: 'session_789',         // Session identifier
  persistSession: true,             // Save session to database
  
  // Context
  context: additionalContext,       // Extra context to include
  metadata: { source: 'api' }       // Custom metadata
});

Configuration Best Practices

Security: Never hardcode API keys or passwords
Environment-Specific: Use different configs for dev/staging/prod
Performance: Monitor and adjust based on usage patterns
Cost: Balance context size with API costs
Testing: Test with different configurations to find optimal settings

📊 Performance Optimization

Database Indexes

The system creates optimized indexes:

-- For vector similarity search
CREATE INDEX idx_document_chunks_embedding 
ON document_chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- For document relationships
CREATE INDEX idx_document_chunks_document_id 
ON document_chunks(document_id);

Chunking Strategy

Recursive Character Text Splitter: Preserves semantic boundaries
Configurable overlap: Ensures context continuity
Multiple separators: Prioritizes paragraph, sentence, then word boundaries

🧪 Testing

Test Document Processing

# Create test documents directory
mkdir test-docs

# Add some test files (PDF, DOCX, TXT, etc.)
# Then process them
npm run process-docs ./test-docs

Test Search

# Interactive search
npm run search

# Or single query
npm run search "What is machine learning?"

🔍 Troubleshooting

Common Issues

pgvector extension not found

-- Install pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

OpenAI API quota exceeded
- Check your OpenAI API usage
- Consider using alternative embedding models
Large document processing fails
- Increase chunk size or reduce document size
- Check memory limits
Poor search results
- Lower similarity threshold
- Adjust chunk size and overlap
- Verify document content quality

Debug Mode

Enable verbose logging by setting:

NODE_ENV=development

🤝 Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

LangChain for the excellent AI/ML framework
pgvector for vector similarity search
OpenAI for embedding and language models