npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

rag-system-pgvector

v2.4.9

Published

A complete Retrieval-Augmented Generation system using pgvector, LangChain, and LangGraph for Node.js applications with dynamic embedding and model providers, structured data queries, and chat history - supports OpenAI, Anthropic, HuggingFace, Azure, Goog

Readme

RAG System Package

npm version License: MIT

A production-ready Retrieval-Augmented Generation (RAG) system package built with PostgreSQL pgvector, LangChain, and LangGraph. Supports multiple AI providers including OpenAI, Anthropic, HuggingFace, Azure, Google AI, and local models.

🚀 Features

  • 📦 Easy Integration: Simple npm install and ready-to-use API
  • 🤖 Multi-Provider Support: OpenAI, Anthropic, HuggingFace, Azure, Google AI, Ollama
  • 📚 Multi-format Support: PDF, DOCX, TXT, HTML, Markdown, JSON
  • 🔍 Vector Search: High-performance similarity search with pgvector
  • 🎯 Structured Data Queries: Accept JSON data for precise, contextual responses
  • 💬 Chat History Support: Full conversation memory with summarization
  • ⚡ Production Ready: Error handling, connection pooling, monitoring
  • 🔧 Flexible Configuration: Choose your preferred embedding and LLM providers
  • 💾 Buffer Processing: Process documents directly from memory buffers
  • 🌐 URL Processing: Download and process documents from web URLs
  • 📊 Batch Operations: Efficient processing of multiple documents

📦 Installation

npm install rag-system-pgvector

# Choose your AI provider (one or more):
npm install @langchain/openai          # For OpenAI
npm install @langchain/anthropic       # For Anthropic Claude
npm install @langchain/azure-openai    # For Azure OpenAI
npm install @langchain/google-genai    # For Google AI
npm install @langchain/community       # For HuggingFace, Ollama, etc.

🚀 Quick Start

OpenAI Provider (Traditional)

import { RAGSystem } from 'rag-system-pgvector';
import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';

// Create provider instances
const embeddings = new OpenAIEmbeddings({
  openAIApiKey: 'your-openai-api-key',
  modelName: 'text-embedding-ada-002',
});

const llm = new ChatOpenAI({
  openAIApiKey: 'your-openai-api-key',
  modelName: 'gpt-4',
  temperature: 0.7,
});

// Initialize RAG system
const rag = new RAGSystem({
  database: {
    host: 'localhost',
    database: 'your_db',
    username: 'postgres',
    password: 'your_password'
  },
  embeddings: embeddings,
  llm: llm,
  embeddingDimensions: 1536,
});

await rag.initialize();

// Add documents and query
await rag.addDocuments(['./docs/file1.pdf', './docs/file2.txt']);

// Simple query
const result = await rag.query("What is the main topic?");
console.log(result.answer);

// Query with structured data for precise responses
const structuredResult = await rag.query("Tell me about iPhone features", {
  structuredData: {
    intent: "product_information",
    entities: { product: "iPhone", category: "smartphone" },
    constraints: ["Focus on latest features", "Include specifications"],
    responseFormat: "structured_list"
  }
});
console.log(structuredResult.answer);

Mixed Providers (Advanced)

import { RAGSystem } from 'rag-system-pgvector';
import { OpenAIEmbeddings } from '@langchain/openai';
import { ChatAnthropic } from '@langchain/anthropic';

// Use OpenAI for embeddings, Anthropic for chat
const embeddings = new OpenAIEmbeddings({
  openAIApiKey: 'your-openai-api-key',
  modelName: 'text-embedding-ada-002',
});

const llm = new ChatAnthropic({
  anthropicApiKey: 'your-anthropic-api-key',
  modelName: 'claude-3-haiku-20240307',
  temperature: 0.7,
});

const rag = new RAGSystem({
  database: { /* your config */ },
  embeddings: embeddings,
  llm: llm,
  embeddingDimensions: 1536,
});

Local Models (Privacy-First)

import { RAGSystem } from 'rag-system-pgvector';
import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers';
import { Ollama } from '@langchain/community/llms/ollama';

// Use local models (no API keys required)
const embeddings = new HuggingFaceTransformersEmbeddings({
  modelName: 'sentence-transformers/all-MiniLM-L6-v2',
});

const llm = new Ollama({
  baseUrl: 'http://localhost:11434',
  model: 'llama2',
});

const rag = new RAGSystem({
  database: { /* your config */ },
  embeddings: embeddings,
  llm: llm,
  embeddingDimensions: 384, // all-MiniLM-L6-v2 dimensions
});

Buffer Processing (New in v1.1.0)

import { DocumentProcessor } from 'rag-system-pgvector/utils';

const processor = new DocumentProcessor();

// Process document from Buffer
const buffer = fs.readFileSync('document.pdf');
const result = await processor.processDocumentFromBuffer(
    buffer, 
    'document.pdf', 
    'pdf',
    { source: 'api-upload', category: 'research' }
);

console.log(result.chunks); // Processed chunks with embeddings

URL Processing (New in v1.1.0)

import { DocumentProcessor } from 'rag-system-pgvector/utils';

const processor = new DocumentProcessor();

// Process single URL
const result = await processor.processDocumentFromUrl(
    'https://example.com/document.pdf',
    { source: 'web-crawl', priority: 'high' }
);

// Process multiple URLs
const urls = [
    'https://example.com/doc1.pdf',
    'https://example.com/doc2.html',
    'https://example.com/doc3.md'
];

const results = await processor.processDocumentsFromUrls(urls, {
    source: 'batch-import',
    maxConcurrent: 3
});

console.log(`Processed ${results.successful.length} documents`);

🎯 Structured Data Queries (New in v2.2.0)

The RAG system now supports structured JSON data alongside natural language queries for more precise and contextual responses.

Basic Structured Query

const result = await rag.query("Tell me about iPhone features", {
  structuredData: {
    intent: "product_information",
    entities: {
      product: "iPhone",
      category: "smartphone",
      brand: "Apple"
    },
    constraints: [
      "Focus on latest model features",
      "Include technical specifications"
    ],
    context: {
      userType: "potential_buyer",
      priceRange: "premium"
    },
    responseFormat: "structured_list"
  }
});

Troubleshooting Query

const result = await rag.query("My device won't connect to WiFi", {
  structuredData: {
    intent: "troubleshooting",
    entities: {
      issue_type: "connectivity",
      device_category: "mobile",
      problem_area: "wifi"
    },
    constraints: [
      "Provide step-by-step solution",
      "Include alternative methods"
    ],
    responseFormat: "step_by_step_guide"
  }
});

Comparison Query

const result = await rag.query("Compare iPhone vs Samsung Galaxy", {
  structuredData: {
    intent: "comparison",
    entities: {
      item1: "iPhone",
      item2: "Samsung Galaxy"
    },
    constraints: [
      "Compare key specifications",
      "Highlight main differences"
    ],
    responseFormat: "comparison_table"
  }
});

Combined with Chat History

const result = await rag.query("What about the camera quality?", {
  chatHistory: [
    { role: 'user', content: 'Tell me about iPhone features' },
    { role: 'assistant', content: 'The iPhone offers excellent features...' }
  ],
  structuredData: {
    intent: "follow_up_question",
    entities: {
      topic: "camera",
      context_reference: "previous_iphone_discussion"
    },
    responseFormat: "detailed_explanation"
  }
});

Structured Data Schema

interface StructuredData {
  intent: string;                    // Query intent/category (required)
  entities?: {                       // Named entities and values
    [key: string]: string | number;
  };
  constraints?: string[];            // Requirements/constraints
  context?: {                        // Additional context
    [key: string]: string | number | boolean;
  };
  responseFormat?: string;           // Desired response format
}

Common Intents

  • product_information - Product details and specifications
  • troubleshooting - Problem-solving and technical support
  • comparison - Comparing multiple items
  • how_to_guide - Step-by-step instructions
  • explanation - Detailed explanations
  • follow_up_question - Context-aware follow-ups

Response Formats

  • structured_list - Organized bullet points
  • step_by_step_guide - Numbered instructions
  • comparison_table - Side-by-side comparison
  • detailed_explanation - Comprehensive explanation
  • bullet_points - Simple bullet format
  • json_format - Structured JSON response

Advanced Filtering (New in v2.1.0)

import RAGSystem from 'rag-system-pgvector';

const rag = new RAGSystem(config);
await rag.initialize();

// Add documents with user/knowledgebot metadata
const documentData = await processor.processDocumentFromBuffer(
    buffer, 
    'user-manual.pdf', 
    'pdf',
    {
        userId: 'user_123',
        knowledgebotId: 'tech_support_bot',
        department: 'engineering',
        priority: 'high'
    }
);

await rag.documentStore.saveDocument(documentData);

// Query with user filtering
const userResults = await rag.query('What technical info is available?', {
    userId: 'user_123',
    limit: 5
});

// Query with knowledgebot filtering
const botResults = await rag.query('Help with technical issues', {
    knowledgebotId: 'tech_support_bot'
});

// Query with multiple filters
const filteredResults = await rag.query('Show important documents', {
    userId: 'user_123',
    filter: {
        priority: 'high',
        department: 'engineering'
    }
});

// Direct search with filtering
const searchResults = await rag.searchDocumentsByUserId(
    'documentation',
    'user_123'
);

// Get all documents for a specific user
const userDocs = await rag.getDocumentsByUserId('user_123');

Chat History & Session Persistence (New in v2.3.0)

Enable multi-turn conversations with persistent chat history stored in PostgreSQL.

Basic Chat History

// First query
const result1 = await rag.query('What is machine learning?');

// Follow-up with context
const result2 = await rag.query('Can you give me examples?', {
    chatHistory: result1.chatHistory
});

// Another follow-up
const result3 = await rag.query('Which one is most popular?', {
    chatHistory: result2.chatHistory
});

Session Persistence

const sessionId = 'user_conversation_123';

// Query with automatic session save/load
const result = await rag.query('What is machine learning?', {
    sessionId: sessionId,
    persistSession: true,  // Auto-save after query
    userId: 'user_456',
    knowledgebotId: 'tech_bot'
});

// Continue conversation (automatically loads history)
const result2 = await rag.query('Tell me more', {
    sessionId: sessionId,
    persistSession: true
});

// Load session manually
const session = await rag.loadSession(sessionId);
console.log(`Session has ${session.messageCount} messages`);

// Get all user sessions
const userSessions = await rag.getUserSessions('user_456');
console.log(`User has ${userSessions.length} sessions`);

// Get session statistics
const stats = await rag.getSessionStats({ userId: 'user_456' });
console.log(`Total messages: ${stats.totalMessages}`);

History Summarization

// Long conversations are automatically managed
const result = await rag.query('Complex question', {
    sessionId: sessionId,
    persistSession: true,
    maxHistoryLength: 20  // Keeps recent 20 messages
});

Testing Chat Features

# Basic chat history
npm run test:chat:basic

# Session management
npm run test:chat:session

# History summarization
npm run test:chat:summarization

# Session persistence
npm run test:chat:persistence

Documentation:

📚 API Documentation

DocumentProcessor Class

The DocumentProcessor class provides powerful document processing capabilities for files, buffers, and URLs.

Buffer Processing Methods

processDocumentFromBuffer(buffer, fileName, fileType, metadata = {})

Process a document directly from a memory buffer.

import { DocumentProcessor } from 'rag-system-pgvector/utils';

const processor = new DocumentProcessor();
const buffer = Buffer.from('This is a test document', 'utf8');

const result = await processor.processDocumentFromBuffer(
    buffer,
    'test.txt',
    'txt',
    { source: 'api', category: 'test' }
);

// Returns:
// {
//   title: 'Test Document',
//   content: 'This is a test document',
//   chunks: [...], // Array of processed chunks with embeddings
//   metadata: { ... },
//   fileType: 'txt',
//   filePath: 'test.txt'
// }

Parameters:

  • buffer (Buffer): The document content as a Buffer object
  • fileName (string): Name of the file (used for metadata)
  • fileType (string): File type ('pdf', 'docx', 'txt', 'html', 'md', 'json')
  • metadata (object): Additional metadata to attach to the document

Supported Buffer Types:

  • TXT: Plain text files
  • HTML: HTML documents (extracts text content)
  • Markdown: Markdown files
  • JSON: JSON files (converts to readable text)
extractTextFromBuffer(buffer, fileType)

Extract raw text from a buffer without processing into chunks.

const text = await processor.extractTextFromBuffer(buffer, 'html');
console.log(text); // Extracted plain text

URL Processing Methods

processDocumentFromUrl(url, metadata = {})

Download and process a document from a URL.

const result = await processor.processDocumentFromUrl(
    'https://example.com/document.pdf',
    { 
        source: 'web-crawl',
        priority: 'high',
        category: 'research' 
    }
);

// Automatically detects file type from URL and content headers
// Downloads to temp directory and processes

Parameters:

  • url (string): HTTP/HTTPS URL to download from
  • metadata (object): Additional metadata for the document

Features:

  • Automatic file type detection from URL extension and Content-Type headers
  • Temporary file handling (auto-cleanup)
  • Support for redirects and various HTTP response types
  • Comprehensive error handling
processDocumentsFromUrls(urls, options = {})

Process multiple URLs in parallel with concurrency control.

const urls = [
    'https://site1.com/doc1.pdf',
    'https://site2.com/doc2.html',
    'https://site3.com/doc3.md'
];

const results = await processor.processDocumentsFromUrls(urls, {
    maxConcurrent: 3,           // Process up to 3 URLs simultaneously
    metadata: { batch: 'import-2024' },
    timeout: 30000,             // 30 second timeout per URL
    retries: 2                  // Retry failed downloads
});

// Returns:
// {
//   successful: [...],         // Array of successfully processed documents
//   failed: [...],            // Array of failed URLs with error details
//   total: 3,
//   successCount: 2,
//   failureCount: 1
// }

Options:

  • maxConcurrent (number): Maximum concurrent downloads (default: 5)
  • metadata (object): Metadata applied to all documents
  • timeout (number): Timeout per URL in milliseconds
  • retries (number): Number of retry attempts for failed downloads

Error Handling

All methods include comprehensive error handling:

try {
    const result = await processor.processDocumentFromBuffer(buffer, 'test.pdf', 'pdf');
} catch (error) {
    if (error.message.includes('Buffer is empty')) {
        console.log('Empty buffer provided');
    } else if (error.message.includes('Unsupported file type')) {
        console.log('File type not supported for buffer processing');
    } else {
        console.log('Processing error:', error.message);
    }
}

Integration with RAG System

Use processed documents with the RAG system:

import RAGSystem from 'rag-system-pgvector';
import { DocumentProcessor } from 'rag-system-pgvector/utils';

const rag = new RAGSystem(config);
const processor = new DocumentProcessor();

await rag.initialize();

// Process from buffer
const buffer = fs.readFileSync('document.pdf');
const processed = await processor.processDocumentFromBuffer(buffer, 'doc.pdf', 'pdf');

// Add to RAG system
await rag.documentStore.saveDocument(processed);

// Process from URL and add to RAG
const urlProcessed = await processor.processDocumentFromUrl('https://example.com/doc.html');
await rag.documentStore.saveDocument(urlProcessed);

// Now query across all documents
const answer = await rag.query('What information is available?');

🌐 With Web Interface

const rag = new RAGSystem({
    // ... configuration
    server: { port: 3000, enableWebUI: true }
});

await rag.initialize();
await rag.startServer();
// Visit http://localhost:3000

📖 Documentation

⚡ Quick Examples

Run the included examples:

# Basic usage example
npm run example:basic

# Web server example  
npm run example:server

# Advanced integration example
npm run example:advanced

# Usage patterns overview
npm run example:patterns

🛠️ Development & Contributing

For local development and contributions:

Prerequisites

  • Node.js v18+
  • PostgreSQL v12+ with pgvector extension
  • OpenAI API Key

Setup

# Clone and install
git clone https://github.com/yourusername/rag-system-pgvector.git
cd rag-system-pgvector
npm install

# Configure environment
cp .env.example .env
# Edit .env with your credentials

# Initialize database
npm run setup

# Start development
npm run dev

Testing

# Run examples
npm run example:basic

# Run with web interface
npm run example:server
curl -X POST http://localhost:3000/documents/upload \
  -F "document=@path/to/your/document.pdf" \
  -F "title=My Document"

Process Document from File Path

curl -X POST http://localhost:3000/documents/process \
  -H "Content-Type: application/json" \
  -d '{
    "filePath": "/path/to/document.pdf",
    "title": "My Document"
  }'

Search/Query

curl -X POST http://localhost:3000/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the main topic of the document?",
    "sessionId": "optional-session-id"
  }'

Get All Documents

curl http://localhost:3000/documents

Get Specific Document

curl http://localhost:3000/documents/{document-id}

Delete Document

curl -X DELETE http://localhost:3000/documents/{document-id}

Command Line Tools

Process Documents from Directory

npm run process-docs /path/to/documents/folder

Interactive Search

npm run search

Single Query Search

npm run search "Your question here"

🏗️ Architecture

System Components

  1. Document Processor (src/utils/documentProcessor.js)

    • Extracts text from various file formats
    • Splits documents into chunks with configurable overlap
    • Generates embeddings using OpenAI
  2. Document Store (src/services/documentStore.js)

    • Manages document and chunk storage in PostgreSQL
    • Performs vector similarity search using pgvector
    • Handles CRUD operations
  3. RAG Workflow (src/workflows/ragWorkflow.js)

    • LangGraph-based workflow orchestration
    • Three-step process: Retrieve → Rerank → Generate
    • Supports conversational context
  4. API Server (src/index.js)

    • Express.js REST API
    • File upload handling
    • Conversation session management

Database Schema

-- Documents table
CREATE TABLE documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  title VARCHAR(255) NOT NULL,
  content TEXT NOT NULL,
  file_path VARCHAR(500),
  file_type VARCHAR(50),
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Document chunks with embeddings
CREATE TABLE document_chunks (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  document_id UUID REFERENCES documents(id) ON DELETE CASCADE,
  chunk_index INTEGER NOT NULL,
  content TEXT NOT NULL,
  embedding vector(1536),
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Search sessions for tracking
CREATE TABLE search_sessions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  query TEXT NOT NULL,
  results JSONB,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Chat Sessions for conversation persistence (NEW)
CREATE TABLE chat_sessions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  session_id VARCHAR(255) UNIQUE NOT NULL,
  user_id VARCHAR(255),
  knowledgebot_id VARCHAR(255),
  history JSONB DEFAULT '[]'::jsonb,
  metadata JSONB DEFAULT '{}'::jsonb,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  last_activity TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  message_count INTEGER DEFAULT 0
);

-- Indexes for chat sessions
CREATE INDEX idx_chat_sessions_session_id ON chat_sessions(session_id);
CREATE INDEX idx_chat_sessions_user_id ON chat_sessions(user_id);
CREATE INDEX idx_chat_sessions_knowledgebot_id ON chat_sessions(knowledgebot_id);
CREATE INDEX idx_chat_sessions_last_activity ON chat_sessions(last_activity);

LangGraph Workflow

graph TD
    A[Query Input] --> B[Retrieve Node]
    B --> C[Rerank Node]
    C --> D[Generate Node]
    D --> E[Response Output]
    
    B --> F[Vector Search]
    F --> G[Similar Chunks]
    
    C --> H[Score Ranking]
    H --> I[Top Chunks]
    
    D --> J[LLM Generation]
    J --> K[Contextual Response]

🔧 Configuration

The RAG system is highly configurable. You can customize every aspect of its behavior through the constructor configuration object.

Complete Configuration Example

import RAGSystem from 'rag-system-pgvector';
import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';

const rag = new RAGSystem({
  // ========================================
  // 1. Database Configuration (Required)
  // ========================================
  database: {
    host: 'localhost',              // Database host
    port: 5432,                     // Database port
    database: 'rag_db',             // Database name
    username: 'postgres',           // Database user
    password: 'your_password',      // Database password
    
    // Connection Pool Settings
    max: 10,                        // Max connections in pool
    min: 0,                         // Min connections in pool
    maxUses: Infinity,              // Max uses per connection
    allowExitOnIdle: false,         // Allow pool to close when idle
    maxLifetimeSeconds: 0,          // Max connection lifetime (0 = unlimited)
    idleTimeoutMillis: 10000        // Idle timeout (10 seconds)
  },

  // ========================================
  // 2. AI Provider Configuration (Required)
  // ========================================
  embeddings: new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY,
    modelName: 'text-embedding-ada-002'
  }),
  
  llm: new ChatOpenAI({
    openAIApiKey: process.env.OPENAI_API_KEY,
    modelName: 'gpt-4',
    temperature: 0.7
  }),

  // ========================================
  // 3. Embedding Configuration
  // ========================================
  embeddingDimensions: 1536,        // Dimensions for embeddings
                                    // OpenAI ada-002: 1536
                                    // HuggingFace MiniLM: 384
                                    // Anthropic: varies

  // ========================================
  // 4. Vector Store Configuration
  // ========================================
  vectorStore: {
    tableName: 'document_chunks_vector',
    vectorColumnName: 'embedding',
    contentColumnName: 'content',
    metadataColumnName: 'metadata'
  },

  // ========================================
  // 5. Document Processing Configuration
  // ========================================
  processing: {
    chunkSize: 1000,                // Characters per chunk
    chunkOverlap: 200               // Overlap between chunks
  },

  // ========================================
  // 6. Chat History Configuration (NEW)
  // ========================================
  chatHistory: {
    enabled: true,                  // Enable chat history feature
    maxMessages: 20,                // Max messages before management kicks in
    maxTokens: 3000,                // Max tokens in chat history
    summarizeThreshold: 30,         // Trigger summarization after N messages
    keepRecentCount: 10,            // Recent messages to preserve
    alwaysKeepFirst: true,          // Always keep conversation starter
    persistSessions: true,          // Store sessions in database
    sessionTimeout: 3600000         // Session timeout (1 hour in ms)
  }
});

await rag.initialize();

Configuration Sections Explained

1. Database Configuration

Controls PostgreSQL connection and pool behavior:

database: {
  host: 'localhost',              // Where PostgreSQL is running
  port: 5432,                     // PostgreSQL port (default: 5432)
  database: 'rag_db',             // Your database name
  username: 'postgres',           // Database user
  password: 'your_password',      // User password
  
  // Pool Settings (Advanced)
  max: 10,                        // Maximum concurrent connections
  min: 0,                         // Minimum idle connections
  idleTimeoutMillis: 10000        // Close idle connections after 10s
}

Best Practices:

  • Use environment variables for sensitive data
  • Set max based on your application's concurrency needs
  • Monitor connection pool usage in production

2. AI Provider Configuration

Specify your embedding and language model providers:

OpenAI Example:

import { OpenAIEmbeddings, ChatOpenAI } from '@langchain/openai';

embeddings: new OpenAIEmbeddings({
  openAIApiKey: process.env.OPENAI_API_KEY,
  modelName: 'text-embedding-ada-002'
}),

llm: new ChatOpenAI({
  openAIApiKey: process.env.OPENAI_API_KEY,
  modelName: 'gpt-4',
  temperature: 0.7
})

Anthropic Example:

import { OpenAIEmbeddings } from '@langchain/openai';
import { ChatAnthropic } from '@langchain/anthropic';

embeddings: new OpenAIEmbeddings({
  openAIApiKey: process.env.OPENAI_API_KEY,
  modelName: 'text-embedding-ada-002'
}),

llm: new ChatAnthropic({
  anthropicApiKey: process.env.ANTHROPIC_API_KEY,
  modelName: 'claude-3-sonnet-20240229',
  temperature: 0.7
})

Local Models Example:

import { HuggingFaceTransformersEmbeddings } from '@langchain/community/embeddings/hf_transformers';
import { Ollama } from '@langchain/community/llms/ollama';

embeddings: new HuggingFaceTransformersEmbeddings({
  modelName: 'sentence-transformers/all-MiniLM-L6-v2'
}),

llm: new Ollama({
  baseUrl: 'http://localhost:11434',
  model: 'llama2'
})

3. Embedding Dimensions

Match this to your embedding model's output dimensions:

| Model | Dimensions | Provider | |-------|------------|----------| | text-embedding-ada-002 | 1536 | OpenAI | | all-MiniLM-L6-v2 | 384 | HuggingFace | | text-embedding-3-small | 1536 | OpenAI | | text-embedding-3-large | 3072 | OpenAI |

embeddingDimensions: 1536  // Must match your embedding model

Important: If you change embedding models, you must recreate the database schema!

4. Vector Store Configuration

Customize the vector store table structure:

vectorStore: {
  tableName: 'document_chunks_vector',    // Table name for vectors
  vectorColumnName: 'embedding',          // Column for embeddings
  contentColumnName: 'content',           // Column for text content
  metadataColumnName: 'metadata'          // Column for metadata
}

Most users can use the defaults.

5. Document Processing

Control how documents are chunked:

processing: {
  chunkSize: 1000,      // Characters per chunk (500-2000 recommended)
  chunkOverlap: 200     // Overlap between chunks (10-20% of chunkSize)
}

Guidelines:

  • Small chunks (500): Better precision, more chunks, higher cost
  • Large chunks (2000): Better context, fewer chunks, lower cost
  • Overlap: Prevents context loss at boundaries (typically 10-20%)

Examples:

// For technical documentation (needs precision)
processing: { chunkSize: 800, chunkOverlap: 150 }

// For books/long content (needs context)
processing: { chunkSize: 1500, chunkOverlap: 300 }

// For code documentation (needs structure)
processing: { chunkSize: 1000, chunkOverlap: 200 }

6. Chat History Configuration (NEW in v2.3.0)

Control conversation history management:

chatHistory: {
  enabled: true,                  // Enable/disable chat history
  maxMessages: 20,                // Start management after N messages
  maxTokens: 3000,                // Maximum tokens in history
  summarizeThreshold: 30,         // Summarize after N messages
  keepRecentCount: 10,            // Recent messages to always keep
  alwaysKeepFirst: true,          // Keep conversation starter
  persistSessions: true,          // Store in database
  sessionTimeout: 3600000         // 1 hour timeout (in milliseconds)
}

Chat History Options Explained:

  • enabled: Master switch for chat history feature
  • maxMessages: Soft limit before history management activates
  • maxTokens: Hard limit on token count (prevents API errors)
  • summarizeThreshold: When to trigger LLM-based summarization
  • keepRecentCount: Recent messages to preserve during summarization
  • alwaysKeepFirst: Preserve conversation context from the beginning
  • persistSessions: Save sessions to database for persistence
  • sessionTimeout: Milliseconds before session is considered inactive

Preset Configurations:

// Minimal (cost-effective)
chatHistory: {
  enabled: true,
  maxMessages: 10,
  maxTokens: 1500,
  summarizeThreshold: 15,
  keepRecentCount: 5,
  persistSessions: false
}

// Balanced (recommended)
chatHistory: {
  enabled: true,
  maxMessages: 20,
  maxTokens: 3000,
  summarizeThreshold: 30,
  keepRecentCount: 10,
  persistSessions: true
}

// Maximum context (for complex conversations)
chatHistory: {
  enabled: true,
  maxMessages: 40,
  maxTokens: 6000,
  summarizeThreshold: 50,
  keepRecentCount: 20,
  persistSessions: true
}

// Disabled (for single-shot queries)
chatHistory: {
  enabled: false
}

Environment Variables

Create a .env file for sensitive configuration:

# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=rag_db
DB_USER=postgres
DB_PASSWORD=your_secure_password

# OpenAI
OPENAI_API_KEY=sk-...

# Anthropic (optional)
ANTHROPIC_API_KEY=sk-ant-...

# Azure (optional)
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://...

# Processing (optional)
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
EMBEDDING_DIMENSIONS=1536

Then use in your code:

import 'dotenv/config';

const rag = new RAGSystem({
  database: {
    host: process.env.DB_HOST,
    port: parseInt(process.env.DB_PORT),
    database: process.env.DB_NAME,
    username: process.env.DB_USER,
    password: process.env.DB_PASSWORD
  },
  embeddings: new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY
  }),
  llm: new ChatOpenAI({
    openAIApiKey: process.env.OPENAI_API_KEY
  }),
  embeddingDimensions: parseInt(process.env.EMBEDDING_DIMENSIONS || '1536')
});

Query-Time Configuration

You can also configure behavior at query time:

const result = await rag.query('Your question', {
  // Filtering
  userId: 'user_123',               // Filter by user
  knowledgebotId: 'bot_456',        // Filter by bot
  filter: { category: 'tech' },     // Custom metadata filters
  
  // Retrieval
  limit: 10,                        // Number of chunks to retrieve
  threshold: 0.5,                   // Similarity threshold (0-1)
  
  // Chat History
  chatHistory: previousHistory,     // Previous conversation
  maxHistoryLength: 15,             // Override default history length
  sessionId: 'session_789',         // Session identifier
  persistSession: true,             // Save session to database
  
  // Context
  context: additionalContext,       // Extra context to include
  metadata: { source: 'api' }       // Custom metadata
});

Configuration Best Practices

  1. Security: Never hardcode API keys or passwords
  2. Environment-Specific: Use different configs for dev/staging/prod
  3. Performance: Monitor and adjust based on usage patterns
  4. Cost: Balance context size with API costs
  5. Testing: Test with different configurations to find optimal settings

📊 Performance Optimization

Database Indexes

The system creates optimized indexes:

-- For vector similarity search
CREATE INDEX idx_document_chunks_embedding 
ON document_chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- For document relationships
CREATE INDEX idx_document_chunks_document_id 
ON document_chunks(document_id);

Chunking Strategy

  • Recursive Character Text Splitter: Preserves semantic boundaries
  • Configurable overlap: Ensures context continuity
  • Multiple separators: Prioritizes paragraph, sentence, then word boundaries

🧪 Testing

Test Document Processing

# Create test documents directory
mkdir test-docs

# Add some test files (PDF, DOCX, TXT, etc.)
# Then process them
npm run process-docs ./test-docs

Test Search

# Interactive search
npm run search

# Or single query
npm run search "What is machine learning?"

🔍 Troubleshooting

Common Issues

  1. pgvector extension not found

    -- Install pgvector extension
    CREATE EXTENSION IF NOT EXISTS vector;
  2. OpenAI API quota exceeded

    • Check your OpenAI API usage
    • Consider using alternative embedding models
  3. Large document processing fails

    • Increase chunk size or reduce document size
    • Check memory limits
  4. Poor search results

    • Lower similarity threshold
    • Adjust chunk size and overlap
    • Verify document content quality

Debug Mode

Enable verbose logging by setting:

NODE_ENV=development

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • LangChain for the excellent AI/ML framework
  • pgvector for vector similarity search
  • OpenAI for embedding and language models

📚 Additional Resources