pgvector-rag
v0.1.0
Published
Lightweight RAG toolkit for pgvector — chunking, hybrid search SQL, MMR, RRF, and more. Zero runtime dependencies.
Maintainers
Readme
pgvector-rag
Lightweight RAG toolkit for PostgreSQL + pgvector. Zero runtime dependencies.
Extracted from a production RAG pipeline serving thousands of queries. Provides the algorithms and SQL you need without the framework lock-in.
Why this exists
| | pgvector-rag | LangChain | LlamaIndex | |---|---|---|---| | Runtime deps | 0 | 50+ | 40+ | | Bundle size | ~15 KB | ~2 MB | ~1.5 MB | | DB lock-in | pgvector only | Many adapters | Many adapters | | Chunking | Built-in | Built-in | Built-in | | Hybrid search SQL | Yes | No (needs driver) | No | | MMR | Yes | Yes | Yes | | RRF fusion | Yes | No | No | | Bring your own DB client | Yes | No | No |
Install
npm install pgvector-ragQuick Start
1. Chunk a document
import { chunk } from 'pgvector-rag';
const chunks = chunk(documentText, {
maxChunkChars: 1200,
overlapChars: 250,
});
// chunks = [{ index: 0, content: '...', type: 'heading' }, ...]2. Create the table
import { createChunksTableSQL, createIndexesSQL } from 'pgvector-rag/sql';
import pg from 'pg';
const pool = new pg.Pool({ connectionString: DATABASE_URL });
const { text: createTable } = createChunksTableSQL({ dimensions: 1536 });
await pool.query(createTable);
for (const { text } of createIndexesSQL()) {
await pool.query(text);
}3. Upsert chunks with embeddings
import { upsertChunksSQL } from 'pgvector-rag/sql';
const records = chunks.map((c, i) => ({
id: crypto.randomUUID(),
documentId: 'doc-123',
chunkIndex: c.index,
content: c.content,
embedding: embeddings[i], // from your embedding API
metadata: { chunk_type: c.type },
}));
const { text, params } = upsertChunksSQL(records);
await pool.query(text, params);4. Search with hybrid SQL
import { hybridSearchSQL } from 'pgvector-rag/sql';
import { selectMMR, buildContext, normalizeScores } from 'pgvector-rag';
// Generate the search SQL
const { text, params } = hybridSearchSQL({
documentId: 'doc-123',
queryText: 'How does photosynthesis work?',
embedding: queryEmbedding, // from your embedding API
limit: 50,
});
// Execute with your DB client
const { rows } = await pool.query(text, params);
// Map to ScoredChunks
const scored = normalizeScores(rows.map(r => ({
rrfScore: r.rrf_score,
id: r.id,
chunkIndex: r.chunk_index,
content: r.content,
embedding: r.embedding, // if you fetched it
})));
// Diversify with MMR
const selected = selectMMR(scored, 10, 0.7);
// Build context string for your LLM
const context = buildContext(selected, 5000);API Reference
Chunking
chunk(text, options?)
Split text into chunks with section-awareness, sentence boundaries, and overlap.
chunk(text: string, options?: {
maxChunkChars?: number; // default: 1200
overlapChars?: number; // default: 250
maxChunks?: number; // default: Infinity
}): Chunk[]sanitizeText(text)
Strip null bytes and control characters.
detectChunkType(content)
Classify a chunk as 'heading', 'list', or 'paragraph'.
Vector Math
cosineSimilarity(a, b)
Cosine similarity between two vectors. Returns [-1, 1].
l2Normalize(vector)
L2-normalize a vector to unit length. Returns a new array.
MMR (Maximal Marginal Relevance)
selectMMR(candidates, k, lambda)
Select k chunks balancing relevance and diversity.
lambda = 1.0→ pure relevance (no diversity)lambda = 0.0→ pure diversity (ignore scores)lambda = 0.7→ good default for QAlambda = 0.5→ good default for summaries
Falls back to Jaccard token similarity when embeddings are absent.
Context Building
buildContext(chunks, maxChars)
Format chunks into an LLM context string. Sorts by index, adds --- gap separators, respects character budget.
Scoring
normalizeScores(chunkRows)
Convert raw ChunkRow objects (from hybrid search) into ScoredChunk objects.
deduplicateByIndex(items)
Keep highest-scoring entry per chunkIndex.
Summary Sampling
selectSummaryRepresentatives(candidates, bucketSize?, maxReps?)
Pick one representative per document section for broad coverage.
Query Classification
classifyQueryType(query)
Regex-based classification: 'instructional', 'informational', or 'definitional'.
isInstructionalQuery(query)
Quick boolean check.
RRF (Reciprocal Rank Fusion)
getRRFWeights(query, queryType?, config?)
Get RRF signal weights tuned for the query type.
getThresholds(queryType?, config?)
Get similarity/BM25 thresholds for the query type.
Legacy Reranker
legacyRerank(query, candidates, topN)
Term-frequency + proximity reranker. Use as a fallback when a cross-encoder (Cohere, etc.) is unavailable.
Configuration
createConfig(overrides?)
Create a RAGConfig with sensible production defaults, optionally overriding specific values.
DEFAULT_CONFIG
Frozen default config with 25+ tuning knobs. See src/core/config.ts.
Concurrency
new Semaphore(max)
Counting semaphore for rate-limiting concurrent operations (e.g., embedding API calls).
const sem = new Semaphore(4);
await sem.acquire();
try { /* work */ } finally { sem.release(); }SQL Generators (pgvector-rag/sql)
hybridSearchSQL(options)
3-CTE query combining vector similarity + BM25 + phrase matching via RRF.
createChunksTableSQL(options?)
CREATE TABLE with vector column, tsvector, and unique constraint.
createIndexesSQL(options?)
HNSW vector index + GIN text search index + document_id index.
upsertChunksSQL(chunks, tableName?)
Batch INSERT … ON CONFLICT with vector and jsonb casting.
deleteChunksSQL(documentId, tableName?)
DELETE all chunks for a document.
Using with ORMs
Knex
import { hybridSearchSQL } from 'pgvector-rag/sql';
const { text, params } = hybridSearchSQL({ ... });
const rows = await knex.raw(text, params);Drizzle
import { sql } from 'drizzle-orm';
import { hybridSearchSQL } from 'pgvector-rag/sql';
const { text, params } = hybridSearchSQL({ ... });
const rows = await db.execute(sql.raw(text, ...params));Prisma
import { hybridSearchSQL } from 'pgvector-rag/sql';
const { text, params } = hybridSearchSQL({ ... });
const rows = await prisma.$queryRawUnsafe(text, ...params);Configuration
Every algorithm is configurable via createConfig():
import { createConfig } from 'pgvector-rag';
const config = createConfig({
rrfK: 100, // RRF constant (default: 60)
kQA: 15, // Final chunks for QA (default: 10)
kSummary: 20, // Final chunks for summaries (default: 14)
mmrLambdaQA: 0.8, // MMR trade-off for QA (default: 0.7)
simThreshold: 0.2, // Minimum cosine similarity (default: 0.15)
});Pass config to getRRFWeights() and getThresholds().
Coming Soon
- Pipeline builder (
createPipeline({ embedder, db })) - Embedder adapters (OpenAI, Cohere, HuggingFace)
- Reranker adapters (Cohere cross-encoder, BGE)
- Streaming chunk insertion
- Chunk overlap deduplication
License
MIT
