pgvector-rag

v0.1.0

Published

4 months ago

Lightweight RAG toolkit for pgvector — chunking, hybrid search SQL, MMR, RRF, and more. Zero runtime dependencies.

0High
0Medium
0Low

rushillshah

pgvector rag retrieval-augmented-generation vector-search hybrid-search chunking mmr rrf postgresql embeddings

pgvector-rag

Lightweight RAG toolkit for PostgreSQL + pgvector. Zero runtime dependencies.

Extracted from a production RAG pipeline serving thousands of queries. Provides the algorithms and SQL you need without the framework lock-in.

Why this exists

| | pgvector-rag | LangChain | LlamaIndex | |---|---|---|---| | Runtime deps | 0 | 50+ | 40+ | | Bundle size | ~15 KB | ~2 MB | ~1.5 MB | | DB lock-in | pgvector only | Many adapters | Many adapters | | Chunking | Built-in | Built-in | Built-in | | Hybrid search SQL | Yes | No (needs driver) | No | | MMR | Yes | Yes | Yes | | RRF fusion | Yes | No | No | | Bring your own DB client | Yes | No | No |

Install

npm install pgvector-rag

Quick Start

1. Chunk a document

import { chunk } from 'pgvector-rag';

const chunks = chunk(documentText, {
  maxChunkChars: 1200,
  overlapChars: 250,
});

// chunks = [{ index: 0, content: '...', type: 'heading' }, ...]

2. Create the table

import { createChunksTableSQL, createIndexesSQL } from 'pgvector-rag/sql';
import pg from 'pg';

const pool = new pg.Pool({ connectionString: DATABASE_URL });

const { text: createTable } = createChunksTableSQL({ dimensions: 1536 });
await pool.query(createTable);

for (const { text } of createIndexesSQL()) {
  await pool.query(text);
}

3. Upsert chunks with embeddings

import { upsertChunksSQL } from 'pgvector-rag/sql';

const records = chunks.map((c, i) => ({
  id: crypto.randomUUID(),
  documentId: 'doc-123',
  chunkIndex: c.index,
  content: c.content,
  embedding: embeddings[i], // from your embedding API
  metadata: { chunk_type: c.type },
}));

const { text, params } = upsertChunksSQL(records);
await pool.query(text, params);

4. Search with hybrid SQL

import { hybridSearchSQL } from 'pgvector-rag/sql';
import { selectMMR, buildContext, normalizeScores } from 'pgvector-rag';

// Generate the search SQL
const { text, params } = hybridSearchSQL({
  documentId: 'doc-123',
  queryText: 'How does photosynthesis work?',
  embedding: queryEmbedding, // from your embedding API
  limit: 50,
});

// Execute with your DB client
const { rows } = await pool.query(text, params);

// Map to ScoredChunks
const scored = normalizeScores(rows.map(r => ({
  rrfScore: r.rrf_score,
  id: r.id,
  chunkIndex: r.chunk_index,
  content: r.content,
  embedding: r.embedding, // if you fetched it
})));

// Diversify with MMR
const selected = selectMMR(scored, 10, 0.7);

// Build context string for your LLM
const context = buildContext(selected, 5000);

API Reference

Chunking

`chunk(text, options?)`

Split text into chunks with section-awareness, sentence boundaries, and overlap.

chunk(text: string, options?: {
  maxChunkChars?: number;  // default: 1200
  overlapChars?: number;   // default: 250
  maxChunks?: number;      // default: Infinity
}): Chunk[]

`sanitizeText(text)`

Strip null bytes and control characters.

`detectChunkType(content)`

Classify a chunk as 'heading', 'list', or 'paragraph'.

Vector Math

`cosineSimilarity(a, b)`

Cosine similarity between two vectors. Returns [-1, 1].

`l2Normalize(vector)`

L2-normalize a vector to unit length. Returns a new array.

MMR (Maximal Marginal Relevance)

`selectMMR(candidates, k, lambda)`

Select k chunks balancing relevance and diversity.

lambda = 1.0 → pure relevance (no diversity)
lambda = 0.0 → pure diversity (ignore scores)
lambda = 0.7 → good default for QA
lambda = 0.5 → good default for summaries

Falls back to Jaccard token similarity when embeddings are absent.

Context Building

`buildContext(chunks, maxChars)`

Format chunks into an LLM context string. Sorts by index, adds --- gap separators, respects character budget.

Scoring

`normalizeScores(chunkRows)`

Convert raw ChunkRow objects (from hybrid search) into ScoredChunk objects.

`deduplicateByIndex(items)`

Keep highest-scoring entry per chunkIndex.

Summary Sampling

`selectSummaryRepresentatives(candidates, bucketSize?, maxReps?)`

Pick one representative per document section for broad coverage.

Query Classification

`classifyQueryType(query)`

Regex-based classification: 'instructional', 'informational', or 'definitional'.

`isInstructionalQuery(query)`

Quick boolean check.

RRF (Reciprocal Rank Fusion)

`getRRFWeights(query, queryType?, config?)`

Get RRF signal weights tuned for the query type.

`getThresholds(queryType?, config?)`

Get similarity/BM25 thresholds for the query type.

Legacy Reranker

`legacyRerank(query, candidates, topN)`

Term-frequency + proximity reranker. Use as a fallback when a cross-encoder (Cohere, etc.) is unavailable.

Configuration

`createConfig(overrides?)`

Create a RAGConfig with sensible production defaults, optionally overriding specific values.

`DEFAULT_CONFIG`

Frozen default config with 25+ tuning knobs. See src/core/config.ts.

Concurrency

`new Semaphore(max)`

Counting semaphore for rate-limiting concurrent operations (e.g., embedding API calls).

const sem = new Semaphore(4);
await sem.acquire();
try { /* work */ } finally { sem.release(); }

SQL Generators (`pgvector-rag/sql`)

`hybridSearchSQL(options)`

3-CTE query combining vector similarity + BM25 + phrase matching via RRF.

`createChunksTableSQL(options?)`

CREATE TABLE with vector column, tsvector, and unique constraint.

`createIndexesSQL(options?)`

HNSW vector index + GIN text search index + document_id index.

`upsertChunksSQL(chunks, tableName?)`

Batch INSERT … ON CONFLICT with vector and jsonb casting.

`deleteChunksSQL(documentId, tableName?)`

DELETE all chunks for a document.

Using with ORMs

Knex

import { hybridSearchSQL } from 'pgvector-rag/sql';

const { text, params } = hybridSearchSQL({ ... });
const rows = await knex.raw(text, params);

Drizzle

import { sql } from 'drizzle-orm';
import { hybridSearchSQL } from 'pgvector-rag/sql';

const { text, params } = hybridSearchSQL({ ... });
const rows = await db.execute(sql.raw(text, ...params));

Prisma

import { hybridSearchSQL } from 'pgvector-rag/sql';

const { text, params } = hybridSearchSQL({ ... });
const rows = await prisma.$queryRawUnsafe(text, ...params);

Configuration

Every algorithm is configurable via createConfig():

import { createConfig } from 'pgvector-rag';

const config = createConfig({
  rrfK: 100,          // RRF constant (default: 60)
  kQA: 15,            // Final chunks for QA (default: 10)
  kSummary: 20,       // Final chunks for summaries (default: 14)
  mmrLambdaQA: 0.8,   // MMR trade-off for QA (default: 0.7)
  simThreshold: 0.2,  // Minimum cosine similarity (default: 0.15)
});

Pass config to getRRFWeights() and getThresholds().

Coming Soon

Pipeline builder (createPipeline({ embedder, db }))
Embedder adapters (OpenAI, Cohere, HuggingFace)
Reranker adapters (Cohere cross-encoder, BGE)
Streaming chunk insertion
Chunk overlap deduplication

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

pgvector-rag

Why this exists

Install

Quick Start

1. Chunk a document

2. Create the table

3. Upsert chunks with embeddings

4. Search with hybrid SQL

API Reference

Chunking

chunk(text, options?)

sanitizeText(text)

detectChunkType(content)

Vector Math

cosineSimilarity(a, b)

l2Normalize(vector)

MMR (Maximal Marginal Relevance)

selectMMR(candidates, k, lambda)

Context Building

buildContext(chunks, maxChars)

Scoring

normalizeScores(chunkRows)

deduplicateByIndex(items)

Summary Sampling

selectSummaryRepresentatives(candidates, bucketSize?, maxReps?)

Query Classification

classifyQueryType(query)

isInstructionalQuery(query)

RRF (Reciprocal Rank Fusion)

getRRFWeights(query, queryType?, config?)

getThresholds(queryType?, config?)

Legacy Reranker

legacyRerank(query, candidates, topN)

Configuration

createConfig(overrides?)

DEFAULT_CONFIG

Concurrency

new Semaphore(max)

SQL Generators (pgvector-rag/sql)

hybridSearchSQL(options)

createChunksTableSQL(options?)

createIndexesSQL(options?)

upsertChunksSQL(chunks, tableName?)

deleteChunksSQL(documentId, tableName?)

Using with ORMs

Knex

Drizzle

Prisma

Configuration

Coming Soon

License

`chunk(text, options?)`

`sanitizeText(text)`

`detectChunkType(content)`

`cosineSimilarity(a, b)`

`l2Normalize(vector)`

`selectMMR(candidates, k, lambda)`

`buildContext(chunks, maxChars)`

`normalizeScores(chunkRows)`

`deduplicateByIndex(items)`

`selectSummaryRepresentatives(candidates, bucketSize?, maxReps?)`

`classifyQueryType(query)`

`isInstructionalQuery(query)`

`getRRFWeights(query, queryType?, config?)`

`getThresholds(queryType?, config?)`

`legacyRerank(query, candidates, topN)`

`createConfig(overrides?)`

`DEFAULT_CONFIG`

`new Semaphore(max)`

SQL Generators (`pgvector-rag/sql`)

`hybridSearchSQL(options)`

`createChunksTableSQL(options?)`

`createIndexesSQL(options?)`

`upsertChunksSQL(chunks, tableName?)`

`deleteChunksSQL(documentId, tableName?)`