vectra-js
v1.0.2
Published
A production-ready, provider-agnostic Node.js SDK for End-to-End RAG pipelines.
Maintainers
Readme
Vectra (Node.js)
Vectra is a production-grade, provider-agnostic Node.js SDK for building end-to-end Retrieval-Augmented Generation (RAG) systems. It is designed for teams that need flexibility, extensibility, correctness, and observability across embeddings, vector databases, retrieval strategies, and LLM providers—without locking into a single vendor.
If you find this project useful, consider supporting it:
Table of Contents
1. Overview
Vectra provides a fully modular RAG pipeline:
Load → Chunk → Embed → Store → Retrieve → Rerank → Plan → Ground → Generate → StreamEvery stage is explicitly configurable, validated at runtime, and observable.
Key Characteristics
- Provider‑agnostic LLM & embedding layer
- Multiple vector backends (Postgres, Chroma, Qdrant, Milvus)
- Advanced retrieval strategies (HyDE, Multi‑Query, Hybrid RRF, MMR)
- Unified streaming interface
- Built‑in evaluation & observability
- CLI + SDK parity
2. Design Goals & Philosophy
Explicitness over Magic
Vectra avoids hidden defaults. Chunking, retrieval, grounding, memory, and generation behavior are always explicit.
Production‑First
Index helpers, rate limiting, embedding cache, observability, and evaluation are first‑class features.
Provider Neutrality
Swapping OpenAI → Gemini → Anthropic → Ollama requires no application code changes.
Extensibility
Every major subsystem (providers, vector stores, callbacks) is interface‑driven.
3. Feature Matrix
Providers
- Embeddings: OpenAI, Gemini, Ollama, HuggingFace
- Generation: OpenAI, Gemini, Anthropic, Ollama, OpenRouter, HuggingFace
- Streaming: Unified async generator
Vector Stores
- PostgreSQL (Prisma + pgvector)
- PostgreSQL (native
pgdriver) - ChromaDB
- Qdrant
- Milvus
Retrieval Strategies
- Naive cosine similarity
- HyDE (Hypothetical Document Embeddings)
- Multi‑Query expansion
- Hybrid semantic + lexical (RRF)
- MMR diversification
4. Installation
Library
npm install vectra-js
# or
pnpm add vectra-jsBackends:
npm install pg # https://node-postgres.com/
npm install @prisma/client # https://prisma.io/docs
npm install chromadb # https://docs.trychroma.com/
npm install qdrant-client # https://qdrant.tech/documentation/
npm install pymilvus # https://milvus.io/docs/CLI
npm i -g vectra-js
# or
pnpm add -g vectra-js5. Quick Start
const { VectraClient, ProviderType } = require('vectra-js');
const { Pool } = require('pg');
const pool = new Pool({
connectionString: process.env.DATABASE_URL
});
const client = new VectraClient({
embedding: {
provider: ProviderType.OPENAI,
apiKey: process.env.OPENAI_API_KEY,
modelName: 'text-embedding-3-small'
},
llm: {
provider: ProviderType.GEMINI,
apiKey: process.env.GOOGLE_API_KEY,
modelName: 'gemini-2.5-flash'
},
database: {
type: 'postgres',
clientInstance: pool,
tableName: 'document',
columnMap: { 'content': 'content', 'metadata': 'metadata', 'vector': 'vector' }
}
});
await client.ingestDocuments('./docs');
const res = await client.queryRAG('What is the vacation policy?');
console.log(res.answer);6. Core Concepts
Providers
Providers implement embeddings, generation, or both. Vectra normalizes outputs and streaming across providers.
Vector Stores
Vector stores persist embeddings and metadata. They are fully swappable via config.
Chunking
- Recursive: Character‑aware, separator‑aware splitting
- Agentic: LLM‑driven semantic propositions (best for policies, legal docs)
Retrieval
Controls recall vs precision using multiple strategies.
Reranking
Optional LLM‑based reordering of retrieved chunks.
Metadata Enrichment
Optional per‑chunk summaries, keywords, and hypothetical questions generated at ingestion time.
Query Planning & Grounding
Controls how context is assembled and how strictly answers must be grounded in retrieved text.
Conversation Memory
Persist multi‑turn chat history across sessions.
7. Configuration Reference (Usage‑Driven)
All configuration is validated using Zod at runtime.
Embedding
embedding: {
provider: ProviderType.OPENAI,
apiKey: process.env.OPENAI_API_KEY,
modelName: 'text-embedding-3-small',
dimensions: 1536
}Use dimensions when using pgvector to avoid runtime mismatches.
LLM
llm: {
provider: ProviderType.GEMINI,
apiKey: process.env.GOOGLE_API_KEY,
modelName: 'gemini-2.5-flash',
temperature: 0.3,
maxTokens: 1024
}Used for:
- Answer generation
- HyDE & Multi‑Query
- Agentic chunking
- Reranking
Database
Supports Prisma, Postgres (native), Chroma, Qdrant, Milvus.
// PostgreSQL (native pg)
database: {
type: 'postgres',
clientInstance: pool, // new Pool(...)
tableName: 'document',
columnMap: { content: 'content', metadata: 'metadata', vector: 'vector' }
}// Prisma
database: {
type: 'prisma',
clientInstance: prisma,
tableName: 'Document',
columnMap: { content: 'content', metadata: 'metadata', vector: 'embedding' }
}// ChromaDB
database: {
type: 'chroma',
clientInstance: chromaClient,
collectionName: 'rag_collection'
}// Qdrant
database: {
type: 'qdrant',
clientInstance: qdrantClient,
collectionName: 'rag_collection'
}// Milvus
database: {
type: 'milvus',
clientInstance: milvusClient,
collectionName: 'rag_collection'
}Chunking
chunking: {
strategy: ChunkingStrategy.RECURSIVE,
chunkSize: 1000,
chunkOverlap: 200
}Agentic chunking:
chunking: {
strategy: ChunkingStrategy.AGENTIC,
agenticLlm: {
provider: ProviderType.OPENAI,
apiKey: process.env.OPENAI_API_KEY,
modelName: 'gpt-4o-mini'
}
}Retrieval
retrieval: { strategy: RetrievalStrategy.HYBRID }HYBRID is recommended for production.
Reranking
reranking: {
enabled: true,
windowSize: 20,
topN: 5
}Memory
memory: { enabled: true, type: 'in-memory', maxMessages: 20 }Redis and Postgres are supported.
// Redis
memory: {
enabled: true,
type: 'redis',
maxMessages: 20,
redis: {
clientInstance: redisClient,
keyPrefix: 'vectra:chat:'
}
}// Postgres
memory: {
enabled: true,
type: 'postgres',
maxMessages: 20,
postgres: {
clientInstance: pool, // pg Pool
tableName: 'ChatMessage',
columnMap: {
sessionId: 'sessionId',
role: 'role',
content: 'content',
createdAt: 'createdAt'
}
}
}Observability
observability: {
enabled: true,
sqlitePath: 'vectra-observability.db'
}8. Ingestion Pipeline
await client.ingestDocuments('./documents');Supports files or directories.
Formats: PDF, DOCX, XLSX, TXT, Markdown
9. Querying & Streaming
const res = await client.queryRAG('Refund policy?');Streaming:
const stream = await client.queryRAG('Draft email', null, true);
for await (const chunk of stream) process.stdout.write(chunk.delta || '');10. Conversation Memory
Pass a sessionId to maintain context across turns.
11. Evaluation & Quality Measurement
await client.evaluate([{ question: 'Capital of France?', expectedGroundTruth: 'Paris' }]);Metrics:
- Faithfulness
- Relevance
12. CLI
Ingest & Query
vectra ingest ./docs --config=./config.json
vectra query "What is our leave policy?" --config=./config.json --streamWebConfig (Config Generator UI)
vectra webconfigWebConfig launches a local web UI that:
- Guides you through building a valid
vectra.config.json - Validates all options interactively
- Prevents misconfiguration
This is ideal for:
- First‑time setup
- Non‑backend users
- Sharing configs across teams
Observability Dashboard
vectra dashboardThe Observability Dashboard is a local web UI backed by SQLite that visualizes:
- Ingestion latency
- Query latency
- Retrieval & generation traces
- Chat sessions
It helps you:
- Debug RAG quality issues
- Understand latency bottlenecks
- Monitor production‑like workloads
13. Observability & Callbacks
Observability
Tracks metrics, traces, and sessions automatically when enabled.
Callbacks
Lifecycle hooks:
- Ingestion
- Chunking
- Embedding
- Retrieval
- Reranking
- Generation
- Errors
14. Telemetry
Vectra collects anonymous usage data to help us improve the SDK, prioritize features, and detect broken versions.
What we track
- Identity: A random UUID (
distinct_id) stored locally in~/.vectra/telemetry.json. No PII, emails, IPs, or hostnames. - Events:
sdk_initialized: Config shape (providers used), OS/Runtime version, session type (api/cli/chat).ingest_started/completed: Source type, chunking strategy, duration bucket, chunk count bucket.query_executed: Retrieval strategy, query mode (rag), result count, latency bucket.feature_used: WebConfig/Dashboard usage.evaluation_run: Dataset size bucket.error_occurred: Error type and stage (no stack traces).cli_command_used: Command name and flags.
Why we track it
- Detect broken versions: Spikes in
error_occurredhelp us find bugs. - Measure adoption: Helps us understand which providers (OpenAI vs Gemini) and vector stores are most popular.
- Drop support safely: We can see if anyone is still using Node 18 before dropping it.
How to opt-out
Telemetry is enabled by default. To disable it:
Option 1: Config
const client = new VectraClient({
// ...
telemetry: { enabled: false }
});Option 2: Environment Variable
Set VECTRA_TELEMETRY_DISABLED=1 or DO_NOT_TRACK=1.
15. Database Schemas & Indexing
model Document {
id String @id @default(uuid())
content String
metadata Json
vector Unsupported("vector")?
createdAt DateTime @default(now())
}16. Extending Vectra
Custom Vector Store
class MyStore extends VectorStore {
async addDocuments() {}
async similaritySearch() {}
}17. Architecture Overview
VectraClient: orchestrator- Typed config schema
- Interface‑driven providers & stores
- Unified streaming abstraction
18. Development & Contribution Guide
- Node.js 18+
- pnpm recommended
- Lint:
pnpm run lint
19. Production Best Practices
- Match embedding dimensions to pgvector
- Prefer HYBRID retrieval
- Enable observability in staging
- Evaluate before changing chunk sizes
Vectra scales cleanly from local prototypes to production‑grade RAG platforms.
