rag-sdk-server
v0.1.8
Published
Server-side RAG (Retrieval-Augmented Generation) SDK
Readme
rag-sdk-server
Server-side RAG (Retrieval-Augmented Generation) SDK for Node.js. Ingest documents into a vector store and query them with any LLM — with streaming support, incremental re-ingestion, and zero lock-in to any single provider.
Overview
rag-sdk-server is built around two independent constructors:
| Constructor | Purpose |
|---|---|
| RagIngesting | Load documents → chunk → embed → store. Tracks changes via a manifest so only diffs are processed on re-runs. |
| RagMessaging | Embed a query → retrieve relevant chunks → assemble a prompt → generate an answer via LLM. |
All providers (embedders, vector stores, LLMs) are hidden behind clean interfaces. Provider SDKs are optional peer dependencies — install only what you use.
Installation
npm install rag-sdk-serverThen install the provider packages you need:
# Embedders
npm install @langchain/openai openai # OpenAI
npm install @langchain/cohere cohere-ai # Cohere
npm install @langchain/community # Voyage, HuggingFace
# LLMs
npm install @langchain/anthropic @anthropic-ai/sdk # Claude
npm install @langchain/openai openai # GPT
npm install @langchain/google-genai @google/generative-ai # Gemini
npm install @langchain/cohere cohere-ai # Cohere Command
# Vector stores
npm install @qdrant/js-client-rest # Qdrant
npm install @langchain/pinecone @pinecone-database/pinecone # Pinecone
npm install chromadb # Chroma
npm install pg # pgvector
# PDF support (only if ingesting PDFs)
npm install pdf-parseQuick start
import { RagIngesting, RagMessaging } from "rag-sdk-server";
// 1. Ingest documents — run once, or on a schedule to pick up changes
const ingest = new RagIngesting({
dataSource: { type: "text-dir", path: "./knowledge" },
embedder: { provider: "openai" },
store: { provider: "qdrant", url: "http://localhost:6333", collection: "kb" },
});
await ingest.run((event) => {
if (event.type === "progress") console.log(event.message);
if (event.type === "done") console.log("Ingestion complete:", event.summary);
});
// 2. Query at runtime — construct once, call .query() per request
const rag = new RagMessaging({
embedder: { provider: "openai" },
store: { provider: "qdrant", url: "http://localhost:6333", collection: "kb" },
llm: { provider: "anthropic", model: "claude-sonnet-4-6" },
systemPrompt: "You are a helpful assistant. Answer using only the provided context.",
});
const { answer, sources } = await rag.query("What is the refund policy?");
console.log(answer);API
new RagIngesting(config)
Ingests documents into a vector store. Tracks which files have changed via a manifest, so re-running only processes what changed.
const ingest = new RagIngesting({
dataSource?: DataSourceConfig, // where to read documents from
embedder?: EmbedderConfig, // how to embed text
store?: VectorStoreConfig, // where to store vectors
chunking?: ChunkingConfig, // chunk size and splitting settings
manifest?: ManifestStoreConfig, // manifest persistence
});
const summary = await ingest.run(reporter?);
// { added: number, updated: number, removed: number, skipped: number }new RagMessaging(config)
Runtime query handler. Construct once at server startup, call .query() per request.
const rag = new RagMessaging({
embedder: EmbedderConfig, // must match the model used during ingest
store: VectorStoreConfig,
llm: LLMConfig,
retrieval?: { topK?: number; minScore?: number },
systemPrompt?: string,
});
const { answer, sources } = await rag.query("your question", {
history?: Message[], // prior conversation turns
filter?: Record<string, unknown>, // metadata filter for the vector store
onToken?: (token: string) => void, // streaming callback
});Config reference
DataSourceConfig
| type | Fields | Description |
|---|---|---|
| "text-dir" | path | All .txt / .md files in a directory (recursive) |
| "pdf-dir" | path | All .pdf files in a directory (recursive). Requires pdf-parse. |
| "text-file" | path | A single text file |
| "pdf-file" | path | A single PDF file. Requires pdf-parse. |
| "text-url" | urls: string[] | Fetch text from one or more URLs |
| "pdf-url" | urls: string[] | Fetch and parse PDFs from one or more URLs. Requires pdf-parse. |
{ type: "text-dir", path: "./docs" }
{ type: "pdf-file", path: "./report.pdf" }
{ type: "text-url", urls: ["https://example.com/page"] }EmbedderConfig
| provider | Peer dep | Default model |
|---|---|---|
| "openai" | @langchain/openai | text-embedding-3-small |
| "cohere" | @langchain/cohere | embed-english-v3.0 |
| "voyage" | @langchain/community | voyage-2 |
| "huggingface" | @langchain/community | sentence-transformers/all-MiniLM-L6-v2 |
| "openai-compatible" | @langchain/openai | — (requires baseURL and model) |
| "custom" | — | — (pass embedder: Embedder) |
"openai-compatible" covers Azure OpenAI, Together AI, Mistral, and any OpenAI-compatible API.
If apiKey is omitted, the provider's standard environment variable is used (OPENAI_API_KEY, COHERE_API_KEY, etc.).
VectorStoreConfig
| provider | Peer dep | Key parameters |
|---|---|---|
| "in-memory" | none | collection? |
| "qdrant" | @qdrant/js-client-rest | url, collection, apiKey? |
| "pinecone" | @langchain/pinecone + @pinecone-database/pinecone | apiKey, index, namespace? |
| "chroma" | chromadb | url?, collection |
| "pgvector" | pg | connectionString, collection, tableName? |
| "weaviate" | weaviate-client | url, className, apiKey? |
The "in-memory" store does not persist across restarts — useful for development and tests.
LLMConfig
| provider | Peer dep | Default model |
|---|---|---|
| "anthropic" | @langchain/anthropic | claude-sonnet-4-6 |
| "openai" | @langchain/openai | gpt-4o-mini |
| "google" | @langchain/google-genai | gemini-1.5-flash |
| "cohere" | @langchain/cohere | command-r |
| "custom" | — | — (pass llm: LLM) |
baseURL on the "openai" provider lets you point to OpenRouter, Groq, Together AI, or any OpenAI-compatible endpoint.
ChunkingConfig
| Field | Type | Default | Description |
|---|---|---|---|
| size | number | 512 | Target chunk size in characters |
| overlap | number | 64 | Overlap between consecutive chunks in characters |
| separators | string[] | ["\n\n", "\n", " ", ""] | Ordered separators tried when splitting; falls back to the next if a chunk would exceed size |
// Larger chunks for dense technical docs
{ size: 1024, overlap: 128 }
// Code-aware splitting
{ size: 512, overlap: 32, separators: ["\nfunction ", "\nclass ", "\n\n", "\n", " "] }
// Markdown-aware splitting
{ size: 768, overlap: 64, separators: ["\n## ", "\n### ", "\n\n", "\n", " "] }ManifestStoreConfig
| type | Parameters | Description |
|---|---|---|
| "file" | dir? (default: .rag-manifest) | Persists to <dir>/<collection>.json on disk |
| "memory" | — | In-memory only; resets on restart (useful for tests) |
RagMessaging retrieval options
| Field | Type | Default | Description |
|---|---|---|---|
| retrieval.topK | number | 5 | Number of chunks to retrieve per query |
| retrieval.minScore | number | 0 | Minimum similarity score (0–1) to include a chunk |
| systemPrompt | string | "You are a helpful assistant." | System prompt prepended before retrieved context |
Incremental ingestion
RagIngesting hashes every source document (SHA-256). On each run() call:
- Unchanged — skipped entirely
- Modified — old chunks deleted, new chunks embedded and stored
- New — embedded and stored
- Deleted — chunks removed from the vector store
Calling run() repeatedly is safe and cheap — only diffs are processed.
Streaming
Pass onToken to .query() to receive tokens as the LLM generates them:
await rag.query("Summarise the privacy policy", {
onToken: (token) => process.stdout.write(token),
});In an Express or WebSocket server:
await rag.query(userMessage, {
onToken: (token) => res.write(token),
});
res.end();Conversation history
Pass previous turns to maintain context across a multi-turn conversation:
const history: Message[] = [];
const { answer } = await rag.query("What is the return window?", { history });
history.push({ role: "user", content: "What is the return window?" });
history.push({ role: "assistant", content: answer });
const { answer: answer2 } = await rag.query("Does that apply to sale items too?", { history });Custom providers
Any component can be replaced with a custom implementation.
Custom embedder:
import type { Embedder } from "rag-sdk-server";
class MyEmbedder implements Embedder {
readonly model = "my-model-v1";
readonly dimensions = 768;
async embed(texts: string[]): Promise<number[][]> {
// call your embedding API
}
}
embedder: { provider: "custom", embedder: new MyEmbedder() }Custom LLM:
import type { LLM, Message } from "rag-sdk-server";
class MyLLM implements LLM {
async generate(opts: {
system?: string;
messages: Message[];
onToken?: (t: string) => void;
}): Promise<string> {
// call your LLM API; invoke opts.onToken per token for streaming
}
}
llm: { provider: "custom", llm: new MyLLM() }Error handling
| Error | When thrown |
|---|---|
| MissingProviderError | A required peer dependency is not installed |
| EmbeddingModelMismatchError | Query-time embedder differs from the model used at ingest |
| ProviderConfigError | Invalid configuration for a provider |
| RagSdkError | Base class for all SDK errors |
import { EmbeddingModelMismatchError, MissingProviderError } from "rag-sdk-server";
try {
const { answer } = await rag.query("...");
} catch (err) {
if (err instanceof EmbeddingModelMismatchError) {
console.error("Embedder mismatch — re-ingest with the current model.");
}
if (err instanceof MissingProviderError) {
console.error(err.message); // tells you exactly which package to install
}
}If you change embedding models, re-ingest your documents from scratch — vectors from different models are not compatible.
