rag-sdk-server

v0.1.8

Published

17 days ago

Server-side RAG (Retrieval-Augmented Generation) SDK

0High
0Medium
0Low

rag-sdk-server

Server-side RAG (Retrieval-Augmented Generation) SDK for Node.js. Ingest documents into a vector store and query them with any LLM — with streaming support, incremental re-ingestion, and zero lock-in to any single provider.

Overview

rag-sdk-server is built around two independent constructors:

| Constructor | Purpose | |---|---| | RagIngesting | Load documents → chunk → embed → store. Tracks changes via a manifest so only diffs are processed on re-runs. | | RagMessaging | Embed a query → retrieve relevant chunks → assemble a prompt → generate an answer via LLM. |

All providers (embedders, vector stores, LLMs) are hidden behind clean interfaces. Provider SDKs are optional peer dependencies — install only what you use.

Installation

npm install rag-sdk-server

Then install the provider packages you need:

# Embedders
npm install @langchain/openai openai           # OpenAI
npm install @langchain/cohere cohere-ai        # Cohere
npm install @langchain/community               # Voyage, HuggingFace

# LLMs
npm install @langchain/anthropic @anthropic-ai/sdk      # Claude
npm install @langchain/openai openai                    # GPT
npm install @langchain/google-genai @google/generative-ai  # Gemini
npm install @langchain/cohere cohere-ai                 # Cohere Command

# Vector stores
npm install @qdrant/js-client-rest                      # Qdrant
npm install @langchain/pinecone @pinecone-database/pinecone  # Pinecone
npm install chromadb                                    # Chroma
npm install pg                                          # pgvector

# PDF support (only if ingesting PDFs)
npm install pdf-parse

Quick start

import { RagIngesting, RagMessaging } from "rag-sdk-server";

// 1. Ingest documents — run once, or on a schedule to pick up changes
const ingest = new RagIngesting({
  dataSource: { type: "text-dir", path: "./knowledge" },
  embedder:   { provider: "openai" },
  store:      { provider: "qdrant", url: "http://localhost:6333", collection: "kb" },
});

await ingest.run((event) => {
  if (event.type === "progress") console.log(event.message);
  if (event.type === "done")     console.log("Ingestion complete:", event.summary);
});

// 2. Query at runtime — construct once, call .query() per request
const rag = new RagMessaging({
  embedder:     { provider: "openai" },
  store:        { provider: "qdrant", url: "http://localhost:6333", collection: "kb" },
  llm:          { provider: "anthropic", model: "claude-sonnet-4-6" },
  systemPrompt: "You are a helpful assistant. Answer using only the provided context.",
});

const { answer, sources } = await rag.query("What is the refund policy?");
console.log(answer);

API

`new RagIngesting(config)`

Ingests documents into a vector store. Tracks which files have changed via a manifest, so re-running only processes what changed.

const ingest = new RagIngesting({
  dataSource?: DataSourceConfig,    // where to read documents from
  embedder?:   EmbedderConfig,      // how to embed text
  store?:      VectorStoreConfig,   // where to store vectors
  chunking?:   ChunkingConfig,      // chunk size and splitting settings
  manifest?:   ManifestStoreConfig, // manifest persistence
});

const summary = await ingest.run(reporter?);
// { added: number, updated: number, removed: number, skipped: number }

`new RagMessaging(config)`

Runtime query handler. Construct once at server startup, call .query() per request.

const rag = new RagMessaging({
  embedder:      EmbedderConfig,    // must match the model used during ingest
  store:         VectorStoreConfig,
  llm:           LLMConfig,
  retrieval?:    { topK?: number; minScore?: number },
  systemPrompt?: string,
});

const { answer, sources } = await rag.query("your question", {
  history?:  Message[],                // prior conversation turns
  filter?:   Record<string, unknown>,  // metadata filter for the vector store
  onToken?:  (token: string) => void,  // streaming callback
});

Config reference

`DataSourceConfig`

| type | Fields | Description | |---|---|---| | "text-dir" | path | All .txt / .md files in a directory (recursive) | | "pdf-dir" | path | All .pdf files in a directory (recursive). Requires pdf-parse. | | "text-file" | path | A single text file | | "pdf-file" | path | A single PDF file. Requires pdf-parse. | | "text-url" | urls: string[] | Fetch text from one or more URLs | | "pdf-url" | urls: string[] | Fetch and parse PDFs from one or more URLs. Requires pdf-parse. |

{ type: "text-dir",  path: "./docs" }
{ type: "pdf-file",  path: "./report.pdf" }
{ type: "text-url",  urls: ["https://example.com/page"] }

`EmbedderConfig`

| provider | Peer dep | Default model | |---|---|---| | "openai" | @langchain/openai | text-embedding-3-small | | "cohere" | @langchain/cohere | embed-english-v3.0 | | "voyage" | @langchain/community | voyage-2 | | "huggingface" | @langchain/community | sentence-transformers/all-MiniLM-L6-v2 | | "openai-compatible" | @langchain/openai | — (requires baseURL and model) | | "custom" | — | — (pass embedder: Embedder) |

"openai-compatible" covers Azure OpenAI, Together AI, Mistral, and any OpenAI-compatible API.

If apiKey is omitted, the provider's standard environment variable is used (OPENAI_API_KEY, COHERE_API_KEY, etc.).

`VectorStoreConfig`

| provider | Peer dep | Key parameters | |---|---|---| | "in-memory" | none | collection? | | "qdrant" | @qdrant/js-client-rest | url, collection, apiKey? | | "pinecone" | @langchain/pinecone + @pinecone-database/pinecone | apiKey, index, namespace? | | "chroma" | chromadb | url?, collection | | "pgvector" | pg | connectionString, collection, tableName? | | "weaviate" | weaviate-client | url, className, apiKey? |

The "in-memory" store does not persist across restarts — useful for development and tests.

`LLMConfig`

| provider | Peer dep | Default model | |---|---|---| | "anthropic" | @langchain/anthropic | claude-sonnet-4-6 | | "openai" | @langchain/openai | gpt-4o-mini | | "google" | @langchain/google-genai | gemini-1.5-flash | | "cohere" | @langchain/cohere | command-r | | "custom" | — | — (pass llm: LLM) |

baseURL on the "openai" provider lets you point to OpenRouter, Groq, Together AI, or any OpenAI-compatible endpoint.

`ChunkingConfig`

| Field | Type | Default | Description | |---|---|---|---| | size | number | 512 | Target chunk size in characters | | overlap | number | 64 | Overlap between consecutive chunks in characters | | separators | string[] | ["\n\n", "\n", " ", ""] | Ordered separators tried when splitting; falls back to the next if a chunk would exceed size |

// Larger chunks for dense technical docs
{ size: 1024, overlap: 128 }

// Code-aware splitting
{ size: 512, overlap: 32, separators: ["\nfunction ", "\nclass ", "\n\n", "\n", " "] }

// Markdown-aware splitting
{ size: 768, overlap: 64, separators: ["\n## ", "\n### ", "\n\n", "\n", " "] }

`ManifestStoreConfig`

| type | Parameters | Description | |---|---|---| | "file" | dir? (default: .rag-manifest) | Persists to <dir>/<collection>.json on disk | | "memory" | — | In-memory only; resets on restart (useful for tests) |

`RagMessaging` retrieval options

| Field | Type | Default | Description | |---|---|---|---| | retrieval.topK | number | 5 | Number of chunks to retrieve per query | | retrieval.minScore | number | 0 | Minimum similarity score (0–1) to include a chunk | | systemPrompt | string | "You are a helpful assistant." | System prompt prepended before retrieved context |

Incremental ingestion

RagIngesting hashes every source document (SHA-256). On each run() call:

Unchanged — skipped entirely
Modified — old chunks deleted, new chunks embedded and stored
New — embedded and stored
Deleted — chunks removed from the vector store

Calling run() repeatedly is safe and cheap — only diffs are processed.

Streaming

Pass onToken to .query() to receive tokens as the LLM generates them:

await rag.query("Summarise the privacy policy", {
  onToken: (token) => process.stdout.write(token),
});

In an Express or WebSocket server:

await rag.query(userMessage, {
  onToken: (token) => res.write(token),
});
res.end();

Conversation history

Pass previous turns to maintain context across a multi-turn conversation:

const history: Message[] = [];

const { answer } = await rag.query("What is the return window?", { history });
history.push({ role: "user", content: "What is the return window?" });
history.push({ role: "assistant", content: answer });

const { answer: answer2 } = await rag.query("Does that apply to sale items too?", { history });

Custom providers

Any component can be replaced with a custom implementation.

Custom embedder:

import type { Embedder } from "rag-sdk-server";

class MyEmbedder implements Embedder {
  readonly model = "my-model-v1";
  readonly dimensions = 768;
  async embed(texts: string[]): Promise<number[][]> {
    // call your embedding API
  }
}

embedder: { provider: "custom", embedder: new MyEmbedder() }

Custom LLM:

import type { LLM, Message } from "rag-sdk-server";

class MyLLM implements LLM {
  async generate(opts: {
    system?: string;
    messages: Message[];
    onToken?: (t: string) => void;
  }): Promise<string> {
    // call your LLM API; invoke opts.onToken per token for streaming
  }
}

llm: { provider: "custom", llm: new MyLLM() }

Error handling

| Error | When thrown | |---|---| | MissingProviderError | A required peer dependency is not installed | | EmbeddingModelMismatchError | Query-time embedder differs from the model used at ingest | | ProviderConfigError | Invalid configuration for a provider | | RagSdkError | Base class for all SDK errors |

import { EmbeddingModelMismatchError, MissingProviderError } from "rag-sdk-server";

try {
  const { answer } = await rag.query("...");
} catch (err) {
  if (err instanceof EmbeddingModelMismatchError) {
    console.error("Embedder mismatch — re-ingest with the current model.");
  }
  if (err instanceof MissingProviderError) {
    console.error(err.message); // tells you exactly which package to install
  }
}

If you change embedding models, re-ingest your documents from scratch — vectors from different models are not compatible.

License

Apache 2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

rag-sdk-server

Overview

Installation

Quick start

API

new RagIngesting(config)

new RagMessaging(config)

Config reference

DataSourceConfig

EmbedderConfig

VectorStoreConfig

LLMConfig

ChunkingConfig

ManifestStoreConfig

RagMessaging retrieval options

Incremental ingestion

Streaming

Conversation history

Custom providers

Error handling

License

`new RagIngesting(config)`

`new RagMessaging(config)`

`DataSourceConfig`

`EmbedderConfig`

`VectorStoreConfig`

`LLMConfig`

`ChunkingConfig`

`ManifestStoreConfig`

`RagMessaging` retrieval options