npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

rag-sdk-server

v0.1.8

Published

Server-side RAG (Retrieval-Augmented Generation) SDK

Readme

rag-sdk-server

Server-side RAG (Retrieval-Augmented Generation) SDK for Node.js. Ingest documents into a vector store and query them with any LLM — with streaming support, incremental re-ingestion, and zero lock-in to any single provider.

npm version License GitHub


Overview

rag-sdk-server is built around two independent constructors:

| Constructor | Purpose | |---|---| | RagIngesting | Load documents → chunk → embed → store. Tracks changes via a manifest so only diffs are processed on re-runs. | | RagMessaging | Embed a query → retrieve relevant chunks → assemble a prompt → generate an answer via LLM. |

All providers (embedders, vector stores, LLMs) are hidden behind clean interfaces. Provider SDKs are optional peer dependencies — install only what you use.


Installation

npm install rag-sdk-server

Then install the provider packages you need:

# Embedders
npm install @langchain/openai openai           # OpenAI
npm install @langchain/cohere cohere-ai        # Cohere
npm install @langchain/community               # Voyage, HuggingFace

# LLMs
npm install @langchain/anthropic @anthropic-ai/sdk      # Claude
npm install @langchain/openai openai                    # GPT
npm install @langchain/google-genai @google/generative-ai  # Gemini
npm install @langchain/cohere cohere-ai                 # Cohere Command

# Vector stores
npm install @qdrant/js-client-rest                      # Qdrant
npm install @langchain/pinecone @pinecone-database/pinecone  # Pinecone
npm install chromadb                                    # Chroma
npm install pg                                          # pgvector

# PDF support (only if ingesting PDFs)
npm install pdf-parse

Quick start

import { RagIngesting, RagMessaging } from "rag-sdk-server";

// 1. Ingest documents — run once, or on a schedule to pick up changes
const ingest = new RagIngesting({
  dataSource: { type: "text-dir", path: "./knowledge" },
  embedder:   { provider: "openai" },
  store:      { provider: "qdrant", url: "http://localhost:6333", collection: "kb" },
});

await ingest.run((event) => {
  if (event.type === "progress") console.log(event.message);
  if (event.type === "done")     console.log("Ingestion complete:", event.summary);
});

// 2. Query at runtime — construct once, call .query() per request
const rag = new RagMessaging({
  embedder:     { provider: "openai" },
  store:        { provider: "qdrant", url: "http://localhost:6333", collection: "kb" },
  llm:          { provider: "anthropic", model: "claude-sonnet-4-6" },
  systemPrompt: "You are a helpful assistant. Answer using only the provided context.",
});

const { answer, sources } = await rag.query("What is the refund policy?");
console.log(answer);

API

new RagIngesting(config)

Ingests documents into a vector store. Tracks which files have changed via a manifest, so re-running only processes what changed.

const ingest = new RagIngesting({
  dataSource?: DataSourceConfig,    // where to read documents from
  embedder?:   EmbedderConfig,      // how to embed text
  store?:      VectorStoreConfig,   // where to store vectors
  chunking?:   ChunkingConfig,      // chunk size and splitting settings
  manifest?:   ManifestStoreConfig, // manifest persistence
});

const summary = await ingest.run(reporter?);
// { added: number, updated: number, removed: number, skipped: number }

new RagMessaging(config)

Runtime query handler. Construct once at server startup, call .query() per request.

const rag = new RagMessaging({
  embedder:      EmbedderConfig,    // must match the model used during ingest
  store:         VectorStoreConfig,
  llm:           LLMConfig,
  retrieval?:    { topK?: number; minScore?: number },
  systemPrompt?: string,
});

const { answer, sources } = await rag.query("your question", {
  history?:  Message[],                // prior conversation turns
  filter?:   Record<string, unknown>,  // metadata filter for the vector store
  onToken?:  (token: string) => void,  // streaming callback
});

Config reference

DataSourceConfig

| type | Fields | Description | |---|---|---| | "text-dir" | path | All .txt / .md files in a directory (recursive) | | "pdf-dir" | path | All .pdf files in a directory (recursive). Requires pdf-parse. | | "text-file" | path | A single text file | | "pdf-file" | path | A single PDF file. Requires pdf-parse. | | "text-url" | urls: string[] | Fetch text from one or more URLs | | "pdf-url" | urls: string[] | Fetch and parse PDFs from one or more URLs. Requires pdf-parse. |

{ type: "text-dir",  path: "./docs" }
{ type: "pdf-file",  path: "./report.pdf" }
{ type: "text-url",  urls: ["https://example.com/page"] }

EmbedderConfig

| provider | Peer dep | Default model | |---|---|---| | "openai" | @langchain/openai | text-embedding-3-small | | "cohere" | @langchain/cohere | embed-english-v3.0 | | "voyage" | @langchain/community | voyage-2 | | "huggingface" | @langchain/community | sentence-transformers/all-MiniLM-L6-v2 | | "openai-compatible" | @langchain/openai | — (requires baseURL and model) | | "custom" | — | — (pass embedder: Embedder) |

"openai-compatible" covers Azure OpenAI, Together AI, Mistral, and any OpenAI-compatible API.

If apiKey is omitted, the provider's standard environment variable is used (OPENAI_API_KEY, COHERE_API_KEY, etc.).

VectorStoreConfig

| provider | Peer dep | Key parameters | |---|---|---| | "in-memory" | none | collection? | | "qdrant" | @qdrant/js-client-rest | url, collection, apiKey? | | "pinecone" | @langchain/pinecone + @pinecone-database/pinecone | apiKey, index, namespace? | | "chroma" | chromadb | url?, collection | | "pgvector" | pg | connectionString, collection, tableName? | | "weaviate" | weaviate-client | url, className, apiKey? |

The "in-memory" store does not persist across restarts — useful for development and tests.

LLMConfig

| provider | Peer dep | Default model | |---|---|---| | "anthropic" | @langchain/anthropic | claude-sonnet-4-6 | | "openai" | @langchain/openai | gpt-4o-mini | | "google" | @langchain/google-genai | gemini-1.5-flash | | "cohere" | @langchain/cohere | command-r | | "custom" | — | — (pass llm: LLM) |

baseURL on the "openai" provider lets you point to OpenRouter, Groq, Together AI, or any OpenAI-compatible endpoint.

ChunkingConfig

| Field | Type | Default | Description | |---|---|---|---| | size | number | 512 | Target chunk size in characters | | overlap | number | 64 | Overlap between consecutive chunks in characters | | separators | string[] | ["\n\n", "\n", " ", ""] | Ordered separators tried when splitting; falls back to the next if a chunk would exceed size |

// Larger chunks for dense technical docs
{ size: 1024, overlap: 128 }

// Code-aware splitting
{ size: 512, overlap: 32, separators: ["\nfunction ", "\nclass ", "\n\n", "\n", " "] }

// Markdown-aware splitting
{ size: 768, overlap: 64, separators: ["\n## ", "\n### ", "\n\n", "\n", " "] }

ManifestStoreConfig

| type | Parameters | Description | |---|---|---| | "file" | dir? (default: .rag-manifest) | Persists to <dir>/<collection>.json on disk | | "memory" | — | In-memory only; resets on restart (useful for tests) |

RagMessaging retrieval options

| Field | Type | Default | Description | |---|---|---|---| | retrieval.topK | number | 5 | Number of chunks to retrieve per query | | retrieval.minScore | number | 0 | Minimum similarity score (0–1) to include a chunk | | systemPrompt | string | "You are a helpful assistant." | System prompt prepended before retrieved context |


Incremental ingestion

RagIngesting hashes every source document (SHA-256). On each run() call:

  • Unchanged — skipped entirely
  • Modified — old chunks deleted, new chunks embedded and stored
  • New — embedded and stored
  • Deleted — chunks removed from the vector store

Calling run() repeatedly is safe and cheap — only diffs are processed.


Streaming

Pass onToken to .query() to receive tokens as the LLM generates them:

await rag.query("Summarise the privacy policy", {
  onToken: (token) => process.stdout.write(token),
});

In an Express or WebSocket server:

await rag.query(userMessage, {
  onToken: (token) => res.write(token),
});
res.end();

Conversation history

Pass previous turns to maintain context across a multi-turn conversation:

const history: Message[] = [];

const { answer } = await rag.query("What is the return window?", { history });
history.push({ role: "user", content: "What is the return window?" });
history.push({ role: "assistant", content: answer });

const { answer: answer2 } = await rag.query("Does that apply to sale items too?", { history });

Custom providers

Any component can be replaced with a custom implementation.

Custom embedder:

import type { Embedder } from "rag-sdk-server";

class MyEmbedder implements Embedder {
  readonly model = "my-model-v1";
  readonly dimensions = 768;
  async embed(texts: string[]): Promise<number[][]> {
    // call your embedding API
  }
}

embedder: { provider: "custom", embedder: new MyEmbedder() }

Custom LLM:

import type { LLM, Message } from "rag-sdk-server";

class MyLLM implements LLM {
  async generate(opts: {
    system?: string;
    messages: Message[];
    onToken?: (t: string) => void;
  }): Promise<string> {
    // call your LLM API; invoke opts.onToken per token for streaming
  }
}

llm: { provider: "custom", llm: new MyLLM() }

Error handling

| Error | When thrown | |---|---| | MissingProviderError | A required peer dependency is not installed | | EmbeddingModelMismatchError | Query-time embedder differs from the model used at ingest | | ProviderConfigError | Invalid configuration for a provider | | RagSdkError | Base class for all SDK errors |

import { EmbeddingModelMismatchError, MissingProviderError } from "rag-sdk-server";

try {
  const { answer } = await rag.query("...");
} catch (err) {
  if (err instanceof EmbeddingModelMismatchError) {
    console.error("Embedder mismatch — re-ingest with the current model.");
  }
  if (err instanceof MissingProviderError) {
    console.error(err.message); // tells you exactly which package to install
  }
}

If you change embedding models, re-ingest your documents from scratch — vectors from different models are not compatible.


License

Apache 2.0