@sisu-ai/mw-rag
v11.0.0
Published
Compose retrieval-augmented generation pipelines by connecting vector retrieval outputs to prompting.
Maintainers
Readme
@sisu-ai/mw-rag
Compose retrieval-augmented generation pipelines by connecting vector retrieval outputs to prompting.
Exports
ragIngest({ vectorStore, namespace?, select? })vectorStore: requiredVectorStoreimplementation.namespace: optional default namespace.select(ctx): return{ records, namespace? }orVectorRecord[]to ingest.
ragRetrieve({ vectorStore, namespace?, topK?, filter?, select? })vectorStore: requiredVectorStoreimplementation.namespace: optional default namespace.topK: default 5; also accepted viaselect.filter: provider-specific filter object to pass to the query.select(ctx): return{ embedding, topK?, filter?, namespace? }ornumber[].
buildRagPrompt({ template?, select? })template: customize the system prompt; uses a sensible default.select(ctx): return{ context?, question? }to override defaults.
State used under ctx.state.rag:
records(ingest input),ingested(result)queryEmbedding(retrieve input),retrieval(result)
Choosing a Package
- Use
@sisu-ai/rag-corewhen app code needs reusable chunking, embedding orchestration, seeding, or direct store/retrieve helpers. - Use
@sisu-ai/tool-ragwhen the model should callretrieveContext/storeContextas tools. - Use
@sisu-ai/mw-ragwhen your app already owns embeddings and vector writes/queries, and you want a deterministic middleware pipeline that turns retrieval into prompt context. @sisu-ai/mw-ragno longer depends on low-level vector tool registration.
What It Does
ragIngestupserts your prepared documents into a vector index via aVectorStore.ragRetrievequeries nearest neighbors using an embedding for the current question.buildRagPromptturns retrieval results into a grounded system prompt that precedes your user question.
It wires the minimum state in ctx.state.rag so you can compose ingestion, retrieval, and prompting without monolithic code.
@sisu-ai/mw-rag does not own chunking or embedding generation. You prepare VectorRecord[] and query embeddings in app code or another layer, then this middleware handles the retrieval/prompting composition.
How It Works
- Vector operations are provided by a
VectorStoreimplementation such as@sisu-ai/vector-chromaor@sisu-ai/vector-vectra. - You provide inputs via
ctx.state.ragorselectcallbacks:rag.records:VectorRecord[]for ingestion.rag.queryEmbedding:number[]representing the query embedding.
- Retrieval matches are placed at
rag.retrieval.buildRagPromptformats these into a context block and appends a system message toctx.messages.
For agent-facing retrieval/storage tools that handle chunking and embedding orchestration, prefer @sisu-ai/tool-rag composed with a backend adapter such as @sisu-ai/vector-chroma or @sisu-ai/vector-vectra.
For app-side seeding and reusable chunking/embedding mechanics outside tool-calling, use @sisu-ai/rag-core directly.
When To Use @sisu-ai/mw-rag
- You want deterministic, middleware-driven RAG rather than model tool-calling.
- You already compute embeddings in your own code and want to keep that explicit.
- You want prompt injection based on retrieval results without exposing storage/retrieval tools to the model.
- You want to compose retrieval with other middleware such as guardrails, orchestration, or prompt shaping.
When Not To Use @sisu-ai/mw-rag
- You want the model to decide when to retrieve or store context; use
@sisu-ai/tool-rag. - You want reusable app-side ingestion helpers; use
@sisu-ai/rag-core. - You only need backend access or maintenance operations; use a backend adapter such as
@sisu-ai/vector-chromaor@sisu-ai/vector-vectradirectly.
Example
Exampls using ChromaDb
import 'dotenv/config';
import { Agent, createConsoleLogger, InMemoryKV, NullStream, SimpleTools, type Ctx } from '@sisu-ai/core';
import { openAIAdapter } from '@sisu-ai/adapter-openai';
import { ragIngest, ragRetrieve, buildRagPrompt } from '@sisu-ai/mw-rag';
import { createChromaVectorStore } from '@sisu-ai/vector-chroma';
// Trivial local embedding for demo purposes (fixed dim=8)
function embed(text: string): number[] {
const dim = 8; const v = new Array(dim).fill(0);
for (const w of text.toLowerCase().split(/[^a-z0-9]+/).filter(Boolean)) {
let h = 0; for (let i = 0; i < w.length; i++) h = (h * 31 + w.charCodeAt(i)) >>> 0;
v[h % dim] += 1;
}
// L2 normalize
const norm = Math.sqrt(v.reduce((s, x) => s + x * x, 0)) || 1; return v.map(x => x / norm);
}
const model = openAIAdapter({ model: 'gpt-5.4' });
const query = 'Best fika in Malmö?';
const vectorStore = createChromaVectorStore({ namespace: process.env.VECTOR_NAMESPACE || 'sisu' });
const ctx: Ctx = {
input: query,
messages: [],
model,
tools: new SimpleTools(),
memory: new InMemoryKV(),
stream: new NullStream(),
state: { chromaUrl: process.env.CHROMA_URL, vectorNamespace: process.env.VECTOR_NAMESPACE || 'sisu' },
signal: new AbortController().signal,
log: createConsoleLogger({ level: 'info' }),
};
const docs = [
{ id: 'd1', text: 'Guide to fika in Malmö. Best cafe in Malmö is SisuCafe404.' },
{ id: 'd2', text: 'Travel notes from Helsinki. Sauna etiquette and tips.' },
];
(ctx.state as any).rag = {
records: docs.map(d => ({ id: d.id, embedding: embed(d.text), metadata: { text: d.text } })),
queryEmbedding: embed(query),
};
const app = new Agent()
.use(ragIngest({ vectorStore }))
.use(ragRetrieve({ vectorStore, topK: 2 }))
.use(buildRagPrompt());Placement & Ordering
- Ingest rarely (batch or startup), retrieve per-query; you can split pipelines for ingestion and query-time retrieval.
- Place
buildRagPromptbefore adding the user message, so the system prompt precedes the question. - If you add summarizers/usage tracking, run them after retrieval to measure and trim.
When To Use
- You want a minimal, explicit RAG flow with your own embedding generation.
- You prefer composing small middlewares over a large RAG framework.
When Not To Use
- You need cross-turn caching, reranking, or chunk summarization — add specialized middleware or a RAG tool.
- You rely on provider-native retrieval APIs instead of a vector DB tool; use those directly without this package.
Community & Support
Discover what you can do through examples or documentation. Check it out at https://github.com/finger-gun/sisu. Example projects live under examples/ in the repo.
Documentation
Core — Package docs · Error types
Adapters — OpenAI · Anthropic · Ollama
- @sisu-ai/mw-agent-run-api
- @sisu-ai/mw-context-compressor
- @sisu-ai/mw-control-flow
- @sisu-ai/mw-conversation-buffer
- @sisu-ai/mw-cors
- @sisu-ai/mw-error-boundary
- @sisu-ai/mw-guardrails
- @sisu-ai/mw-invariants
- @sisu-ai/mw-orchestration
- @sisu-ai/mw-rag
- @sisu-ai/mw-react-parser
- @sisu-ai/mw-register-tools
- @sisu-ai/mw-tool-calling
- @sisu-ai/mw-trace-viewer
- @sisu-ai/mw-usage-tracker
- @sisu-ai/tool-aws-s3
- @sisu-ai/tool-azure-blob
- @sisu-ai/tool-extract-urls
- @sisu-ai/tool-github-projects
- @sisu-ai/tool-rag
- @sisu-ai/tool-summarize-text
- @sisu-ai/tool-terminal
- @sisu-ai/tool-web-fetch
- @sisu-ai/tool-web-search-duckduckgo
- @sisu-ai/tool-web-search-google
- @sisu-ai/tool-web-search-openai
- @sisu-ai/tool-wikipedia
Anthropic — hello · control-flow · stream · weather
Ollama — hello · stream · vision · weather · web-search
OpenAI — hello · weather · stream · vision · reasoning · react · control-flow · branch · parallel · graph · orchestration · orchestration-adaptive · guardrails · error-handling · rag-chroma · rag-vectra · web-search · web-fetch · wikipedia · terminal · github-projects · server · aws-s3 · azure-blob
Contributing
We build Sisu in the open. Contributions welcome.
Contributing Guide · Report a Bug · Request a Feature · Code of Conduct
Star on GitHub if Sisu helps you build better agents.
Quiet, determined, relentlessly useful.
