@halo-sdk/rag
v2.0.0
Published
Cache-aware retrieval-augmented generation for Halo AI SDK — sticky retrieval + append-only growth that keeps the prefix cache warm
Downloads
214
Maintainers
Readme
@halo-sdk/rag
Cache-aware retrieval-augmented generation for Halo AI SDK.
Most RAG implementations re-retrieve every turn and splice fresh documents into the prompt — which silently busts the provider's prefix cache on every message. @halo-sdk/rag produces a reusable cache segment with two cache-preserving policies:
- Sticky retrieval — near-duplicate consecutive queries skip re-retrieval entirely; the segment (and the cache after it) is untouched.
- Append-only growth — when a new result set extends the previous one, new docs are appended rather than rebuilding, so the already-cached prefix stays valid.
Usage
import { Halo } from "@halo-sdk/core";
import { CacheAwareRag, VectorRetriever, MemoryVectorStore } from "@halo-sdk/rag";
const store = new MemoryVectorStore();
// store.add(docs, await embedder.embed(docs.map(d => d.text)))
const retriever = new VectorRetriever(myEmbedder, store);
const rag = new CacheAwareRag({ retriever, k: 4 });
await rag.update("what is prompt caching?");
const agent = halo.agent({
/* ... */
});
agent.setContextSegments([rag.segment]);Seams
Embedder, VectorStore, and Retriever are interfaces — bring your own embedding model and vector DB. MemoryVectorStore (cosine) and VectorRetriever are included for tests, demos, and small corpora. Halo owns the cache-aware orchestration, not the embedding/storage.
