@msm-core/context
v0.2.2
Published
Context assembly layer for msm-mini — parallel store queries, tier merging, token budget enforcement
Readme
@msm-core/context
Context assembly layer for AI agents — parallel store queries, tier merging, and token budget enforcement.
@msm-core/context retrieves, merges, and trims knowledge from multiple vector stores before each agent call. It is designed as the context-supply partner of @msm-core/mini, but works with any agent runtime or standalone.
Zero runtime dependencies. Pure TypeScript — bring your own HTTP client (uses fetch built into Node.js 18+).
Install
npm install @msm-core/contextQuick Start
import { createContextAssembler } from "@msm-core/context";
import { QdrantAdapter } from "@msm-core/context/adapters";
// Create a Qdrant adapter for each knowledge collection
const kbAdapter = QdrantAdapter.create({
url: process.env.QDRANT_URL!,
collection: "knowledge-base",
embedProvider: "gemini",
embedModel: "gemini-embedding-001",
embedApiKey: process.env.GEMINI_API_KEY!,
});
const sessionAdapter = QdrantAdapter.create({
url: process.env.QDRANT_URL!,
collection: "session-notes",
embedProvider: "gemini",
embedModel: "gemini-embedding-001",
embedApiKey: process.env.GEMINI_API_KEY!,
});
// Create the assembler
const assembler = createContextAssembler({
tiers: [
// Lower priority number = higher priority (included first)
{ name: "kb", priority: 1, adapters: [kbAdapter], topK: 8 },
{ name: "session", priority: 2, adapters: [sessionAdapter], topK: 5 },
],
budget: { maxTokens: 6000 },
});
// Before each agent call:
const ctx = await assembler.build({
text: userMessage,
sessionId: "user-123",
});
console.log(
`${ctx.results.length} results, ${ctx.totalTokens} tokens, truncated=${ctx.truncated}`,
);Concepts
Tiers
A tier is a named group of adapters queried in parallel. Each tier has a priority (integer — lower = higher priority). Results from tier 1 always appear before tier 2 in the merged output, regardless of score.
Use tiers to model retrieval layers:
tiers: [
{ name: "ground_truth", priority: 1, adapters: [countryRegulations] }, // always first
{ name: "project_docs", priority: 2, adapters: [projectKb] },
{ name: "session_notes", priority: 3, adapters: [sessionNotes] },
];Token Budget
The assembler enforces a token budget using a 4-chars-per-token estimate (configurable). Items marked neverTruncate: true are always kept; remaining budget is filled by normal results in tier+score order.
Deduplication
If two results share the same first 200 characters of content, only the first one (by priority+score) is kept.
Adapters
QdrantAdapter
Full Qdrant v1 REST adapter with embedding. Implements both StoreAdapter (for assembler) and a richer searchKnowledge / indexDocument API.
import { QdrantAdapter } from "@msm-core/context/adapters";
const adapter = QdrantAdapter.create({
url: "http://localhost:6333",
collection: "my-collection",
// Embedding — pick one provider:
embedProvider: "gemini", // "gemini" | "openai" | "ollama"
embedModel: "gemini-embedding-001",
embedApiKey: process.env.GEMINI_API_KEY,
// Optional Qdrant auth:
apiKey: process.env.QDRANT_API_KEY,
});
// Used by the assembler automatically via StoreAdapter.search()
// Direct usage:
const hits = await adapter.searchKnowledge("solar energy regulations", {
topK: 5,
minScore: 0.3,
tags: ["energy", "egypt"],
});
// Index a document (chunked automatically):
await adapter.indexDocument({
docId: "report-2024",
title: "Egypt Solar Report",
text: longDocumentText,
tags: ["solar", "egypt"],
});Embedding providers:
| Provider | embedProvider | embedModel default |
| -------------- | --------------- | -------------------------- |
| Google Gemini | "gemini" | "gemini-embedding-001" |
| OpenAI | "openai" | "text-embedding-3-small" |
| Ollama (local) | "ollama" | "nomic-embed-text" |
Embeddings are cached in-process (LRU 256 entries, 5-min TTL).
InMemoryAdapter
For testing and local development. Matches by text substring.
import { InMemoryAdapter } from "@msm-core/context/adapters";
const adapter = new InMemoryAdapter(
[
{ id: "doc1", content: "Egypt feasibility overview", score: 0.9 },
{ id: "doc2", content: "Saudi market analysis", score: 0.8 },
],
"test-store",
);
const results = await adapter.search({ text: "egypt", topK: 3 });NullAdapter
Always returns empty results. Useful as a disabled-tier placeholder.
import { NullAdapter } from "@msm-core/context/adapters";
const adapter = new NullAdapter();Custom Adapter
Implement StoreAdapter to add any store (Postgres pgvector, Pinecone, Weaviate, etc.):
import type { StoreAdapter, StoreQuery, StoreResult } from "@msm-core/context";
class MyAdapter implements StoreAdapter {
async search(query: StoreQuery): Promise<StoreResult[]> {
const raw = await myDb.semanticSearch(query.text, query.topK ?? 5);
return raw.map((r) => ({
content: r.text,
source: `mydb:${r.id}`,
score: r.score,
}));
}
}Budget & Utilities
import { fitToBudget, estimateTokens } from "@msm-core/context";
// Standalone budget enforcement
const { kept, totalTokens, truncated } = fitToBudget(results, {
maxTokens: 4000,
charsPerToken: 0.25, // default: 4 chars = 1 token
});
// Token estimation
const tokens = estimateTokens("Some text here", 0.25);import { mergeResults } from "@msm-core/context";
// Standalone merge (sorted by priority then score, deduplicated)
const merged = mergeResults([
{ tierName: "primary", priority: 1, results: [...] },
{ tierName: "fallback", priority: 2, results: [...] },
]);Architecture
assembler.build({ text, sessionId })
│
├── queryTier("ground_truth") ─┐
├── queryTier("project_docs") ├─ parallel Promise.all
└── queryTier("session_notes") ─┘
│
▼
mergeResults()
(tier priority → score → dedup)
│
▼
fitToBudget()
(neverTruncate pinned → fill remaining)
│
▼
AssembledContext {
results, totalTokens,
truncated, tierCounts
}Adapter errors are swallowed per-adapter — a failing Qdrant collection never breaks the whole assembly.
API Reference
createContextAssembler(config)
interface AssemblerConfig {
tiers: TierConfig[];
budget?: { maxTokens?: number; charsPerToken?: number };
defaultTopK?: number; // default: 5
defaultMinScore?: number; // default: 0.15
}
interface TierConfig {
name: string;
priority: number; // 1 = highest priority
adapters: StoreAdapter[];
topK?: number; // overrides defaultTopK for this tier
minScore?: number; // overrides defaultMinScore for this tier
}assembler.build(query)
interface AssemblerQuery {
text: string;
sessionId?: string;
tierOverrides?: Record<
string,
{ topK?: number; minScore?: number; filter?: Record<string, unknown> }
>;
}
interface AssembledContext {
results: StoreResult[];
totalTokens: number;
truncated: boolean;
tierCounts: Record<string, number>;
}StoreResult
interface StoreResult {
content: string;
source: string; // e.g. "kb:chunk-42"
score: number; // 0–1
tokenCount?: number; // pre-computed token count (skips estimation)
neverTruncate?: boolean; // always kept regardless of budget
}Integration with @msm-core/mini
import { createAgent, createGeminiBrain } from "@msm-core/mini";
import { createContextAssembler } from "@msm-core/context";
import { QdrantAdapter } from "@msm-core/context/adapters";
const assembler = createContextAssembler({ tiers: [...] });
const agent = createAgent({ brain, redis, tools: [...] });
// In your request handler:
const ctx = await assembler.build({ text: req.body.message, sessionId });
const outcome = await agent.handle({
sessionId,
message: req.body.message,
context: {
memories: ctx.results.map((r) => ({
source: r.source,
content: r.content,
score: r.score,
tokenCount: r.tokenCount,
})),
},
});License
MIT
