@llm-context/core
v0.1.2
Published
Topic-aware context routing engine for LLM conversations. Auto-clusters messages by topic, assembles only relevant context using hybrid retrieval (vector + BM25 + RRF) and LLM judgment.
Downloads
139
Maintainers
Readme
@llm-context/core
Core engine for Lang Context Attention — a topic-aware context routing system for LLM conversations.
What It Does
In multi-turn LLM conversations, users jump between topics. This engine automatically:
- Clusters messages by topic using hybrid retrieval (vector + BM25 + RRF fusion)
- Routes each message to the correct topic via LLM judgment
- Assembles only relevant context with token budget management
- Streams responses back with full routing observability
The result: focused LLM responses with ~50% token savings.
Install
pnpm add @llm-context/coreQuick Start
import { createEngine } from '@llm-context/core'
const engine = createEngine({
store: yourStoreProvider, // Where to persist data
vectorSearch: yourVectorSearch, // Semantic similarity search
keywordSearch: yourKeywordSearch, // BM25 keyword search
chat: yourChatProvider, // LLM for responses
judge: yourJudgeProvider, // LLM for topic classification
embedding: yourEmbeddingProvider, // Text → vector embedding
})
// Create a session
const session = await engine.createSession('You are a helpful assistant.')
// Send messages — routing happens automatically
const { stream, routingDecision, rootQuestionId } =
await engine.processMessage(session.id, 'How do I deploy to AWS?')
for await (const chunk of stream) {
process.stdout.write(chunk)
}
// The next message is automatically routed to the right topic
const r2 = await engine.processMessage(session.id, 'What about using Docker on AWS?')
// → routed to same topic as above
const r3 = await engine.processMessage(session.id, 'Best chocolate cake recipe?')
// → creates a new topic (unrelated to AWS)Default Implementations
Use these companion packages for zero-config setup:
| Package | Description |
|---------|-------------|
| @llm-context/store-sqlite | SQLite storage + sqlite-vec vector search + FTS5 keyword search |
| @llm-context/provider-ai-sdk | Vercel AI SDK providers (OpenAI, Anthropic, etc.) |
import { createEngine } from '@llm-context/core'
import { createDatabase, SqliteStore, SqliteVectorSearch, SqliteKeywordSearch } from '@llm-context/store-sqlite'
import { AiSdkChatProvider, AiSdkJudgeProvider, AiSdkEmbeddingProvider } from '@llm-context/provider-ai-sdk'
import { openai } from '@ai-sdk/openai'
const db = createDatabase('./conversations.db')
const engine = createEngine({
store: new SqliteStore(db),
vectorSearch: new SqliteVectorSearch(db, 1536),
keywordSearch: new SqliteKeywordSearch(db),
chat: new AiSdkChatProvider(openai('gpt-4o-mini')),
judge: new AiSdkJudgeProvider({ model: openai('gpt-4o-mini') }),
embedding: new AiSdkEmbeddingProvider({
model: openai.embedding('text-embedding-3-small'),
dimensions: 1536,
}),
})Engine API
Session Management
engine.createSession(systemPrompt: string, title?: string): Promise<Session>
engine.getSession(sessionId: string): Promise<Session | null>Message Processing
// Core method — handles the full routing pipeline
engine.processMessage(sessionId: string, userMessage: string): Promise<{
stream: AsyncIterable<string> // Streaming LLM response
routingDecision: RoutingDecision // Full routing metadata
rootQuestionId: string // Which topic this was routed to
}>Query Methods
engine.getRootQuestions(sessionId): Promise<RootQuestion[]> // All topics
engine.getMessages(rootQuestionId): Promise<Message[]> // Messages in a topic
engine.getTimeline(sessionId): Promise<Message[]> // All messages chronologically
engine.getRoutingDecision(messageId): Promise<RoutingDecision | null>Manual Operations
engine.reassignMessage(messageId, newTopicId): Promise<void> // Fix routing errors
engine.linkQuestions(topicA, topicB): Promise<QuestionLink> // Link related topics
engine.unlinkQuestions(linkId): Promise<void>Configuration
createEngine({
// ... providers (required) ...
topK: 5, // Candidates per retrieval (default: 5)
rrfK: 60, // RRF fusion constant (default: 60)
minFusedScoreForJudge: 0.01, // Score threshold for judge (default: 0.01)
maxContextTokens: 4000, // Token budget for context (default: 4000)
summaryUpdateInterval: 5, // Re-summarize every N messages (default: 5)
summaryContextSize: 10, // Messages for summary prompt (default: 10)
// Callbacks
onRoutingComplete: (decision) => { /* routing telemetry */ },
onLinkSuggestion: (suggestion) => { /* UI notification */ },
})Provider Interfaces
Implement these to use your own storage, search, or LLM:
interface StoreProvider { /* Session, RootQuestion, Message, RoutingDecision, QuestionLink CRUD */ }
interface VectorSearchProvider { upsert, search, delete }
interface KeywordSearchProvider { upsert, search, delete }
interface ChatProvider { chat, streamChat }
interface JudgeProvider { judge }
interface EmbeddingProvider { embed, dimensions }Full interface definitions: interfaces.ts
Routing Flow
User Message → Embed → [Vector Search ∥ Keyword Search] → RRF Fusion → LLM Judge → Context Assembly → Stream ResponseSee the design spec for full architecture details.
License
MIT
