@stacklatte/context-manager
v0.3.1
Published
Versioned AI context cache with per-model delta tracking and intent-aware context injection
Maintainers
Readme
@stacklatte/context-manager
Versioned AI context cache with per-model delta tracking.
Instead of sending your full context on every request, this package tracks which blocks each model has already seen and sends only what changed — adds, updates, removals, and reorders. It also classifies user messages so you can skip the context entirely when the user is just confirming or declining.
The problem
Every token you send to an AI model costs money and burns through the context window. Most of your context — project background, decisions, step definitions — doesn't change between requests. Sending it all again anyway is wasteful.
This package maintains a manifest of what each model has already received. On each request you get a minimal delta: only the blocks that are new or changed since the last commit.
Install
npm install @stacklatte/context-managerQuick start
import {
ContextCacheManager,
formatDeltaForPrompt,
hashContent,
} from '@stacklatte/context-manager';
const manager = new ContextCacheManager();
const blocks = [
{
id: 'goal',
type: 'project',
content: 'Build a user authentication system.',
priority: 1,
version: 1,
hash: hashContent('Build a user authentication system.'),
createdAt: Date.now(),
updatedAt: Date.now(),
},
];
// On each user message:
const { delta } = manager.prepareIfNeeded('gpt-4o', userMessage, blocks);
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
// inject context into system prompt — model treats it as silent background knowledge
content: [baseSystemPrompt, delta ? formatDeltaForPrompt(delta) : '']
.filter(Boolean)
.join('\n\n'),
},
{ role: 'user', content: userMessage },
],
});
// Commit after a successful response so the next request only sends what changed
manager.commit('gpt-4o', blocks);Core concepts
Blocks
A ContextBlock is a unit of context — a project goal, a track definition, a decision, a conversation excerpt. Each block has an id, type, content, priority, version, and hash.
type BlockType =
| 'project' | 'track' | 'phase' | 'step'
| 'knowledge' | 'decision' | 'conversation';
type ContextBlock = {
id: string;
type: BlockType;
content: string;
priority: number;
dependencies?: string[];
version: number;
hash: string;
createdAt: number;
updatedAt: number;
};Use hashContent(content) to compute a stable, cross-platform hash for the hash field.
Delta packages
A DeltaPackage describes what changed since the model's last known state. Each entry is one of:
| op | meaning |
|---|---|
| add | block the model has never seen |
| update | block whose content, version, type, or dependencies changed |
| remove | block that no longer exists |
| reorder | block whose priority changed but nothing else did |
type DeltaPackage = {
modelId: string;
fromVersion: number;
toVersion: number;
entries: DeltaEntry[];
fullSeed: boolean; // true on first request — no prior state
};Manifests
A CacheManifest is the manager's internal record of what a model last received. You can inspect it via getManifest() but don't need to manage it — commit maintains it automatically.
Stateless APIs
By default the manager holds state in memory. In a stateless API (serverless functions, REST handlers) each request starts a fresh instance and the delta optimization is lost — every request looks like a first request and sends a full seed.
Use exportModelState and importModelState to persist the state externally and rehydrate it on each request:
// At the start of each request — rehydrate from your store
const manager = new ContextCacheManager();
const stored = await redis.get(`ctx:${userId}:${modelId}`);
if (stored) manager.importModelState(modelId, JSON.parse(stored));
// Normal flow
const { delta } = manager.prepareIfNeeded(modelId, userMessage, blocks);
const response = await callModel(modelId, systemPrompt, userMessage, delta);
manager.commit(modelId, blocks);
// Persist updated state before the request ends
await redis.set(`ctx:${userId}:${modelId}`, JSON.stringify(manager.exportModelState(modelId)));ModelState is a plain JSON-serializable object — store it anywhere: Redis, Postgres, DynamoDB, a session cookie.
Request flow
user message
│
▼
prepareIfNeeded(modelId, message, blocks)
│
├─ 'confirmation' / 'decline' / 'gratitude'
│ delta = null, matchedBlockIds = null — skip context
│
└─ 'contextual'
│
▼
relevance check (keyword overlap against block content)
│
├─ no keyword match
│ delta = null, matchedBlockIds = [] — off-topic, skip context
│
└─ keyword match
delta = only changed blocks since last commit
matchedBlockIds = ['block-a', 'block-b']
│
▼
formatDeltaForPrompt(delta)
│
▼
inject into system prompt ← silent background knowledge
│
▼
model responds to user message
│
▼
commit(modelId, blocks) ← update baseline for next requestAPI
ContextCacheManager
prepareIfNeeded(modelId, userMessage, blocks, options?)
The main entry point. Applies two filters before deciding whether to send context:
- Intent — confirmations and declines skip context entirely ("yes", "ok", "nope")
- Relevance — contextual messages that share no keywords with any block also skip context ("tell me a joke" when your blocks are about authentication)
const { delta, classification, matchedBlockIds } = manager.prepareIfNeeded(
'gpt-4o',
userMessage,
blocks
);
// delta → DeltaPackage | null
// classification → { intent: 'confirmation'|'decline'|'gratitude'|'contextual', needsContext: boolean }
// matchedBlockIds → null (confirmation/decline/gratitude) | [] (no match) | ['id1','id2'] (matched)The relevance index is built from block content on the first call and cached until the block set changes — no overhead on repeated calls with the same blocks.
Custom relevance checker — plug in embeddings or any other logic:
const { delta } = manager.prepareIfNeeded('gpt-4o', userMessage, blocks, {
checkRelevance: (message, blocks) => {
// return ids of blocks relevant to this message
// return [] to skip context
return myEmbeddingSearch(message, blocks);
},
});prepare(modelId, blocks): DeltaPackage
Computes what changed since the last commit without classifying the message. Use this when you want the delta unconditionally.
const delta = manager.prepare('gpt-4o', blocks);Does not update the manifest — call commit after the model acknowledges the delta.
commit(modelId, blocks, removedIds?): void
Persists the current block state as the model's new baseline. Call this after a successful model response.
manager.commit('gpt-4o', blocks);
manager.commit('gpt-4o', updatedBlocks, ['deleted-block-id']);reset(modelId): void
Clears the manifest and version counter for a model. The next prepare() returns a full seed — all blocks as add entries. Use this to force a periodic context refresh so the model never drifts too far from the current state.
const ROTATION_INTERVAL = 8;
if (++messageCount % ROTATION_INTERVAL === 0) {
manager.reset(modelId);
}
const { delta } = manager.prepareIfNeeded(modelId, message, blocks);
// delta.fullSeed === true on rotation turnsgetStaleModels(blocks): string[]
Returns the IDs of all registered models whose cached state no longer matches the given blocks. Useful when blocks change outside of a request cycle and you need to know which models are out of date.
const stale = manager.getStaleModels(blocks);
// ['gpt-4o', 'claude-sonnet-4-6'] — these models need a delta on next requestexportModelState(modelId): ModelState | null
Returns the manifest and version counter for a model as a plain JSON-serializable object. Returns null if no commit has been made. Use this to persist state between requests in a stateless API.
const state = manager.exportModelState(modelId);
await redis.set(`ctx:${userId}:${modelId}`, JSON.stringify(state));importModelState(modelId, state: ModelState): void
Restores a previously exported state. After import, prepare() diffs against the restored manifest rather than treating the model as new.
const stored = await redis.get(`ctx:${userId}:${modelId}`);
if (stored) manager.importModelState(modelId, JSON.parse(stored));getManifest(modelId): CacheManifest | null
Returns a read-only snapshot of the model's current manifest. Returns null if no commit has been made for this model.
keywordRelevance(message, blocks, index?): string[]
The built-in relevance checker used by prepareIfNeeded. Returns the ids of blocks whose content shares keywords with the message. Returns [] if no blocks match.
Stop words ("the", "is", "and", etc.) are filtered before matching. The index is built from block content — pass a pre-built index to avoid rebuilding on repeated calls.
import { keywordRelevance, buildIndex } from '@stacklatte/context-manager';
// one-shot
const matchedIds = keywordRelevance('fix the login bug', blocks);
// reuse a pre-built index across many messages
const index = buildIndex(blocks);
const ids1 = keywordRelevance('fix the login bug', blocks, index);
const ids2 = keywordRelevance('jwt token expiry', blocks, index);Also exported: tokenize(text), buildIndex(blocks), findRelevantBlocks(message, index) for building custom relevance logic on top of the same primitives.
formatDeltaForPrompt(delta): string
Converts a DeltaPackage into text ready to inject into a system prompt. Returns an empty string if there are no entries.
Each block is wrapped in <context> tags. The output begins with an instruction telling the model to treat the content as silent background knowledge — not something to acknowledge or respond to.
import { formatDeltaForPrompt } from '@stacklatte/context-manager';
const contextText = delta ? formatDeltaForPrompt(delta) : '';Always inject into the system prompt, not the user message. Models are trained to treat system prompt content as background instructions. Putting context in a user turn risks the model responding to the context update instead of the user's actual question.
// OpenAI / compatible API
{
role: 'system',
content: [baseSystemPrompt, formatDeltaForPrompt(delta)].filter(Boolean).join('\n\n'),
}// Anthropic API
{
system: [baseSystemPrompt, formatDeltaForPrompt(delta)].filter(Boolean).join('\n\n'),
}classifyIntent(message): IntentClassification
Classifies a user message as confirmation, decline, gratitude, or contextual. Used internally by prepareIfNeeded — call it directly if you need the classification without preparing a delta.
All three non-contextual intents have needsContext: false. gratitude is distinct from confirmation so callers can respond differently (e.g. "You're welcome!"). Gratitude with a follow-up question is contextual: "thanks but can you clarify" → contextual.
import { classifyIntent } from '@stacklatte/context-manager';
classifyIntent('yes please'); // { intent: 'confirmation', needsContext: false }
classifyIntent('no thanks'); // { intent: 'decline', needsContext: false }
classifyIntent('thank you'); // { intent: 'gratitude', needsContext: false }
classifyIntent('how does this work?') // { intent: 'contextual', needsContext: true }Pattern coverage (~130 patterns total):
- Confirmations (~55): yes/yeah/yep/y, totally/for sure/bet/100, ok/okay/alright, exactly/precisely/spot on, noted/understood/got it/wilco, go ahead/proceed/carry on/ship it, great/perfect/brilliant/love it, and more
- Declines (~50): no/nope/nah, absolutely not/no way/hard no, not really/maybe not, stop/cancel/abort, never mind/forget it, hold on/hang on/one sec, not yet/later, wrong/off track/not even close, i doubt it/not convinced/skeptical, and more
- Gratitude (~25): thank you and all variants, appreciate it/much appreciated, cheers/ty/thx, much obliged/grateful, and more
hashContent(content): string
Produces a stable 16-character hex hash of a string. Unicode-normalised (NFC) and line-ending-normalised (CRLF → LF) so the same logical content produces the same hash on any platform.
import { hashContent } from '@stacklatte/context-manager';
const hash = hashContent('my block content');bumpVersion(current): number
Safe version increment. Throws RangeError on NaN, floats, negatives, or overflow.
import { bumpVersion } from '@stacklatte/context-manager';
const nextVersion = bumpVersion(block.version);License
MIT
