@stacklatte/context-manager

v0.3.1

Published

9 days ago

Versioned AI context cache with per-model delta tracking and intent-aware context injection

0High
0Medium
0Low

rasmuslagoni

ai context cache delta llm context-management context-window token-optimization intent-classification openai anthropic

@stacklatte/context-manager

Versioned AI context cache with per-model delta tracking.

Instead of sending your full context on every request, this package tracks which blocks each model has already seen and sends only what changed — adds, updates, removals, and reorders. It also classifies user messages so you can skip the context entirely when the user is just confirming or declining.

The problem

Every token you send to an AI model costs money and burns through the context window. Most of your context — project background, decisions, step definitions — doesn't change between requests. Sending it all again anyway is wasteful.

This package maintains a manifest of what each model has already received. On each request you get a minimal delta: only the blocks that are new or changed since the last commit.

Install

npm install @stacklatte/context-manager

Quick start

import {
  ContextCacheManager,
  formatDeltaForPrompt,
  hashContent,
} from '@stacklatte/context-manager';

const manager = new ContextCacheManager();

const blocks = [
  {
    id: 'goal',
    type: 'project',
    content: 'Build a user authentication system.',
    priority: 1,
    version: 1,
    hash: hashContent('Build a user authentication system.'),
    createdAt: Date.now(),
    updatedAt: Date.now(),
  },
];

// On each user message:
const { delta } = manager.prepareIfNeeded('gpt-4o', userMessage, blocks);

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    {
      role: 'system',
      // inject context into system prompt — model treats it as silent background knowledge
      content: [baseSystemPrompt, delta ? formatDeltaForPrompt(delta) : '']
        .filter(Boolean)
        .join('\n\n'),
    },
    { role: 'user', content: userMessage },
  ],
});

// Commit after a successful response so the next request only sends what changed
manager.commit('gpt-4o', blocks);

Core concepts

Blocks

A ContextBlock is a unit of context — a project goal, a track definition, a decision, a conversation excerpt. Each block has an id, type, content, priority, version, and hash.

type BlockType =
  | 'project' | 'track' | 'phase' | 'step'
  | 'knowledge' | 'decision' | 'conversation';

type ContextBlock = {
  id: string;
  type: BlockType;
  content: string;
  priority: number;
  dependencies?: string[];
  version: number;
  hash: string;
  createdAt: number;
  updatedAt: number;
};

Use hashContent(content) to compute a stable, cross-platform hash for the hash field.

Delta packages

A DeltaPackage describes what changed since the model's last known state. Each entry is one of:

| op | meaning | |---|---| | add | block the model has never seen | | update | block whose content, version, type, or dependencies changed | | remove | block that no longer exists | | reorder | block whose priority changed but nothing else did |

type DeltaPackage = {
  modelId: string;
  fromVersion: number;
  toVersion: number;
  entries: DeltaEntry[];
  fullSeed: boolean; // true on first request — no prior state
};

Manifests

A CacheManifest is the manager's internal record of what a model last received. You can inspect it via getManifest() but don't need to manage it — commit maintains it automatically.

Stateless APIs

By default the manager holds state in memory. In a stateless API (serverless functions, REST handlers) each request starts a fresh instance and the delta optimization is lost — every request looks like a first request and sends a full seed.

Use exportModelState and importModelState to persist the state externally and rehydrate it on each request:

// At the start of each request — rehydrate from your store
const manager = new ContextCacheManager();
const stored = await redis.get(`ctx:${userId}:${modelId}`);
if (stored) manager.importModelState(modelId, JSON.parse(stored));

// Normal flow
const { delta } = manager.prepareIfNeeded(modelId, userMessage, blocks);
const response = await callModel(modelId, systemPrompt, userMessage, delta);
manager.commit(modelId, blocks);

// Persist updated state before the request ends
await redis.set(`ctx:${userId}:${modelId}`, JSON.stringify(manager.exportModelState(modelId)));

ModelState is a plain JSON-serializable object — store it anywhere: Redis, Postgres, DynamoDB, a session cookie.

Request flow

user message
     │
     ▼
prepareIfNeeded(modelId, message, blocks)
     │
     ├─ 'confirmation' / 'decline' / 'gratitude'
     │   delta = null, matchedBlockIds = null — skip context
     │
     └─ 'contextual'
          │
          ▼
     relevance check (keyword overlap against block content)
          │
          ├─ no keyword match
          │   delta = null, matchedBlockIds = [] — off-topic, skip context
          │
          └─ keyword match
              delta = only changed blocks since last commit
              matchedBlockIds = ['block-a', 'block-b']
                   │
                   ▼
              formatDeltaForPrompt(delta)
                   │
                   ▼
              inject into system prompt   ← silent background knowledge
                   │
                   ▼
              model responds to user message
                   │
                   ▼
              commit(modelId, blocks)     ← update baseline for next request

API

`ContextCacheManager`

`prepareIfNeeded(modelId, userMessage, blocks, options?)`

The main entry point. Applies two filters before deciding whether to send context:

Intent — confirmations and declines skip context entirely ("yes", "ok", "nope")
Relevance — contextual messages that share no keywords with any block also skip context ("tell me a joke" when your blocks are about authentication)

const { delta, classification, matchedBlockIds } = manager.prepareIfNeeded(
  'gpt-4o',
  userMessage,
  blocks
);
// delta              → DeltaPackage | null
// classification     → { intent: 'confirmation'|'decline'|'gratitude'|'contextual', needsContext: boolean }
// matchedBlockIds    → null (confirmation/decline/gratitude) | [] (no match) | ['id1','id2'] (matched)

The relevance index is built from block content on the first call and cached until the block set changes — no overhead on repeated calls with the same blocks.

Custom relevance checker — plug in embeddings or any other logic:

const { delta } = manager.prepareIfNeeded('gpt-4o', userMessage, blocks, {
  checkRelevance: (message, blocks) => {
    // return ids of blocks relevant to this message
    // return [] to skip context
    return myEmbeddingSearch(message, blocks);
  },
});

`prepare(modelId, blocks): DeltaPackage`

Computes what changed since the last commit without classifying the message. Use this when you want the delta unconditionally.

const delta = manager.prepare('gpt-4o', blocks);

Does not update the manifest — call commit after the model acknowledges the delta.

`commit(modelId, blocks, removedIds?): void`

Persists the current block state as the model's new baseline. Call this after a successful model response.

manager.commit('gpt-4o', blocks);
manager.commit('gpt-4o', updatedBlocks, ['deleted-block-id']);

`reset(modelId): void`

Clears the manifest and version counter for a model. The next prepare() returns a full seed — all blocks as add entries. Use this to force a periodic context refresh so the model never drifts too far from the current state.

const ROTATION_INTERVAL = 8;

if (++messageCount % ROTATION_INTERVAL === 0) {
  manager.reset(modelId);
}

const { delta } = manager.prepareIfNeeded(modelId, message, blocks);
// delta.fullSeed === true on rotation turns

`getStaleModels(blocks): string[]`

Returns the IDs of all registered models whose cached state no longer matches the given blocks. Useful when blocks change outside of a request cycle and you need to know which models are out of date.

const stale = manager.getStaleModels(blocks);
// ['gpt-4o', 'claude-sonnet-4-6'] — these models need a delta on next request

`exportModelState(modelId): ModelState | null`

Returns the manifest and version counter for a model as a plain JSON-serializable object. Returns null if no commit has been made. Use this to persist state between requests in a stateless API.

const state = manager.exportModelState(modelId);
await redis.set(`ctx:${userId}:${modelId}`, JSON.stringify(state));

`importModelState(modelId, state: ModelState): void`

Restores a previously exported state. After import, prepare() diffs against the restored manifest rather than treating the model as new.

const stored = await redis.get(`ctx:${userId}:${modelId}`);
if (stored) manager.importModelState(modelId, JSON.parse(stored));

`getManifest(modelId): CacheManifest | null`

Returns a read-only snapshot of the model's current manifest. Returns null if no commit has been made for this model.

`keywordRelevance(message, blocks, index?): string[]`

The built-in relevance checker used by prepareIfNeeded. Returns the ids of blocks whose content shares keywords with the message. Returns [] if no blocks match.

Stop words ("the", "is", "and", etc.) are filtered before matching. The index is built from block content — pass a pre-built index to avoid rebuilding on repeated calls.

import { keywordRelevance, buildIndex } from '@stacklatte/context-manager';

// one-shot
const matchedIds = keywordRelevance('fix the login bug', blocks);

// reuse a pre-built index across many messages
const index = buildIndex(blocks);
const ids1 = keywordRelevance('fix the login bug', blocks, index);
const ids2 = keywordRelevance('jwt token expiry', blocks, index);

Also exported: tokenize(text), buildIndex(blocks), findRelevantBlocks(message, index) for building custom relevance logic on top of the same primitives.

`formatDeltaForPrompt(delta): string`

Converts a DeltaPackage into text ready to inject into a system prompt. Returns an empty string if there are no entries.

Each block is wrapped in <context> tags. The output begins with an instruction telling the model to treat the content as silent background knowledge — not something to acknowledge or respond to.

import { formatDeltaForPrompt } from '@stacklatte/context-manager';

const contextText = delta ? formatDeltaForPrompt(delta) : '';

Always inject into the system prompt, not the user message. Models are trained to treat system prompt content as background instructions. Putting context in a user turn risks the model responding to the context update instead of the user's actual question.

// OpenAI / compatible API
{
  role: 'system',
  content: [baseSystemPrompt, formatDeltaForPrompt(delta)].filter(Boolean).join('\n\n'),
}

// Anthropic API
{
  system: [baseSystemPrompt, formatDeltaForPrompt(delta)].filter(Boolean).join('\n\n'),
}

`classifyIntent(message): IntentClassification`

Classifies a user message as confirmation, decline, gratitude, or contextual. Used internally by prepareIfNeeded — call it directly if you need the classification without preparing a delta.

All three non-contextual intents have needsContext: false. gratitude is distinct from confirmation so callers can respond differently (e.g. "You're welcome!"). Gratitude with a follow-up question is contextual: "thanks but can you clarify" → contextual.

import { classifyIntent } from '@stacklatte/context-manager';

classifyIntent('yes please');         // { intent: 'confirmation', needsContext: false }
classifyIntent('no thanks');          // { intent: 'decline',      needsContext: false }
classifyIntent('thank you');          // { intent: 'gratitude',    needsContext: false }
classifyIntent('how does this work?') // { intent: 'contextual',   needsContext: true }

Pattern coverage (~130 patterns total):

Confirmations (~55): yes/yeah/yep/y, totally/for sure/bet/100, ok/okay/alright, exactly/precisely/spot on, noted/understood/got it/wilco, go ahead/proceed/carry on/ship it, great/perfect/brilliant/love it, and more
Declines (~50): no/nope/nah, absolutely not/no way/hard no, not really/maybe not, stop/cancel/abort, never mind/forget it, hold on/hang on/one sec, not yet/later, wrong/off track/not even close, i doubt it/not convinced/skeptical, and more
Gratitude (~25): thank you and all variants, appreciate it/much appreciated, cheers/ty/thx, much obliged/grateful, and more

`hashContent(content): string`

Produces a stable 16-character hex hash of a string. Unicode-normalised (NFC) and line-ending-normalised (CRLF → LF) so the same logical content produces the same hash on any platform.

import { hashContent } from '@stacklatte/context-manager';

const hash = hashContent('my block content');

`bumpVersion(current): number`

Safe version increment. Throws RangeError on NaN, floats, negatives, or overflow.

import { bumpVersion } from '@stacklatte/context-manager';

const nextVersion = bumpVersion(block.version);

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@stacklatte/context-manager

The problem

Install

Quick start

Core concepts

Blocks

Delta packages

Manifests

Stateless APIs

Request flow

API

ContextCacheManager

prepareIfNeeded(modelId, userMessage, blocks, options?)

prepare(modelId, blocks): DeltaPackage

commit(modelId, blocks, removedIds?): void

reset(modelId): void

getStaleModels(blocks): string[]

exportModelState(modelId): ModelState | null

importModelState(modelId, state: ModelState): void

getManifest(modelId): CacheManifest | null

keywordRelevance(message, blocks, index?): string[]

formatDeltaForPrompt(delta): string

classifyIntent(message): IntentClassification

hashContent(content): string

bumpVersion(current): number

License

`ContextCacheManager`

`prepareIfNeeded(modelId, userMessage, blocks, options?)`

`prepare(modelId, blocks): DeltaPackage`

`commit(modelId, blocks, removedIds?): void`

`reset(modelId): void`

`getStaleModels(blocks): string[]`

`exportModelState(modelId): ModelState | null`

`importModelState(modelId, state: ModelState): void`

`getManifest(modelId): CacheManifest | null`

`keywordRelevance(message, blocks, index?): string[]`

`formatDeltaForPrompt(delta): string`

`classifyIntent(message): IntentClassification`

`hashContent(content): string`

`bumpVersion(current): number`