sliding-context
v0.3.2
Published
Provider-agnostic sliding window context manager for LLMs
Readme
sliding-context
Provider-agnostic sliding window context manager for LLMs. Keeps your conversation within a token budget by evicting old messages, summarizing context, and preserving tool call atomicity.
Description
sliding-context manages the conversation history portion of an LLM's context window. It divides the context into three zones -- a fixed system prompt, a rolling summary of older messages, and a verbatim recent-message buffer -- and automatically evicts and summarizes messages to keep the total token count within a configurable budget.
The package is provider-agnostic. It does not import any LLM SDK, make HTTP requests, or manage API keys. You supply a summarizer function that calls whichever LLM you use, and sliding-context orchestrates when and what to summarize. It works equally well with OpenAI, Anthropic, Google, Ollama, or any other provider.
Key properties:
- Zero runtime dependencies. All logic uses built-in JavaScript APIs.
- Tool call pair atomicity. Assistant messages with
tool_callsand their corresponding tool result messages are always evicted together -- they are never split across the summary boundary. - Pluggable token counting. Ships with a built-in approximate counter (
Math.ceil(text.length / 4)) and accepts any custom counter (tiktoken, gpt-tokenizer, etc.). - Serialization and persistence. Full context state can be serialized to JSON and restored across sessions, server restarts, or storage backends.
- Event hooks. Observe eviction, summarization, budget exceeded, and summary compression events for logging and monitoring.
- Three summarization strategies.
incremental,rolling, andanchored-- choose the one that fits your use case.
Installation
npm install sliding-contextRequires Node.js >= 18.
Quick Start
import { createSlidingContext } from "sliding-context";
import type { Message } from "sliding-context";
const ctx = createSlidingContext({
tokenBudget: 8192,
systemPrompt: "You are a helpful assistant.",
summarizer: async (messages: Message[], existingSummary?: string) => {
// Call your LLM of choice to summarize the evicted messages.
// Return the summary as a plain string.
return "Summary of the conversation so far...";
},
strategy: "incremental",
});
// Add messages as the conversation progresses
await ctx.addMessage({ role: "user", content: "Hello!" });
await ctx.addMessage({ role: "assistant", content: "Hi there!" });
// Retrieve the context window -- always fits within tokenBudget
const messages = ctx.getMessages();
// => [{ role: 'system', content: 'You are a helpful assistant.' },
// { role: 'user', content: 'Hello!' },
// { role: 'assistant', content: 'Hi there!' }]
// Inspect token usage
const breakdown = ctx.getTokenBreakdown();
// => { system: 12, anchor: 0, summary: 0, recent: 18, total: 30 }Features
Automatic Eviction and Summarization
When the total token count exceeds the configured budget, the oldest messages in the recent zone are evicted into a pending buffer. Once the pending buffer crosses a configurable threshold (by token count or message count), the summarizer is invoked to compress those messages into the rolling summary.
Tool Call Pair Atomicity
Assistant messages with tool_calls are always evicted as an atomic unit together with all corresponding tool result messages. This prevents orphaned tool calls from confusing the LLM.
Three Summarization Strategies
| Strategy | Behavior |
|---|---|
| incremental | Summarizes only newly evicted messages and merges with the existing summary. Most token-efficient. |
| rolling | Prepends the existing summary to evicted messages and re-summarizes the whole batch. Produces more coherent summaries. |
| anchored | Maintains a permanent anchor (never re-summarized) plus a rolling section for newer content. Best for preserving key context. |
Dynamic Budget Changes
Call setTokenBudget() at any time to resize the context window. Reducing the budget triggers immediate eviction and, if thresholds are met, summarization. This supports multi-model workflows where you switch from a large-context model to a smaller one mid-conversation.
Serialization and Persistence
Export the full context state as JSON with serialize() or serializeContext(). Restore it later with restoreSlidingContext(), re-supplying function-valued options (summarizer, tokenCounter, hooks) that cannot be serialized. Store state in Redis, DynamoDB, localStorage, or any other backend.
Event Hooks
Attach callbacks for observability: onEvict, onSummarize, onBudgetExceeded, and onSummaryCompressed. Hooks that throw are caught internally and do not crash the context manager.
Multi-Part Message Support
Messages with array content (text blocks, image blocks) are supported. Text parts are counted with the configured token counter. Non-text parts (images, etc.) use a flat token cost of 85 per part.
API Reference
createSlidingContext(options)
Creates and returns a new SlidingContext instance.
import { createSlidingContext } from "sliding-context";
const ctx = createSlidingContext({ tokenBudget: 4096 });Parameters:
options(SlidingContextOptions) -- Configuration object. See Configuration below.
Returns: SlidingContext
Throws: RangeError if tokenBudget is not a finite number >= 100.
SlidingContext Instance Methods
addMessage(message: Message): Promise<void>
Appends a message to the context. After each addition, checks the token budget and triggers eviction and summarization as needed.
await ctx.addMessage({ role: "user", content: "What time is it?" });getMessages(): Message[]
Returns the current message array, ready to send to any LLM API. The order is:
- System prompt (if configured)
- Anchor messages (if set)
- Pending buffer messages (evicted but not yet summarized)
- Summary message (if summarization has occurred)
- Recent messages (verbatim, newest last)
const messages = ctx.getMessages();getSummary(): string | undefined
Returns the current rolling summary text, or undefined if no summarization has occurred.
const summary = ctx.getSummary();getTokenCount(): number
Returns the total token count across all zones (system + anchor + summary + recent + pending).
const total = ctx.getTokenCount();getTokenBreakdown(): { system, anchor, summary, recent, total }
Returns a breakdown of token usage by zone.
const bd = ctx.getTokenBreakdown();
// { system: 12, anchor: 0, summary: 150, recent: 800, total: 962 }getRecentMessageCount(): number
Returns the count of recent messages (including the pending buffer, excluding system prompt, anchor, and summary).
const count = ctx.getRecentMessageCount();getTotalMessageCount(): number
Returns the total count of all messages including system prompt, anchor, summary, pending buffer, and recent.
const total = ctx.getTotalMessageCount();setAnchor(messages: Message[]): void
Replaces the anchor message set. Anchor messages are always included verbatim after the system prompt and are never evicted or summarized.
ctx.setAnchor([
{ role: "system", content: "Important context that must always be present." },
]);setTokenBudget(budget: number): void
Updates the token budget dynamically. If the current token count exceeds the new budget, eviction is triggered immediately.
ctx.setTokenBudget(4096);Throws: RangeError if budget is not a finite number >= 100.
clear(): void
Resets recent messages, pending buffer, summary, and summary round counter to their initial state. The system prompt is retained. Anchor messages are reset to the value from the original options.
ctx.clear();serialize(): ContextState
Returns a serializable snapshot of the current context state. Function-valued options (summarizer, tokenCounter, hooks) are excluded. The returned object is JSON-safe.
const state = ctx.serialize();
// state.version === 1serializeContext(ctx: SlidingContext): string
Serializes a SlidingContext instance to a JSON string suitable for persistence.
import { serializeContext } from "sliding-context";
const json = serializeContext(ctx);
await redis.set("ctx:user123", json);restoreSlidingContext(data: string, options: SlidingContextOptions): SlidingContext
Restores a SlidingContext from a JSON string produced by serializeContext(). Re-supply function-valued options since they cannot be serialized.
import { restoreSlidingContext } from "sliding-context";
const json = await redis.get("ctx:user123");
const ctx = restoreSlidingContext(json, {
tokenBudget: 4096,
summarizer: mySummarizer,
tokenCounter: myTokenCounter,
});serialize(state: ContextState): string
Low-level serialization. Converts a ContextState object to a JSON string with version: 1.
import { serialize } from "sliding-context";
const json = serialize(ctx.serialize());deserialize(data: string): ContextState
Low-level deserialization. Parses a JSON string back to a ContextState object. Validates the version field.
import { deserialize } from "sliding-context";
const state = deserialize(json);Throws: Error if the JSON is malformed or the version field does not match the expected schema version (currently 1).
approximateTokenCounter(text: string): number
Built-in approximate token counter. Returns Math.ceil(text.length / 4). Returns 0 for empty or falsy input. Approximates GPT-style tokenization without any external dependency.
import { approximateTokenCounter } from "sliding-context";
approximateTokenCounter("Hello world"); // 3countMessageTokens(message: Message, tokenCounter: TokenCounter, messageOverhead: number): number
Counts tokens for a single Message, including:
- String content via the provided
tokenCounter - Array content (text parts summed, non-text parts at 85 tokens each)
tool_callsJSON stringified and countedtool_call_idstring counted- Per-message
messageOverheadadded
import { countMessageTokens, approximateTokenCounter } from "sliding-context";
const tokens = countMessageTokens(
{ role: "user", content: "Hello" },
approximateTokenCounter,
4
);
// => 6 (2 content tokens + 4 overhead)DEFAULT_MESSAGE_OVERHEAD
The default per-message token overhead constant: 4.
import { DEFAULT_MESSAGE_OVERHEAD } from "sliding-context";evictMessages(messages, targetTokens, tokenCounter, overhead)
Low-level eviction function. Removes the oldest non-system messages from the array until total tokens fit within targetTokens. Tool call pairs are evicted atomically.
import { evictMessages } from "sliding-context";
const { evicted, remaining } = evictMessages(
messages,
1000,
approximateTokenCounter,
4
);Parameters:
messages(Message[]) -- The message array to evict from.targetTokens(number) -- The target token count for the remaining messages.tokenCounter(TokenCounter) -- Token counting function.overhead(number) -- Per-message token overhead.
Returns: { evicted: Message[]; remaining: Message[] }
allocateBudget(options, systemTokensUsed)
Computes the token budget allocation across the three context zones.
import { allocateBudget } from "sliding-context";
const allocation = allocateBudget({ tokenBudget: 4096, summarizer: fn }, 100);
// { systemTokens: 100, summaryTokens: 1228, recentTokens: 2768 }Parameters:
options(SlidingContextOptions) -- Configuration options.systemTokensUsed(number) -- Actual token count consumed by system + anchor + summary zones.
Returns: BudgetAllocation -- { systemTokens: number; summaryTokens: number; recentTokens: number }
Priority order: system tokens (fixed), summary tokens (capped at maxSummaryTokens; 0 if no summarizer), recent tokens (remainder, enforcing minRecentTokens).
runSummarizer(messages, summarizer, existingSummary)
Invokes the summarizer function with error handling. Returns null if the summarizer throws.
import { runSummarizer } from "sliding-context";
const summary = await runSummarizer(messages, mySummarizer, existingSummary);
if (summary === null) {
// summarizer failed; handle gracefully
}Parameters:
messages(Message[]) -- Messages to summarize.summarizer(Summarizer) -- The summarizer function.existingSummary(string | undefined) -- The current summary, if any.
Returns: Promise<string | null> -- The summary string, or null on failure.
defaultSummarizationPrompt(messages: Message[]): string
Returns a summarization prompt for the given messages. Formats each message with its role and content, then wraps in a "Summarize the following conversation concisely" instruction.
import { defaultSummarizationPrompt } from "sliding-context";
const prompt = defaultSummarizationPrompt(evictedMessages);
// "Summarize the following conversation concisely:\n\nUSER: Hello\nASSISTANT: Hi there"Tool call messages are formatted as ASSISTANT: [tool_calls] <JSON>. Tool result messages are formatted as TOOL(<tool_call_id>): <content>.
Configuration
All options are passed to createSlidingContext().
| Option | Type | Default | Description |
|---|---|---|---|
| tokenBudget | number | required | Maximum total tokens for the context window. Must be >= 100. |
| systemPrompt | string | -- | System prompt text. Never evicted or summarized. Always first in getMessages(). |
| summarizer | Summarizer | -- | Async function to summarize evicted messages. If omitted, evicted messages accumulate in the pending buffer without summarization (truncation mode). |
| strategy | SummarizationStrategy | 'incremental' | Summarization strategy: 'incremental', 'rolling', or 'anchored'. |
| maxSummaryTokens | number | Math.floor(tokenBudget * 0.3) | Maximum tokens allocated to the summary zone. |
| minRecentTokens | number | Math.floor(tokenBudget * 0.3) | Minimum tokens guaranteed for the recent message zone. |
| summarizeThresholdTokens | number | Math.floor(tokenBudget * 0.1) | Token count in the pending buffer that triggers summarization. |
| summarizeThresholdMessages | number | 6 | Message count in the pending buffer that triggers summarization. Whichever threshold is reached first triggers the call. |
| tokenCounter | TokenCounter | approximateTokenCounter | Custom token counting function. Signature: (text: string) => number. |
| messageOverhead | number | 4 | Per-message token overhead added to every message's token count. |
| summaryRole | SummaryRole | 'system' | Role for the injected summary message: 'system' or 'user'. |
| maxSummaryRounds | number | 5 | Maximum number of summarization rounds before the summarizer stops being called. |
| anchor | Message[] | -- | Initial anchor messages. Included verbatim after the system prompt, never evicted. |
| maxAnchorTokens | number | Math.floor(maxSummaryTokens * 0.4) | Maximum tokens for the anchor section. |
| hooks | EventHooks | -- | Event callbacks for observability. See Error Handling and Event Hooks. |
Error Handling and Event Hooks
Construction Errors
createSlidingContext() throws a RangeError if tokenBudget is not a finite number >= 100:
createSlidingContext({ tokenBudget: 50 });
// RangeError: sliding-context: tokenBudget must be a finite number >= 100, got 50Budget Enforcement Errors
setTokenBudget() throws a RangeError for invalid values:
ctx.setTokenBudget(Infinity);
// RangeError: sliding-context: tokenBudget must be a finite number >= 100, got InfinityDeserialization Errors
deserialize() throws an Error for malformed JSON or version mismatches:
deserialize("not json");
// Error: sliding-context: failed to parse serialized state: ...
deserialize(JSON.stringify({ version: 99 }));
// Error: sliding-context: schema mismatch -- expected version 1, got 99Summarizer Failure Handling
If the summarizer function throws, the error is caught internally. The pending buffer retains its messages for the next summarization attempt. Summarization failure never crashes the context manager.
Event Hooks
Attach callbacks via the hooks option:
const ctx = createSlidingContext({
tokenBudget: 4096,
summarizer: mySummarizer,
hooks: {
onEvict(messages, reason) {
// reason is 'budget' for normal eviction, 'truncation' for emergency drops
console.log(`Evicted ${messages.length} messages: ${reason}`);
},
onSummarize(inputMessages, existingSummary, newSummary, durationMs) {
console.log(`Summarized ${inputMessages.length} messages in ${durationMs}ms`);
},
onBudgetExceeded(totalTokens, budget) {
console.log(`Budget exceeded: ${totalTokens} / ${budget}`);
},
onSummaryCompressed(oldSummary, newSummary) {
console.log("Summary was compressed");
},
},
});All hooks are optional. Omitting any hook is safe and produces no errors.
Advanced Usage
Using a Custom Token Counter
For accurate token counting, provide a function backed by a real tokenizer:
import { encoding_for_model } from "tiktoken";
const enc = encoding_for_model("gpt-4o");
const ctx = createSlidingContext({
tokenBudget: 8192,
tokenCounter: (text: string) => enc.encode(text).length,
summarizer: mySummarizer,
});Provider Integration Examples
OpenAI:
import OpenAI from "openai";
import { createSlidingContext } from "sliding-context";
const openai = new OpenAI();
const ctx = createSlidingContext({
tokenBudget: 8192,
systemPrompt: "You are a helpful assistant.",
summarizer: async (messages, existingSummary) => {
const prompt = existingSummary
? `Existing summary:\n${existingSummary}\n\nNew messages to incorporate:`
: "Summarize the following conversation:";
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: prompt },
...messages,
],
});
return response.choices[0].message.content ?? "";
},
});Anthropic:
import Anthropic from "@anthropic-ai/sdk";
import { createSlidingContext } from "sliding-context";
const anthropic = new Anthropic();
const ctx = createSlidingContext({
tokenBudget: 8192,
systemPrompt: "You are a helpful assistant.",
summarizer: async (messages) => {
const formatted = messages
.map((m) => `${m.role}: ${m.content}`)
.join("\n");
const response = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 512,
messages: [
{
role: "user",
content: `Summarize this conversation concisely:\n\n${formatted}`,
},
],
});
return response.content[0].type === "text" ? response.content[0].text : "";
},
});Truncation Mode (No Summarizer)
When no summarizer is provided, evicted messages accumulate in the pending buffer but are never summarized. The getMessages() output includes pending buffer messages along with recent messages. This mode is useful when you want eviction behavior without LLM-based summarization.
const ctx = createSlidingContext({
tokenBudget: 4096,
systemPrompt: "You are a helpful assistant.",
// No summarizer -- truncation mode
});Persistence with Redis
import { createSlidingContext, serializeContext, restoreSlidingContext } from "sliding-context";
// Save
const ctx = createSlidingContext({ tokenBudget: 8192, summarizer: mySummarizer });
await ctx.addMessage({ role: "user", content: "Hello" });
const json = serializeContext(ctx);
await redis.set("session:abc", json);
// Restore
const saved = await redis.get("session:abc");
const restored = restoreSlidingContext(saved, {
tokenBudget: 8192,
summarizer: mySummarizer,
});
// Continue the conversation
await restored.addMessage({ role: "user", content: "I'm back!" });Dynamic Model Switching
// Start with a large-context model
const ctx = createSlidingContext({
tokenBudget: 128000,
summarizer: mySummarizer,
});
// ... many messages later, switch to a smaller model
ctx.setTokenBudget(4096);
// Eviction fires immediately to fit the new budget
const messages = ctx.getMessages();
// messages fit within 4096 tokensUsing Anchor Messages
Anchor messages are pinned after the system prompt and are never evicted or summarized. Use them for persistent instructions, user profile data, or key facts.
const ctx = createSlidingContext({
tokenBudget: 8192,
systemPrompt: "You are a customer support agent.",
summarizer: mySummarizer,
strategy: "anchored",
});
ctx.setAnchor([
{
role: "system",
content:
"Customer: Jane Doe. Account: #12345. Plan: Enterprise. Issue: billing discrepancy.",
},
]);TypeScript
sliding-context is written in TypeScript and ships with full type declarations. All public types are exported from the package root:
import type {
Message,
ToolCall,
TokenCounter,
Summarizer,
SummarizationStrategy,
SummaryRole,
EventHooks,
SlidingContextOptions,
ContextState,
SlidingContext,
} from "sliding-context";Type Reference
| Type | Description |
|---|---|
| Message | Chat message with role ('system', 'user', 'assistant', 'tool'), content (string), optional tool_calls (ToolCall[]), optional tool_call_id (string), optional name (string). |
| ToolCall | Tool/function call descriptor: { id: string; type: 'function'; function: { name: string; arguments: string } }. |
| TokenCounter | (text: string) => number -- pluggable token counting function. |
| Summarizer | (messages: Message[], existingSummary?: string) => Promise<string> -- async summarization function. |
| SummarizationStrategy | 'incremental' \| 'rolling' \| 'anchored' |
| SummaryRole | 'system' \| 'user' -- role used for the injected summary message. |
| EventHooks | { onEvict?, onSummarize?, onBudgetExceeded?, onSummaryCompressed? } -- optional callbacks for observability. |
| SlidingContextOptions | Full configuration object for createSlidingContext(). |
| ContextState | Serializable snapshot of context state. Contains version: 1, options, messages, summary, anchor, pendingBuffer, summaryRounds, and tokenCounts. |
| SlidingContext | Public interface for the context manager instance with all methods documented above. |
License
MIT
