@agentsy/context
v0.2.4
Published
Compression, drift detection, and reversible output shaping
Readme
@agentsy/context
Compression, drift detection, reversible output shaping, and cache-friendly prompt planning for LLM applications.
Status
Version: 0.2.0-alpha.0
License: GPL-3.0-or-later
Published: @agentsy/core v0.2.0, @agentsy/types v0.1.1
Installation
npm install @agentsy/context @agentsy/core @agentsy/typesQuick Start
Compress Conversation History
import { compressConversation } from '@agentsy/context';
interface Message {
role: 'user' | 'assistant' | 'system';
content: string;
}
const messages: Message[] = [
{ role: 'system', content: 'You are an AI programming assistant...' },
{ role: 'user', content: 'Help me refactor this function...' },
{ role: 'assistant', content: 'Here is the refactored code...' },
// ... more messages
];
const result = compressConversation(messages, {
maxTokens: 200000,
preserveLast: 2, // Keep last 2 messages for continuity
estimateTokens: (msg) => Math.ceil(msg.content.length / 4)
});
console.log(`Dropped ${result.droppedCount} messages`);
console.log(`Retained ${result.retained.length} messages`);
console.log(`Estimated tokens: ${result.estimatedTokens}`);Compress Output
import { compressOutput } from '@agentsy/context';
const longResponse = `
This is a very long response that contains a lot of filler text.
Basically, you should consider removing unnecessary words.
Here's some code:
\`\`\`typescript
const example = "preserve this exactly";
\`\`\`
And a link: https://example.com/docs
`;
const compressed = compressOutput(longResponse, {
level: 'full',
preserve: {
codeFences: true,
inlineCode: true,
urls: true
}
});
console.log(compressed);
// Output: Code blocks, inline code, and URLs preserved exactly
// Filler words removed: "basically", "should consider", "a lot of"Cache Prompt Plans
import { createCachePromptPlan } from '@agentsy/context';
import { applyOpenAIPromptCaching } from '@agentsy/providers/caching';
const plan = createCachePromptPlan({
prefix: 'ctx-v1',
provider: 'openai'
});
const cached = applyOpenAIPromptCaching('prompt body', plan);
console.log(cached.prompt_cache_key); // openai:ctx-v1Manual Compaction
import { createManualCompaction } from '@agentsy/context';
const result = createManualCompaction({
focus: 'architecture',
maxTokens: 200,
messages: ['diff --git a/a b/a', 'plain prose'],
sessionId: 'sess-1'
});
console.log(result.summary.focus); // architecture
console.log(result.summary.nextSteps); // ['rehydrate:architecture']Token Management: See @agentsy/tokenomics for
createInMemoryTokenManager,PacingController, and budget management.
API Reference
compressConversation
Compresses a conversation history to fit within a token budget.
function compressConversation<TMessage>(
messages: readonly TMessage[],
options: CompressionOptions<TMessage>
): CompressionResult<TMessage>Parameters:
messages: Array of messages to compressoptions.maxTokens: Maximum tokens to retainoptions.preserveLast: Number of recent messages to always preserve (default: 0)options.estimateTokens: Function to estimate tokens per message
Returns:
retained: Array of messages that fit in budgetdroppedCount: Number of messages droppedestimatedTokens: Estimated token count of retained messages
compressOutput
Compresses output text while preserving code blocks, URLs, and other critical elements.
function compressOutput(
input: string,
options?: OutputCompressionOptions
): stringParameters:
input: Text to compressoptions.level: Compression level -'lite'(40-50%),'full'(65-75%),'ultra'(75-87%)options.preserve: What to preserve (codeFences, inlineCode, urls)
Returns: Compressed text string
compressOutputDetailed / compressOutputV2
Use the detailed helpers when you need content kind, routing, metrics, or reversible markers.
Use Cases
1. Context Window Management
import { compressConversation } from '@agentsy/context';
function prepareLLMRequest(
messages: Message[],
maxTokens: number
): Message[] {
const result = compressConversation(messages, {
maxTokens: maxTokens - 10000, // Safety margin
preserveLast: 2,
estimateTokens: (msg) => Math.ceil(msg.content.length / 4)
});
if (result.droppedCount > 0) {
console.warn(`Dropped ${result.droppedCount} messages to fit budget`);
}
return result.retained;
}2. Cost-Aware Request Routing
import { createInMemoryTokenManager } from '@agentsy/tokenomics';
const manager = createInMemoryTokenManager();
async function routeRequest(
model: string,
estimatedTokens: number
): Promise<string> {
const budget = await manager.createBudget({
maxCost: 10.0,
maxTokens: 100000,
model,
name: 'routing-budget',
periodMs: 3600000,
priority: 'medium',
provider: 'openai',
resetStrategy: 'rolling'
});
const allocation = await manager.requestTokens({
estimatedTokens,
estimatedCost: estimatedTokens * 0.00001, // $0.01 per 1K tokens
model,
provider: 'openai',
requestType: 'completion',
budgetId: budget.id
});
if (allocation.conditions) {
// Try cheaper model
return 'gpt-4o-mini';
}
return model;
}3. Output Compression for Token Savings
import { compressOutput } from '@agentsy/context';
function compressAssistantResponse(response: string): string {
// Compress to 75% of original size
return compressOutput(response, {
level: 'full',
preserve: {
codeFences: true,
inlineCode: true,
urls: true
}
});
}4. Multi-Budget Management
import { createInMemoryTokenManager } from '@agentsy/tokenomics';
const manager = createInMemoryTokenManager();
// Create separate budgets for different models
const gpt4Budget = await manager.createBudget({
maxCost: 50.0,
maxTokens: 500000,
model: 'gpt-4',
name: 'gpt-4-budget',
periodMs: 3600000,
priority: 'high',
provider: 'openai',
resetStrategy: 'rolling'
});
const claudeBudget = await manager.createBudget({
maxCost: 30.0,
maxTokens: 200000,
model: 'claude-3-5-sonnet',
name: 'claude-budget',
periodMs: 3600000,
priority: 'medium',
provider: 'anthropic',
resetStrategy: 'rolling'
});
// Request from appropriate budget
async function requestTokens(
model: string,
tokens: number
): Promise<TokenAllocation> {
const budgetId = model === 'gpt-4' ? gpt4Budget.id : claudeBudget.id;
return await manager.requestTokens({
budgetId,
estimatedTokens: tokens,
estimatedCost: tokens * 0.00001,
model,
provider: model === 'gpt-4' ? 'openai' : 'anthropic',
requestType: 'completion'
});
}Performance Characteristics
Compression Performance
- Output compression: <10ms average for typical responses
- Conversation compression: <50ms for 100-message histories
- Token estimation: <1ms per message
Accuracy
- Token estimation: Conservative (never underestimates)
- Compression preservation: 100% accuracy for code blocks, URLs, paths
- Budget enforcement: Deterministic, no race conditions
Best Practices
1. Conservative Token Estimation
Always overestimate token counts to avoid exceeding model limits:
const estimateTokens = (msg: Message) => {
// Conservative: 4 chars per token (typical for English)
return Math.ceil(msg.content.length / 4);
};2. Preserve Critical Content
Always preserve code, URLs, and file paths in output compression:
compressOutput(response, {
level: 'full',
preserve: {
codeFences: true,
inlineCode: true,
urls: true
}
});3. Use Safety Margins
Leave buffer between budget and actual usage:
const safeMaxTokens = model.maxInputTokens - 10000; // 10K safety margin
compressConversation(messages, { maxTokens: safeMaxTokens });4. Monitor Budget Status
Regularly check budget status to proactively manage costs:
const status = await manager.getBudgetStatus(budgetId);
if (status.remainingCost < status.totalCost * 0.2) {
console.warn('Budget at 20% or less remaining');
}License
GPL-3.0-or-later - See LICENSE.md for details.
Contributing
See IMPLEMENTATION-PLAN.md for development roadmap.
Related Packages
- @agentsy/core - Core compression utilities
- @agentsy/types - Shared type definitions
- @agentsy/providers - Provider adapters
- @agentsy/runtime - Runtime execution
- @agentsy/orchestrator - Agent orchestration
