npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@wundr.io/token-budget

v1.0.6

Published

Token budget management for AI agents - cost calculation, usage tracking, and optimization suggestions

Readme

@wundr.io/token-budget

Token budget management for LLM context - comprehensive cost calculation, usage tracking, and optimization suggestions for AI agents.

Table of Contents

Overview

@wundr.io/token-budget provides a complete solution for managing token consumption in LLM-powered applications. It enables:

  • Cost Calculation: Accurate pricing across multiple models (Claude, GPT-4, etc.)
  • Usage Tracking: Session-based tracking with filtering, aggregation, and export
  • Budget Enforcement: Hard and soft limits with warning thresholds
  • Optimization Suggestions: Intelligent recommendations to reduce costs
  • Context Window Management: Tools to optimize prompt and context sizes
  • Event-Driven Architecture: React to budget events in real-time

Installation

npm install @wundr.io/token-budget
# or
yarn add @wundr.io/token-budget
# or
pnpm add @wundr.io/token-budget

Quick Start

import { TokenBudgetManager, createBudgetManager } from '@wundr.io/token-budget';

// Create a budget manager with limits
const manager = createBudgetManager({
  limits: {
    maxTotalTokens: 100000,
    maxCostUsd: 10,
    warningThreshold: 0.8,
    criticalThreshold: 0.95,
  },
});

// Check budget before making an API call
const check = manager.checkBudget({
  inputTokens: 5000,
  outputTokens: 2000,
  model: 'claude-sonnet-4-20250514',
});

if (check.withinBudget) {
  // Proceed with the LLM call
  const response = await callLLM(prompt);

  // Record actual usage
  manager.recordUsage({
    model: 'claude-sonnet-4-20250514',
    inputTokens: 5000,
    outputTokens: 1800,
  });
} else {
  console.log('Budget exceeded:', check.warnings);
  console.log('Suggestions:', check.suggestions);
}

// Get current status
const status = manager.getBudgetStatus();
console.log(`Used: ${status.utilizationPercent}%`);
console.log(`Cost: $${status.costUsedUsd.toFixed(4)}`);

Core Concepts

Budget Limits

Define constraints on token usage and costs:

const limits = {
  maxTotalTokens: 1000000,      // Total tokens (input + output)
  maxInputTokens: 800000,       // Input tokens only
  maxOutputTokens: 200000,      // Output tokens only
  maxCostUsd: 100,              // Maximum cost in USD
  timeWindowMs: 3600000,        // Rolling 1-hour window
  warningThreshold: 0.8,        // Warn at 80% utilization
  criticalThreshold: 0.95,      // Critical at 95% utilization
};

Status Levels

The budget manager tracks four status levels:

| Status | Description | |--------|-------------| | ok | Below warning threshold | | warning | Between warning and critical thresholds | | critical | Between critical threshold and limit | | exceeded | Budget limit exceeded |

Model Pricing

Built-in pricing for popular models (as of 2025):

| Model | Input (per 1K) | Output (per 1K) | Context Window | |-------|----------------|-----------------|----------------| | claude-sonnet-4-20250514 | $0.003 | $0.015 | 200K | | claude-sonnet-4-5-20250929 | $0.003 | $0.015 | 200K | | claude-opus-4-20250514 | $0.015 | $0.075 | 200K | | claude-3-haiku-20240307 | $0.00025 | $0.00125 | 200K | | gpt-4-turbo | $0.01 | $0.03 | 128K | | gpt-4o | $0.005 | $0.015 | 128K | | gpt-4o-mini | $0.00015 | $0.0006 | 128K |

API Reference

TokenBudgetManager

The main class that combines cost calculation, usage tracking, and optimization.

Constructor

const manager = new TokenBudgetManager(config?: Partial<TokenBudgetConfig>);

Methods

checkBudget(options)

Checks if an operation would be within budget.

const result = manager.checkBudget({
  inputTokens: 5000,
  outputTokens: 2000,
  model: 'claude-sonnet-4-20250514',
  includeCurrentUsage: true,
});

// Result:
// {
//   withinBudget: boolean,
//   status: BudgetStatus,
//   estimatedCostUsd: number,
//   warnings: string[],
//   suggestions: OptimizationSuggestion[],
// }
recordUsage(options)

Records token usage and updates budget status.

const status = manager.recordUsage({
  model: 'claude-sonnet-4-20250514',
  inputTokens: 5000,
  outputTokens: 1800,
  taskId: 'task-123',
  operationType: 'chat',
  cacheHit: false,
  metadata: { endpoint: '/api/chat' },
});
getBudgetStatus()

Returns current budget status.

const status = manager.getBudgetStatus();
// {
//   totalTokensUsed: number,
//   inputTokensUsed: number,
//   outputTokensUsed: number,
//   costUsedUsd: number,
//   totalTokensRemaining?: number,
//   costRemainingUsd?: number,
//   utilizationPercent: number,
//   status: 'ok' | 'warning' | 'critical' | 'exceeded',
//   operationCount: number,
//   avgTokensPerOperation: number,
//   avgCostPerOperation: number,
//   lastUpdated: Date,
// }
suggestOptimization()

Generates optimization suggestions based on usage patterns.

const suggestions = manager.suggestOptimization();
// Returns array of OptimizationSuggestion objects
resetBudget()

Clears usage history and resets tracking.

manager.resetBudget();
updateConfig(config)

Updates budget configuration.

manager.updateConfig({
  limits: { maxCostUsd: 200 },
});
onEvent(handler) / offEvent(handler)

Registers/removes event handlers.

manager.onEvent((event) => {
  if (event.type === 'budget:warning') {
    console.log('Budget warning!', event.payload);
  }
});
exportAsJson()

Exports budget data as JSON.

const json = manager.exportAsJson();

Factory Functions

// Basic manager
const manager = createBudgetManager(config);

// Strict limits
const strictManager = createStrictBudgetManager(maxTokens, maxCostUsd);

// Session-specific
const sessionManager = createSessionBudgetManager(sessionId, agentId, config);

CostCalculator

Standalone cost calculation utilities.

import { CostCalculator, createCostCalculator } from '@wundr.io/token-budget';

const calculator = createCostCalculator();

// Calculate cost
const cost = calculator.calculateCost({
  model: 'claude-sonnet-4-20250514',
  inputTokens: 1000,
  outputTokens: 500,
  cacheHit: false,
});

// Estimate with cache probability
const estimate = calculator.estimateCost({
  model: 'claude-sonnet-4-20250514',
  inputTokens: 1000,
  estimatedOutputTokens: 500,
  cacheHitProbability: 0.3,
});

// Compare models
const comparisons = calculator.compareModels(1000, 500);
// Returns models sorted by cost (cheapest first)

// Calculate batch cost
const batchResult = calculator.calculateBatchCost(usageRecords);

// Add custom pricing
calculator.addPricing({
  modelId: 'custom-model',
  inputCostPer1K: 0.002,
  outputCostPer1K: 0.01,
  contextWindow: 100000,
  isCached: false,
  cacheDiscount: 0,
});

// Calculate cache savings
const savings = calculator.calculateCacheSavings('claude-sonnet-4-20250514', 1000, 500);

Utility Functions

import {
  quickCostCalculation,
  estimateTokensFromText,
  formatCost
} from '@wundr.io/token-budget';

// Quick calculation without creating a calculator
const cost = quickCostCalculation('claude-sonnet-4-20250514', 1000, 500);

// Estimate tokens from text (rough: ~4 chars per token)
const tokens = estimateTokensFromText('Hello, world!'); // ~4 tokens

// Format cost for display
const formatted = formatCost(0.00123); // "$0.0012"

UsageTracker

Session-based usage tracking.

import { UsageTracker, createUsageTracker } from '@wundr.io/token-budget';

const tracker = createUsageTracker({
  sessionId: 'session-123',
  agentId: 'agent-456',
});

// Record usage
const usage = tracker.recordUsage({
  model: 'claude-sonnet-4-20250514',
  inputTokens: 1000,
  outputTokens: 500,
  taskId: 'task-789',
  operationType: 'chat',
  cacheHit: false,
  metadata: { user: 'user-123' },
});

// Get usage history with filtering
const history = tracker.getUsageHistory({
  startTime: new Date('2024-01-01'),
  endTime: new Date(),
  model: 'claude-sonnet-4-20250514',
  limit: 100,
  offset: 0,
  sortDirection: 'desc',
});

// Get session summary
const summary = tracker.getSessionSummary();
// {
//   sessionId: string,
//   startTime: Date,
//   totalInputTokens: number,
//   totalOutputTokens: number,
//   totalTokens: number,
//   totalCostUsd: number,
//   operationCount: number,
//   byModel: { [model]: { ... } },
//   byOperationType: { [type]: { ... } },
//   cacheHitRate: number,
//   peakTokensPerMinute: number,
// }

// Get current totals (fast)
const totals = tracker.getCurrentTotals();

// Get usage in time window
const windowUsage = tracker.getUsageInWindow(3600000); // Last hour

// Get current rate
const rate = tracker.getCurrentRate();
// { tokensPerMinute, costPerMinute, operationsPerMinute, sessionDurationMinutes }

// Export data
const json = tracker.exportAsJson();
const csv = tracker.exportAsCsv();

// End session
const finalSummary = tracker.endSession();

Budget Allocation Strategies

Fixed Budget

Set hard limits that cannot be exceeded:

const manager = createBudgetManager({
  limits: {
    maxTotalTokens: 100000,
    maxCostUsd: 10,
    warningThreshold: 0.8,
    criticalThreshold: 0.95,
  },
});

Time-Windowed Budget

Rolling budget that resets after a time period:

const manager = createBudgetManager({
  limits: {
    maxTotalTokens: 50000,
    maxCostUsd: 5,
    timeWindowMs: 3600000, // 1 hour rolling window
    warningThreshold: 0.8,
    criticalThreshold: 0.95,
  },
});

Per-Agent Budget

Track and limit usage per agent:

const agentBudgets = new Map<string, TokenBudgetManager>();

function getAgentBudget(agentId: string): TokenBudgetManager {
  if (!agentBudgets.has(agentId)) {
    agentBudgets.set(agentId, createSessionBudgetManager(
      `session-${Date.now()}`,
      agentId,
      {
        limits: {
          maxTotalTokens: 10000,
          maxCostUsd: 1,
        },
      }
    ));
  }
  return agentBudgets.get(agentId)!;
}

Tiered Budget

Different limits based on operation type:

const tierLimits = {
  chat: { maxTokens: 50000, maxCost: 5 },
  embedding: { maxTokens: 100000, maxCost: 2 },
  function_call: { maxTokens: 20000, maxCost: 3 },
};

function checkTierBudget(
  manager: TokenBudgetManager,
  operationType: string,
  inputTokens: number,
  outputTokens: number
): boolean {
  const summary = manager.getUsageTracker().getSessionSummary();
  const tierUsage = summary.byOperationType[operationType] || { totalTokens: 0, costUsd: 0 };
  const limits = tierLimits[operationType] || tierLimits.chat;

  return (
    tierUsage.totalTokens + inputTokens + outputTokens <= limits.maxTokens &&
    tierUsage.costUsd <= limits.maxCost
  );
}

Token Counting and Tracking

Estimating Tokens

import { estimateTokensFromText } from '@wundr.io/token-budget';

// Rough estimation (~4 chars per token for English)
const promptTokens = estimateTokensFromText(prompt);

// For more accurate counting, use a proper tokenizer
// This package provides rough estimates for budget planning

Tracking by Model

const summary = manager.getUsageTracker().getSessionSummary();

// Usage breakdown by model
for (const [model, stats] of Object.entries(summary.byModel)) {
  console.log(`${model}:`);
  console.log(`  Input tokens: ${stats.inputTokens}`);
  console.log(`  Output tokens: ${stats.outputTokens}`);
  console.log(`  Cost: $${stats.costUsd.toFixed(4)}`);
  console.log(`  Operations: ${stats.operationCount}`);
}

Tracking by Operation Type

const summary = manager.getUsageTracker().getSessionSummary();

// Usage breakdown by operation type
// Types: 'chat', 'completion', 'embedding', 'function_call', 'tool_use', 'other'
for (const [opType, stats] of Object.entries(summary.byOperationType)) {
  console.log(`${opType}: ${stats.totalTokens} tokens, $${stats.costUsd.toFixed(4)}`);
}

Historical Analysis

const tracker = manager.getUsageTracker();

// Get last 24 hours
const dayAgo = new Date(Date.now() - 24 * 60 * 60 * 1000);
const recentHistory = tracker.getUsageHistory({
  startTime: dayAgo,
  sortDirection: 'desc',
});

// Analyze patterns
const hourlyBuckets = new Map<number, number>();
for (const record of recentHistory) {
  const hour = record.timestamp.getHours();
  hourlyBuckets.set(hour, (hourlyBuckets.get(hour) || 0) + record.totalTokens);
}

Context Window Optimization

Monitoring Context Usage

const manager = createBudgetManager({
  defaultModel: 'claude-sonnet-4-20250514',
  limits: {
    maxTotalTokens: 200000, // Match context window
    warningThreshold: 0.7,  // Warn at 70% to leave room
  },
});

// Check before adding to context
const check = manager.checkBudget({ inputTokens: newContextSize });
if (check.warnings.length > 0) {
  console.log('Consider trimming context:', check.suggestions);
}

Context Trimming Strategies

// Get suggestions when context is large
const suggestions = manager.suggestOptimization();

for (const suggestion of suggestions) {
  if (suggestion.type === 'reduce_context') {
    console.log(`Suggestion: ${suggestion.title}`);
    console.log(`Steps: ${suggestion.steps.join(', ')}`);
    console.log(`Estimated savings: ${suggestion.estimatedSavingsPercent}%`);
  }
}

Multi-Model Context Management

const calculator = createCostCalculator();

// Compare context costs across models
const comparisons = calculator.compareModels(contextSize, estimatedOutput);

// Find best model for large context
const affordableModels = comparisons.filter(m => {
  const pricing = calculator.getPricing(m.model);
  return pricing?.contextWindow && pricing.contextWindow >= contextSize;
});

console.log('Models that support this context size:');
for (const model of affordableModels) {
  console.log(`${model.model}: $${model.cost.toFixed(4)}`);
}

Priority-Based Allocation

Priority Queue Pattern

interface PrioritizedRequest {
  id: string;
  priority: 'critical' | 'high' | 'medium' | 'low';
  inputTokens: number;
  outputTokens: number;
  model: string;
}

function processWithPriority(
  manager: TokenBudgetManager,
  requests: PrioritizedRequest[]
): PrioritizedRequest[] {
  // Sort by priority
  const sorted = [...requests].sort((a, b) => {
    const order = { critical: 0, high: 1, medium: 2, low: 3 };
    return order[a.priority] - order[b.priority];
  });

  const approved: PrioritizedRequest[] = [];

  for (const request of sorted) {
    const check = manager.checkBudget({
      inputTokens: request.inputTokens,
      outputTokens: request.outputTokens,
      model: request.model,
    });

    if (check.withinBudget) {
      approved.push(request);
      // Pre-allocate budget (will be confirmed with recordUsage later)
    } else if (request.priority === 'critical') {
      // Critical requests might bypass budget
      console.warn(`Critical request ${request.id} exceeds budget`);
      approved.push(request);
    } else {
      console.log(`Skipping ${request.priority} request ${request.id}: ${check.warnings[0]}`);
    }
  }

  return approved;
}

Reserved Budget Pattern

// Reserve budget for critical operations
const totalBudget = 100; // USD
const reservedForCritical = totalBudget * 0.2; // 20% reserved
const availableForGeneral = totalBudget * 0.8;

const criticalManager = createStrictBudgetManager(50000, reservedForCritical);
const generalManager = createStrictBudgetManager(200000, availableForGeneral);

function routeRequest(priority: string): TokenBudgetManager {
  return priority === 'critical' ? criticalManager : generalManager;
}

Integration with JIT Tools and Context Engineering

Just-In-Time Context Loading

import { TokenBudgetManager } from '@wundr.io/token-budget';

interface ContextSource {
  id: string;
  priority: number;
  estimatedTokens: number;
  load: () => Promise<string>;
}

async function loadContextWithBudget(
  manager: TokenBudgetManager,
  sources: ContextSource[],
  maxContextTokens: number
): Promise<string[]> {
  // Sort by priority
  const sorted = [...sources].sort((a, b) => b.priority - a.priority);

  const loadedContext: string[] = [];
  let usedTokens = 0;

  for (const source of sorted) {
    const check = manager.checkBudget({
      inputTokens: usedTokens + source.estimatedTokens,
      outputTokens: 0,
    });

    if (usedTokens + source.estimatedTokens <= maxContextTokens && check.withinBudget) {
      const content = await source.load();
      loadedContext.push(content);
      usedTokens += source.estimatedTokens;
    } else {
      console.log(`Skipping context source ${source.id}: budget constraint`);
    }
  }

  return loadedContext;
}

Dynamic Context Compression

function compressContextForBudget(
  manager: TokenBudgetManager,
  context: string,
  targetUtilization: number = 0.7
): string {
  const status = manager.getBudgetStatus();
  const limits = manager.getConfig().limits;

  if (!limits.maxTotalTokens) return context;

  const availableTokens = limits.maxTotalTokens * targetUtilization - status.totalTokensUsed;
  const currentTokens = estimateTokensFromText(context);

  if (currentTokens <= availableTokens) {
    return context;
  }

  // Compress by ratio
  const ratio = availableTokens / currentTokens;
  const targetLength = Math.floor(context.length * ratio);

  // Simple truncation - in production, use smarter summarization
  return context.slice(0, targetLength) + '...';
}

Integration with RAG Systems

interface RAGResult {
  content: string;
  relevanceScore: number;
  tokenCount: number;
}

function selectRAGResultsWithBudget(
  manager: TokenBudgetManager,
  results: RAGResult[],
  maxContextTokens: number
): RAGResult[] {
  // Sort by relevance
  const sorted = [...results].sort((a, b) => b.relevanceScore - a.relevanceScore);

  const selected: RAGResult[] = [];
  let totalTokens = 0;

  for (const result of sorted) {
    const check = manager.checkBudget({
      inputTokens: totalTokens + result.tokenCount,
      outputTokens: 2000, // Estimated response
    });

    if (totalTokens + result.tokenCount <= maxContextTokens && check.withinBudget) {
      selected.push(result);
      totalTokens += result.tokenCount;
    }
  }

  return selected;
}

Streaming with Budget Monitoring

async function streamWithBudgetMonitoring(
  manager: TokenBudgetManager,
  streamGenerator: AsyncGenerator<string>,
  maxOutputTokens: number
): Promise<string> {
  let output = '';
  let estimatedTokens = 0;

  for await (const chunk of streamGenerator) {
    output += chunk;
    estimatedTokens = estimateTokensFromText(output);

    // Check if we're approaching limits
    const status = manager.getBudgetStatus();
    if (status.status === 'critical' || estimatedTokens >= maxOutputTokens) {
      console.warn('Stopping stream: approaching budget limit');
      break;
    }
  }

  return output;
}

Configuration

Full Configuration Options

interface TokenBudgetConfig {
  // Default model for cost calculations
  defaultModel: string; // Default: 'claude-sonnet-4-20250514'

  // Budget limits
  limits: {
    maxTotalTokens?: number;     // Max total tokens
    maxInputTokens?: number;     // Max input tokens
    maxOutputTokens?: number;    // Max output tokens
    maxCostUsd?: number;         // Max cost in USD
    timeWindowMs?: number;       // Rolling time window (ms)
    warningThreshold: number;    // Warning threshold (0-1), default 0.8
    criticalThreshold: number;   // Critical threshold (0-1), default 0.95
  };

  // Custom model pricing
  pricingOverrides: ModelPricing[];

  // Features
  enableOptimizations: boolean;  // Enable suggestions, default true
  enableTracking: boolean;       // Enable usage tracking, default true

  // Identifiers
  sessionId?: string;
  agentId?: string;

  // Custom data
  metadata: Record<string, unknown>;
}

Custom Model Pricing

const manager = createBudgetManager({
  pricingOverrides: [
    {
      modelId: 'my-custom-model',
      inputCostPer1K: 0.001,
      outputCostPer1K: 0.005,
      contextWindow: 50000,
      isCached: true,
      cacheDiscount: 0.5, // 50% discount on cached requests
    },
  ],
});

Events

Subscribe to budget events for real-time monitoring:

manager.onEvent((event) => {
  switch (event.type) {
    case 'usage:recorded':
      console.log('Usage recorded:', event.payload.usage);
      break;
    case 'budget:warning':
      console.log('WARNING: Approaching budget limit');
      break;
    case 'budget:critical':
      console.log('CRITICAL: Near budget limit');
      break;
    case 'budget:exceeded':
      console.log('EXCEEDED: Budget limit reached');
      break;
    case 'session:started':
      console.log('Session started');
      break;
    case 'session:ended':
      console.log('Session ended:', event.payload.details);
      break;
    case 'optimization:suggested':
      console.log('Suggestions:', event.payload.suggestions);
      break;
  }
});

Types

Key Types

// Operation types
type OperationType = 'chat' | 'completion' | 'embedding' | 'function_call' | 'tool_use' | 'other';

// Budget status levels
type BudgetStatusLevel = 'ok' | 'warning' | 'critical' | 'exceeded';

// Optimization types
type OptimizationType =
  | 'reduce_context'
  | 'use_smaller_model'
  | 'enable_caching'
  | 'batch_requests'
  | 'truncate_output'
  | 'compress_input'
  | 'use_streaming'
  | 'reduce_frequency'
  | 'other';

// Priority levels
type SuggestionPriority = 'low' | 'medium' | 'high' | 'critical';

// Difficulty levels
type SuggestionDifficulty = 'easy' | 'medium' | 'hard';

Zod Schemas

All types have corresponding Zod schemas for runtime validation:

import {
  ModelPricingSchema,
  BudgetLimitSchema,
  TokenBudgetConfigSchema,
  TokenUsageSchema,
  BudgetStatusSchema,
  OptimizationSuggestionSchema,
  SessionUsageSummarySchema,
} from '@wundr.io/token-budget';

// Validate configuration
const config = TokenBudgetConfigSchema.parse(userInput);

// Validate usage record
const usage = TokenUsageSchema.parse(externalData);

License

MIT