llm-token-budget
v1.0.0
Published
Token budget manager for LLM applications — track usage, enforce limits, estimate costs per user/session. Zero dependencies. Works with OpenAI, Anthropic, Groq, Ollama.
Maintainers
Readme
llm-token-budget
Token budget manager for LLM applications — track usage, enforce limits, estimate costs per user/session. Zero dependencies. Works with OpenAI, Anthropic, Groq, Ollama, and more.
Why?
When building LLM-powered applications, you quickly face these problems:
- Cost overruns — one user sends 10,000 messages and your bill explodes
- No visibility — which user/session is using the most tokens?
- No enforcement — you want per-user free tiers and paid quotas
- Multi-model complexity — different pricing for GPT-4o vs Claude vs Groq
llm-token-budget solves all four with a simple, zero-dependency API.
Install
npm install llm-token-budgetQuick Start
const { BudgetManager } = require('llm-token-budget');
// Create a manager with per-user limits
const budget = new BudgetManager({
perUserTokenLimit: 100_000, // 100K tokens per user
perUserCostLimit: 1.00, // $1.00 max spend per user
defaultModel: 'gpt-4o-mini',
});
// Before calling your LLM — check if the user is within budget
const check = budget.check({
estimatedInputTokens: 500,
estimatedOutputTokens: 500,
userKey: 'user-123',
model: 'gpt-4o-mini',
});
if (!check.allowed) {
console.log(`Blocked: ${check.reason}`);
// → "user_token_limit_exceeded" or "user_cost_limit_exceeded"
} else {
// Call your LLM here...
const response = await openai.chat.completions.create({ ... });
// After the call — record actual usage
budget.record({
inputTokens: response.usage.prompt_tokens,
outputTokens: response.usage.completion_tokens,
userKey: 'user-123',
model: 'gpt-4o-mini',
});
}API Reference
new BudgetManager(options?)
| Option | Type | Description |
|--------|------|-------------|
| globalTokenLimit | number | Max total tokens across ALL users |
| globalCostLimit | number | Max total cost (USD) across ALL users |
| perUserTokenLimit | number | Max tokens per user key |
| perUserCostLimit | number | Max cost (USD) per user key |
| perSessionTokenLimit | number | Max tokens per session |
| defaultModel | string | Default model for cost calculation |
| customPricing | object | Custom pricing overrides { model: { input, output } } |
| resetIntervalMs | number | Auto-reset all counters every N ms (e.g. 86400000 for daily) |
.check(params) → { allowed, reason?, wouldCost }
Pre-flight check before making an LLM call.
const result = budget.check({
estimatedInputTokens: 500,
estimatedOutputTokens: 500,
userKey: 'user-abc', // optional
sessionId: 'sess-xyz', // optional
model: 'claude-3-5-haiku', // optional, uses defaultModel if omitted
});
// result.allowed → true | false
// result.reason → 'user_token_limit_exceeded' | 'user_cost_limit_exceeded' | ...
// result.wouldCost → estimated cost in USDLimit reasons:
global_token_limit_exceededglobal_cost_limit_exceededuser_token_limit_exceededuser_cost_limit_exceededsession_token_limit_exceeded
.record(params) → { cost, global }
Record actual token usage after a successful LLM call.
budget.record({
inputTokens: response.usage.prompt_tokens,
outputTokens: response.usage.completion_tokens,
userKey: 'user-abc',
sessionId: 'sess-xyz',
model: 'gpt-4o',
});.getUserStats(userKey) → UsageStats
const stats = budget.getUserStats('user-abc');
// { inputTokens, outputTokens, totalCost, requests, blocked }.getGlobalStats() → GlobalStats
const stats = budget.getGlobalStats();
// { inputTokens, outputTokens, totalTokens, totalCost, requests }.getAllUserStats() → Array<{ userKey, stats }>
Returns all users sorted by cost (highest first) — useful for admin dashboards.
.middleware(options?) — Express/Fastify Middleware
Drop-in middleware for Express, Fastify, or any Connect-compatible framework.
const express = require('express');
const { BudgetManager } = require('llm-token-budget');
const app = express();
const budget = new BudgetManager({ perUserTokenLimit: 50_000 });
app.use('/api/chat', budget.middleware({
getUserKey: (req) => req.user?.id || req.ip,
getSessionId: (req) => req.headers['x-session-id'],
getModel: (req) => req.body?.model || 'gpt-4o-mini',
onBlocked: (req, res, reason) => {
res.status(429).json({
error: 'quota_exceeded',
reason,
message: 'You have exceeded your token quota. Upgrade to Pro for more.',
});
},
}));When blocked, the default response is:
{
"error": "budget_exceeded",
"reason": "user_token_limit_exceeded",
"message": "Token budget exceeded. Please try again later or upgrade your plan."
}Utility Functions
estimateTokens(text) → number
Quick token estimate (no external deps). Accuracy: ±15%.
For production accuracy, pair with tiktoken or model-specific counters.
const { estimateTokens } = require('llm-token-budget');
estimateTokens('Hello, how are you today?'); // → ~7calculateCost(inputTokens, outputTokens, model) → CostResult
const { calculateCost } = require('llm-token-budget');
calculateCost(10_000, 2_000, 'gpt-4o');
// → { inputCost: 0.025, outputCost: 0.02, totalCost: 0.045, currency: 'USD' }Supported Models (Built-in Pricing)
| Provider | Models | |----------|--------| | OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, o1, o1-mini, o3-mini | | Anthropic | claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus, claude-sonnet-4, claude-haiku-4 | | Groq | llama-3.3-70b, llama-3.1-8b, mixtral-8x7b, gemma2-9b | | Ollama | ollama (local, $0 cost) |
Custom models: pass customPricing option or calculateCost(i, o, 'my-model', { 'my-model': { input: 0.5, output: 1.5 } }).
Patterns
Daily Reset (Free Tier)
const budget = new BudgetManager({
perUserTokenLimit: 10_000, // 10K tokens/day free
resetIntervalMs: 24 * 60 * 60 * 1000, // reset daily
});Per-User Pricing Tiers
function getLimitForUser(user) {
if (user.plan === 'pro') return { perUserTokenLimit: 1_000_000 };
if (user.plan === 'free') return { perUserTokenLimit: 10_000 };
return { perUserTokenLimit: 0 }; // blocked
}
// Create separate BudgetManager per tier, or check dynamicallyMulti-tenant SaaS
const budget = new BudgetManager({
globalCostLimit: 100.00, // $100/month hard cap
perUserCostLimit: 5.00, // $5/user/month
defaultModel: 'gpt-4o-mini',
});
// Admin dashboard
app.get('/admin/usage', (req, res) => {
res.json({
global: budget.getGlobalStats(),
users: budget.getAllUserStats(),
});
});TypeScript
Full TypeScript support included (no @types/ package needed).
import { BudgetManager, BudgetManagerOptions, CheckResult } from 'llm-token-budget';
const options: BudgetManagerOptions = {
perUserTokenLimit: 100_000,
defaultModel: 'gpt-4o',
};
const budget = new BudgetManager(options);
const result: CheckResult = budget.check({ estimatedInputTokens: 500, userKey: 'user-1' });Comparison
| Feature | llm-token-budget | Manual counters | Commercial solutions | |---------|-----------------|-----------------|---------------------| | Zero dependencies | ✅ | ✅ | ❌ | | Per-user limits | ✅ | Manual | ✅ | | Cost tracking | ✅ | Manual | ✅ | | Express middleware | ✅ | Manual | ✅ | | TypeScript support | ✅ | Varies | ✅ | | Open source | ✅ | ✅ | ❌ | | Free | ✅ | ✅ | ❌ |
Contributing
Issues and PRs welcome at github.com/mariusfit/llm-token-budget.
License
MIT © 2026 Marius Fit
