tokenfirewall
v2.0.1
Published
Scalable, adapter-driven LLM cost enforcement middleware for Node.js with model discovery and context intelligence
Maintainers
Readme
TokenFirewall
Enterprise-grade LLM cost enforcement middleware for Node.js with automatic budget protection, intelligent model routing, and comprehensive multi-provider support.
Overview
TokenFirewall is a production-ready middleware that automatically tracks and enforces budget limits for Large Language Model (LLM) API calls. It provides transparent cost monitoring, prevents budget overruns, intelligent model routing with automatic failover, and supports multiple providers through a unified interface.
Key Features
- Never Exceed Your Budget - Automatically blocks API calls when spending limits are reached, preventing surprise bills
- Zero Code Changes Required - Drop-in middleware that works with any LLM API without modifying your existing code
- Automatic Failover - Intelligent router switches to backup models when primary fails, keeping your app running
- Real-time Cost Tracking - See exactly how much each API call costs based on actual token usage
- Multi-Provider Support - Works with OpenAI, Anthropic, Gemini, Grok, Kimi, and any custom LLM provider
- Custom Model Support - Register your own models with custom pricing and context limits at runtime
- Production Ready - Battle-tested with comprehensive error handling and edge case coverage
- TypeScript Native - Full type safety with included definitions
What's New in v2.0.0
- Intelligent Router - Automatic failover to backup models when API calls fail
- 40+ Latest Models - GPT-5, Claude 4.5, Gemini 3, with accurate 2026 pricing
- Dynamic Registration - Add custom models and pricing at runtime
- Production Hardened - Comprehensive validation, error handling, and edge case coverage
Table of Contents
- Installation
- Quick Start
- Core Concepts
- API Reference
- Intelligent Model Router
- Dynamic Model Registration
- Supported Providers
- Examples
- TypeScript Support
- Error Handling
- Best Practices
- License
Installation
npm install tokenfirewallRequirements:
- Node.js >= 16.0.0
- TypeScript >= 5.0.0 (for TypeScript projects)
Quick Start
const { createBudgetGuard, patchGlobalFetch } = require("tokenfirewall");
// Step 1: Set up budget protection
createBudgetGuard({
monthlyLimit: 100, // $100 USD
mode: "block" // Throw error when exceeded
});
// Step 2: Patch global fetch
patchGlobalFetch();
// Step 3: Use any LLM API normally
const response = await fetch("https://api.openai.com/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.OPENAI_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello!" }]
})
});
// Costs are automatically tracked and loggedCore Concepts
Budget Guard
The Budget Guard tracks spending and enforces limits in two modes:
- Block Mode (
mode: "block"): Throws an error when budget is exceeded, preventing the API call - Warn Mode (
mode: "warn"): Logs a warning but allows the API call to proceed
Automatic Interception
TokenFirewall intercepts HTTP requests at the fetch level, automatically:
- Detecting LLM API responses
- Extracting token usage information
- Calculating costs based on provider pricing
- Tracking against your budget
- Logging usage details
Provider Adapters
Each LLM provider has a dedicated adapter that:
- Detects provider-specific response formats
- Normalizes token usage data
- Applies correct pricing models
API Reference
Budget Management
createBudgetGuard(options)
Creates and configures a budget guard instance.
Parameters:
interface BudgetGuardOptions {
monthlyLimit: number; // Maximum spending limit in USD
mode?: "block" | "warn"; // Enforcement mode (default: "block")
}Example:
const { createBudgetGuard } = require("tokenfirewall");
// Block mode - strict enforcement
createBudgetGuard({
monthlyLimit: 100,
mode: "block"
});
// Warn mode - soft limits
createBudgetGuard({
monthlyLimit: 500,
mode: "warn"
});getBudgetStatus()
Retrieves the current budget status and usage statistics.
Returns:
interface BudgetStatus {
totalSpent: number; // Total amount spent in USD
limit: number; // Monthly limit in USD
remaining: number; // Remaining budget in USD
percentageUsed: number; // Percentage of budget used (0-100)
}Example:
const { getBudgetStatus } = require("tokenfirewall");
const status = getBudgetStatus();
if (status) {
console.log(`Spent: $${status.totalSpent.toFixed(2)}`);
console.log(`Remaining: $${status.remaining.toFixed(2)}`);
console.log(`Usage: ${status.percentageUsed.toFixed(1)}%`);
}resetBudget()
Resets the budget tracking to zero.
const { resetBudget } = require("tokenfirewall");
// Reset at the start of each month
resetBudget();exportBudgetState() / importBudgetState(state)
Save and restore budget state for persistence.
const { exportBudgetState, importBudgetState } = require("tokenfirewall");
const fs = require("fs");
// Export state
const state = exportBudgetState();
fs.writeFileSync("budget.json", JSON.stringify(state));
// Import state
const savedState = JSON.parse(fs.readFileSync("budget.json"));
importBudgetState(savedState);Interception
patchGlobalFetch()
Patches the global fetch function to intercept and track LLM API calls.
const { patchGlobalFetch } = require("tokenfirewall");
patchGlobalFetch();
// All subsequent fetch calls are interceptedModel Discovery
listModels(options)
Lists available models from a provider with context limits and budget information.
Parameters:
interface ListModelsOptions {
provider: string; // Provider name
apiKey: string; // Provider API key
baseURL?: string; // Custom API endpoint
includeBudgetUsage?: boolean; // Include budget usage %
}Example:
const { listModels } = require("tokenfirewall");
const models = await listModels({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY,
includeBudgetUsage: true
});
models.forEach(model => {
console.log(`${model.model}: ${model.contextLimit} tokens`);
});Intelligent Model Router
The Model Router provides automatic retry and model switching on failures.
createModelRouter(options)
Creates and configures an intelligent model router.
Parameters:
interface ModelRouterOptions {
strategy: "fallback" | "context" | "cost"; // Routing strategy
fallbackMap?: Record<string, string[]>; // Fallback model map
maxRetries?: number; // Max retry attempts (default: 1)
}Example:
const { createModelRouter, patchGlobalFetch } = require("tokenfirewall");
// Fallback strategy - use predefined fallback models
createModelRouter({
strategy: "fallback",
fallbackMap: {
"gpt-4o": ["gpt-4o-mini", "gpt-3.5-turbo"],
"claude-3-5-sonnet-20241022": ["claude-3-5-haiku-20241022"]
},
maxRetries: 2
});
patchGlobalFetch();
// API calls will automatically retry with fallback models on failureRouting Strategies
1. Fallback Strategy - Uses predefined fallback map
- Tries models in order from fallbackMap
- Best for: Known model preferences, production resilience
2. Context Strategy - Upgrades to larger context window
- Only triggers on context overflow errors
- Selects model with larger context from same provider
- Best for: Handling variable input sizes
3. Cost Strategy - Switches to cheaper model
- Selects cheaper model from same provider
- Best for: Cost optimization, rate limit handling
Error Detection
The router automatically detects and classifies failures:
rate_limit- HTTP 429 or rate limit errorscontext_overflow- Context length exceeded errorsmodel_unavailable- HTTP 404 or model not foundaccess_denied- HTTP 403 or unauthorizedunknown- Other errors
disableModelRouter()
Disables the model router.
const { disableModelRouter } = require("tokenfirewall");
disableModelRouter();Dynamic Model Registration
Register models with pricing and context limits at runtime.
registerModels(provider, models)
Bulk register models for a provider.
Parameters:
interface ModelConfig {
name: string; // Model identifier
contextLimit?: number; // Context window size in tokens
pricing?: { // Pricing per 1M tokens (USD)
input: number;
output: number;
};
}Example:
const { registerModels, createModelRouter } = require("tokenfirewall");
// Register custom models
registerModels("my-provider", [
{
name: "my-large-model",
contextLimit: 200000,
pricing: { input: 5.0, output: 15.0 }
},
{
name: "my-small-model",
contextLimit: 50000,
pricing: { input: 1.0, output: 3.0 }
}
]);
// Router will use dynamically registered models
createModelRouter({
strategy: "cost",
maxRetries: 2
});registerPricing(provider, model, pricing)
Register custom pricing for a specific model.
const { registerPricing } = require("tokenfirewall");
registerPricing("openai", "gpt-5", {
input: 5.0, // $5 per 1M input tokens
output: 15.0 // $15 per 1M output tokens
});registerContextLimit(provider, model, contextLimit)
Register custom context window limit.
const { registerContextLimit } = require("tokenfirewall");
registerContextLimit("openai", "gpt-5", 256000);Supported Providers
TokenFirewall includes built-in support for:
| Provider | Models | Pricing | Discovery | |----------|--------|---------|-----------| | OpenAI | GPT-5, GPT-5-mini, GPT-4.1, GPT-4o, o1, gpt-image-1 | Included | API | | Anthropic | Claude 4.5 (Opus, Sonnet, Haiku), Claude 4, Claude 3.5 | Included | Static | | Google Gemini | Gemini 3, Gemini 3.1, Gemini 2.5, Nano Banana | Included | API | | Grok (X.AI) | Grok 3, Grok 2, Grok Vision | Included | API | | Kimi (Moonshot) | Moonshot v1 (8k, 32k, 128k) | Included | API | | Meta | Llama 3.3, Llama 3.1 | Included | Static | | Mistral | Mistral Large, Mixtral | Included | Static | | Cohere | Command R+, Command R | Included | Static | | Custom | Any LLM API | Register | Custom |
Pricing (Per 1M Tokens)
OpenAI:
- GPT-5: $1.25 / $10.00
- GPT-5-mini: $0.25 / $2.00
- GPT-4.1: $2.00 / $8.00
- GPT-4o: $2.50 / $10.00
- o1: $15.00 / $60.00
Anthropic:
- Claude Opus 4.5: $5.00 / $25.00
- Claude Sonnet 4.5: $3.00 / $15.00
- Claude Haiku 4.5: $1.00 / $5.00
Gemini:
- Gemini 3 Pro: $2.00 / $12.00
- Gemini 3 Flash: $0.50 / $3.00
- Gemini 2.5 Pro: $1.25 / $10.00
- Gemini 2.5 Flash: $0.30 / $2.50
- Gemini 2.5 Flash Lite: $0.10 / $0.40
Pricing verified as of February 27, 2026. Standard tier, ≤200K input tokens.
Context Limits
- GPT-5: 256K tokens
- GPT-4.1: 200K tokens
- Claude 4.5: 200K tokens
- Gemini 3 Pro: 2M tokens
- o1: 200K tokens
Examples
See the examples/ directory for complete, runnable examples:
- Basic Usage - Core functionality and budget protection
- Multiple Providers - Unified tracking across providers
- Budget Persistence - Save and restore state
- Custom Provider - Add your own LLM provider
- Model Discovery - Find and compare models
- Intelligent Routing - Automatic retry and fallback
- Dynamic Models - Register models at runtime
TypeScript Support
TokenFirewall is written in TypeScript and includes full type definitions.
import {
createBudgetGuard,
patchGlobalFetch,
getBudgetStatus,
createModelRouter,
registerModels,
BudgetGuardOptions,
BudgetStatus,
ModelInfo,
ModelRouterOptions,
ModelConfig
} from "tokenfirewall";
// Full type safety
const options: BudgetGuardOptions = {
monthlyLimit: 100,
mode: "block"
};
createBudgetGuard(options);
patchGlobalFetch();
const status: BudgetStatus | null = getBudgetStatus();Error Handling
TokenFirewall provides clear, actionable error messages:
try {
const response = await fetch(/* ... */);
} catch (error) {
if (error.message.includes("TokenFirewall: Budget exceeded")) {
console.error("Monthly budget exhausted");
// Handle budget limit
} else if (error.message.includes("TokenFirewall Router: Max routing retries exceeded")) {
console.error("All fallback models failed");
// Handle routing failure
} else {
console.error("API error:", error.message);
}
}Common Errors:
| Error Message | Cause | Solution |
|---------------|-------|----------|
| Budget exceeded! Would spend $X of $Y limit | Budget limit reached | Increase limit or wait for reset |
| monthlyLimit must be a valid number | Invalid budget configuration | Provide positive number |
| Max routing retries exceeded | All fallback models failed | Check API status or fallback map |
| No pricing found for model "X" | Unknown model | Register custom pricing |
Best Practices
1. Initialize Early
// At application startup
createBudgetGuard({ monthlyLimit: 100, mode: "block" });
patchGlobalFetch();2. Use Warn Mode in Development
const mode = process.env.NODE_ENV === "production" ? "block" : "warn";
createBudgetGuard({ monthlyLimit: 100, mode });3. Persist Budget State
// Save on exit
process.on("beforeExit", () => {
const state = exportBudgetState();
if (state) saveToDatabase(state);
});4. Monitor Usage
// Alert at 80% usage
const status = getBudgetStatus();
if (status && status.percentageUsed > 80) {
await sendAlert("Budget usage high");
}5. Use Router for Resilience
// Automatic fallback on failures
createModelRouter({
strategy: "fallback",
fallbackMap: {
"gpt-4o": ["gpt-4o-mini", "gpt-3.5-turbo"]
},
maxRetries: 2
});6. Register Models Dynamically
// Discover and register models from API
const models = await discoverModels(apiKey);
registerModels("provider", models.map(m => ({
name: m.id,
contextLimit: m.context_window,
pricing: { input: m.input_price, output: m.output_price }
})));License
MIT © Ruthwik
Links
- GitHub: https://github.com/Ruthwik000/tokenfirewall
- npm: https://www.npmjs.com/package/tokenfirewall
- Issues: https://github.com/Ruthwik000/tokenfirewall/issues
- Changelog: CHANGELOG.md
Built with ❤️ for the AI developer community.
