cost-katana
v2.3.3
Published
The simplest way to use AI with automatic cost tracking and optimization. Native SDK support for OpenAI and Google Gemini with automatic AWS Bedrock fallback.
Maintainers
Readme
Cost Katana 🥷
Cut your AI costs in half. Without cutting corners.
Cost Katana is a drop-in SDK that wraps your AI calls with automatic cost tracking, smart caching, and optimization—all in one line of code.
🚀 Get Started in 60 Seconds
Step 1: Install
npm install cost-katanaStep 2: Make Your First AI Call
import { ai, OPENAI } from 'cost-katana';
const response = await ai(OPENAI.GPT_4, 'Explain quantum computing in one sentence');
console.log(response.text); // "Quantum computing uses qubits to perform..."
console.log(response.cost); // 0.0012
console.log(response.tokens); // 47That's it. No configuration files. No complex setup. Just results.
🌍 Provider-Independent by Design
Cost Katana is completely provider-agnostic. Never lock yourself into a single vendor.
✅ Use Capability-Based Routing
import { ai, ModelCapability } from 'cost-katana';
// Automatically selects best model for each task
const code = await ai(ModelCapability.CODE_GENERATION, 'Write a React component');
const chat = await ai(ModelCapability.CONVERSATION, 'Hello!');
const vision = await ai(ModelCapability.VISION, 'Describe this image', { image });✅ Optimize by Performance Characteristics
import { ai } from 'cost-katana';
// Fastest model available
const fast = await ai({ speed: 'fastest' }, prompt);
// Cheapest model available
const cheap = await ai({ cost: 'cheapest' }, prompt);
// Best quality model
const best = await ai({ quality: 'best' }, prompt);
// Balanced approach
const balanced = await ai({ speed: 'fast', cost: 'cheap' }, prompt);Benefits:
- 🔄 Automatic Failover - Seamlessly switch providers if one goes down
- 💰 Cost Optimization - Routes to the cheapest provider automatically
- 🚀 Future-Proof - New providers added without code changes
- 🔓 Zero Lock-In - Switch providers anytime, no refactoring needed
Read the full Provider-Agnostic Guide →
📖 Tutorial: Build a Cost-Aware Chatbot
Let's build something real. In this tutorial, you'll create a chatbot that:
- ✅ Tracks every dollar spent
- ✅ Caches repeated questions (saving 100% on duplicates)
- ✅ Optimizes long responses (40-75% savings)
Part 1: Basic Chat Session
import { chat, OPENAI } from 'cost-katana';
// Create a persistent chat session
const session = chat(OPENAI.GPT_4);
// Send messages and track costs
await session.send('Hello! What can you help me with?');
await session.send('Tell me a programming joke');
await session.send('Now explain it');
// See exactly what you spent
console.log(`💰 Total cost: $${session.totalCost.toFixed(4)}`);
console.log(`📊 Messages: ${session.messages.length}`);
console.log(`🎯 Tokens used: ${session.totalTokens}`);Part 2: Add Smart Caching
Cache identical questions to avoid paying twice:
import { ai, OPENAI } from 'cost-katana';
// First call - hits the API
const response1 = await ai(OPENAI.GPT_4, 'What is 2+2?', { cache: true });
console.log(`Cached: ${response1.cached}`); // false
console.log(`Cost: $${response1.cost}`); // $0.0008
// Second call - served from cache (FREE!)
const response2 = await ai(OPENAI.GPT_4, 'What is 2+2?', { cache: true });
console.log(`Cached: ${response2.cached}`); // true
console.log(`Cost: $${response2.cost}`); // $0.0000 🎉Part 3: Enable Cortex Optimization
For long-form content, Cortex compresses prompts intelligently:
import { ai, OPENAI } from 'cost-katana';
const response = await ai(
OPENAI.GPT_4,
'Write a comprehensive guide to machine learning for beginners',
{
cortex: true, // Enable 40-75% cost reduction
maxTokens: 2000
}
);
console.log(`Optimized: ${response.optimized}`);
console.log(`Saved: $${response.savedAmount}`);Part 4: Compare Models Side-by-Side
Find the best price-to-quality ratio for your use case:
import { ai, OPENAI, ANTHROPIC, GOOGLE } from 'cost-katana';
const prompt = 'Summarize the theory of relativity in 50 words';
const models = [
{ name: 'GPT-4', id: OPENAI.GPT_4 },
{ name: 'Claude 3.5 Sonnet', id: ANTHROPIC.CLAUDE_3_5_SONNET_20241022 },
{ name: 'Gemini 2.5 Pro', id: GOOGLE.GEMINI_2_5_PRO },
{ name: 'GPT-3.5 Turbo', id: OPENAI.GPT_3_5_TURBO }
];
console.log('📊 Model Cost Comparison\n');
for (const model of models) {
const response = await ai(model.id, prompt);
console.log(`${model.name.padEnd(20)} $${response.cost.toFixed(6)}`);
}Sample Output:
📊 Model Cost Comparison
GPT-4 $0.001200
Claude 3.5 Sonnet $0.000900
Gemini 2.5 Pro $0.000150
GPT-3.5 Turbo $0.000080🎯 Type-Safe Model Selection
Stop guessing model names. Get autocomplete and catch typos at compile time:
import { OPENAI, ANTHROPIC, GOOGLE, AWS_BEDROCK, XAI, DEEPSEEK } from 'cost-katana';
// OpenAI
OPENAI.GPT_5
OPENAI.GPT_4
OPENAI.GPT_4O
OPENAI.GPT_3_5_TURBO
OPENAI.O1
OPENAI.O3
// Anthropic
ANTHROPIC.CLAUDE_SONNET_4_5
ANTHROPIC.CLAUDE_3_5_SONNET_20241022
ANTHROPIC.CLAUDE_3_5_HAIKU_20241022
// Google
GOOGLE.GEMINI_2_5_PRO
GOOGLE.GEMINI_2_5_FLASH
GOOGLE.GEMINI_1_5_PRO
// AWS Bedrock
AWS_BEDROCK.NOVA_PRO
AWS_BEDROCK.NOVA_LITE
AWS_BEDROCK.CLAUDE_SONNET_4_5
// Others
XAI.GROK_2_1212
DEEPSEEK.DEEPSEEK_CHATWhy constants over strings?
| Feature | String 'gpt-4' | Constant OPENAI.GPT_4 |
|---------|------------------|-------------------------|
| Autocomplete | ❌ | ✅ |
| Typo protection | ❌ | ✅ |
| Refactor safely | ❌ | ✅ |
| Self-documenting | ❌ | ✅ |
⚙️ Configuration
Environment Variables
# Recommended: Use Cost Katana API key for all features
COST_KATANA_API_KEY=dak_your_key_here
# Or use provider keys directly
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
# For AWS Bedrock
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1Programmatic Configuration
import { configure } from 'cost-katana';
await configure({
apiKey: 'dak_your_key',
cortex: true, // 40-75% cost savings
cache: true, // Smart caching
firewall: true // Block prompt injections
});Request Options
const response = await ai(OPENAI.GPT_4, 'Your prompt', {
temperature: 0.7, // Creativity (0-2)
maxTokens: 500, // Response limit
systemMessage: 'You are a helpful AI', // System prompt
cache: true, // Enable caching
cortex: true, // Enable optimization
retry: true // Auto-retry on failures
});🔌 Framework Integration
Next.js App Router
// app/api/chat/route.ts
import { ai, OPENAI } from 'cost-katana';
export async function POST(request: Request) {
const { prompt } = await request.json();
const response = await ai(OPENAI.GPT_4, prompt);
return Response.json(response);
}Express.js
import express from 'express';
import { ai, OPENAI } from 'cost-katana';
const app = express();
app.use(express.json());
app.post('/api/chat', async (req, res) => {
const response = await ai(OPENAI.GPT_4, req.body.prompt);
res.json(response);
});
app.listen(3000);Fastify
import fastify from 'fastify';
import { ai, OPENAI } from 'cost-katana';
const app = fastify();
app.post('/api/chat', async (request) => {
const { prompt } = request.body as { prompt: string };
return await ai(OPENAI.GPT_4, prompt);
});
app.listen({ port: 3000 });NestJS
import { Controller, Post, Body } from '@nestjs/common';
import { ai, OPENAI } from 'cost-katana';
@Controller('api')
export class ChatController {
@Post('chat')
async chat(@Body() body: { prompt: string }) {
return await ai(OPENAI.GPT_4, body.prompt);
}
}🛡️ Built-in Security
Firewall Protection
Block prompt injection attacks automatically:
import { configure, ai, OPENAI } from 'cost-katana';
await configure({ firewall: true });
try {
await ai(OPENAI.GPT_4, 'Ignore all previous instructions and...');
} catch (error) {
console.log('🛡️ Blocked:', error.message);
}Protects against:
- Prompt injection attacks
- Jailbreak attempts
- Data exfiltration
- Malicious content generation
🔄 Auto-Failover
Never let provider outages break your app:
import { ai, OPENAI } from 'cost-katana';
// If OpenAI is down, automatically switches to Claude or Gemini
const response = await ai(OPENAI.GPT_4, 'Hello');
console.log(`Provider used: ${response.provider}`);
// Could be 'openai', 'anthropic', or 'google' depending on availability📊 Comprehensive Usage Tracking & Analytics
Real-time Performance Monitoring
Cost Katana now provides comprehensive tracking of every request, including network performance, client environment, and optimization opportunities:
import { AICostTracker, OPENAI } from 'cost-katana';
const tracker = new AICostTracker({
apiKey: process.env.COST_KATANA_API_KEY,
// Enable comprehensive tracking
comprehensiveTracking: true,
// Optional: configure tracking endpoints
trackingEndpoint: 'https://api.costkatana.com/usage/track-comprehensive'
});
const response = await tracker.chat(OPENAI.GPT_4, 'Explain quantum computing');
console.log('Response:', response.text);
console.log('Cost:', response.cost);
console.log('Tokens:', response.tokens);
console.log('Response Time:', response.responseTime);
// Comprehensive tracking data is automatically sent to your dashboard
// Including network metrics, client environment, and optimization suggestionsView Analytics in Dashboard
Once tracking is enabled, you can view detailed analytics at your dashboard:
- Network Performance: DNS lookup time, TCP connection time, total response time
- Client Environment: User agent, platform, IP geolocation
- Request/Response Data: Full request and response payloads (sanitized)
- Optimization Opportunities: AI-powered suggestions to reduce costs
- Performance Metrics: Real-time monitoring with anomaly detection
Manual Usage Tracking
For custom implementations or additional tracking:
import { AICostTracker } from 'cost-katana';
const tracker = new AICostTracker({
apiKey: process.env.COST_KATANA_API_KEY
});
// Manually track usage with additional metadata
await tracker.trackUsage({
model: 'gpt-4',
provider: 'openai',
prompt: 'Hello, world!',
completion: 'Hello! How can I help you today?',
promptTokens: 3,
completionTokens: 9,
totalTokens: 12,
cost: 0.00036,
responseTime: 850,
userId: 'user_123',
sessionId: 'session_abc',
tags: ['chat', 'greeting'],
// Additional metadata for comprehensive tracking
requestMetadata: {
userAgent: navigator?.userAgent,
clientIP: await fetch('https://api.ipify.org').then(r => r.text()),
feature: 'chat-interface'
}
});Session Replay & Distributed Tracing
import { AICostTracker } from 'cost-katana';
const tracker = new AICostTracker({
apiKey: process.env.COST_KATANA_API_KEY,
sessionReplay: true,
distributedTracing: true
});
// Start a traced session
const sessionId = tracker.startSession({
userId: 'user_123',
feature: 'customer-support',
metadata: {
source: 'web-app',
version: '1.2.3'
}
});
// All requests in this session will be automatically traced
const response = await tracker.chat(OPENAI.GPT_4, 'How can I cancel my subscription?', {
sessionId,
tags: ['support', 'billing']
});
// End session and get analytics
const sessionStats = await tracker.endSession(sessionId);
console.log('Session cost:', sessionStats.totalCost);
console.log('Session duration:', sessionStats.duration);
console.log('Requests made:', sessionStats.requestCount);💡 Cost Optimization Cheatsheet
| Strategy | Savings | When to Use | |----------|---------|-------------| | Use GPT-3.5 over GPT-4 | 90% | Simple tasks, translations | | Enable caching | 100% on hits | Repeated queries, FAQs | | Enable Cortex | 40-75% | Long-form content | | Batch in sessions | 10-20% | Related queries | | Use Gemini Flash | 95% vs GPT-4 | High-volume, cost-sensitive |
Quick Wins
// ❌ Expensive: Using GPT-4 for everything
await ai(OPENAI.GPT_4, 'What is 2+2?'); // $0.001
// ✅ Smart: Match model to task
await ai(OPENAI.GPT_3_5_TURBO, 'What is 2+2?'); // $0.0001
// ✅ Smarter: Cache common queries
await ai(OPENAI.GPT_3_5_TURBO, 'What is 2+2?', { cache: true }); // $0 on repeat
// ✅ Smartest: Cortex for long content
await ai(OPENAI.GPT_4, 'Write a 2000-word essay', { cortex: true }); // 40-75% off🔧 Error Handling
import { ai, OPENAI } from 'cost-katana';
try {
const response = await ai(OPENAI.GPT_4, 'Hello');
console.log(response.text);
} catch (error) {
switch (error.code) {
case 'NO_API_KEY':
console.log('Set COST_KATANA_API_KEY or OPENAI_API_KEY');
break;
case 'RATE_LIMIT':
console.log('Rate limited. Retrying...');
break;
case 'INVALID_MODEL':
console.log('Model not found. Available:', error.availableModels);
break;
default:
console.log('Error:', error.message);
}
}🌐 AI Gateway (simple mental model)
The gateway is an HTTP proxy: your app calls Cost Katana’s URL (for example https://api.costkatana.com/api/gateway/...) with your Cost Katana API key. The server can forward that request to OpenAI, Anthropic, Google, Cohere, etc., and attach usage tracking, caching, budgets, firewall, and other features.
What is CostKatana-Target-Url for?
It tells the proxy which provider’s base URL to use (for example https://api.anthropic.com or https://api.openai.com). The proxy then combines that origin with your route (such as /v1/messages) to build the real upstream request.
It is not your API key. Keys are either:
- Provider keys configured on the server (for example
ANTHROPIC_API_KEY,OPENAI_API_KEY), or - Proxy keys (
ck-proxy-...) that map to a stored provider key in Cost Katana.
The target URL only answers: “which vendor’s HTTP API are we talking to?”
Easy default (you usually skip the header)
For normal routes (/v1/chat/completions, /v1/messages, Google generateContent, Cohere /v1/generate, etc.), the gateway infers the provider from the path, and the SDK omits CostKatana-Target-Url when you use createGatewayClientFromEnv() or createCostKatanaGatewayClient().
Set CostKatana-Target-Url (or targetUrl in SDK options) when you use a non-default base URL (Azure OpenAI, a private endpoint, another OpenAI-compatible host) or a path the server cannot infer.
On Cost Katana’s hosted gateway, /v1/messages (Anthropic) needs no extra SDK or client configuration: if the server has no ANTHROPIC_API_KEY, the gateway automatically runs Claude on AWS Bedrock (Cost Katana’s AWS account/credentials). Your app still calls the normal gateway URL and gateway.anthropic(...) as usual. Streaming (stream: true) is not supported on that automatic Bedrock path yet—use non-streaming or set ANTHROPIC_API_KEY on the server for direct Anthropic streaming.
import { createGatewayClientFromEnv } from 'cost-katana';
const gateway = createGatewayClientFromEnv();
// No target header needed — gateway infers Anthropic from /v1/messages
const res = await gateway.anthropic({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 256,
messages: [{ role: 'user', content: 'Hello' }],
});📚 More Examples
Explore 45+ complete examples in our examples repository:
🔗 github.com/Hypothesize-Tech/costkatana-examples
| Category | Examples | |----------|----------| | Cost Tracking | Basic tracking, budgets, alerts | | Gateway | Routing, load balancing, failover | | Optimization | Cortex, caching, compression | | Observability | OpenTelemetry, tracing, metrics | | Security | Firewall, rate limiting, moderation | | Workflows | Multi-step AI orchestration | | Frameworks | Express, Next.js, Fastify, NestJS, FastAPI |
🔄 Migration Guides
From OpenAI SDK
// Before
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: 'sk-...' });
const completion = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Hello' }]
});
console.log(completion.choices[0].message.content);
// After
import { ai, OPENAI } from 'cost-katana';
const response = await ai(OPENAI.GPT_4, 'Hello');
console.log(response.text);
console.log(`Cost: $${response.cost}`); // Bonus: cost tracking!From Anthropic SDK
// Before
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({ apiKey: 'sk-ant-...' });
const message = await anthropic.messages.create({
model: 'claude-3-sonnet-20241022',
messages: [{ role: 'user', content: 'Hello' }]
});
// After
import { ai, ANTHROPIC } from 'cost-katana';
const response = await ai(ANTHROPIC.CLAUDE_3_5_SONNET_20241022, 'Hello');From LangChain
// Before
import { ChatOpenAI } from 'langchain/chat_models/openai';
const model = new ChatOpenAI({ modelName: 'gpt-4' });
const response = await model.call([{ content: 'Hello' }]);
// After
import { ai, OPENAI } from 'cost-katana';
const response = await ai(OPENAI.GPT_4, 'Hello');🤝 Contributing
We welcome contributions! See our Contributing Guide.
git clone https://github.com/Hypothesize-Tech/costkatana-core.git
cd costkatana-core
npm install
npm run lint # Check code style
npm run lint:fix # Auto-fix issues
npm run format # Format code
npm test # Run tests
npm run build # Build📞 Support
| Channel | Link | |---------|------| | Dashboard | costkatana.com | | Documentation | docs.costkatana.com | | GitHub | github.com/Hypothesize-Tech | | Discord | discord.gg/D8nDArmKbY | | Email | [email protected] |
📄 License
MIT © Cost Katana
Start cutting AI costs today 🥷
npm install cost-katanaimport { ai, OPENAI } from 'cost-katana';
await ai(OPENAI.GPT_4, 'Hello, world!');