kukamba
v1.0.1
Published
Intelligent API key rotation and parallel request management for LLM providers
Maintainers
Readme
Kukamba 🚀
Intelligent API key rotation and parallel request management for LLM providers
Kukamba is a lightweight, TypeScript library that helps you avoid rate limits when working with LLM APIs. It provides smart key rotation, health tracking, circuit breaking, and parallel request management - all without bundling any provider-specific SDKs.
Features
✨ Smart Key Rotation - Automatically rotates through API keys using configurable strategies
🏥 Health Tracking - Monitors success/failure rates and adapts key selection
⚡ Circuit Breaking - Temporarily disables unhealthy keys to prevent cascading failures
🔄 Parallel Requests - Execute multiple requests simultaneously across different keys
📊 Load Balancing - Choose from round-robin, weighted health, or least connections strategies
🪶 Lightweight - No bundled dependencies, bring your own LLM SDK
🔌 Provider Agnostic - Works with any LLM API (OpenAI, Anthropic, Google, etc.)
📝 Full TypeScript - Complete type safety and IntelliSense support
Installation
bun add kukambaQuick Start
import { LlmClient, ApiKeyManager, LoadBalancingStrategy } from 'kukamba';
import { GoogleGenAI } from '@google/genai';
// Set up your API keys
const keyManager = new ApiKeyManager(
['key1', 'key2', 'key3'],
LoadBalancingStrategy.WeightedHealth
);
// Create an adapter for your LLM provider
const geminiAdapter = async (prompt: string, apiKey: string) => {
const ai = new GoogleGenAI({ apiKey });
return await ai.models.generateContent({
model: 'gemini-2.5-flash-lite',
contents: prompt,
});
};
// Create the client
const client = new LlmClient(
geminiAdapter,
(response) => response.text, // Extract text from response
keyManager,
{ maxRetries: 3, initialDelay: 1000 }
);
// Generate with automatic retries and key rotation
const result = await client.generate(
'Explain quantum computing',
(text) => text.length > 100 // Validate response
);
console.log(result.content);
console.log(`Succeeded on attempt ${result.attempts}`);API Reference
ApiKeyManager
Manages a pool of API keys with intelligent load balancing and health tracking.
const keyManager = new ApiKeyManager(
keys: string[] | string,
strategy?: LoadBalancingStrategy,
config?: ApiKeyManagerConfig
);Parameters:
keys- Array of API keys or comma-separated stringstrategy- Load balancing strategy (default:RoundRobin)config- Configuration options:maxConsecutiveFailures- Failures before circuit breaker opens (default: 3)circuitBreakerResetTime- Reset time in milliseconds (default: 60000)minHealthScore- Minimum health score to use key (default: 0.3)maxParallelRequestsPerKey- Max concurrent requests per key (default: 10)
Load Balancing Strategies:
LoadBalancingStrategy.RoundRobin- Simple rotation through keysLoadBalancingStrategy.WeightedHealth- Prefer healthier keysLoadBalancingStrategy.LeastConnections- Use key with fewest active requests
Methods:
getNext()- Get next available API keymarkRequestStart(key)- Track start of requestrecordSuccess(key)- Record successful requestrecordFailure(key, isRateLimit?)- Record failed requestgetHealthStatus()- Get health metrics for all keysgetCount()- Get total number of keysgetAvailableCount()- Get number of healthy keysresetHealth()- Reset all health metrics
LlmClient
Generic client for making LLM requests with retries, validation, and key rotation.
const client = new LlmClient(
adapter: LlmAdapter,
textExtractor: TextExtractor,
keyManager: ApiKeyManager,
retryConfig?: RetryConfig,
logger?: Logger
);Parameters:
adapter- Function that calls your LLM provider's APItextExtractor- Function to extract text from LLM responsekeyManager- ApiKeyManager instanceretryConfig- Retry configuration:maxRetries- Maximum retry attempts (default: 3)initialDelay- Initial delay in ms (default: 1000)maxDelay- Maximum delay in ms (default: 30000)backoffMultiplier- Exponential backoff multiplier (default: 2)timeout- Request timeout in ms (default: 600000)
logger- Custom logger (default: console)
Methods:
generate(prompt, validate, config?)- Generate single responsegenerateBatch(prompts, validate, config?, concurrency?)- Generate multiple responses in parallelvalidatecan be a single function (applied to all prompts) or an array of functions (one per prompt)
getKeyHealthStatus()- Get health status for all keysresetHealth()- Reset health metrics
Examples
Example 1: OpenAI with JSON Validation
import OpenAI from 'openai';
import { LlmClient, ApiKeyManager } from 'kukamba';
const keyManager = new ApiKeyManager(process.env.OPENAI_KEYS!.split(','));
const openaiAdapter = async (prompt: string, apiKey: string) => {
const openai = new OpenAI({ apiKey });
return await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: prompt }],
});
};
const client = new LlmClient(
openaiAdapter,
(response) => response.choices[0]?.message?.content || null,
keyManager
);
// Validate that response is valid JSON
const result = await client.generate(
'Generate a JSON object with name and age fields',
(text) => {
try {
const json = JSON.parse(text);
return json.name && json.age;
} catch {
return false;
}
}
);
console.log(JSON.parse(result.content));Example 2: Anthropic Claude with Structured Output
import Anthropic from '@anthropic-ai/sdk';
import { LlmClient, ApiKeyManager, LoadBalancingStrategy } from 'kukamba';
const keyManager = new ApiKeyManager(
process.env.ANTHROPIC_KEYS!.split(','),
LoadBalancingStrategy.WeightedHealth
);
const claudeAdapter = async (prompt: string, apiKey: string) => {
const anthropic = new Anthropic({ apiKey });
return await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [{ role: 'user', content: prompt }],
});
};
const client = new LlmClient(
claudeAdapter,
(response) => response.content[0]?.text || null,
keyManager
);
// Validate structured response format
const result = await client.generate(
'List 5 programming languages in the format: "1. Language - Description"',
(text) => {
const lines = text.split('\n').filter(l => l.trim());
return lines.length === 5 && lines.every(l => /^\d+\./.test(l));
}
);
console.log(result.content);Example 3: Batch Processing with Parallel Requests
import { LlmClient, ApiKeyManager } from 'kukamba';
// Process 100 prompts in parallel using 5 API keys
const keyManager = new ApiKeyManager([
'key1', 'key2', 'key3', 'key4', 'key5'
]);
const client = new LlmClient(
yourAdapter,
yourTextExtractor,
keyManager
);
const prompts = Array.from({ length: 100 }, (_, i) =>
`Summarize article ${i + 1}`
);
// Option 1: Single validator for all prompts
const results = await client.generateBatch(
prompts,
(text) => text.length > 50, // Same validation for all
{},
5 // Max 5 concurrent requests
);
console.log(`Processed ${results.length} articles`);Example 3b: Batch Processing with Per-Prompt Validation
import { LlmClient, ApiKeyManager } from 'kukamba';
// Different prompts need different validation!
const prompts = [
'Translate these 2 paragraphs into Arabic:\nP1 - Text\nP2 - Text2',
'Translate this 1 paragraph into Arabic:\nP1 - Text3',
'Translate these 3 paragraphs into Arabic:\nP1 - A\nP2 - B\nP3 - C',
];
// Create a validator for each prompt
const validators = [
(text) => text.split('\n').filter(p => p.trim()).length === 2, // Expects 2 paragraphs
(text) => text.split('\n').filter(p => p.trim()).length === 1, // Expects 1 paragraph
(text) => text.split('\n').filter(p => p.trim()).length === 3, // Expects 3 paragraphs
];
// Pass array of validators - one per prompt
const results = await client.generateBatch(
prompts,
validators, // Array of validators!
{},
3
);
// Each prompt was validated with its own validator
results.forEach((result, i) => {
console.log(`Prompt ${i + 1}: ${result.isValid ? '✓' : '✗'} (${result.attempts} attempts)`);
});Example 4: Custom Validation Logic
import { LlmClient, ApiKeyManager } from 'kukamba';
const client = new LlmClient(
yourAdapter,
yourTextExtractor,
keyManager
);
// Validate response contains specific keywords
const validateKeywords = (text: string) => {
const required = ['quantum', 'superposition', 'entanglement'];
return required.every(keyword =>
text.toLowerCase().includes(keyword)
);
};
const result = await client.generate(
'Explain quantum computing concepts',
validateKeywords
);
// Validate response length and format
const validateBlogPost = (text: string) => {
const wordCount = text.split(/\s+/).length;
const hasHeadings = /#{1,6}\s/.test(text);
return wordCount >= 500 && wordCount <= 1000 && hasHeadings;
};
const blogPost = await client.generate(
'Write a blog post about TypeScript',
validateBlogPost
);
// Validate code output
const validateCode = (text: string) => {
const hasFunction = /function\s+\w+\s*\(/.test(text);
const hasReturn = /return\s+/.test(text);
const isComplete = text.includes('}');
return hasFunction && hasReturn && isComplete;
};
const code = await client.generate(
'Write a TypeScript function to sort an array',
validateCode
);Example 5: Custom Logger Integration
import { LlmClient, ApiKeyManager, type Logger } from 'kukamba';
import winston from 'winston';
// Use Winston logger
const winstonLogger: Logger = {
log: (msg, ...args) => winston.info(msg, ...args),
warn: (msg, ...args) => winston.warn(msg, ...args),
error: (msg, ...args) => winston.error(msg, ...args),
};
const client = new LlmClient(
yourAdapter,
yourTextExtractor,
keyManager,
{ maxRetries: 5 },
winstonLogger
);
// Or use a silent logger for production
const silentLogger: Logger = {
log: () => {},
warn: () => {},
error: () => {},
};Example 6: Google Gemini with Retry Logic
import { GoogleGenAI } from '@google/genai';
import { LlmClient, ApiKeyManager } from 'kukamba';
const keyManager = new ApiKeyManager(
process.env.GEMINI_KEYS!.split(','),
LoadBalancingStrategy.LeastConnections,
{
maxConsecutiveFailures: 5,
circuitBreakerResetTime: 120000, // 2 minutes
maxParallelRequestsPerKey: 20,
}
);
const geminiAdapter = async (prompt: string, apiKey: string, config: any) => {
const ai = new GoogleGenAI({
apiKey,
httpOptions: { timeout: config.timeout }
});
return await ai.models.generateContent({
model: config.model || 'gemini-2.5-flash-lite',
contents: prompt,
config: { temperature: config.temperature || 0.1 },
});
};
const client = new LlmClient(
geminiAdapter,
(response) => response.text || null,
keyManager,
{
maxRetries: 5,
initialDelay: 2000,
maxDelay: 60000,
timeout: 600000,
}
);
// Generate with custom config
const result = await client.generate(
'Explain machine learning',
(text) => text.length > 200,
{
model: 'gemini-2.5-pro',
temperature: 0.7,
timeout: 120000
}
);Example 7: Health Monitoring and Circuit Breaking
import { LlmClient, ApiKeyManager } from 'kukamba';
const keyManager = new ApiKeyManager(
['key1', 'key2', 'key3'],
LoadBalancingStrategy.WeightedHealth,
{
maxConsecutiveFailures: 3,
minHealthScore: 0.4,
circuitBreakerResetTime: 90000,
}
);
const client = new LlmClient(
yourAdapter,
yourTextExtractor,
keyManager
);
// Monitor health during processing
setInterval(() => {
const health = client.getKeyHealthStatus();
health.forEach(key => {
console.log(`Key: ${key.key}`);
console.log(` Health Score: ${key.healthScore.toFixed(2)}`);
console.log(` Success: ${key.successCount}, Failures: ${key.failureCount}`);
console.log(` Circuit Open: ${key.isCircuitOpen}`);
console.log(` Active Requests: ${key.activeRequests}`);
});
}, 10000);
// Process many requests
const results = await client.generateBatch(
prompts,
validateFunction
);
// Reset health if needed
client.resetHealth();Error Handling
Kukamba automatically classifies and handles different error types:
- Rate Limit (429) - Applies exponential backoff with longer delays
- Timeout - Retries with backoff
- Server Errors (5xx) - Retries with backoff
- Authentication (401) - No retry, throws immediately
- Bad Request (4xx) - No retry, throws immediately
Circuit breakers automatically disable keys after consecutive failures and re-enable them after a timeout.
Best Practices
- Use validation functions - Always validate LLM responses to ensure quality
- Set appropriate retry limits - Balance between success rate and latency
- Monitor health status - Track key performance to identify issues
- Use batch processing - Maximize throughput for multiple requests
- Choose the right strategy - WeightedHealth for reliability, LeastConnections for speed
- Configure circuit breakers - Prevent wasting requests on unhealthy keys
- Provide custom loggers - Integrate with your existing logging infrastructure
Contributing
Contributions are welcome! Please read our AGENTS.md for development guidelines.
License
MIT
Support
For issues and questions, please open an issue on GitHub.
