llmpool

v0.0.6

Published

2 months ago

Production-ready LLM API pool manager with load balancing, failover, and dynamic configuration

0High
0Medium
0Low

ktbsomen

llm ai api pool load-balancing failover openai anthropic groq together-ai cohere

LLM Pool Manager

A production-ready, fault-tolerant Node.js library for managing multiple LLM API providers with intelligent load balancing, automatic failover, and dynamic configuration management.

Features

🚀 Multi-Provider Support

OpenAI GPT models
Anthropic Claude models
Groq models
Together AI models
Cohere models
Easy to extend for new providers

⚖️ Intelligent Load Balancing

Priority-based provider selection
Success rate tracking
Response time optimization
Circuit breaker pattern

🔄 Automatic Failover

Seamless switching between providers
Configurable retry logic with exponential backoff
Rate limit detection and handling

📊 Advanced Rate Limiting

Per-minute and per-day request limits
Dynamic rate limit detection from API responses
Intelligent request distribution

🛡️ Fault Tolerance

Circuit breaker for failing providers
Request timeout handling
Network error recovery
Provider health monitoring

⚙️ Dynamic Configuration

Hot-reload configuration changes
Local file or remote URL configuration
Configuration validation and checksums
Zero-downtime updates

🖼️ Multi-Modal Support

Text and image message handling
Base64 image support
Provider-specific format conversion

📈 Comprehensive Monitoring

Real-time provider statistics
Cost tracking with token pricing
Performance metrics
Health checks and alerts

Installation

npm install llmpool

Quick Start

1. Create Configuration

Create a config.json file:

{
  "providers": [
    {
      "name": "groq-primary",
      "type": "groq",
      "api_key": "your-groq-api-key",
      "base_url": "https://api.groq.com/openai/v1",
      "model": "mixtral-8x7b-32768",
      "priority": 1,
      "requests_per_minute": 30,
      "requests_per_day": 1000
    },
    {
      "name": "openai-fallback",
      "type": "openai", 
      "api_key": "your-openai-api-key",
      "base_url": "https://api.openai.com/v1",
      "model": "gpt-4",
      "priority": 2,
      "requests_per_minute": 100,
      "requests_per_day": 5000
    }
  ]
}

2. Basic Usage

const { LLMPool, createTextMessage } = require('llmpool');

async function main() {
  // Initialize pool
  const pool = new LLMPool({
    configPath: './config.json'
  });

  await pool.initialize();

  // Send chat request
  const response = await pool.chat({
    messages: [
      createTextMessage('system', 'You are a helpful assistant.'),
      createTextMessage('user', 'What is the capital of France?')
    ],
    temperature: 0.7,
    max_tokens: 1000
  });

  console.log('Response:', response.content);
  console.log('Provider:', response.provider);
  console.log('Tokens used:', response.usage.total_tokens);

  await pool.shutdown();
}

main().catch(console.error);

Advanced Usage

Image Support

const { createImageMessage } = require('llmpool');

const response = await pool.chat({
  messages: [
    createImageMessage(
      'user',
      'What do you see in this image?',
      'data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEASABIAAD...'
    )
  ]
});

Remote Configuration

const pool = new LLMPool({
  configUrl: 'https://your-domain.com/llm-config.json',
  checkInterval: 300000 // Check for updates every 5 minutes
});

pool.on('configChanged', (config) => {
  console.log('Configuration updated automatically');
});

Event Monitoring

pool.on('requestSuccess', (event) => {
  console.log(`✅ ${event.provider} succeeded on attempt ${event.attempt}`);
});

pool.on('requestError', (event) => {
  console.log(`❌ ${event.provider} failed: ${event.error}`);
});

pool.on('providersUpdated', (providers) => {
  console.log(`Updated ${providers.length} providers`);
});

Health Monitoring

// Get overall pool health
const health = pool.getPoolHealth();
console.log(`Available: ${health.availableProviders}/${health.totalProviders}`);

// Get detailed provider statistics
const stats = pool.getProviderStats();
Object.entries(stats).forEach(([name, stat]) => {
  console.log(`${name}:`);
  console.log(`  Success Rate: ${stat.performance.successRate.toFixed(2)}%`);
  console.log(`  Avg Response Time: ${stat.performance.averageResponseTime}ms`);
  console.log(`  Total Cost: $${stat.usage.totalCost.toFixed(4)}`);
});

Configuration Reference

Pool Configuration

const pool = new LLMPool({
  // Configuration source (choose one)
  configPath: './config.json',           // Local file path
  configUrl: 'https://example.com/config.json', // Remote URL
  
  // Behavior settings
  timeout: 30000,        // Request timeout (ms)
  maxRetries: 3,         // Maximum retry attempts
  retryDelay: 1000,      // Initial retry delay (ms)
  checkInterval: 300000, // Config check interval (ms)
  useTokenCounting: true // Enable token estimation
});

Provider Configuration

{
  "name": "provider-name",          // Unique identifier
  "type": "openai",                 // Provider type
  "api_key": "your-api-key",        // API authentication
  "base_url": "https://api.openai.com/v1",
  "model": "gpt-4",                 // Model to use
  "priority": 1,                    // Selection priority (lower = higher priority)
  
  // Rate limiting
  "requests_per_minute": 100,       // RPM limit
  "requests_per_day": 5000,         // Daily limit
  
  // Circuit breaker
  "circuit_breaker_threshold": 5,   // Failure threshold
  "circuit_breaker_timeout": 60000, // Recovery timeout (ms)
  
  // Request defaults
  "max_tokens": 4096,               // Default max tokens
  "temperature": 0.7,               // Default temperature
  "timeout": 30000,                 // Request timeout (ms)
  
  // Cost tracking (optional)
  "input_token_price": 0.03,        // Cost per 1K input tokens
  "output_token_price": 0.06        // Cost per 1K output tokens
}

Supported Provider Types

| Provider | Type | Base URL | |----------|------|----------| | OpenAI | openai | https://api.openai.com/v1 | | Gemini | gemini | https://generativelanguage.googleapis.com/v1beta/openai | | Anthropic | anthropic | https://api.anthropic.com/v1 | | Groq | groq | https://api.groq.com/openai/v1 | | Together AI | together | https://api.together.xyz/v1 | | Cohere | cohere | https://api.cohere.ai/v1 |

Error Handling

The library provides specific error types for different scenarios:

const { 
  ProviderError, 
  RateLimitError, 
  ConfigurationError 
} = require('llmpool');

try {
  const response = await pool.chat({ messages });
} catch (error) {
  if (error instanceof RateLimitError) {
    console.log(`Rate limited by ${error.provider}, retry in ${error.resetTime}s`);
  } else if (error instanceof ProviderError) {
    console.log(`Provider ${error.provider} failed: ${error.message}`);
    if (error.retryable) {
      // Can retry with different provider
    }
  } else if (error instanceof ConfigurationError) {
    console.log(`Configuration issue: ${error.message}`);
  }
}

Testing

Run the test suite:

npm test

Run specific test categories:

# Unit tests only
npm test -- --testNamePattern="LLMPool|Provider|ConfigManager"

# Integration tests
npm test -- --testNamePattern="Integration"

# Performance tests  
npm test -- --testNamePattern="Performance"

Performance Considerations

Concurrent Requests

The pool handles concurrent requests efficiently:

// Process multiple requests simultaneously
const promises = requests.map(request => 
  pool.chat({ messages: request.messages })
);

const results = await Promise.allSettled(promises);

Memory Usage

Provider statistics are kept in memory with configurable history limits
Token counting uses efficient algorithms when enabled
Configuration changes don't cause memory leaks

Optimization Tips

Set appropriate priorities - Put faster/cheaper providers first
Configure realistic rate limits - Match provider specifications
Use circuit breakers - Prevent cascading failures
Monitor health regularly - Detect issues early
Cache configurations - Reduce remote config fetches

Security Best Practices

API Key Management

// Use environment variables
const config = {
  providers: [{
    name: 'openai',
    type: 'openai',
    api_key: process.env.OPENAI_API_KEY,
    // ... other config
  }]
};

Request Validation

All requests are validated before sending:

Message format validation
Content length checks
Parameter sanitization
Provider capability verification

Network Security

HTTPS-only connections
Request timeout protection
Retry limit enforcement
Error message sanitization

Monitoring and Observability

Metrics Collection

// Set up periodic monitoring
setInterval(() => {
  const health = pool.getPoolHealth();
  const stats = pool.getProviderStats();
  
  // Log metrics to your monitoring system
  console.log('Pool Health:', health);
  
  // Alert on issues
  if (!health.healthy) {
    console.warn('🚨 Pool unhealthy - no available providers');
  }
  
  Object.entries(stats).forEach(([name, stat]) => {
    if (stat.performance.successRate < 90) {
      console.warn(`⚠️ ${name} has low success rate: ${stat.performance.successRate}%`);
    }
  });
}, 30000);

Integration with Monitoring Tools

The library emits structured events that can be integrated with monitoring tools:

// Prometheus metrics example
pool.on('requestSuccess', (event) => {
  prometheus.requestsTotal
    .labels({ provider: event.provider, status: 'success' })
    .inc();
});

pool.on('requestError', (event) => {
  prometheus.requestsTotal
    .labels({ provider: event.provider, status: 'error' })
    .inc();
});

Troubleshooting

Common Issues

No available providers

Check provider configurations
Verify API keys are valid
Check rate limits haven't been exceeded
Ensure network connectivity

High failure rates

Review circuit breaker thresholds
Check provider status pages
Verify request formats
Monitor network timeouts

Configuration not updating

Verify remote URL accessibility
Check file permissions for local configs
Review checkInterval setting
Monitor configChanged events

Debug Mode

Enable verbose logging:

const pool = new LLMPool({
  configPath: './config.json',
  debug: true
});

pool.on('debug', (message) => {
  console.log('DEBUG:', message);
});

Health Checks

Implement regular health checks:

async function healthCheck() {
  const health = pool.getPoolHealth();
  
  if (!health.healthy) {
    throw new Error('LLM Pool is unhealthy');
  }
  
  return {
    status: 'healthy',
    providers: health.availableProviders,
    total: health.totalProviders
  };
}

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

Development Setup

git clone https://github.com/KTBsomen/llmpool.git
cd llmpool
npm install
npm test

License

MIT License - see LICENSE file for details.

Changelog

v1.0.0

Initial release
Multi-provider support
Dynamic configuration
Circuit breaker implementation
Comprehensive test suite
Production-ready error handling

For more examples and advanced usage patterns, see the examples directory.