multi-llm

v1.1.1

Published

7 months ago

A unified TypeScript/JavaScript package to use LLMs across ALL platforms with support for 17 major providers, streaming, MCP tools, and intelligent response parsing

Multi-LLM 🤖

A unified TypeScript/JavaScript package to use LLMs across ALL platforms with support for streaming, MCP tools, and intelligent response parsing.

Features

🌐 Universal Provider Support: 17 Major Providers including OpenAI, Anthropic, Google Gemini, Cohere, Mistral AI, Together AI, Fireworks AI, OpenRouter, Groq, Cerebras, Ollama, Azure OpenAI, Perplexity, DeepInfra, Replicate, HuggingFace, AWS Bedrock
⚡ Streaming & Non-Streaming: Real-time streaming or batch processing
🔄 Intelligent Retry System: Exponential backoff retry logic for handling API failures and rate limits
🧠 Smart Response Parsing: Automatic extraction of code blocks, thinking sections, and structured content
🔧 MCP Integration: Add Model Context Protocol tools to enhance capabilities
📘 TypeScript Support: Full type definitions and IntelliSense
🎯 Unified API: Same interface across all 17 providers
🧪 Smart Testing: Conditional tests run only for configured providers

Installation

npm install multi-llm

Quick Start

import { MultiLLM } from 'multi-llm';

// Create a provider
const provider = MultiLLM.createProvider('openai', 'your-api-key');

// Get available models
const models = await provider.getModels();
console.log(models);

// Create LLM instance
const llm = provider.createLLM('gpt-4o-mini');

// Non-streaming chat
const result = await llm.chat('What is the capital of France?', {
  temperature: 0.7,
  maxTokens: 100,
  system: 'You are a helpful geography assistant'
});

console.log(result.parsed.content);

// Chat with retry configuration for handling API failures
const robustResult = await llm.chat('What is the capital of France?', {
  temperature: 0.7,
  maxTokens: 100,
  system: 'You are a helpful geography assistant',
  retries: 3,                    // Retry up to 3 times on failure (default: 1)
  retryInterval: 1000,           // Initial retry delay: 1 second (default: 1000ms)
  retryBackoff: 2                // Exponential backoff multiplier (default: 2)
});

console.log(robustResult.parsed.content);

// Streaming chat
const streamResult = await llm.chat('Tell me a story', {
  temperature: 1.0
}, (chunk) => {
  process.stdout.write(chunk); // Real-time streaming
});

Supported Providers

OpenAI

const provider = MultiLLM.createProvider('openai', 'sk-...');
const llm = provider.createLLM('gpt-4o-mini');

Anthropic

const provider = MultiLLM.createProvider('anthropic', 'sk-ant-...');
const llm = provider.createLLM('claude-3-5-sonnet-20241022');

OpenRouter

const provider = MultiLLM.createProvider('openrouter', 'sk-or-...');
const llm = provider.createLLM('microsoft/wizardlm-2-8x22b');

Groq

const provider = MultiLLM.createProvider('groq', 'gsk_...');
const llm = provider.createLLM('llama3-70b-8192');

Cerebras

const provider = MultiLLM.createProvider('cerebras', 'csk-...');
const llm = provider.createLLM('llama3.1-70b');

Ollama (Local)

const provider = MultiLLM.createProvider('ollama', '', 'http://localhost:11434');
const llm = provider.createLLM('llama3.2');

Azure OpenAI

const provider = MultiLLM.createProvider('azure', 'your-api-key', 'https://your-resource.openai.azure.com');
const llm = provider.createLLM('your-deployment-name');

Google Gemini

const provider = MultiLLM.createProvider('google', 'your-api-key');
const llm = provider.createLLM('gemini-2.5-pro');

Cohere

const provider = MultiLLM.createProvider('cohere', 'your-api-key');
const llm = provider.createLLM('command-r-plus');

Mistral AI

const provider = MultiLLM.createProvider('mistral', 'your-api-key');
const llm = provider.createLLM('mistral-large-latest');

Together AI

const provider = MultiLLM.createProvider('together', 'your-api-key');
const llm = provider.createLLM('meta-llama/Llama-3.2-3B-Instruct-Turbo');

Fireworks AI

const provider = MultiLLM.createProvider('fireworks', 'your-api-key');
const llm = provider.createLLM('accounts/fireworks/models/llama-v3p1-70b-instruct');

Perplexity

const provider = MultiLLM.createProvider('perplexity', 'your-api-key');
const llm = provider.createLLM('llama-3.1-sonar-large-128k-online');

DeepInfra

const provider = MultiLLM.createProvider('deepinfra', 'your-api-key');
const llm = provider.createLLM('meta-llama/Meta-Llama-3.1-8B-Instruct');

Replicate

const provider = MultiLLM.createProvider('replicate', 'your-api-key');
const llm = provider.createLLM('meta/llama-2-70b-chat');

Hugging Face

const provider = MultiLLM.createProvider('huggingface', 'your-api-key');
const llm = provider.createLLM('mistralai/Mixtral-8x7B-Instruct-v0.1');

Amazon Bedrock

const provider = MultiLLM.createProvider('bedrock', 'accessKeyId:secretAccessKey');
const llm = provider.createLLM('anthropic.claude-3-5-sonnet-20241022-v2:0');

Response Structure

Every chat response includes:

interface ChatResult {
  raw: any;                    // Raw provider response
  parsed: {
    content: string;           // Clean text content
    codeBlocks: Array<{        // Extracted code blocks
      language: string;
      code: string;
    }>;
    thinking?: string;         // Extracted thinking/reasoning
    toolCalls?: Array<{        // MCP tool calls (if available)
      id: string;
      function: string;
      args: any;
      execute: () => Promise<any>;
    }>;
  };
  usage?: {                    // Token usage stats
    inputTokens: number;
    outputTokens: number;
    totalTokens: number;
  };
}

Retry Configuration

Multi-LLM includes built-in retry functionality with exponential backoff to handle temporary API failures, rate limits, and network issues.

Basic Retry Usage

const result = await llm.chat('Your message', {
  retries: 3,                    // Number of retry attempts (default: 1)
  retryInterval: 1000,           // Initial retry delay in ms (default: 1000)
  retryBackoff: 2,               // Backoff multiplier (default: 2)
  // ... other chat options
});

Retry Behavior

The retry system implements exponential backoff:

1st retry: After retryInterval ms (e.g., 1000ms)
2nd retry: After retryInterval × retryBackoff ms (e.g., 2000ms)
3rd retry: After retryInterval × retryBackoff² ms (e.g., 4000ms)

Retry Examples

// Conservative retry for important requests
const result = await llm.chat('Critical business query', {
  retries: 5,
  retryInterval: 2000,     // Start with 2 second delay
  retryBackoff: 1.5,       // Slower backoff: 2s, 3s, 4.5s, 6.75s, 10.125s
  maxTokens: 500
});

// Quick retry for real-time applications  
const result = await llm.chat('Fast query', {
  retries: 2,
  retryInterval: 200,      // Quick 200ms initial delay
  retryBackoff: 3,         // Aggressive backoff: 200ms, 600ms
  maxTokens: 50
});

// Disable retries (equivalent to retries: 0)
const result = await llm.chat('One-shot request', {
  retries: 0,              // No retries, fail immediately
  maxTokens: 100
});

Error Handling with Retries

try {
  const result = await llm.chat('Your message', {
    retries: 3,
    retryInterval: 1000,
    retryBackoff: 2
  });
  
  console.log(result.parsed.content);
} catch (error) {
  // After exhausting all retries
  console.error('Request failed:', error.message);
  // Error message includes retry context:
  // "Failed after 3 retries (ProviderName:model-id): Original error message"
}

When Retries Are Triggered

Retries are automatically triggered for:

Network errors (connection timeouts, DNS failures)
Rate limit errors (429 status codes)
Server errors (5xx status codes)
Authentication failures (invalid API keys)
Model unavailability (temporary model issues)

Retries are NOT triggered for:

Client errors (400, 404 - malformed requests)
Successful responses (2xx status codes)
Streaming responses (retries could cause duplicate content)

Default Configuration

If no retry options are specified, the system uses:

{
  retries: 1,           // 1 retry attempt
  retryInterval: 1000,  // 1 second initial delay  
  retryBackoff: 2       // Double delay each retry
}

MCP (Model Context Protocol) Integration

Add tools to enhance your LLM's capabilities:

const llm = provider.createLLM('gpt-4o-mini');

// Add MCP server
llm.addMCP('python3 -m my_mcp_server');

// Chat with tool access
const result = await llm.chat('Calculate the fibonacci sequence', {});

// Execute tool calls if present
if (result.parsed.toolCalls?.length > 0) {
  for (const toolCall of result.parsed.toolCalls) {
    const toolResult = await toolCall.execute();
    console.log(`Tool ${toolCall.function} result:`, toolResult);
  }
}

Testing

The package includes comprehensive tests for each provider. Tests are only run for providers with valid environment variables.

Environment Setup

The test system automatically detects available providers based on environment variables. Only providers with valid credentials will run tests.

Create a .env file in the project root:

# Copy the example file
cp .env.example .env

# Edit .env with your API keys (add only the providers you want to test)

Provider Environment Variables (add only what you have):

# OpenRouter
OPENROUTER_API_KEY=your_openrouter_api_key
OPENROUTER_MODEL=microsoft/wizardlm-2-8x22b

# OpenAI
OPENAI_API_KEY=your_openai_api_key
OPENAI_MODEL=gpt-4o-mini

# Anthropic
ANTHROPIC_API_KEY=your_anthropic_api_key
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022

# Google Gemini
GOOGLE_API_KEY=your_google_api_key
GOOGLE_MODEL=gemini-2.5-pro

# Cohere
COHERE_API_KEY=your_cohere_api_key
COHERE_MODEL=command-r-plus

# Mistral AI
MISTRAL_API_KEY=your_mistral_api_key
MISTRAL_MODEL=mistral-large-latest

# Together AI
TOGETHER_API_KEY=your_together_api_key
TOGETHER_MODEL=meta-llama/Llama-3.2-3B-Instruct-Turbo

# Fireworks AI
FIREWORKS_API_KEY=your_fireworks_api_key
FIREWORKS_MODEL=accounts/fireworks/models/llama-v3p1-70b-instruct

# Groq
GROQ_API_KEY=your_groq_api_key
GROQ_MODEL=llama3-70b-8192

# Cerebras
CEREBRAS_API_KEY=your_cerebras_api_key
CEREBRAS_MODEL=llama3.1-70b

# Ollama (local)
OLLAMA_MODEL=llama3.2
OLLAMA_BASE_URL=http://localhost:11434

# Azure OpenAI
AZURE_API_KEY=your_azure_api_key
AZURE_BASE_URL=https://your-resource.openai.azure.com
AZURE_MODEL=your-deployment-name

# Perplexity
PERPLEXITY_API_KEY=your_perplexity_api_key
PERPLEXITY_MODEL=llama-3.1-sonar-large-128k-online

# DeepInfra
DEEPINFRA_API_KEY=your_deepinfra_api_key
DEEPINFRA_MODEL=meta-llama/Meta-Llama-3.1-8B-Instruct

# Replicate
REPLICATE_API_KEY=your_replicate_api_key
REPLICATE_MODEL=meta/llama-2-70b-chat

# Hugging Face
HUGGINGFACE_API_KEY=your_huggingface_api_key
HUGGINGFACE_MODEL=mistralai/Mixtral-8x7B-Instruct-v0.1

# AWS Bedrock
BEDROCK_API_KEY=accessKeyId:secretAccessKey
BEDROCK_MODEL=anthropic.claude-3-5-sonnet-20241022-v2:0
BEDROCK_REGION=us-east-1

Running Tests

# Install dependencies
npm install

# Build the project
npm run build

# Run all tests
# ✅ Providers with valid credentials will run tests
# ⏭️  Providers without credentials will be skipped
npm test

# Run tests for specific provider
npm test -- --testPathPattern=openrouter

# Run tests with coverage
npm test -- --coverage

# Run tests in watch mode
npm test -- --watch

Example Output:

📊 Provider Environment Status:
   OpenRouter: ✅ Available
   OpenAI: ❌ Missing credentials
   Anthropic: ✅ Available
   Google: ✅ Available
   Cohere: ❌ Missing credentials
   Mistral: ❌ Missing credentials
   Together: ❌ Missing credentials
   Fireworks: ❌ Missing credentials
   Groq: ❌ Missing credentials
   Cerebras: ❌ Missing credentials
   Ollama: ❌ Missing credentials
   Azure: ❌ Missing credentials
   Perplexity: ❌ Missing credentials
   DeepInfra: ❌ Missing credentials
   Replicate: ❌ Missing credentials
   HuggingFace: ❌ Missing credentials
   Bedrock: ❌ Missing credentials

🎯 3 providers available for testing: openrouter, anthropic, google

✅ Test execution will run for 3 provider(s): openrouter, anthropic, google
🚀 Provider-specific tests will execute for configured providers
⏭️  Provider tests without credentials will be skipped

Test Categories

Each provider test suite includes:

Provider Creation: Basic instantiation and configuration
Model Management: Fetching available models and metadata
Non-Streaming Chat: Standard request/response with performance metrics
Streaming Chat: Real-time streaming with chunk analysis
Error Handling: Invalid requests and edge cases
Response Parsing: Code blocks, thinking extraction, and structured content

Performance Metrics

Tests automatically measure and report:

Response time for non-streaming requests
Time to first chunk for streaming requests
Total streaming time
Token usage statistics (when available)
Chunk count and average size for streaming

API Reference

MultiLLM

class MultiLLM {
  static createProvider(type: ProviderType, apiKey: string, baseUrl?: string): Provider
}

Provider

abstract class Provider {
  abstract getModels(): Promise<ModelInfo[]>
  abstract createLLM(modelId: string): LLM
}

LLM

class LLM {
  addMCP(startupCommand: string): void
  chat(content: string, options: ChatOptions, streamCallback?: StreamCallback): Promise<ChatResult>
  dispose(): void
}

ChatOptions

interface ChatOptions {
  temperature?: number;      // 0.0 to 2.0
  maxTokens?: number;        // Maximum output tokens
  topP?: number;            // Nucleus sampling parameter
  topK?: number;            // Top-K sampling parameter  
  system?: string;          // System message
  stream?: boolean;         // Automatically set based on callback presence
  
  // Retry configuration
  retries?: number;         // Number of retry attempts (default: 1)
  retryInterval?: number;   // Initial retry delay in ms (default: 1000)
  retryBackoff?: number;    // Exponential backoff multiplier (default: 2)
  
  [key: string]: any;       // Provider-specific options
}

Examples

See example.js for comprehensive usage examples across all providers.

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Add tests for your changes
Run the test suite (npm test)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT License - see LICENSE file for details.

Changelog

v1.1.0

🔄 Intelligent Retry System: Added exponential backoff retry logic with customizable configuration
- retries: Number of retry attempts (default: 1)
- retryInterval: Initial retry delay in milliseconds (default: 1000)
- retryBackoff: Exponential backoff multiplier (default: 2)
🧪 Comprehensive Retry Testing: 13 new test cases covering retry behavior, backoff timing, and error handling
📚 Enhanced Documentation: Complete retry configuration examples and best practices
⚡ Production Ready: Robust error handling for network issues, rate limits, and API failures

v1.0.0

Initial release with support for 17 providers:
- Core Providers: OpenAI, Anthropic, Google Gemini, OpenRouter
- Performance Providers: Groq, Cerebras, Together AI, Fireworks AI
- Specialized Providers: Cohere, Mistral AI, Perplexity, DeepInfra
- Local/Custom: Ollama, Azure OpenAI
- Cloud Platforms: Replicate, Hugging Face, AWS Bedrock
Streaming and non-streaming support across all providers
Smart response parsing with code block and thinking extraction
MCP integration framework for enhanced capabilities
Conditional testing system that adapts to available credentials
Comprehensive test suite with performance metrics
Full TypeScript definitions and IntelliSense support