@ai-orchestration/core

v0.4.0

Published

a month ago

Modular AI orchestration framework for multiple LLM providers

0High
0Medium
0Low

kndl

ai llm orchestration grok gemini openrouter cerebras

AI Orchestration Framework

A modular and extensible framework for orchestrating multiple AI/LLM providers consistently and configurable.

📦 This is an npm package: API keys must be configured in the project that uses this package (using environment variables or .env files in that project), not in the package itself.

Features

🔌 Plugin-based architecture: Add new providers or strategies without modifying the core
🎯 Multiple selection strategies: Round-robin, priority, fallback, weighted, health-aware
🌊 Native streaming: Full support for streaming responses using ReadableStream
🔄 Automatic fallback: Automatically tries multiple providers if one fails
💚 Health checks: Provider health monitoring with latency metrics
📦 Runtime agnostic: Compatible with Node.js and Bun
🎨 Declarative API: Simple configuration via JSON/JS objects
🔒 Type-safe: Fully typed with TypeScript

Installation

npm install @ai-orchestration/core

Module System Compatibility

This package supports both ESM (ECMAScript Modules) and CommonJS, so you can use it in any Node.js project:

ESM Projects (recommended):

import { createOrchestrator } from '@ai-orchestration/core';

CommonJS Projects:

const { createOrchestrator } = require('@ai-orchestration/core');

The package automatically exports the correct format based on your project's module system.

Quick Start

Basic Usage

import { createOrchestrator } from '@ai-orchestration/core';

// API keys should come from environment variables configured in YOUR project
// Example: export GROQ_API_KEY="your-key" or using dotenv in your project
const orchestrator = createOrchestrator({
  providers: [
    {
      id: 'groq-1',
      type: 'groq',
      apiKey: process.env.GROQ_API_KEY!, // Configure this variable in your project
      model: 'llama-3.3-70b-versatile',
    },
    {
      id: 'openrouter-1',
      type: 'openrouter',
      apiKey: process.env.OPENROUTER_API_KEY!,
      model: 'openai/gpt-3.5-turbo',
    },
  ],
  strategy: {
    type: 'round-robin',
  },
});

// Simple chat
const response = await orchestrator.chat([
  { role: 'user', content: 'Hello, world!' },
]);

console.log(response.content);

// Streaming chat
const stream = await orchestrator.chatStream([
  { role: 'user', content: 'Tell me a story' },
]);

const reader = stream.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  process.stdout.write(value.content);
}

Programmatic Usage

import {
  Orchestrator,
  RoundRobinStrategy,
  GroqProvider,
  OpenRouterProvider,
} from '@ai-orchestration/core';

// Create strategy
const strategy = new RoundRobinStrategy();

// Create orchestrator
const orchestrator = new Orchestrator(strategy);

// Register providers
// API keys should come from environment variables configured in YOUR project
orchestrator.registerProvider(
  new GroqProvider({
    id: 'groq-1',
    apiKey: process.env.GROQ_API_KEY!,
  })
);

orchestrator.registerProvider(
  new OpenRouterProvider({
    id: 'openrouter-1',
    apiKey: process.env.OPENROUTER_API_KEY!,
  })
);

// Use
const response = await orchestrator.chat([
  { role: 'user', content: 'Hello!' },
]);

Selection Strategies

Round-Robin

Cycles through providers in order:

{
  strategy: {
    type: 'round-robin',
  },
}

Priority

Selects providers based on priority (lower number = higher priority):

{
  strategy: {
    type: 'priority',
    priorities: {
      'groq-1': 1,
      'openrouter-1': 2,
      'gemini-1': 3,
    },
  },
}

Fallback

Tries providers in order until one works:

{
  strategy: {
    type: 'fallback',
    order: ['groq-1', 'openrouter-1', 'gemini-1'],
  },
}

Weighted

Selection based on weights (useful for load balancing):

{
  strategy: {
    type: 'weighted',
    weights: {
      'groq-1': 0.7,
      'openrouter-1': 0.3,
    },
  },
}

Weighted Cost-Aware

Considers cost per token:

{
  strategy: {
    type: 'weighted',
    costAware: true,
    weights: {
      'groq-1': 1.0,
      'openrouter-1': 1.0,
    },
  },
}

Health-Aware

Selects based on health metrics (latency, success rate):

{
  strategy: {
    type: 'health-aware',
    preferLowLatency: true,
    minHealthScore: 0.5,
  },
}

Supported Providers

Groq

{
  id: 'groq-1',
  type: 'groq',
  apiKey: 'your-api-key',
  model: 'llama-3.3-70b-versatile', // optional, default
  baseURL: 'https://api.groq.com/openai/v1', // optional
}

OpenRouter

{
  id: 'openrouter-1',
  type: 'openrouter',
  apiKey: 'your-api-key',
  model: 'openai/gpt-3.5-turbo', // optional
  baseURL: 'https://openrouter.ai/api/v1', // optional
}

Google Gemini

{
  id: 'gemini-1',
  type: 'gemini',
  apiKey: 'your-api-key',
  model: 'gemini-pro', // optional
  baseURL: 'https://generativelanguage.googleapis.com/v1beta', // optional
}

Cerebras

Cerebras Inference API - OpenAI compatible. Documentation: inference-docs.cerebras.ai

{
  id: 'cerebras-1',
  type: 'cerebras',
  apiKey: 'your-api-key', // Get at: https://inference-docs.cerebras.ai
  model: 'llama-3.3-70b', // optional, default
  baseURL: 'https://api.cerebras.ai/v1', // optional
}

Note: Cerebras API requires the User-Agent header to avoid CloudFront blocking. This is included automatically.

Local (Local Models)

For local models that expose an OpenAI-compatible API:

{
  id: 'local-1',
  type: 'local',
  baseURL: 'http://localhost:8000',
  model: 'local-model', // optional
  apiKey: 'optional-key', // optional
}

Advanced Configuration

Retry and Timeout Configuration

const orchestrator = createOrchestrator({
  providers: [...],
  strategy: {...},
  maxRetries: 3, // Maximum retry attempts (default: number of providers)
  requestTimeout: 30000, // Global timeout in milliseconds (default: 30000)
  retryDelay: 'exponential', // or number in milliseconds (default: 1000)
});

Circuit Breaker

Automatically disable providers after consecutive failures:

const orchestrator = createOrchestrator({
  providers: [...],
  strategy: {...},
  circuitBreaker: {
    enabled: true,
    failureThreshold: 5, // Open circuit after 5 failures
    resetTimeout: 60000, // Reset after 60 seconds
  },
});

Health Checks

Enhanced health check configuration:

const orchestrator = createOrchestrator({
  providers: [...],
  strategy: {...},
  healthCheck: {
    enabled: true,
    interval: 60000, // Check every 60 seconds
    timeout: 5000, // Health check timeout (default: 5000ms)
    maxConsecutiveFailures: 3, // Mark unhealthy after 3 failures (default: 3)
    latencyThreshold: 10000, // Max latency in ms (default: 10000ms)
  },
  // Legacy format still supported:
  // enableHealthChecks: true,
  // healthCheckInterval: 60000,
});

Or manually check health:

const health = await provider.checkHealth();
console.log(health.healthy, health.latency);

Chat Options

const response = await orchestrator.chat(messages, {
  temperature: 0.7,
  maxTokens: 1000,
  topP: 0.9,
  topK: 40,
  stopSequences: ['\n\n'],
  responseLanguage: 'es', // Force response in Spanish
  frequencyPenalty: 0.5, // Reduce repetition
  presencePenalty: 0.3, // Encourage new topics
  seed: 42, // For reproducible outputs
  timeout: 30000, // Request timeout in milliseconds
  user: 'user-123', // User identifier for tracking
});

Available Chat Options

temperature: Controls randomness (0.0 to 2.0)
maxTokens: Maximum tokens in response
topP: Nucleus sampling threshold
topK: Top-K sampling
stopSequences: Stop generation on these sequences
responseLanguage: Force response language (see below)
frequencyPenalty: Penalize frequent tokens (-2.0 to 2.0)
presencePenalty: Penalize existing tokens (-2.0 to 2.0)
seed: Seed for reproducible outputs
timeout: Request timeout in milliseconds (overrides global timeout)
user: User identifier for tracking/rate limiting

Forcing Response Language

You can force the AI to respond in a specific language using the responseLanguage option:

// Using ISO 639-1 language codes
const response = await orchestrator.chat(messages, {
  responseLanguage: 'es', // Spanish
  // or 'en', 'fr', 'de', 'it', 'pt', 'ja', 'zh', 'ru', etc.
});

// Using full language names
const response2 = await orchestrator.chat(messages, {
  responseLanguage: 'spanish', // Also works
  // or 'english', 'french', 'german', 'italian', etc.
});

How it works: When responseLanguage is specified, the framework automatically prepends a system message instructing the model to respond in the specified language. If you already have a system message, the language instruction will be prepended to it.

Supported languages: Spanish, English, French, German, Italian, Portuguese, Japanese, Chinese, Russian, Korean, Arabic, Hindi, Dutch, Polish, Swedish, Turkish (and more via ISO 639-1 codes).

Metrics and Analytics

Track provider usage, costs, and strategy effectiveness:

const orchestrator = createOrchestrator({
  providers: [...],
  strategy: {...},
  enableMetrics: true, // Enabled by default
  onMetricsEvent: (event) => {
    // Optional: Real-time event tracking
    console.log('Event:', event.type, event.providerId);
  },
});

// Make some requests...

// Get overall metrics
const metrics = orchestrator.getMetrics().getOrchestratorMetrics();
console.log('Total Requests:', metrics.totalRequests);
console.log('Total Cost:', metrics.totalCost);
console.log('Error Rate:', metrics.errorRate);

// Get provider-specific metrics
const providerMetrics = orchestrator.getMetrics().getProviderMetrics('groq-1');
console.log('Provider Requests:', providerMetrics?.totalRequests);
console.log('Provider Cost:', providerMetrics?.totalCost);
console.log('Success Rate:', providerMetrics?.successfulRequests / providerMetrics?.totalRequests);

// Get strategy metrics
const strategyMetrics = orchestrator.getMetrics().getStrategyMetrics();
console.log('Selections by Provider:', strategyMetrics.selectionsByProvider);
console.log('Average Selection Time:', strategyMetrics.averageSelectionTime);

Available Metrics

Provider Metrics: Requests, success/failure rates, latency, token usage, costs
Strategy Metrics: Selection counts, distribution, selection time
Overall Metrics: Total requests, costs, error rates, requests per minute
Request History: Detailed history with filtering options

See examples/metrics.ts for a complete example.

Extensibility

Adding a New Provider

import { BaseProvider } from '@ai-orchestration/core';
import type {
  ChatMessage,
  ChatOptions,
  ChatResponse,
  ChatChunk,
  ProviderHealth,
  ProviderMetadata,
} from '@ai-orchestration/core';

export class CustomProvider extends BaseProvider {
  readonly id: string;
  readonly metadata: ProviderMetadata;
  
  constructor(config: CustomConfig) {
    super();
    this.id = config.id;
    this.metadata = {
      id: this.id,
      name: 'Custom Provider',
    };
  }
  
  async checkHealth(): Promise<ProviderHealth> {
    // Implement health check
  }
  
  async chat(messages: ChatMessage[], options?: ChatOptions): Promise<ChatResponse> {
    // Implement chat
  }
  
  async chatStream(messages: ChatMessage[], options?: ChatOptions): Promise<ReadableStream<ChatChunk>> {
    // Implement streaming
  }
  
  protected formatMessages(messages: ChatMessage[]): unknown {
    // Convert standard format to provider format
  }
  
  protected parseResponse(response: unknown): ChatResponse {
    // Convert provider response to standard format
  }
  
  protected parseStream(stream: ReadableStream<unknown>): ReadableStream<ChatChunk> {
    // Convert provider stream to standard format
  }
}

Adding a New Strategy

import { BaseStrategy } from '@ai-orchestration/core';
import type { AIService, SelectionContext } from '@ai-orchestration/core';

export class CustomStrategy extends BaseStrategy {
  async select(
    providers: AIService[],
    context?: SelectionContext
  ): Promise<AIService | null> {
    // Implement selection logic
    return providers[0];
  }
  
  update?(provider: AIService, success: boolean, metadata?: unknown): void {
    // Optional: update internal state
  }
}

Architecture

src/
├── core/
│   ├── interfaces.ts      # Main interfaces
│   ├── types.ts           # Shared types
│   ├── orchestrator.ts    # Orchestrator core
│   └── errors.ts          # Custom error classes
├── providers/
│   ├── base.ts            # Base class for providers
│   ├── groq.ts
│   ├── openrouter.ts
│   ├── gemini.ts
│   ├── cerebras.ts
│   └── local.ts
├── strategies/
│   ├── base.ts            # Base class for strategies
│   ├── round-robin.ts
│   ├── priority.ts
│   ├── fallback.ts
│   ├── weighted.ts
│   └── health-aware.ts
├── factory/
│   └── index.ts           # Factory for declarative creation
└── index.ts               # Main entry point

Design Principles

Single Responsibility: Each class has a single responsibility
Open/Closed Principle: Extensible without modifying the core
Plugin-based Architecture: Providers and strategies are plugins
Composition over Inheritance: Preference for composition
Configuration over Hard-coding: Declarative configuration
Declarative APIs: Simple and expressive APIs

Development

Setup

# Install dependencies
npm install

# Build
npm run build

# Development with watch
npm run dev

# Type checking
npm run typecheck

# Tests
npm test

Testing

Quick Test (No API Keys Required)

Test the framework with mock providers without needing API keys:

npm run test:mock

Test with Real Providers

Note: The @ai-orchestration/core package does not include .env files. Environment variables must be configured in your project or in the examples.

Set environment variables:

export GROQ_API_KEY="your-key"
export OPENROUTER_API_KEY="your-key"
export GEMINI_API_KEY="your-key"
export CEREBRAS_API_KEY="your-key"

Run tests:

npm run test:local

Local Development in Other Projects

Method 1: npm link (Recommended)

# In this directory (ai-orchestration)
npm run link

# In your other project
npm link @ai-orchestration/core

Now you can import normally:

import { createOrchestrator } from '@ai-orchestration/core';

Method 2: npm pack

# In this directory
npm run pack:local

# In your other project
npm install ./@ai-orchestration-core-0.1.0.tgz

Requirements

Node.js: >= 18.0.0 (for native ReadableStream and test runner)
TypeScript: 5.3+ (already included in devDependencies)

Examples

See the examples/ directory for more code examples:

basic.ts - Basic usage example
strategies.ts - Strategy examples
test-local.ts - Testing with real providers
test-mock.ts - Testing with mock providers
chat-app/ - Full chat application example

License

MIT

Contributing

See CONTRIBUTING.md for guidelines on contributing to this project.