model-orch-sdk

v0.1.1

Published

a month ago

Multi-Model Routing & Orchestration Platform - Build secure, extensible orchestration for LLMs with routing strategies, ensemble logic, fallbacks, and monitoring

🎯 model-orch-sdk

Enterprise-Grade Multi-Model Orchestration - Route requests intelligently across 19+ LLM providers with sophisticated strategies, ensemble logic, automatic failover, cost optimization, and production-ready observability.

🔍 Overview

model-orch-sdk is a comprehensive TypeScript SDK for orchestrating multiple LLM and AI model endpoints. It enables teams to register many model providers (OpenAI, Anthropic, Google, Azure, custom endpoints), configure sophisticated routing and ensemble strategies, manage credentials securely, and operate models reliably with monitoring, cost controls, fallback policies, and A/B/canary experiments.

Key Capabilities

🔀 Multi-Model Support: Register any LLM API (OpenAI, Anthropic, Google, Azure, Cohere, HuggingFace, Replicate, custom HTTP/gRPC, local models)
🎯 Intelligent Routing: Percent splits, conditional routing, confidence cascading, cost/latency optimization
🤝 Ensemble Strategies: Voting, confidence-weighted aggregation, rankers, synthesizers
🔄 Reliability: Circuit breakers, automatic fallbacks, retry with exponential backoff
💰 Cost Control: Per-request and per-project budgets, cost estimation, quota management
📊 Observability: Distributed tracing, metrics, telemetry, audit logs
🔐 Security: Encrypted credential vault, RBAC, data residency rules, rotation policies
🧪 Experimentation: A/B tests, canary deployments, traffic splits with sticky sessions
⚡ Performance: Rate limiting, token buckets, caching, concurrent request management

📋 Prerequisites

Node.js 18 or higher
TypeScript 5.0+ (for development)

🚀 Installation

npm install model-orch-sdk

Or with yarn:

yarn add model-orch-sdk

🎮 Quick Start

Basic Usage

import { 
  CredentialVault, 
  TokenBucketRateLimiter, 
  CircuitBreaker 
} from 'model-orch-sdk';

// Initialize credential vault
const vault = new CredentialVault({
  encryptionKey: process.env.ENCRYPTION_KEY!,
  enableAuditLog: true,
  maxAccessLogSize: 1000,
});

// Store API credentials
await vault.storeCredential({
  projectId: 'proj_123',
  name: 'OpenAI Key',
  type: 'api_key',
  value: process.env.OPENAI_API_KEY!,
  createdBy: 'admin',
});

// Set up rate limiting
const rateLimiter = new TokenBucketRateLimiter(
  100,  // capacity
  10    // refill rate per second
);

// Create circuit breaker
const breaker = new CircuitBreaker(
  5,     // failure threshold
  2,     // success threshold
  30000  // reset timeout (30s)
);

Routing Configuration

import type { 
  PercentSplitConfig, 
  RoutingPolicyType 
} from 'model-orch-sdk';

// Define a percent split routing policy
const routingPolicy: PercentSplitConfig = {
  type: RoutingPolicyType.PERCENT_SPLIT,
  splits: [
    { connectorId: 'gpt-4', percentage: 70, weight: 0.7 },
    { connectorId: 'claude-3', percentage: 30, weight: 0.3 },
  ],
  stickySession: true,
  sessionKey: 'userId',
};

Ensemble Strategy

import type { 
  EnsembleRoutingConfig, 
  VotingConfig,
  EnsembleStrategyType 
} from 'model-orch-sdk';

// Configure ensemble with voting
const ensembleConfig: EnsembleRoutingConfig = {
  type: RoutingPolicyType.ENSEMBLE,
  connectorIds: ['gpt-4', 'claude-3', 'gemini-pro'],
  executionMode: 'parallel',
  aggregationStrategy: EnsembleStrategyType.VOTING,
  aggregationConfig: {
    type: EnsembleStrategyType.VOTING,
    votingMethod: 'majority',
    normalization: 'semantic',
    tieBreaker: 'highest_confidence',
  } as VotingConfig,
  timeout: 30000,
  minSuccessful: 2,
};

📚 Core Concepts

1. Project

Top-level workspace grouping models, policies, quotas, and telemetry. Each organization can have multiple projects with isolated configurations.

2. Model Connector

Registered endpoint with metadata (type, endpoint URL, credentials, rate limits, cost info). Supports all major LLM providers and custom endpoints.

3. Model Pool

Logical group of connectors serving a single role (e.g., high_quality_llms, cheap_fallback, embedding_engines).

4. Routing Policy

Declarative rule set determining which model(s) to call for incoming requests:

Percent Split: Distribute traffic by percentages (70/30, A/B tests)
Conditional: Route based on input attributes (language, length, topic)
Confidence Cascade: Try cheaper models first, escalate if confidence is low
Cost/Latency Based: Optimize for budget or speed
Ensemble: Call multiple models and aggregate responses

5. Ensemble Strategy

How to combine responses when multiple models are invoked:

Voting: Majority, plurality, or unanimous voting
Confidence-Weighted: Weight outputs by model confidence scores
Ranker: Use learned model to score and rank outputs
Synthesizer: Call a high-quality model to merge outputs
Diversity: Select diverse candidates for creative tasks
Earliest: Return first acceptable response

6. Fallback Policy

Automatic failover when primary model fails, times out, or exceeds budget. Define fallback chains with multiple backup options.

7. Circuit Breaker

Automatically disable failing connectors to prevent cascading failures. Auto-heal when service recovers.

8. Credential Vault

Secure, encrypted storage for API keys with audit logs, rotation policies, and RBAC.

🏗️ Architecture

High-Level Architecture

Architecture Diagram

The SDK follows a layered architecture:

API Layer: REST, GraphQL, and WebSocket interfaces for client applications
Core Orchestrator: Central coordination engine for request routing and execution
Middleware Layer: Caching, rate limiting, cost tracking, load balancing, circuit breakers, and retry logic
Policy Engine: Intelligent routing, security, and compliance policies
Provider Layer: Integration with multiple LLM providers (OpenAI, Anthropic, Cohere, Google, Groq, Together AI, etc.)
Storage Layer: Redis cache, PostgreSQL database, and MongoDB for persistent storage
Monitoring: Comprehensive logging, metrics, and observability

Request Processing Flow

Request Flow

Every request goes through a comprehensive processing pipeline:

Authentication: Verify API keys and user credentials
Rate Limiting: Check and enforce rate limits
Cache Check: Look for cached responses (3-tier cache)
Policy Evaluation: Apply routing and security policies
Provider Selection: Choose optimal model based on policies
Load Balancing: Distribute load across available providers
Circuit Breaker: Prevent requests to failing services
Provider Request: Execute request with retry logic

Policy Engine

The Policy Engine supports multiple policy types:

Routing Policies: Cost-based, latency-based, quality-based routing
Security Policies: Authentication, authorization, data filtering
Cost Policies: Budget limits, cost optimization, quota management
Rate Limit Policies: Per-user, per-model, per-tenant limits
Custom Policies: Business rules, compliance, user-defined logic

Caching Strategy

Multi-layer caching for optimal performance:

L1 (Memory Cache): In-process LRU cache (~10ms latency)
L2 (Redis Cache): Distributed cache (~50ms latency)
L3 (Semantic Cache): Similar query matching (~100ms latency)

Cloud Deployment

Deployment Architecture

Production-ready deployment on AWS with Kubernetes:

Load Balancing: Application Load Balancer with health checks
Kubernetes: Multi-replica services for high availability
Storage: Redis cluster, PostgreSQL RDS, S3, ElastiCache
Monitoring: Prometheus, Grafana, Jaeger, ELK Stack, AlertManager
CI/CD: GitHub Actions with automated testing and deployment
Infrastructure: Terraform for infrastructure as code

🔧 Configuration

Environment Variables

Create a .env file (see .env.example):

# Model API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_AI_API_KEY=...
AZURE_OPENAI_API_KEY=...
COHERE_API_KEY=...

# Security
CREDENTIAL_ENCRYPTION_KEY=<64-char-hex-string>

# Observability
ENABLE_TELEMETRY=true
METRICS_EXPORT_INTERVAL=60000

Generate encryption key:

import { CredentialVault } from 'model-orch-sdk';

const key = CredentialVault.generateKey();
console.log('CREDENTIAL_ENCRYPTION_KEY=' + key);

📖 API Documentation

Type Definitions

All types are exported from the main package:

import type {
  // Core
  Project,
  ModelConnector,
  ModelPool,
  RoutingPolicy,
  
  // Requests
  OrchestrationRequest,
  RequestPayload,
  RequestContext,
  
  // Responses
  OrchestrationResponse,
  ModelResponse,
  ExecutionTrace,
  
  // Configs
  PercentSplitConfig,
  ConditionalRoutingConfig,
  ConfidenceCascadeConfig,
  EnsembleRoutingConfig,
  
  // Enums
  ModelEndpointType,
  RoutingPolicyType,
  EnsembleStrategyType,
  ConnectorStatus,
  CircuitBreakerState,
} from 'model-orch-sdk';

Core Classes

CredentialVault

const vault = new CredentialVault({
  encryptionKey: string,
  enableAuditLog: boolean,
  maxAccessLogSize: number,
});

// Store credential
await vault.storeCredential({
  projectId: string,
  name: string,
  type: 'api_key' | 'oauth' | 'basic_auth' | 'custom',
  value: string,
  createdBy: string,
});

// Get credential
const apiKey = await vault.getCredential(id, userId, ipAddress);

// Rotate credential
await vault.rotateCredential(id, newValue, userId, ipAddress);

// Check rotation needed
const needsRotation = vault.needsRotation(id);

TokenBucketRateLimiter

const limiter = new TokenBucketRateLimiter(capacity, refillRate);

// Try to consume tokens
if (limiter.tryConsume(1)) {
  // Request allowed
} else {
  // Rate limit exceeded
}

// Get available tokens
const available = limiter.getAvailableTokens();

CircuitBreaker

const breaker = new CircuitBreaker(failureThreshold, successThreshold, resetTimeoutMs);

// Check if requests allowed
if (breaker.allowRequest()) {
  try {
    // Make request
    breaker.recordSuccess();
  } catch (error) {
    breaker.recordFailure();
  }
}

// Get state
const { state, failures, successes } = breaker.getState();

SimpleCache

const cache = new SimpleCache<Response>(maxSize, ttlSeconds);

// Set value
cache.set(key, value);

// Get value
const cached = cache.get(key);

// Check existence
if (cache.has(key)) {
  // Use cached value
}

🧪 Testing

The SDK includes comprehensive utilities for testing:

# Run all tests
npm test

# Watch mode
npm run test:watch

# Coverage report
npm run test:coverage

🛡️ Security

Encryption: All credentials encrypted at rest using AES-256-GCM
Audit Logs: Complete trail of credential access and modifications
RBAC: Role-based access control for projects and resources
Rotation: Automatic credential rotation with configurable policies
Data Residency: Control which vendors/regions can process data
No Plaintext: Credentials never logged or stored in plaintext

📊 Observability

The SDK provides comprehensive telemetry:

Request Tracing: Distributed traces for every orchestration request
Metrics: Latency (p50/p95/p99), error rates, token usage, costs
Audit Logs: Complete history of policy changes, credential access
Health Checks: Automatic monitoring of connector health
Circuit Breaker Events: Track when connectors open/close
Experiment Metrics: A/B test results and statistical significance

🤝 Contributing

Contributions are welcome! Please see our contribution guidelines:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

git clone https://github.com/gyash1512/model-orch-sdk.git
cd model-orch-sdk
npm install
npm run build
npm test

Code Style

TypeScript: Strict mode enabled, no any types allowed
Linting: ESLint with TypeScript rules
Formatting: Prettier for consistent code style
Testing: Jest for unit and integration tests

📄 License

MIT License - See LICENSE file for details.

🙏 Acknowledgments

OpenAI, Anthropic, Google, and other LLM providers for their APIs
The TypeScript community for excellent tooling
Contributors and users of this SDK

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: Support

Built with ❤️ using TypeScript

For detailed API documentation, examples, and guides, visit the documentation.