model-orch-sdk
v0.1.1
Published
Multi-Model Routing & Orchestration Platform - Build secure, extensible orchestration for LLMs with routing strategies, ensemble logic, fallbacks, and monitoring
Maintainers
Readme
🎯 model-orch-sdk
Enterprise-Grade Multi-Model Orchestration - Route requests intelligently across 19+ LLM providers with sophisticated strategies, ensemble logic, automatic failover, cost optimization, and production-ready observability.
🔍 Overview
model-orch-sdk is a comprehensive TypeScript SDK for orchestrating multiple LLM and AI model endpoints. It enables teams to register many model providers (OpenAI, Anthropic, Google, Azure, custom endpoints), configure sophisticated routing and ensemble strategies, manage credentials securely, and operate models reliably with monitoring, cost controls, fallback policies, and A/B/canary experiments.
Key Capabilities
- 🔀 Multi-Model Support: Register any LLM API (OpenAI, Anthropic, Google, Azure, Cohere, HuggingFace, Replicate, custom HTTP/gRPC, local models)
- 🎯 Intelligent Routing: Percent splits, conditional routing, confidence cascading, cost/latency optimization
- 🤝 Ensemble Strategies: Voting, confidence-weighted aggregation, rankers, synthesizers
- 🔄 Reliability: Circuit breakers, automatic fallbacks, retry with exponential backoff
- 💰 Cost Control: Per-request and per-project budgets, cost estimation, quota management
- 📊 Observability: Distributed tracing, metrics, telemetry, audit logs
- 🔐 Security: Encrypted credential vault, RBAC, data residency rules, rotation policies
- 🧪 Experimentation: A/B tests, canary deployments, traffic splits with sticky sessions
- ⚡ Performance: Rate limiting, token buckets, caching, concurrent request management
📋 Prerequisites
- Node.js 18 or higher
- TypeScript 5.0+ (for development)
🚀 Installation
npm install model-orch-sdkOr with yarn:
yarn add model-orch-sdk🎮 Quick Start
Basic Usage
import {
CredentialVault,
TokenBucketRateLimiter,
CircuitBreaker
} from 'model-orch-sdk';
// Initialize credential vault
const vault = new CredentialVault({
encryptionKey: process.env.ENCRYPTION_KEY!,
enableAuditLog: true,
maxAccessLogSize: 1000,
});
// Store API credentials
await vault.storeCredential({
projectId: 'proj_123',
name: 'OpenAI Key',
type: 'api_key',
value: process.env.OPENAI_API_KEY!,
createdBy: 'admin',
});
// Set up rate limiting
const rateLimiter = new TokenBucketRateLimiter(
100, // capacity
10 // refill rate per second
);
// Create circuit breaker
const breaker = new CircuitBreaker(
5, // failure threshold
2, // success threshold
30000 // reset timeout (30s)
);Routing Configuration
import type {
PercentSplitConfig,
RoutingPolicyType
} from 'model-orch-sdk';
// Define a percent split routing policy
const routingPolicy: PercentSplitConfig = {
type: RoutingPolicyType.PERCENT_SPLIT,
splits: [
{ connectorId: 'gpt-4', percentage: 70, weight: 0.7 },
{ connectorId: 'claude-3', percentage: 30, weight: 0.3 },
],
stickySession: true,
sessionKey: 'userId',
};Ensemble Strategy
import type {
EnsembleRoutingConfig,
VotingConfig,
EnsembleStrategyType
} from 'model-orch-sdk';
// Configure ensemble with voting
const ensembleConfig: EnsembleRoutingConfig = {
type: RoutingPolicyType.ENSEMBLE,
connectorIds: ['gpt-4', 'claude-3', 'gemini-pro'],
executionMode: 'parallel',
aggregationStrategy: EnsembleStrategyType.VOTING,
aggregationConfig: {
type: EnsembleStrategyType.VOTING,
votingMethod: 'majority',
normalization: 'semantic',
tieBreaker: 'highest_confidence',
} as VotingConfig,
timeout: 30000,
minSuccessful: 2,
};📚 Core Concepts
1. Project
Top-level workspace grouping models, policies, quotas, and telemetry. Each organization can have multiple projects with isolated configurations.
2. Model Connector
Registered endpoint with metadata (type, endpoint URL, credentials, rate limits, cost info). Supports all major LLM providers and custom endpoints.
3. Model Pool
Logical group of connectors serving a single role (e.g., high_quality_llms, cheap_fallback, embedding_engines).
4. Routing Policy
Declarative rule set determining which model(s) to call for incoming requests:
- Percent Split: Distribute traffic by percentages (70/30, A/B tests)
- Conditional: Route based on input attributes (language, length, topic)
- Confidence Cascade: Try cheaper models first, escalate if confidence is low
- Cost/Latency Based: Optimize for budget or speed
- Ensemble: Call multiple models and aggregate responses
5. Ensemble Strategy
How to combine responses when multiple models are invoked:
- Voting: Majority, plurality, or unanimous voting
- Confidence-Weighted: Weight outputs by model confidence scores
- Ranker: Use learned model to score and rank outputs
- Synthesizer: Call a high-quality model to merge outputs
- Diversity: Select diverse candidates for creative tasks
- Earliest: Return first acceptable response
6. Fallback Policy
Automatic failover when primary model fails, times out, or exceeds budget. Define fallback chains with multiple backup options.
7. Circuit Breaker
Automatically disable failing connectors to prevent cascading failures. Auto-heal when service recovers.
8. Credential Vault
Secure, encrypted storage for API keys with audit logs, rotation policies, and RBAC.
🏗️ Architecture
High-Level Architecture
The SDK follows a layered architecture:
- API Layer: REST, GraphQL, and WebSocket interfaces for client applications
- Core Orchestrator: Central coordination engine for request routing and execution
- Middleware Layer: Caching, rate limiting, cost tracking, load balancing, circuit breakers, and retry logic
- Policy Engine: Intelligent routing, security, and compliance policies
- Provider Layer: Integration with multiple LLM providers (OpenAI, Anthropic, Cohere, Google, Groq, Together AI, etc.)
- Storage Layer: Redis cache, PostgreSQL database, and MongoDB for persistent storage
- Monitoring: Comprehensive logging, metrics, and observability
Request Processing Flow
Every request goes through a comprehensive processing pipeline:
- Authentication: Verify API keys and user credentials
- Rate Limiting: Check and enforce rate limits
- Cache Check: Look for cached responses (3-tier cache)
- Policy Evaluation: Apply routing and security policies
- Provider Selection: Choose optimal model based on policies
- Load Balancing: Distribute load across available providers
- Circuit Breaker: Prevent requests to failing services
- Provider Request: Execute request with retry logic
Policy Engine
The Policy Engine supports multiple policy types:
- Routing Policies: Cost-based, latency-based, quality-based routing
- Security Policies: Authentication, authorization, data filtering
- Cost Policies: Budget limits, cost optimization, quota management
- Rate Limit Policies: Per-user, per-model, per-tenant limits
- Custom Policies: Business rules, compliance, user-defined logic
Caching Strategy
Multi-layer caching for optimal performance:
- L1 (Memory Cache): In-process LRU cache (~10ms latency)
- L2 (Redis Cache): Distributed cache (~50ms latency)
- L3 (Semantic Cache): Similar query matching (~100ms latency)
Cloud Deployment
Production-ready deployment on AWS with Kubernetes:
- Load Balancing: Application Load Balancer with health checks
- Kubernetes: Multi-replica services for high availability
- Storage: Redis cluster, PostgreSQL RDS, S3, ElastiCache
- Monitoring: Prometheus, Grafana, Jaeger, ELK Stack, AlertManager
- CI/CD: GitHub Actions with automated testing and deployment
- Infrastructure: Terraform for infrastructure as code
🔧 Configuration
Environment Variables
Create a .env file (see .env.example):
# Model API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_AI_API_KEY=...
AZURE_OPENAI_API_KEY=...
COHERE_API_KEY=...
# Security
CREDENTIAL_ENCRYPTION_KEY=<64-char-hex-string>
# Observability
ENABLE_TELEMETRY=true
METRICS_EXPORT_INTERVAL=60000Generate encryption key:
import { CredentialVault } from 'model-orch-sdk';
const key = CredentialVault.generateKey();
console.log('CREDENTIAL_ENCRYPTION_KEY=' + key);📖 API Documentation
Type Definitions
All types are exported from the main package:
import type {
// Core
Project,
ModelConnector,
ModelPool,
RoutingPolicy,
// Requests
OrchestrationRequest,
RequestPayload,
RequestContext,
// Responses
OrchestrationResponse,
ModelResponse,
ExecutionTrace,
// Configs
PercentSplitConfig,
ConditionalRoutingConfig,
ConfidenceCascadeConfig,
EnsembleRoutingConfig,
// Enums
ModelEndpointType,
RoutingPolicyType,
EnsembleStrategyType,
ConnectorStatus,
CircuitBreakerState,
} from 'model-orch-sdk';Core Classes
CredentialVault
const vault = new CredentialVault({
encryptionKey: string,
enableAuditLog: boolean,
maxAccessLogSize: number,
});
// Store credential
await vault.storeCredential({
projectId: string,
name: string,
type: 'api_key' | 'oauth' | 'basic_auth' | 'custom',
value: string,
createdBy: string,
});
// Get credential
const apiKey = await vault.getCredential(id, userId, ipAddress);
// Rotate credential
await vault.rotateCredential(id, newValue, userId, ipAddress);
// Check rotation needed
const needsRotation = vault.needsRotation(id);TokenBucketRateLimiter
const limiter = new TokenBucketRateLimiter(capacity, refillRate);
// Try to consume tokens
if (limiter.tryConsume(1)) {
// Request allowed
} else {
// Rate limit exceeded
}
// Get available tokens
const available = limiter.getAvailableTokens();CircuitBreaker
const breaker = new CircuitBreaker(failureThreshold, successThreshold, resetTimeoutMs);
// Check if requests allowed
if (breaker.allowRequest()) {
try {
// Make request
breaker.recordSuccess();
} catch (error) {
breaker.recordFailure();
}
}
// Get state
const { state, failures, successes } = breaker.getState();SimpleCache
const cache = new SimpleCache<Response>(maxSize, ttlSeconds);
// Set value
cache.set(key, value);
// Get value
const cached = cache.get(key);
// Check existence
if (cache.has(key)) {
// Use cached value
}🧪 Testing
The SDK includes comprehensive utilities for testing:
# Run all tests
npm test
# Watch mode
npm run test:watch
# Coverage report
npm run test:coverage🛡️ Security
- Encryption: All credentials encrypted at rest using AES-256-GCM
- Audit Logs: Complete trail of credential access and modifications
- RBAC: Role-based access control for projects and resources
- Rotation: Automatic credential rotation with configurable policies
- Data Residency: Control which vendors/regions can process data
- No Plaintext: Credentials never logged or stored in plaintext
📊 Observability
The SDK provides comprehensive telemetry:
- Request Tracing: Distributed traces for every orchestration request
- Metrics: Latency (p50/p95/p99), error rates, token usage, costs
- Audit Logs: Complete history of policy changes, credential access
- Health Checks: Automatic monitoring of connector health
- Circuit Breaker Events: Track when connectors open/close
- Experiment Metrics: A/B test results and statistical significance
🤝 Contributing
Contributions are welcome! Please see our contribution guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Setup
git clone https://github.com/gyash1512/model-orch-sdk.git
cd model-orch-sdk
npm install
npm run build
npm testCode Style
- TypeScript: Strict mode enabled, no
anytypes allowed - Linting: ESLint with TypeScript rules
- Formatting: Prettier for consistent code style
- Testing: Jest for unit and integration tests
📄 License
MIT License - See LICENSE file for details.
🙏 Acknowledgments
- OpenAI, Anthropic, Google, and other LLM providers for their APIs
- The TypeScript community for excellent tooling
- Contributors and users of this SDK
📞 Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: Support
Built with ❤️ using TypeScript
For detailed API documentation, examples, and guides, visit the documentation.
