npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

model-orch-sdk

v0.1.1

Published

Multi-Model Routing & Orchestration Platform - Build secure, extensible orchestration for LLMs with routing strategies, ensemble logic, fallbacks, and monitoring

Readme

🎯 model-orch-sdk

model-orch-sdk Banner

npm version License Node TypeScript

Enterprise-Grade Multi-Model Orchestration - Route requests intelligently across 19+ LLM providers with sophisticated strategies, ensemble logic, automatic failover, cost optimization, and production-ready observability.


🔍 Overview

model-orch-sdk is a comprehensive TypeScript SDK for orchestrating multiple LLM and AI model endpoints. It enables teams to register many model providers (OpenAI, Anthropic, Google, Azure, custom endpoints), configure sophisticated routing and ensemble strategies, manage credentials securely, and operate models reliably with monitoring, cost controls, fallback policies, and A/B/canary experiments.

Key Capabilities

  • 🔀 Multi-Model Support: Register any LLM API (OpenAI, Anthropic, Google, Azure, Cohere, HuggingFace, Replicate, custom HTTP/gRPC, local models)
  • 🎯 Intelligent Routing: Percent splits, conditional routing, confidence cascading, cost/latency optimization
  • 🤝 Ensemble Strategies: Voting, confidence-weighted aggregation, rankers, synthesizers
  • 🔄 Reliability: Circuit breakers, automatic fallbacks, retry with exponential backoff
  • 💰 Cost Control: Per-request and per-project budgets, cost estimation, quota management
  • 📊 Observability: Distributed tracing, metrics, telemetry, audit logs
  • 🔐 Security: Encrypted credential vault, RBAC, data residency rules, rotation policies
  • 🧪 Experimentation: A/B tests, canary deployments, traffic splits with sticky sessions
  • ⚡ Performance: Rate limiting, token buckets, caching, concurrent request management

📋 Prerequisites

  • Node.js 18 or higher
  • TypeScript 5.0+ (for development)

🚀 Installation

npm install model-orch-sdk

Or with yarn:

yarn add model-orch-sdk

🎮 Quick Start

Basic Usage

import { 
  CredentialVault, 
  TokenBucketRateLimiter, 
  CircuitBreaker 
} from 'model-orch-sdk';

// Initialize credential vault
const vault = new CredentialVault({
  encryptionKey: process.env.ENCRYPTION_KEY!,
  enableAuditLog: true,
  maxAccessLogSize: 1000,
});

// Store API credentials
await vault.storeCredential({
  projectId: 'proj_123',
  name: 'OpenAI Key',
  type: 'api_key',
  value: process.env.OPENAI_API_KEY!,
  createdBy: 'admin',
});

// Set up rate limiting
const rateLimiter = new TokenBucketRateLimiter(
  100,  // capacity
  10    // refill rate per second
);

// Create circuit breaker
const breaker = new CircuitBreaker(
  5,     // failure threshold
  2,     // success threshold
  30000  // reset timeout (30s)
);

Routing Configuration

import type { 
  PercentSplitConfig, 
  RoutingPolicyType 
} from 'model-orch-sdk';

// Define a percent split routing policy
const routingPolicy: PercentSplitConfig = {
  type: RoutingPolicyType.PERCENT_SPLIT,
  splits: [
    { connectorId: 'gpt-4', percentage: 70, weight: 0.7 },
    { connectorId: 'claude-3', percentage: 30, weight: 0.3 },
  ],
  stickySession: true,
  sessionKey: 'userId',
};

Ensemble Strategy

import type { 
  EnsembleRoutingConfig, 
  VotingConfig,
  EnsembleStrategyType 
} from 'model-orch-sdk';

// Configure ensemble with voting
const ensembleConfig: EnsembleRoutingConfig = {
  type: RoutingPolicyType.ENSEMBLE,
  connectorIds: ['gpt-4', 'claude-3', 'gemini-pro'],
  executionMode: 'parallel',
  aggregationStrategy: EnsembleStrategyType.VOTING,
  aggregationConfig: {
    type: EnsembleStrategyType.VOTING,
    votingMethod: 'majority',
    normalization: 'semantic',
    tieBreaker: 'highest_confidence',
  } as VotingConfig,
  timeout: 30000,
  minSuccessful: 2,
};

📚 Core Concepts

1. Project

Top-level workspace grouping models, policies, quotas, and telemetry. Each organization can have multiple projects with isolated configurations.

2. Model Connector

Registered endpoint with metadata (type, endpoint URL, credentials, rate limits, cost info). Supports all major LLM providers and custom endpoints.

3. Model Pool

Logical group of connectors serving a single role (e.g., high_quality_llms, cheap_fallback, embedding_engines).

4. Routing Policy

Declarative rule set determining which model(s) to call for incoming requests:

  • Percent Split: Distribute traffic by percentages (70/30, A/B tests)
  • Conditional: Route based on input attributes (language, length, topic)
  • Confidence Cascade: Try cheaper models first, escalate if confidence is low
  • Cost/Latency Based: Optimize for budget or speed
  • Ensemble: Call multiple models and aggregate responses

5. Ensemble Strategy

How to combine responses when multiple models are invoked:

  • Voting: Majority, plurality, or unanimous voting
  • Confidence-Weighted: Weight outputs by model confidence scores
  • Ranker: Use learned model to score and rank outputs
  • Synthesizer: Call a high-quality model to merge outputs
  • Diversity: Select diverse candidates for creative tasks
  • Earliest: Return first acceptable response

6. Fallback Policy

Automatic failover when primary model fails, times out, or exceeds budget. Define fallback chains with multiple backup options.

7. Circuit Breaker

Automatically disable failing connectors to prevent cascading failures. Auto-heal when service recovers.

8. Credential Vault

Secure, encrypted storage for API keys with audit logs, rotation policies, and RBAC.


🏗️ Architecture

High-Level Architecture

Architecture Diagram

The SDK follows a layered architecture:

  • API Layer: REST, GraphQL, and WebSocket interfaces for client applications
  • Core Orchestrator: Central coordination engine for request routing and execution
  • Middleware Layer: Caching, rate limiting, cost tracking, load balancing, circuit breakers, and retry logic
  • Policy Engine: Intelligent routing, security, and compliance policies
  • Provider Layer: Integration with multiple LLM providers (OpenAI, Anthropic, Cohere, Google, Groq, Together AI, etc.)
  • Storage Layer: Redis cache, PostgreSQL database, and MongoDB for persistent storage
  • Monitoring: Comprehensive logging, metrics, and observability

Request Processing Flow

Request Flow

Every request goes through a comprehensive processing pipeline:

  1. Authentication: Verify API keys and user credentials
  2. Rate Limiting: Check and enforce rate limits
  3. Cache Check: Look for cached responses (3-tier cache)
  4. Policy Evaluation: Apply routing and security policies
  5. Provider Selection: Choose optimal model based on policies
  6. Load Balancing: Distribute load across available providers
  7. Circuit Breaker: Prevent requests to failing services
  8. Provider Request: Execute request with retry logic

Policy Engine

Policy Engine

The Policy Engine supports multiple policy types:

  • Routing Policies: Cost-based, latency-based, quality-based routing
  • Security Policies: Authentication, authorization, data filtering
  • Cost Policies: Budget limits, cost optimization, quota management
  • Rate Limit Policies: Per-user, per-model, per-tenant limits
  • Custom Policies: Business rules, compliance, user-defined logic

Caching Strategy

Caching Strategy

Multi-layer caching for optimal performance:

  • L1 (Memory Cache): In-process LRU cache (~10ms latency)
  • L2 (Redis Cache): Distributed cache (~50ms latency)
  • L3 (Semantic Cache): Similar query matching (~100ms latency)

Cloud Deployment

Deployment Architecture

Production-ready deployment on AWS with Kubernetes:

  • Load Balancing: Application Load Balancer with health checks
  • Kubernetes: Multi-replica services for high availability
  • Storage: Redis cluster, PostgreSQL RDS, S3, ElastiCache
  • Monitoring: Prometheus, Grafana, Jaeger, ELK Stack, AlertManager
  • CI/CD: GitHub Actions with automated testing and deployment
  • Infrastructure: Terraform for infrastructure as code

🔧 Configuration

Environment Variables

Create a .env file (see .env.example):

# Model API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_AI_API_KEY=...
AZURE_OPENAI_API_KEY=...
COHERE_API_KEY=...

# Security
CREDENTIAL_ENCRYPTION_KEY=<64-char-hex-string>

# Observability
ENABLE_TELEMETRY=true
METRICS_EXPORT_INTERVAL=60000

Generate encryption key:

import { CredentialVault } from 'model-orch-sdk';

const key = CredentialVault.generateKey();
console.log('CREDENTIAL_ENCRYPTION_KEY=' + key);

📖 API Documentation

Type Definitions

All types are exported from the main package:

import type {
  // Core
  Project,
  ModelConnector,
  ModelPool,
  RoutingPolicy,
  
  // Requests
  OrchestrationRequest,
  RequestPayload,
  RequestContext,
  
  // Responses
  OrchestrationResponse,
  ModelResponse,
  ExecutionTrace,
  
  // Configs
  PercentSplitConfig,
  ConditionalRoutingConfig,
  ConfidenceCascadeConfig,
  EnsembleRoutingConfig,
  
  // Enums
  ModelEndpointType,
  RoutingPolicyType,
  EnsembleStrategyType,
  ConnectorStatus,
  CircuitBreakerState,
} from 'model-orch-sdk';

Core Classes

CredentialVault

const vault = new CredentialVault({
  encryptionKey: string,
  enableAuditLog: boolean,
  maxAccessLogSize: number,
});

// Store credential
await vault.storeCredential({
  projectId: string,
  name: string,
  type: 'api_key' | 'oauth' | 'basic_auth' | 'custom',
  value: string,
  createdBy: string,
});

// Get credential
const apiKey = await vault.getCredential(id, userId, ipAddress);

// Rotate credential
await vault.rotateCredential(id, newValue, userId, ipAddress);

// Check rotation needed
const needsRotation = vault.needsRotation(id);

TokenBucketRateLimiter

const limiter = new TokenBucketRateLimiter(capacity, refillRate);

// Try to consume tokens
if (limiter.tryConsume(1)) {
  // Request allowed
} else {
  // Rate limit exceeded
}

// Get available tokens
const available = limiter.getAvailableTokens();

CircuitBreaker

const breaker = new CircuitBreaker(failureThreshold, successThreshold, resetTimeoutMs);

// Check if requests allowed
if (breaker.allowRequest()) {
  try {
    // Make request
    breaker.recordSuccess();
  } catch (error) {
    breaker.recordFailure();
  }
}

// Get state
const { state, failures, successes } = breaker.getState();

SimpleCache

const cache = new SimpleCache<Response>(maxSize, ttlSeconds);

// Set value
cache.set(key, value);

// Get value
const cached = cache.get(key);

// Check existence
if (cache.has(key)) {
  // Use cached value
}

🧪 Testing

The SDK includes comprehensive utilities for testing:

# Run all tests
npm test

# Watch mode
npm run test:watch

# Coverage report
npm run test:coverage

🛡️ Security

  • Encryption: All credentials encrypted at rest using AES-256-GCM
  • Audit Logs: Complete trail of credential access and modifications
  • RBAC: Role-based access control for projects and resources
  • Rotation: Automatic credential rotation with configurable policies
  • Data Residency: Control which vendors/regions can process data
  • No Plaintext: Credentials never logged or stored in plaintext

📊 Observability

The SDK provides comprehensive telemetry:

  • Request Tracing: Distributed traces for every orchestration request
  • Metrics: Latency (p50/p95/p99), error rates, token usage, costs
  • Audit Logs: Complete history of policy changes, credential access
  • Health Checks: Automatic monitoring of connector health
  • Circuit Breaker Events: Track when connectors open/close
  • Experiment Metrics: A/B test results and statistical significance

🤝 Contributing

Contributions are welcome! Please see our contribution guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Setup

git clone https://github.com/gyash1512/model-orch-sdk.git
cd model-orch-sdk
npm install
npm run build
npm test

Code Style

  • TypeScript: Strict mode enabled, no any types allowed
  • Linting: ESLint with TypeScript rules
  • Formatting: Prettier for consistent code style
  • Testing: Jest for unit and integration tests

📄 License

MIT License - See LICENSE file for details.


🙏 Acknowledgments

  • OpenAI, Anthropic, Google, and other LLM providers for their APIs
  • The TypeScript community for excellent tooling
  • Contributors and users of this SDK

📞 Support


Built with ❤️ using TypeScript

For detailed API documentation, examples, and guides, visit the documentation.