@cascadeflow/core
v1.0.1
Published
cascadeflow TypeScript library - Smart AI model cascading for cost optimization
Readme
@cascadeflow/core
TypeScript/JavaScript library for cascadeflow
Smart AI model cascading for cost optimization.
Save 40-85% on LLM costs with intelligent model routing. Available for Node.js, browser, and edge environments.
Installation
npm install @cascadeflow/core
# or
pnpm add @cascadeflow/core
# or
yarn add @cascadeflow/coreQuick Start
Recommended Setup (Claude Haiku + GPT-5)
import { CascadeAgent } from '@cascadeflow/core';
const agent = new CascadeAgent({
models: [
{
name: 'claude-haiku-4-5',
provider: 'anthropic',
cost: 0.001 // Fast, high-quality drafter
},
{
name: 'gpt-5',
provider: 'openai',
cost: 0.00125 // Superior reasoning verifier (50% cheaper than GPT-4o!)
}
]
});
const result = await agent.run('What is artificial intelligence?');
console.log(result.content);
console.log(`Cost: $${result.totalCost}`);
console.log(`Savings: ${result.savingsPercentage}%`);Quality Configuration
Control when the cascade uses the drafter vs. verifier with quality thresholds:
// Recommended: Complexity-aware thresholds
const agent = new CascadeAgent({
models: [
{ name: 'claude-haiku-4-5', provider: 'anthropic', cost: 0.001 },
{ name: 'gpt-5', provider: 'openai', cost: 0.00125 }
],
quality: {
confidenceThresholds: {
simple: 0.6, // "What is Python?" - Accept 60%+ confidence
moderate: 0.7, // "Compare Python vs Java" - Accept 70%+
hard: 0.8, // "Analyze quantum computing" - Accept 80%+
expert: 0.85 // "Implement distributed cache" - Accept 85%+
}
}
});Quick Configuration Options:
// Option 1: Use CASCADE_QUALITY_CONFIG (optimized for 50-60% acceptance)
import { CascadeAgent, CASCADE_QUALITY_CONFIG } from '@cascadeflow/core';
const agent = new CascadeAgent({
models: [...],
quality: CASCADE_QUALITY_CONFIG // Lower threshold (0.40) = more cost savings
});
// Option 2: Simple flat threshold
const agent = new CascadeAgent({
models: [...],
quality: {
threshold: 0.7, // 70% confidence required (default)
requireMinimumTokens: 10 // Minimum response length
}
});
// Option 3: Use defaults (no quality config needed)
const agent = new CascadeAgent({
models: [...]
// Automatically uses threshold: 0.7
});When to adjust:
- Lower thresholds (0.4-0.6): More drafts accepted → higher cost savings, slightly lower quality
- Higher thresholds (0.8-0.9): Fewer drafts accepted → lower savings, maximum quality
- Complexity-aware: Best balance → adjusts automatically based on query difficulty
⚠️ GPT-5 Requires Organization Verification
To use GPT-5, your OpenAI organization must be verified:
- Go to https://platform.openai.com/settings/organization/general
- Click "Verify Organization"
- Wait ~15 minutes for access to propagate
Works immediately: The cascade above works right away! Claude Haiku handles 75% of queries, GPT-5 only called when needed.
📝 Model Naming
Both naming conventions work with CascadeFlow:
claude-haiku-4-5(used in presets, recommended)claude-3-5-haiku-20241022(Anthropic API format)The library accepts both formats and routes them correctly.
OpenAI Only
const agent = new CascadeAgent({
models: [
{ name: 'gpt-4o-mini', provider: 'openai', cost: 0.00015 },
{ name: 'gpt-5', provider: 'openai', cost: 0.00125 } // Requires org verification
]
});Even Easier: Use Presets
No configuration needed - just import a preset and go:
import { CascadeAgent, PRESET_ULTRA_FAST, PRESET_BEST_OVERALL } from '@cascadeflow/core';
// Ultra-fast with Groq (5-10x faster than OpenAI)
const agent = new CascadeAgent(PRESET_ULTRA_FAST);
// Or best overall (Claude Haiku + GPT-4o-mini)
const agent = new CascadeAgent(PRESET_BEST_OVERALL);
const result = await agent.run('Your query here');Available Presets:
| Preset | Best For | Speed | Cost/Query | API Keys |
|--------|----------|-------|-----------|----------|
| PRESET_BEST_OVERALL | Most use cases | Fast (~2-3s) | ~$0.0008 | Anthropic + OpenAI |
| PRESET_ULTRA_FAST | Real-time apps | Ultra-fast (~1-2s) | ~$0.0002 | Groq |
| PRESET_ULTRA_CHEAP | High volume | Very fast (~1-3s) | ~$0.00008 | Groq + OpenAI |
| PRESET_OPENAI_ONLY | Single provider | Fast (~2-4s) | ~$0.0004 | OpenAI |
| PRESET_ANTHROPIC_ONLY | Claude fans | Fast (~2-3s) | ~$0.002 | Anthropic |
| PRESET_FREE_LOCAL | Privacy/offline | Moderate (~3-5s) | $0 (free) | None (Ollama) |
Custom Presets:
import { CascadeAgent, createPreset } from '@cascadeflow/core';
const agent = new CascadeAgent(
createPreset({
quality: 'strict', // 'cost-optimized' | 'balanced' | 'strict'
performance: 'fast', // 'fast' | 'balanced' | 'reliable'
includePremium: true // Add premium tier (gpt-4o)
})
);Advanced Features
🎚️ Quality Profiles
Control quality validation with predefined profiles optimized for different use cases:
import { CascadeAgent, QualityValidator } from '@cascadeflow/core';
// Strict Mode: Maximum quality with semantic validation
const strictAgent = new CascadeAgent({
models: [...],
cascade: {
enabled: true,
qualityConfig: {
useProductionConfidence: true,
strictMode: true,
useSemanticValidation: true,
minConfidence: 0.85,
provider: 'openai',
},
},
});
// Or use factory methods
const strictValidator = QualityValidator.strict();
const prodValidator = QualityValidator.forProduction(); // Multi-signal confidence
const devValidator = QualityValidator.forDevelopment(); // Lenient for testing
const cascadeValidator = QualityValidator.forCascade(); // Optimized for 50-60% acceptance
const permissiveValidator = QualityValidator.permissive(); // Maximum throughputAvailable Profiles:
- Strict: 85% confidence + semantic validation (maximum quality)
- Production: 70% confidence with multi-signal estimation (balanced)
- Development: 50% confidence, minimal word count (fast iteration)
- Cascade: 40% confidence, optimized for cost savings (50-60% draft acceptance)
- Permissive: 30% confidence, maximum throughput (highest savings)
📡 Telemetry & Callbacks
Monitor cascade operations with event-driven callbacks:
import { CascadeAgent, CallbackManager, CallbackEvent } from '@cascadeflow/core';
const callbackManager = new CallbackManager(true); // verbose=true
// Track query lifecycle
callbackManager.register(CallbackEvent.QUERY_START, (data) => {
console.log(`Query started: "${data.query}"`);
});
callbackManager.register(CallbackEvent.COMPLEXITY_DETECTED, (data) => {
console.log(`Complexity: ${data.data.complexity} (confidence: ${data.data.confidence})`);
});
callbackManager.register(CallbackEvent.DRAFT_ACCEPTED, (data) => {
console.log(`Draft accepted! Savings: $${data.data.savings}`);
});
const agent = new CascadeAgent({
models: [...],
callbacks: callbackManager,
cascade: { enabled: true },
});Available Events:
QUERY_START/QUERY_COMPLETE- Query lifecycleCOMPLEXITY_DETECTED- Query complexity analysisCASCADE_DECISION- Routing decisionsQUALITY_VALIDATION- Quality checksDRAFT_ACCEPTED/DRAFT_REJECTED- Draft outcomesVERIFIER_CALLED- Verifier invocations
📦 Batch Processing
Process multiple queries with progress tracking and analytics:
const queries = [
'What is TypeScript?',
'Explain async/await.',
'What are design patterns?',
];
const batchResult = await agent.runBatch(queries, {
strategy: BatchStrategy.SEQUENTIAL,
continueOnError: true,
onProgress: (completed, total, currentQuery) => {
console.log(`[${(completed/total*100).toFixed(0)}%] ${completed}/${total}`);
},
});
// Analyze results
console.log(`Success rate: ${(batchResult.successCount / queries.length * 100).toFixed(1)}%`);
console.log(`Total cost: $${batchResult.results.reduce((sum, r) => sum + (r.result?.totalCost || 0), 0)}`);
console.log(`Draft acceptance: ${batchResult.results.filter(r => r.result?.draftAccepted).length}`);🔀 Router Integration
Intelligent routing with complexity analysis and capability filtering:
// PreRouter: Automatically routes based on query complexity
const simpleResult = await agent.run('What is 2 + 2?');
// → Uses draft model (simple query)
const complexResult = await agent.run(
'Explain quantum computing theory with recent research references.'
);
// → Routes directly to best model (complex query)
// ToolRouter: Filters to tool-capable models
// Use a strict parser helper (see examples/nodejs/safe-math.ts).
const calculatorTool = createTool({
name: 'calculator',
description: 'Perform calculations',
function: async ({ expression }) => safeCalculateExpression(expression),
});
const toolResult = await agent.run('Calculate 125 * 47', {
tools: [calculatorTool],
});
// → Automatically excludes models without tool support
// Get router statistics
const stats = agent.getRouterStats();
console.log(stats.preRouter); // Complexity-based routing stats
console.log(stats.toolRouter); // Tool filtering stats👤 User Profiles & Workflows
Manage user tiers, budgets, and optimization preferences:
import { createUserProfile, createWorkflowProfile, TIER_PRESETS, WORKFLOW_PRESETS } from '@cascadeflow/core';
// Tier-based profiles
const freeProfile = createUserProfile({
tier: TIER_PRESETS.free, // Max budget: $0.01, Quality: 0.60
});
const premiumProfile = createUserProfile({
tier: TIER_PRESETS.premium, // Max budget: $0.10, Quality: 0.80
});
// Custom profile with optimization weights
const customProfile = createUserProfile({
tier: { name: 'custom', maxBudget: 0.05, qualityThreshold: 0.75 },
optimizationWeights: {
cost: 0.5, // 50% weight on cost
speed: 0.3, // 30% weight on speed
quality: 0.2, // 20% weight on quality
},
});
// Latency-aware profiles
const lowLatencyProfile = createUserProfile({
tier: TIER_PRESETS.premium,
latencyProfile: {
maxTotalMs: 2000, // 2 second total limit
maxPerModelMs: 1000, // 1 second per model
preferParallel: true, // Prefer parallel execution
skipCascadeThreshold: 1500,
},
});
// Use with agent
const agent = new CascadeAgent({
models: [...],
profile: premiumProfile,
cascade: { enabled: true },
});Workflow Presets:
WORKFLOW_PRESETS.production- High quality, reasonable latencyWORKFLOW_PRESETS.realtime- Ultra-low latency, single modelWORKFLOW_PRESETS.batch- Maximum throughput, relaxed constraints
🏭 Factory Methods
Simplified agent creation with auto-configuration:
import { CascadeAgent } from '@cascadeflow/core';
// Auto-detect providers from environment variables
const envAgent = CascadeAgent.fromEnv({
quality: 'production', // 'strict' | 'production' | 'development'
});
// Checks for: OPENAI_API_KEY, ANTHROPIC_API_KEY, GROQ_API_KEY, etc.
// Create from user profile
const profileAgent = CascadeAgent.fromProfile({
profile: premiumProfile,
preferredModels: ['gpt-4o-mini', 'gpt-4o', 'claude-3-5-haiku-20241022'],
});
// Traditional manual configuration (full control)
const manualAgent = new CascadeAgent({
models: [
{ name: 'gpt-4o-mini', provider: 'openai', cost: 0.00015 },
{ name: 'gpt-4o', provider: 'openai', cost: 0.00625 },
],
cascade: { enabled: true },
});Benefits:
fromEnv(): Auto-detects available providers, sensible defaultsfromProfile(): Multi-tenant applications, per-user configuration- Manual config: Full control for production requirements
🌊 Enhanced Streaming
Event-driven streaming with real-time progress:
const stream = agent.streamEvents('What is TypeScript?', {
forceDirect: true,
});
for await (const event of stream) {
switch (event.type) {
case StreamEventType.START:
console.log(`Streaming from: ${event.data.model}`);
break;
case StreamEventType.CHUNK:
process.stdout.write(event.data.content);
break;
case StreamEventType.COMPLETE:
console.log(`\nCost: $${event.data.totalCost?.toFixed(6)}`);
console.log(`Time: ${event.data.timing?.total}ms`);
break;
case StreamEventType.ERROR:
console.error(`Error: ${event.data.error}`);
break;
}
}
// Or collect the full result
import { collectResult } from '@cascadeflow/core';
const result = await collectResult(stream);
console.log(`Content: ${result.content}`);
console.log(`Model: ${result.modelUsed}`);Use Cases:
- Interactive chat applications
- Real-time content generation
- Progressive content display
- Long-form content (articles, essays)
Features
- 🎯 Smart Cascading: Automatically tries smaller models first
- 💰 Cost Optimization: Save 40-85% on LLM costs
- ⚡ Fast: 2-10x faster responses with small models
- 🔀 Multi-Provider: OpenAI, Anthropic, Groq, and more
- ✅ Quality Validation: Multi-signal confidence with semantic analysis
- 📊 Telemetry: Event-driven monitoring with callbacks
- 📦 Batch Processing: Sequential processing with analytics
- 🔀 Intelligent Routing: Complexity-based and capability-aware
- 👤 User Profiles: Tier-based access control and budgets
- 🌊 Enhanced Streaming: Event-driven streaming with progress
- 🏭 Factory Methods: Simplified setup with auto-configuration
- 📈 Cost Tracking: Detailed metrics and savings analysis
Examples
All examples are available in the examples/nodejs directory:
- quality-profiles.ts - Quality validation profiles (strict, production, development, cascade, permissive)
- telemetry-callbacks.ts - Event-driven monitoring and callbacks
- batch-processing.ts - Batch processing with progress tracking
- router-integration.ts - PreRouter and ToolRouter integration
- user-profiles-workflows.ts - User profiles, tiers, and workflows
- factory-methods.ts - Factory methods (fromEnv, fromProfile)
- enhanced-streaming.ts - Enhanced streaming with events
Run any example with:
npx tsx examples/nodejs/<example-name>.tsDocumentation
See the main cascadeflow documentation for complete guides and examples.
License
MIT © Lemony Inc.
