@cascadeflow/ml
v0.7.1
Published
ML semantic detection for cascadeflow TypeScript - Feature parity with Python
Readme
@cascadeflow/ml
ML-based semantic detection for cascadeflow TypeScript.
Brings TypeScript to feature parity with Python's ML capabilities using Transformers.js.
Features
- 🎯 84-87% domain detection confidence (matches Python)
- 🧠 Semantic validation using cosine similarity
- 🚀 Works everywhere - Node.js, browser, edge functions
- 📦 Same model as Python - BGE-small-en-v1.5
- 🔄 Automatic fallback to rule-based detection
- ⚡ Fast inference - ~20-50ms per embedding
- 🎨 Request-scoped caching - 50% latency reduction
Installation
npm install @cascadeflow/mlThe model (~40MB) will be downloaded automatically on first use.
Usage
Enable ML Detection in CascadeAgent
ML-based semantic detection is automatically available when @cascadeflow/ml is installed. The CascadeAgent will use it for enhanced domain detection and routing.
import { CascadeAgent } from '@cascadeflow/core';
const agent = new CascadeAgent({
models: [
{ name: 'gpt-4o-mini', provider: 'openai', cost: 0.00015 },
{ name: 'gpt-4o', provider: 'openai', cost: 0.00625 },
],
});
const result = await agent.run('Calculate eigenvalues of [[1,2],[3,4]]');
// ML detection results are in metadata when available
console.log(result.metadata.domainDetected); // 'MATH'
console.log(result.metadata.detectionMethod); // 'semantic'
console.log(result.metadata.domainConfidence); // 0.87 (87%)Direct Embedding Service Usage
import { UnifiedEmbeddingService, EmbeddingCache } from '@cascadeflow/ml';
// Create service (lazy loads model)
const embedder = new UnifiedEmbeddingService();
// Check availability
if (await embedder.isAvailable()) {
// Generate embeddings
const embedding = await embedder.embed('Hello world');
console.log(embedding?.dimensions); // 384
// Compute similarity
const similarity = await embedder.similarity('cat', 'kitten');
console.log(similarity); // ~0.85 (high similarity)
// Use caching for better performance
const cache = new EmbeddingCache(embedder);
const emb1 = await cache.getOrEmbed('query'); // Computes
const emb2 = await cache.getOrEmbed('query'); // Cached!
}How It Works
Model
Uses Xenova/bge-small-en-v1.5 (ONNX-converted BAAI/bge-small-en-v1.5):
- Size: ~40MB
- Dimensions: 384
- Inference: ~20-50ms per embedding
- MTEB Score: 91.8%
- Same as Python: Exact feature parity
Semantic Domain Detection
Computes semantic similarity between query and domain exemplars:
- Embed user query → 384-dim vector
- Compare to domain exemplars (8 per domain)
- Find highest similarity score
- Return domain with confidence
Graceful Fallback
If ML unavailable (model loading fails, dependency missing):
- ✅ Automatically falls back to rule-based detection
- ✅ All features continue to work
- ✅ No errors or crashes
- ⚠️ Slightly lower confidence (~60-75% vs 84-87%)
Performance
Latency
- Cold start: ~200-500ms (model loading)
- Warm: ~20-50ms per embedding
- Cached: <1ms (request-scoped cache)
- Batch: ~30% faster than individual calls
Accuracy
Domain detection confidence:
- ML semantic: 84-87% (complex domains)
- Rule-based fallback: 60-75%
- Improvement: 15-20% higher confidence
Tested on domains: MATH, CODE, DATA, STRUCTURED, REASONING
Browser Support
Works in modern browsers with:
- WebAssembly support
- Sufficient memory (~100MB for model)
- ES2020+ JavaScript support
Tested on:
- ✅ Chrome 90+
- ✅ Firefox 88+
- ✅ Safari 14+
- ✅ Edge 90+
Edge Functions
Supported edge runtimes:
- ✅ Vercel Edge Functions
- ✅ Cloudflare Workers
- ✅ Netlify Edge Functions
- ⚠️ AWS Lambda@Edge (check memory limits)
API Reference
UnifiedEmbeddingService
class UnifiedEmbeddingService {
constructor(modelName?: string);
isAvailable(): Promise<boolean>;
embed(text: string): Promise<EmbeddingVector | null>;
embedBatch(texts: string[]): Promise<EmbeddingVector[] | null>;
similarity(text1: string, text2: string): Promise<number | null>;
}EmbeddingCache
class EmbeddingCache {
constructor(embedder: UnifiedEmbeddingService);
getOrEmbed(text: string): Promise<EmbeddingVector | null>;
similarity(text1: string, text2: string): Promise<number | null>;
clear(): void;
cacheSize(): number;
cacheInfo(): { size: number; texts: string[] };
}EmbeddingVector
interface EmbeddingVector {
data: Float32Array;
dimensions: number;
}Troubleshooting
Model Loading Fails
// Check if ML is available
const embedder = new UnifiedEmbeddingService();
const available = await embedder.isAvailable();
if (!available) {
console.log('ML not available, using rule-based detection');
// App continues to work with fallback
}Memory Issues
The model requires ~100MB memory. For constrained environments:
- Use rule-based detection (no ML package)
- Implement model lazy loading
- Consider server-side ML service
Slow First Load
Model download (~40MB) happens once on first use. To preload:
const embedder = new UnifiedEmbeddingService();
await embedder.embed('warmup query'); // Triggers model downloadComparison with Python
| Feature | Python | TypeScript | Notes | |---------|--------|------------|-------| | Model | FastEmbed | Transformers.js | Same BGE-small-en-v1.5 | | Confidence | 84-87% | 84-87% | ✅ Parity | | Latency | ~20-30ms | ~20-50ms | Similar | | Size | ~40MB | ~40MB | Same | | Fallback | ✅ | ✅ | Both graceful |
Result: Feature parity achieved! 🎉
Examples
See packages/core/examples/nodejs/production-patterns.ts for a complete production example that demonstrates ML-based semantic detection and validation.
License
MIT
Support
- Documentation: https://github.com/lemony-ai/cascadeflow
- Issues: https://github.com/lemony-ai/cascadeflow/issues
- Discussions: https://github.com/lemony-ai/cascadeflow/discussions
