@cascadeflow/ml

v0.7.1

Published

4 days ago

ML semantic detection for cascadeflow TypeScript - Feature parity with Python

0High
0Medium
0Low

saschabuehrle

lemony-ai

ai ml embeddings semantic-search transformers bge cascadeflow domain-detection semantic-validation

@cascadeflow/ml

ML-based semantic detection for cascadeflow TypeScript.

Brings TypeScript to feature parity with Python's ML capabilities using Transformers.js.

Features

🎯 84-87% domain detection confidence (matches Python)
🧠 Semantic validation using cosine similarity
🚀 Works everywhere - Node.js, browser, edge functions
📦 Same model as Python - BGE-small-en-v1.5
🔄 Automatic fallback to rule-based detection
⚡ Fast inference - ~20-50ms per embedding
🎨 Request-scoped caching - 50% latency reduction

Installation

npm install @cascadeflow/ml

The model (~40MB) will be downloaded automatically on first use.

Usage

Enable ML Detection in CascadeAgent

ML-based semantic detection is automatically available when @cascadeflow/ml is installed. The CascadeAgent will use it for enhanced domain detection and routing.

import { CascadeAgent } from '@cascadeflow/core';

const agent = new CascadeAgent({
  models: [
    { name: 'gpt-4o-mini', provider: 'openai', cost: 0.00015 },
    { name: 'gpt-4o', provider: 'openai', cost: 0.00625 },
  ],
});

const result = await agent.run('Calculate eigenvalues of [[1,2],[3,4]]');

// ML detection results are in metadata when available
console.log(result.metadata.domainDetected);      // 'MATH'
console.log(result.metadata.detectionMethod);     // 'semantic'
console.log(result.metadata.domainConfidence);    // 0.87 (87%)

Direct Embedding Service Usage

import { UnifiedEmbeddingService, EmbeddingCache } from '@cascadeflow/ml';

// Create service (lazy loads model)
const embedder = new UnifiedEmbeddingService();

// Check availability
if (await embedder.isAvailable()) {
  // Generate embeddings
  const embedding = await embedder.embed('Hello world');
  console.log(embedding?.dimensions);  // 384

  // Compute similarity
  const similarity = await embedder.similarity('cat', 'kitten');
  console.log(similarity);  // ~0.85 (high similarity)

  // Use caching for better performance
  const cache = new EmbeddingCache(embedder);
  const emb1 = await cache.getOrEmbed('query');  // Computes
  const emb2 = await cache.getOrEmbed('query');  // Cached!
}

How It Works

Model

Uses Xenova/bge-small-en-v1.5 (ONNX-converted BAAI/bge-small-en-v1.5):

Size: ~40MB
Dimensions: 384
Inference: ~20-50ms per embedding
MTEB Score: 91.8%
Same as Python: Exact feature parity

Semantic Domain Detection

Computes semantic similarity between query and domain exemplars:

Embed user query → 384-dim vector
Compare to domain exemplars (8 per domain)
Find highest similarity score
Return domain with confidence

Graceful Fallback

If ML unavailable (model loading fails, dependency missing):

✅ Automatically falls back to rule-based detection
✅ All features continue to work
✅ No errors or crashes
⚠️ Slightly lower confidence (~60-75% vs 84-87%)

Performance

Latency

Cold start: ~200-500ms (model loading)
Warm: ~20-50ms per embedding
Cached: <1ms (request-scoped cache)
Batch: ~30% faster than individual calls

Accuracy

Domain detection confidence:

ML semantic: 84-87% (complex domains)
Rule-based fallback: 60-75%
Improvement: 15-20% higher confidence

Tested on domains: MATH, CODE, DATA, STRUCTURED, REASONING

Browser Support

Works in modern browsers with:

WebAssembly support
Sufficient memory (~100MB for model)
ES2020+ JavaScript support

Tested on:

✅ Chrome 90+
✅ Firefox 88+
✅ Safari 14+
✅ Edge 90+

Edge Functions

Supported edge runtimes:

✅ Vercel Edge Functions
✅ Cloudflare Workers
✅ Netlify Edge Functions
⚠️ AWS Lambda@Edge (check memory limits)

API Reference

UnifiedEmbeddingService

class UnifiedEmbeddingService {
  constructor(modelName?: string);

  isAvailable(): Promise<boolean>;
  embed(text: string): Promise<EmbeddingVector | null>;
  embedBatch(texts: string[]): Promise<EmbeddingVector[] | null>;
  similarity(text1: string, text2: string): Promise<number | null>;
}

EmbeddingCache

class EmbeddingCache {
  constructor(embedder: UnifiedEmbeddingService);

  getOrEmbed(text: string): Promise<EmbeddingVector | null>;
  similarity(text1: string, text2: string): Promise<number | null>;
  clear(): void;
  cacheSize(): number;
  cacheInfo(): { size: number; texts: string[] };
}

EmbeddingVector

interface EmbeddingVector {
  data: Float32Array;
  dimensions: number;
}

Troubleshooting

Model Loading Fails

// Check if ML is available
const embedder = new UnifiedEmbeddingService();
const available = await embedder.isAvailable();

if (!available) {
  console.log('ML not available, using rule-based detection');
  // App continues to work with fallback
}

Memory Issues

The model requires ~100MB memory. For constrained environments:

Use rule-based detection (no ML package)
Implement model lazy loading
Consider server-side ML service

Slow First Load

Model download (~40MB) happens once on first use. To preload:

const embedder = new UnifiedEmbeddingService();
await embedder.embed('warmup query');  // Triggers model download

Comparison with Python

| Feature | Python | TypeScript | Notes | |---------|--------|------------|-------| | Model | FastEmbed | Transformers.js | Same BGE-small-en-v1.5 | | Confidence | 84-87% | 84-87% | ✅ Parity | | Latency | ~20-30ms | ~20-50ms | Similar | | Size | ~40MB | ~40MB | Same | | Fallback | ✅ | ✅ | Both graceful |

Result: Feature parity achieved! 🎉

Examples

See packages/core/examples/nodejs/production-patterns.ts for a complete production example that demonstrates ML-based semantic detection and validation.

License

MIT

Support

Documentation: https://github.com/lemony-ai/cascadeflow
Issues: https://github.com/lemony-ai/cascadeflow/issues
Discussions: https://github.com/lemony-ai/cascadeflow/discussions

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@cascadeflow/ml

Features

Installation

Usage

Enable ML Detection in CascadeAgent

Direct Embedding Service Usage

How It Works

Model

Semantic Domain Detection

Graceful Fallback

Performance

Latency

Accuracy

Browser Support

Edge Functions

API Reference

UnifiedEmbeddingService

EmbeddingCache

EmbeddingVector

Troubleshooting

Model Loading Fails

Memory Issues

Slow First Load

Comparison with Python

Examples

License

Support