npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@cascadeflow/langchain

v1.1.0

Published

LangChain integration for cascadeflow - Add intelligent cost optimization to your LangChain models

Downloads

370

Readme

@cascadeflow/langchain

LangChain integration for CascadeFlow - Add intelligent cost optimization to your existing LangChain models without reconfiguration.

Features

  • 🔄 Zero Code Changes - Wrap your existing LangChain models, no refactoring needed
  • 💰 Automatic Cost Optimization - Save 40-60% on LLM costs through intelligent cascading
  • 🎯 Quality-Based Routing - Only escalate to expensive models when quality is insufficient
  • 📊 Full Visibility - Track costs, quality scores, and cascade decisions
  • 🔗 Chainable - All LangChain methods (bind(), bindTools(), etc.) work seamlessly
  • 📈 LangSmith Ready - Automatic cost metadata injection for observability
  • 🧭 Domain Policies - Per-domain threshold/routing overrides (qualityThreshold, forceVerifier, directToVerifier)
  • 🔁 CascadeAgent - Built-in closed-loop tool agent for multi-turn execution with max-step protection

Installation

npm install @cascadeflow/langchain @langchain/core
# or
pnpm add @cascadeflow/langchain @langchain/core
# or
yarn add @cascadeflow/langchain @langchain/core

Quick Start

import { ChatOpenAI } from '@langchain/openai';
import { ChatAnthropic } from '@langchain/anthropic';
import { withCascade } from '@cascadeflow/langchain';

// Step 1: Configure your existing models (no changes needed!)
const drafter = new ChatOpenAI({
  model: 'gpt-5-mini',  // Fast, cheap model ($0.25/$2 per 1M tokens)
  temperature: 0.7
});

const verifier = new ChatAnthropic({
  model: 'claude-opus-4-6',  // Accurate, expensive model ($15/$75 per 1M tokens)
  temperature: 0.7
});

// Step 2: Wrap with cascade (just 2 lines!)
const cascadeModel = withCascade({
  drafter,
  verifier,
  qualityThreshold: 0.7,  // Quality bar for accepting drafter responses
});

// Step 3: Use like any LangChain model!
const result = await cascadeModel.invoke("What is TypeScript?");
console.log(result.content);

// Step 4: Check cascade statistics
const stats = cascadeModel.getLastCascadeResult();
console.log(`Model used: ${stats.modelUsed}`);
console.log(`Cost: $${stats.totalCost.toFixed(6)}`);
console.log(`Savings: ${stats.savingsPercentage.toFixed(1)}%`);

// Optional: Enable LangSmith tracing (see traces at https://smith.langchain.com)
// Set LANGSMITH_API_KEY, LANGSMITH_PROJECT, LANGSMITH_TRACING=true
// Your ChatOpenAI/ChatAnthropic models will appear in LangSmith with cascade metadata

How It Works

CascadeFlow uses speculative execution to optimize costs:

  1. Try Drafter First - Executes the cheap, fast model
  2. Quality Check - Validates the response quality using heuristics or custom validators
  3. Cascade if Needed - Only calls the expensive model if quality is below threshold
  4. Track Everything - Records costs, latency, and cascade decisions

This approach provides:

  • No Latency Penalty - Drafter responses are instant when quality is high
  • Quality Guarantee - Verifier ensures high-quality responses for complex queries
  • Cost Savings - 40-60% reduction in API costs on average

Configuration

Basic Configuration

const cascadeModel = withCascade({
  drafter: new ChatOpenAI({ model: 'gpt-5-mini' }),
  verifier: new ChatAnthropic({ model: 'claude-opus-4-6' }),
  qualityThreshold: 0.7,  // Default: 0.7 (70%)
});

Custom Quality Validator

const cascadeModel = withCascade({
  drafter,
  verifier,
  qualityValidator: async (response) => {
    // Custom logic - return quality score 0-1
    const text = response.generations[0].text;

    // Example: Use length and keywords
    const hasKeywords = ['typescript', 'javascript'].some(kw =>
      text.toLowerCase().includes(kw)
    );

    return text.length > 50 && hasKeywords ? 0.9 : 0.4;
  },
});

Disable Cost Tracking

const cascadeModel = withCascade({
  drafter,
  verifier,
  enableCostTracking: false,  // Disable metadata injection
});

Domain Policies

Use domain-specific routing rules without changing your chain code:

const cascadeModel = withCascade({
  drafter,
  verifier,
  qualityThreshold: 0.7,
  domainPolicies: {
    finance: { qualityThreshold: 0.5 }, // Easier acceptance for finance queries
    medical: { forceVerifier: true }, // Always verify after drafting
    legal: { directToVerifier: true }, // Skip drafter entirely
  },
});

const legalCascade = cascadeModel.bind({
  metadata: { cascadeflow_domain: "legal" },
});

const result = await legalCascade.invoke("Review this contract clause");

Advanced Usage

Streaming Responses

CascadeFlow supports real-time streaming with optimistic drafter execution:

const cascade = withCascade({
  drafter: new ChatOpenAI({ model: 'gpt-4o-mini' }),
  verifier: new ChatOpenAI({ model: 'gpt-4o' }),
});

// Stream responses in real-time
const stream = await cascade.stream('Explain TypeScript');

for await (const chunk of stream) {
  process.stdout.write(chunk.content);
}

How Streaming Works:

  1. Optimistic Streaming (text-only) - Drafter response streams immediately (user sees output in real-time)
  2. Quality Check - After drafter completes, quality is validated
  3. Optional Cascade - If quality is insufficient, verifier output is streamed; switch notices are off by default and can be enabled via metadata (cascadeflow_emit_switch_message)
  4. Tool-safe Streaming - When tools are bound with bindTools(...), output is buffered until final routing so tool-call deltas stay consistent

This provides the best user experience with no perceived latency for queries the drafter can handle.

Chaining with bind()

All LangChain chainable methods work seamlessly:

const cascadeModel = withCascade({ drafter, verifier });

// bind() works
const boundModel = cascadeModel.bind({ temperature: 0.1 });
const result = await boundModel.invoke("Be precise");

// Chain multiple times
const doubleChained = cascadeModel
  .bind({ temperature: 0.5 })
  .bind({ maxTokens: 100 });

Tool Calling

const tools = [
  {
    name: 'calculator',
    description: 'Useful for math calculations',
    func: async (input: string) => {
      // Use a strict parser helper (see examples/nodejs/safe-math.ts).
      return safeCalculateExpression(input).toString();
    },
  },
];

const modelWithTools = cascadeModel.bindTools(tools);
const result = await modelWithTools.invoke("What is 25 * 4?");

Structured Output

const schema = {
  name: 'person',
  schema: {
    type: 'object',
    properties: {
      name: { type: 'string' },
      age: { type: 'number' },
    },
  },
};

const structuredModel = cascadeModel.withStructuredOutput(schema);
const result = await structuredModel.invoke("Extract: John is 30 years old");
// Result is typed according to schema

Agentic Tool Loops (CascadeAgent)

CascadeAgent adds a closed agent/tool loop with explicit max-step safety:

import { CascadeAgent, withCascade } from '@cascadeflow/langchain';

const cascadeModel = withCascade({
  drafter,
  verifier,
  domainPolicies: {
    legal: { directToVerifier: true },
    medical: { forceVerifier: true },
  },
});

const agent = new CascadeAgent({
  model: cascadeModel.bindTools(tools),
  maxSteps: 6,
  toolHandlers: {
    // Use the same strict parser helper (see examples/nodejs/safe-math.ts).
    calculator: async ({ expression }) => safeCalculateExpression(expression).toString(),
  },
});

const run = await agent.run(
  [{ role: 'user', content: 'What is (25 * 4) + 10?' }],
  { systemPrompt: 'You are a precise calculator assistant.' }
);

console.log(run.status, run.steps, run.message.content);

Input can be a string, LangChain BaseMessage[], or role/content message list for multi-turn conversations. CascadeAgent also supports multi-tool calls in a single step and keeps system prompts at the front of looped executions.

Accessing Cascade Statistics

const result = await cascadeModel.invoke("Complex question");

const stats = cascadeModel.getLastCascadeResult();
console.log({
  content: stats.content,
  modelUsed: stats.modelUsed,  // 'drafter' or 'verifier'
  accepted: stats.accepted,  // Was drafter response accepted?
  drafterQuality: stats.drafterQuality,  // 0-1 quality score
  drafterCost: stats.drafterCost,  // $ spent on drafter
  verifierCost: stats.verifierCost,  // $ spent on verifier
  totalCost: stats.totalCost,  // Total $ spent
  savingsPercentage: stats.savingsPercentage,  // % saved vs verifier-only
  latencyMs: stats.latencyMs,  // Total latency in ms
});

LangSmith Integration

CascadeFlow works seamlessly with LangSmith for observability and cost tracking.

What You'll See in LangSmith

When you enable LangSmith tracing, you'll see:

  1. Your Actual Chat Models - ChatOpenAI, ChatAnthropic, etc. appear as separate traces
  2. Cascade Metadata - Decision info attached to each response
  3. Token Usage & Costs - Server-side calculation by LangSmith
  4. Nested Traces - Parent CascadeFlow trace with child model traces

Enabling LangSmith

// Set environment variables
process.env.LANGSMITH_API_KEY = 'lsv2_pt_...';
process.env.LANGSMITH_PROJECT = 'your-project';
process.env.LANGSMITH_TRACING = 'true';

// Use CascadeFlow normally - tracing happens automatically
const cascade = withCascade({
  drafter: new ChatOpenAI({ model: 'gpt-5-mini' }),
  verifier: new ChatAnthropic({ model: 'claude-opus-4-6' }),
  costTrackingProvider: 'cascadeflow', // Default (local pricing)
});

const result = await cascade.invoke("Your query");

Viewing Traces

In your LangSmith dashboard (https://smith.langchain.com):

  • For cascaded queries - You'll see only the drafter model trace (e.g., ChatOpenAI with gpt-5-mini)
  • For escalated queries - You'll see BOTH drafter AND verifier traces (e.g., ChatOpenAI gpt-5-mini + ChatAnthropic claude-opus-4-6)
  • Metadata location - Click any trace → Outputs → response_metadata → cascade

Example Metadata

{
  "cascade": {
    "cascade_decision": "cascaded",
    "model_used": "drafter",
    "drafter_quality": 0.85,
    "savings_percentage": 66.7,
    "drafter_cost": 0,      // Calculated by LangSmith
    "verifier_cost": 0,     // Calculated by LangSmith
    "total_cost": 0         // Calculated by LangSmith
  }
}

Note: costTrackingProvider: 'cascadeflow' (default) computes costs locally using CascadeFlow's pricebook. If you use costTrackingProvider: 'langsmith', costs are calculated server-side and shown in the LangSmith UI (local cost values will be $0).

See docs/COST_TRACKING.md for more details on cost tracking options.

Supported Models

Works with any LangChain-compatible chat model:

OpenAI

import { ChatOpenAI } from '@langchain/openai';

const drafter = new ChatOpenAI({ model: 'gpt-5-mini' });
const verifier = new ChatOpenAI({ model: 'gpt-5' });

Anthropic

import { ChatAnthropic } from '@langchain/anthropic';

const drafter = new ChatAnthropic({ model: 'claude-haiku-4-5-20251001' });
const verifier = new ChatAnthropic({ model: 'claude-opus-4-6' });

Mix and Match (Recommended)

// Use different providers for optimal cost/quality balance!
const drafter = new ChatOpenAI({ model: 'gpt-5-mini' });
const verifier = new ChatAnthropic({ model: 'claude-opus-4-6' });

Cost Optimization Tips

  1. Choose Your Drafter Wisely - Use the cheapest model that can handle most queries

    • GPT-5-mini: $0.25/$2.00 per 1M tokens (input/output)
    • GPT-4o-mini: $0.15/$0.60 per 1M tokens (input/output)
    • Claude Haiku 4.5: $0.80/$4.00 per 1M tokens
  2. Tune Quality Threshold - Higher threshold = more cascades = higher cost but better quality

    • 0.6 - Aggressive cost savings, may sacrifice some quality
    • 0.7 - Balanced (recommended default)
    • 0.8 - Conservative, ensures high quality
  3. Use Custom Validators - Domain-specific validation can improve accuracy

    qualityValidator: (response) => {
      const text = response.generations[0].text;
      // Check for specific requirements
      return hasRelevantKeywords(text) && meetsLengthRequirement(text) ? 0.9 : 0.5;
    }

Performance

Typical cascade behavior:

| Query Type | Drafter Hit Rate | Avg Latency | Cost Savings | |-----------|------------------|-------------|--------------| | Simple Q&A | 85% | 500ms | 55-65% | | Complex reasoning | 40% | 1200ms | 20-30% | | Code generation | 60% | 800ms | 35-45% | | Overall | 70% | 700ms | 40-60% |

TypeScript Support

Full TypeScript support with type inference:

import type { CascadeConfig, CascadeResult } from '@cascadeflow/langchain';

const config: CascadeConfig = {
  drafter,
  verifier,
  qualityThreshold: 0.7,
};

const stats: CascadeResult | undefined = cascadeModel.getLastCascadeResult();

Examples

See the examples directory for complete working examples:

API Reference

withCascade(config: CascadeConfig): CascadeFlow

Creates a cascade-wrapped LangChain model.

Parameters:

  • config.drafter - The cheap, fast model
  • config.verifier - The accurate, expensive model
  • config.qualityThreshold? - Minimum quality to accept drafter (default: 0.7)
  • config.qualityValidator? - Custom function to calculate quality
  • config.enableCostTracking? - Enable LangSmith metadata injection (default: true)
  • config.costTrackingProvider? - 'cascadeflow' (default, local pricing) or 'langsmith' (server-side)
  • config.domainPolicies? - Per-domain overrides: qualityThreshold, forceVerifier, directToVerifier

Returns: CascadeFlow - A LangChain-compatible model with cascade logic

new CascadeAgent(config: CascadeAgentConfig)

Creates a closed-loop agent around a LangChain model (or directly from cascade config).

Parameters:

  • config.model? - Any LangChain chat model (often withCascade(...).bindTools(...))
  • config.cascade? - Optional CascadeConfig used to create an internal CascadeFlow
  • config.maxSteps? - Loop safety cap (default: 8)
  • config.toolHandlers? - Tool name to handler map

CascadeAgent.run(input, options?): Promise<CascadeAgentRunResult>

Runs model/tool/model loops until completion or maxSteps is reached.

Returns: CascadeAgentRunResult with:

  • message - Final AIMessage
  • messages - Full message history (including tool messages)
  • steps - Executed model turns
  • status - 'completed' | 'max_steps_reached'
  • toolCalls - Collected tool calls across steps

CascadeFlow.getLastCascadeResult(): CascadeResult | undefined

Returns statistics from the last cascade execution.

Returns: CascadeResult with:

  • content - The final response text
  • modelUsed - Which model provided the response ('drafter' | 'verifier')
  • accepted - Whether drafter response was accepted
  • drafterQuality - Quality score of drafter response (0-1)
  • drafterCost - Cost of drafter call
  • verifierCost - Cost of verifier call (0 if not used)
  • totalCost - Total cost
  • savingsPercentage - Percentage saved vs verifier-only
  • latencyMs - Total latency in milliseconds

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT © Lemony Inc.

Related