@mullion/ai-sdk
v0.3.0
Published
Mullion integration with Vercel AI SDK
Maintainers
Readme
Installation
npm install @mullion/ai-sdk ai zodOverview
This package provides a seamless integration between Mullion and the Vercel AI SDK, enabling type-safe context management for LLM operations with automatic confidence tracking, provider-aware caching, and cost estimation.
Features
- ✅ Type-safe contexts - Full Mullion
Owned<T, S>integration - ✅ Automatic confidence scoring - Based on finish reasons
- ✅ Provider-aware caching - Anthropic/OpenAI optimizations
- ✅ Cost estimation & tracking - Pre-call estimates and actual costs
- ✅ Cache metrics - Hit rates, savings calculation
- ✅ Fork integration - Warmup strategies, schema conflict detection
- ✅ Safe-by-default caching - Never cache user content without opt-in
- ✅ TTL support - '5m', '1h', '1d' cache lifetimes
- ✅ All AI SDK providers - OpenAI, Anthropic, Google, custom
Quick Start
Basic Usage
import {createMullionClient} from '@mullion/ai-sdk';
import {openai} from '@ai-sdk/openai';
import {z} from 'zod';
// Create a client with your preferred model
const client = createMullionClient(openai('gpt-4'));
// Define your data schema
const EmailSchema = z.object({
intent: z.enum(['support', 'sales', 'billing', 'general']),
urgency: z.enum(['low', 'medium', 'high']),
entities: z.array(z.string()).describe('Key entities mentioned'),
});
// Scoped LLM operations
const analysis = await client.scope('email-intake', async (ctx) => {
const result = await ctx.infer(EmailSchema, userEmail);
// Automatic confidence checking
if (result.confidence < 0.8) {
throw new Error('Low confidence - needs human review');
}
return ctx.use(result);
});Multi-Scope Workflows
// Process data across different security contexts
const result = await client.scope('admin', async (adminCtx) => {
// Admin scope: access sensitive data
const adminData = await adminCtx.infer(DataSchema, sensitiveInput);
return await client.scope('user', async (userCtx) => {
// User scope: safe for customer-facing operations
const bridged = userCtx.bridge(adminData); // ✅ Explicit bridge
const response = await userCtx.infer(ResponseSchema, bridged.value);
return userCtx.use(response);
});
});Supported Providers
Works with all Vercel AI SDK providers:
OpenAI
import {openai} from '@ai-sdk/openai';
const client = createMullionClient(openai('gpt-4'));
// or
const client = createMullionClient(openai('gpt-3.5-turbo'));Anthropic
import {anthropic} from '@ai-sdk/anthropic';
const client = createMullionClient(anthropic('claude-3-5-sonnet-20241022'));import {google} from '@ai-sdk/google';
const client = createMullionClient(google('gemini-1.5-pro'));Custom Providers
import {createOpenAI} from '@ai-sdk/openai';
const customProvider = createOpenAI({
apiKey: process.env.CUSTOM_API_KEY,
baseURL: 'https://your-custom-endpoint.com/v1',
});
const client = createMullionClient(customProvider('your-model'));Features
Automatic Confidence Scoring
Confidence is automatically extracted from LLM finish reasons:
const result = await ctx.infer(schema, input);
// Confidence mapping:
// stop: 1.0 - Model completed naturally
// tool-calls: 0.95 - Model made tool calls
// length: 0.75 - Output truncated due to token limit
// content-filter: 0.6 - Content was filtered
// other: 0.5 - Unknown reason
// error: 0.3 - Error occurred
console.log(`Confidence: ${result.confidence}`);Schema Integration
Full Zod schema support with type inference:
const ProductSchema = z.object({
name: z.string().describe('Product name'),
price: z.number().positive().describe('Price in USD'),
category: z.enum(['electronics', 'clothing', 'books']),
features: z.array(z.string()).optional(),
});
const product = await ctx.infer(ProductSchema, productDescription);
// product.value is fully typed as ProductSchema's inferred typeInference Options
Customize LLM behavior:
const result = await ctx.infer(schema, input, {
temperature: 0.7,
maxTokens: 500,
systemPrompt: 'You are a helpful assistant specialized in data extraction.',
});Caching
Provider-aware caching with safe-by-default behavior and automatic optimization.
Basic Caching
const result = await client.scope('analysis', async (ctx) => {
// Add cacheable content
ctx.cache.addSystemPrompt('You are an expert data analyst.');
ctx.cache.addDeveloperContent(largeDocument, {
ttl: '5m', // Time-to-live: '5m' | '1h' | '1d'
scope: 'ephemeral', // or 'persistent'
});
// This inference benefits from caching on repeat calls
const analysis = await ctx.infer(AnalysisSchema, 'Analyze this data');
// Check cache performance
const stats = await ctx.getCacheStats();
console.log(`Cache hits: ${stats.cacheReadTokens} tokens`);
console.log(`Saved: $${stats.estimatedSavings.toFixed(4)}`);
return ctx.use(analysis);
});Cache Segments API
// System prompts (always safe to cache)
ctx.cache.addSystemPrompt('You are a helpful assistant');
// Developer content (your content, safe to cache)
ctx.cache.addDeveloperContent(documentation, {
ttl: '1h',
scope: 'persistent',
});
// User content (requires explicit opt-in)
ctx.cache.addDeveloperContent(userQuery, {
scope: 'allow-user-content', // ⚠️ Only if safe!
ttl: '5m',
});Provider-Specific Features:
| Provider | Min Tokens | TTL Options | Auto-Cache | | --------- | ---------- | ----------- | --------------- | | Anthropic | 1024-4096 | 5m, 1h, 1d | No (explicit) | | OpenAI | 1024 | 1h (fixed) | Yes (automatic) |
Learn more: See docs/reference/caching.md
Cost Estimation
Track and predict LLM costs before and after API calls.
Pre-Call Estimation
const estimate = await ctx.estimateNextCallCost(schema, input);
console.log(`Estimated cost: $${estimate.totalCost.toFixed(4)}`);
console.log(`Input tokens: ${estimate.inputTokens}`);
console.log(`Expected output tokens: ${estimate.outputTokens}`);
if (estimate.totalCost > 0.1) {
console.warn('High cost operation!');
}Post-Call Tracking
const result = await ctx.infer(schema, input);
const actual = await ctx.getLastCallCost();
console.log(`Actual cost: $${actual.totalCost.toFixed(4)}`);
console.log(`Cache saved: $${actual.cacheSavings.toFixed(4)}`);
console.log(`Net cost: $${actual.netCost.toFixed(4)}`);
// Compare estimate vs actual
const diff = actual.totalCost - estimate.totalCost;
console.log(`Difference: $${diff.toFixed(4)}`);Token Estimation
import {estimateTokens} from '@mullion/ai-sdk';
const estimate = estimateTokens(text, 'gpt-4');
console.log(`${estimate.tokens} tokens (${estimate.method})`);Pricing API
import {getPricing, PRICING_DATA} from '@mullion/ai-sdk';
const pricing = getPricing('claude-3-5-sonnet-20241022');
console.log(`Input: $${pricing.inputTokenPrice}/token`);
console.log(`Output: $${pricing.outputTokenPrice}/token`);
console.log(`Cache write: $${pricing.cacheWritePrice}/token`);
console.log(`Cache read: $${pricing.cacheReadPrice}/token`);
// Custom pricing
PRICING_DATA['custom-model'] = {
modelId: 'custom-model',
provider: 'custom',
inputTokenPrice: 0.000002,
outputTokenPrice: 0.000008,
};Learn more: See docs/reference/cost-estimation.md
Fork/Merge Integration
Parallel execution with cache optimization and cost tracking.
Warmup Strategies
const result = await ctx.fork({
branches: {
model1: (c) => c.infer(schema, prompt),
model2: (c) => c.infer(schema, prompt),
model3: (c) => c.infer(schema, prompt),
},
strategy: 'cache-optimized',
warmup: 'first-branch', // Prime cache with first branch
});
// Aggregate cache stats
const stats = await Promise.all(
Object.values(result).map((r) => r.context.getCacheStats()),
);Schema Conflict Detection
import {detectSchemaConflict} from '@mullion/ai-sdk';
const result = await ctx.fork({
branches: {
simple: (c) => c.infer(SimpleSchema, prompt),
complex: (c) => c.infer(ComplexSchema, prompt), // Different schema!
},
onSchemaConflict: 'warn', // or 'error', 'ignore'
});
// Console warning: "Schema conflict detected - limited cache reuse"Best Practice: Use unified schemas for fork branches:
// ✅ GOOD: Same schema, full cache reuse
const UnifiedSchema = z.object({
analysisA: SchemaA,
analysisB: SchemaB,
});
const result = await ctx.fork({
branches: {
a: (c) => c.infer(UnifiedSchema, prompt),
b: (c) => c.infer(UnifiedSchema, prompt),
},
});Learn more: See docs/reference/fork.md
Advanced Examples
Error Handling with Confidence
async function processWithConfidence<T>(
ctx: Context<string>,
schema: z.ZodType<T>,
input: string,
minConfidence = 0.8,
): Promise<T> {
const result = await ctx.infer(schema, input);
if (result.confidence < minConfidence) {
throw new Error(
`Low confidence: ${result.confidence.toFixed(2)} < ${minConfidence}. ` +
`Trace ID: ${result.traceId}`,
);
}
return ctx.use(result);
}Multi-Step Processing
const analysis = await client.scope('analysis', async (ctx) => {
// Step 1: Extract entities
const entities = await ctx.infer(EntitiesSchema, rawText);
// Step 2: Classify sentiment
const sentiment = await ctx.infer(SentimentSchema, rawText);
// Step 3: Combine results
if (entities.confidence > 0.8 && sentiment.confidence > 0.8) {
return {
entities: ctx.use(entities),
sentiment: ctx.use(sentiment),
};
} else {
throw new Error('Insufficient confidence for analysis');
}
});Bridging Complex Data
const pipeline = await client.scope('ingestion', async (ingestCtx) => {
const rawData = await ingestCtx.infer(RawSchema, input);
return await client.scope('processing', async (processCtx) => {
const bridged = processCtx.bridge(rawData);
return await client.scope('output', async (outputCtx) => {
const processed = outputCtx.bridge(bridged);
const final = await outputCtx.infer(OutputSchema, processed.value);
// final.__scope is 'ingestion' | 'processing' | 'output'
return outputCtx.use(final);
});
});
});API Reference
Client & Context
createMullionClient(model, options?)
- Creates Mullion client with AI SDK integration
- Returns:
MullionClientwithscope()method
MullionClient.scope<S, R>(name, fn)
- Creates scoped execution context
- Returns:
Promise<R>
Context<S>.infer<T>(schema, input, options?)
- Infer structured data using LLM
- Returns:
Promise<Owned<T, S>>
Context<S>.bridge<T, OS>(owned)
- Transfer value from another scope
- Returns:
Owned<T, S | OS>
Context<S>.use<T>(owned)
- Extract raw value (scope-safe)
- Returns:
T
Caching
Context Methods:
ctx.cache.addSystemPrompt(content)- Add system prompt to cachectx.cache.addDeveloperContent(content, options)- Add developer contentctx.getCacheStats()- Get cache performance metrics
Utilities:
getCacheCapabilities(provider, model)- Get provider cache capabilitiessupportsCacheFeature(provider, feature)- Check feature supportisValidTtl(ttl)- Validate TTL stringvalidateTtlOrdering(segments)- Validate TTL orderingcreateAnthropicAdapter(options)- Create Anthropic adaptercreateOpenAIAdapter(options)- Create OpenAI adaptercreateCacheSegmentManager(options)- Create cache managerparseAnthropicMetrics(response)- Parse Anthropic metricsparseOpenAIMetrics(response)- Parse OpenAI metricsaggregateCacheMetrics(stats)- Aggregate metricsestimateCacheSavings(stats, pricing)- Estimate savings
Types:
CacheSegmentManager,CacheSegment,CacheConfigCacheStats,CacheCapabilities,CacheScope,CacheTTL
Cost Estimation
Context Methods:
ctx.estimateNextCallCost(schema, input, options?)- Estimate before callctx.getLastCallCost()- Get actual cost after call
Token Estimation:
estimateTokens(text, model?)- Estimate token countestimateTokensForSegments(segments, model)- Estimate for segments
Pricing:
getPricing(modelId)- Get pricing for modelgetAllPricing()- Get all pricing datagetPricingByProvider(provider)- Get provider pricingPRICING_DATA- Global pricing objectexportPricingAsJSON()- Export pricingimportPricingFromJSON(data)- Import pricing
Cost Calculation:
calculateCost(params)- Calculate cost from usageestimateCost(params)- Estimate costcalculateBatchCost(calls)- Calculate batch costsformatCostBreakdown(cost)- Format for displaycompareCosts(estimated, actual)- Compare costs
Types:
TokenEstimate,ModelPricing,CostBreakdown,TokenUsage
Fork/Merge Integration
Warmup:
explicitWarmup(config)- Explicit cache warmupfirstBranchWarmup(branches)- First-branch warmupcreateWarmupExecutor(config)- Create warmup executorsetupWarmupExecutor(config)- Setup global executorestimateWarmupCost(config)- Estimate warmup costshouldWarmup(estimate)- Warmup recommendation
Schema Conflicts:
computeSchemaSignature(schema)- Compute schema hashdetectSchemaConflict(branches, options)- Detect conflictshandleSchemaConflict(conflict, behavior)- Handle conflictareSchemasCompatible(schemaA, schemaB)- Check compatibilitydescribeSchemasDifference(schemaA, schemaB)- Describe diff
Types:
WarmupConfig,WarmupResult,SchemaInfoDetectSchemaConflictOptions,DetailedSchemaConflictResult
Confidence Scoring
extractConfidenceFromFinishReason(reason)- Extract confidence
Confidence Mapping:
stop: 1.0 - Model completed naturallytool-calls: 0.95 - Model made tool callslength: 0.75 - Truncated by token limitcontent-filter: 0.6 - Content filteredother: 0.5 - Unknown reasonerror: 0.3 - Error occurred
Related Packages
- @mullion/core - Core types, fork/merge, tracing
- @mullion/eslint-plugin - Static analysis
Documentation
- Caching Guide - Complete caching documentation
- Cost Estimation - Cost tracking guide
- Fork API - Parallel execution guide
- Examples - Working examples
Integration with ESLint
Use with @mullion/eslint-plugin for compile-time leak detection:
npm install @mullion/eslint-plugin --save-dev// eslint.config.js
import mullion from '@mullion/eslint-plugin';
export default [
{
plugins: {'@mullion': mullion},
rules: {
'@mullion/no-context-leak': 'error',
'@mullion/require-confidence-check': 'warn',
},
},
];Examples
See the examples directory for complete implementations:
- Basic Example - Core concepts demonstration
- Integration test instructions in docs/contributing/integration-tests.md
Contributing
Found a bug or want to contribute? See CONTRIBUTING.md for guidelines.
License
MIT - see LICENSE for details.
