@providerprotocol/ai
v0.0.40
Published
UPP: Unified Provider Protocol for AI inference
Maintainers
Readme
@providerprotocol/ai
A unified TypeScript SDK for AI inference across multiple providers. One API for LLMs, embeddings, and image generation.
bun add @providerprotocol/aiQuick Start
import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
const claude = llm({ model: anthropic('claude-sonnet-4-20250514') });
const turn = await claude.generate('Hello!');
console.log(turn.response.text);Providers
| Provider | Import | LLM | Embedding | Image |
|----------|--------|:---:|:---------:|:-----:|
| Anthropic | @providerprotocol/ai/anthropic | ✓ | | |
| OpenAI | @providerprotocol/ai/openai | ✓ | ✓ | ✓ |
| Google | @providerprotocol/ai/google | ✓ | ✓ | ✓ |
| xAI | @providerprotocol/ai/xai | ✓ | | ✓ |
| Ollama | @providerprotocol/ai/ollama | ✓ | ✓ | |
| OpenRouter | @providerprotocol/ai/openrouter | ✓ | ✓ | ✓ |
| Groq | @providerprotocol/ai/groq | ✓ | | |
| Cerebras | @providerprotocol/ai/cerebras | ✓ | | |
| Moonshot | @providerprotocol/ai/moonshot | ✓ | | |
| OpenResponses | @providerprotocol/ai/responses | ✓ | | |
API keys are loaded automatically from environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, GROQ_API_KEY, CEREBRAS_API_KEY, MOONSHOT_API_KEY, etc.).
LLM
Streaming
const stream = claude.stream('Count to 5');
for await (const event of stream) {
if (event.type === 'text_delta') {
process.stdout.write(event.delta.text);
}
}
const turn = await stream.turn;Stream results are PromiseLike, so you can also await the stream directly to auto-drain:
const turn = await claude.stream('Count to 5');Stream Control:
const stream = claude.stream('Write a long story');
// Abort the stream at any time
setTimeout(() => stream.abort(), 5000);
for await (const event of stream) {
// Process events until abort
}Stream Events:
| Event | Description |
|-------|-------------|
| text_delta | Incremental text output |
| reasoning_delta | Incremental reasoning/thinking output |
| object_delta | Incremental structured output JSON |
| tool_call_delta | Tool call arguments being streamed |
| tool_execution_start | Tool execution has started |
| tool_execution_end | Tool execution has completed |
| message_start / message_stop | Message boundaries |
| content_block_start / content_block_stop | Content block boundaries |
Multi-turn Conversations
const history: Message[] = [];
const t1 = await claude.generate(history, 'My name is Alice');
history.push(...t1.messages);
const t2 = await claude.generate(history, 'What is my name?');
// Response: "Your name is Alice"System-Only Inference
Both generate() and stream() can be called with no arguments for system-prompt-only inference:
const assistant = llm({
model: anthropic('claude-sonnet-4-20250514'),
system: 'You are a haiku generator. Generate a haiku about coding.',
});
// No user input needed
const turn = await assistant.generate();
console.log(turn.response.text);Tools
const turn = await claude.generate({
tools: [{
name: 'getWeather',
description: 'Get weather for a location',
parameters: {
type: 'object',
properties: { location: { type: 'string' } },
required: ['location'],
},
run: async ({ location }) => ({ temp: 72, conditions: 'sunny' }),
}],
}, 'What is the weather in Tokyo?');Tools with Zod Parameters
Tool parameters also accept Zod schemas:
import { z } from 'zod';
const model = llm({
model: anthropic('claude-sonnet-4-20250514'),
tools: [{
name: 'get_weather',
description: 'Get weather for a location',
parameters: z.object({
location: z.string().describe('City name'),
units: z.enum(['celsius', 'fahrenheit']).optional(),
}),
run: async ({ location, units }) => fetchWeather(location, units),
}],
});Structured Output
import { llm } from '@providerprotocol/ai';
import { openai } from '@providerprotocol/ai/openai';
const extractor = llm({
model: openai('gpt-4o'),
structure: {
type: 'object',
properties: {
name: { type: 'string' },
age: { type: 'number' },
},
required: ['name', 'age'],
},
});
const turn = await extractor.generate('John is 30 years old');
console.log(turn.data); // { name: 'John', age: 30 }Zod Schema Support
Structured output and tool parameters accept Zod schemas directly, with automatic conversion to JSON Schema:
import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { z } from 'zod';
const extractor = llm({
model: anthropic('claude-sonnet-4-20250514'),
structure: z.object({
name: z.string(),
age: z.number(),
tags: z.array(z.string()),
}),
});
const turn = await extractor.generate('Extract: John Doe, 30 years old, likes coding');
console.log(turn.data); // { name: "John Doe", age: 30, tags: ["coding"] }Requirements:
- Zod schemas must be object schemas (
z.object()). Non-object schemas will throw an error. - Zod is an optional peer dependency - install only if using Zod schemas:
bun add zod # v4+ for native JSON Schema conversion
bun add zod zod-to-json-schema # v3 requires additional packageMultimodal Input
import { Image, Document, Audio, Video } from '@providerprotocol/ai';
// Images
const img = await Image.fromPath('./photo.png');
const turn = await claude.generate([img, 'What is in this image?']);
// Documents (PDF, text)
const doc = await Document.fromPath('./report.pdf', 'Annual Report');
const docTurn = await claude.generate([doc.toBlock(), 'Summarize this document']);
// Audio (Google, OpenRouter)
const audio = await Audio.fromPath('./recording.mp3');
const audioTurn = await gemini.generate([audio.toBlock(), 'Transcribe this audio']);
// Video (Google, OpenRouter)
const video = await Video.fromPath('./clip.mp4');
const videoTurn = await gemini.generate([video.toBlock(), 'Describe this video']);Multimodal Support by Provider:
| Provider | Image | Document | Audio | Video | |----------|:-----:|:--------:|:-----:|:-----:| | Anthropic | ✓ | PDF, Text | | | | OpenAI | ✓ | PDF, Text | | | | Google | ✓ | PDF, Text | ✓ | ✓ | | OpenRouter | ✓ | PDF, Text | ✓ | ✓ | | xAI | ✓ | | | | | Groq | ✓ | | | | | Moonshot | ✓ | | | ✓* |
* Moonshot video input is experimental.
Anthropic Beta Features
Anthropic provides beta features through the betas export. Enable them at the model level:
import { anthropic, betas } from '@providerprotocol/ai/anthropic';
import { llm } from '@providerprotocol/ai';
// Native structured outputs with guaranteed JSON schema conformance
const model = llm({
model: anthropic('claude-sonnet-4-20250514', {
betas: [betas.structuredOutputs],
}),
structure: {
type: 'object',
properties: { answer: { type: 'string' } },
required: ['answer'],
},
});
// Extended thinking with interleaved tool calls
const thinker = llm({
model: anthropic('claude-sonnet-4-20250514', {
betas: [betas.interleavedThinking],
}),
params: {
thinking: { type: 'enabled', budget_tokens: 10000 },
},
});Available Beta Features:
| Beta | Description |
|------|-------------|
| structuredOutputs | Guaranteed JSON schema conformance for responses |
| interleavedThinking | Claude can think between tool calls |
| devFullThinking | Developer mode for full thinking visibility |
| effort | Control response thoroughness vs efficiency (Opus 4.5) |
| computerUseLegacy | Computer use for Claude 3.x models |
| computerUse | Mouse, keyboard, screenshot control (Claude 4) |
| computerUseOpus | Computer use with extra commands (Opus 4.5) |
| codeExecution | Python/Bash sandbox execution |
| tokenEfficientTools | Up to 70% token reduction for tool calls |
| fineGrainedToolStreaming | Stream tool args without buffering |
| maxTokens35Sonnet | 8,192 output tokens for Claude 3.5 Sonnet |
| output128k | 128K token output length |
| context1m | 1 million token context window (Sonnet 4) |
| promptCaching | Reduced latency and costs via caching |
| extendedCacheTtl | 1-hour cache TTL (vs 5-minute default) |
| contextManagement | Automatic tool call clearing for context |
| modelContextWindowExceeded | Handle exceeded context windows |
| advancedToolUse | Tool Search, Programmatic Tool Calling |
| mcpClient | Connect to remote MCP servers |
| mcpClientLatest | Updated MCP client |
| filesApi | Upload and manage files |
| pdfs | PDF document support |
| tokenCounting | Token counting endpoint |
| messageBatches | Async batch processing at 50% cost |
| skills | Agent Skills (PowerPoint, Excel, Word, PDF) |
Anthropic Built-in Tools
Use Anthropic's built-in tools directly with the tools export:
import { anthropic, betas, tools } from '@providerprotocol/ai/anthropic';
import { llm } from '@providerprotocol/ai';
// Web search with optional user location
const model = llm({
model: anthropic('claude-sonnet-4-20250514'),
params: {
tools: [tools.webSearch({ max_results: 5 })],
},
});
// Computer use (requires beta)
const computerModel = llm({
model: anthropic('claude-sonnet-4-20250514', {
betas: [betas.computerUse],
}),
params: {
tools: [tools.computer({ display_width: 1920, display_height: 1080, display_number: 1 })],
},
});
// Code execution (requires beta)
const codeModel = llm({
model: anthropic('claude-sonnet-4-20250514', {
betas: [betas.codeExecution],
}),
params: {
tools: [tools.codeExecution()],
},
});Available Built-in Tools:
| Tool | Description |
|------|-------------|
| tools.webSearch() | Search the web with optional max results and location |
| tools.computer() | Mouse, keyboard, and screenshot control |
| tools.textEditor() | Edit text files programmatically |
| tools.bash() | Execute bash commands |
| tools.codeExecution() | Run code in a sandboxed environment |
| tools.toolSearch() | Search through available tools |
Reasoning / Extended Thinking
Access model reasoning and extended thinking across providers with a unified API.
Anthropic
import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
const claude = llm({
model: anthropic('claude-sonnet-4-20250514'),
params: {
max_tokens: 16000,
thinking: {
type: 'enabled',
budget_tokens: 5000,
},
},
});
const turn = await claude.generate('Solve this complex problem...');
console.log(turn.response.reasoning); // Reasoning blocksOpenAI
import { llm } from '@providerprotocol/ai';
import { openai } from '@providerprotocol/ai/openai';
const gpt = llm({
model: openai('o3-mini'),
params: {
max_output_tokens: 4000,
reasoning: {
effort: 'medium',
summary: 'detailed',
},
},
});Google Gemini
import { llm } from '@providerprotocol/ai';
import { google } from '@providerprotocol/ai/google';
const gemini = llm({
model: google('gemini-2.5-flash'),
params: {
maxOutputTokens: 4000,
thinkingConfig: {
thinkingBudget: -1, // Dynamic
includeThoughts: true,
},
},
});Cerebras
import { llm } from '@providerprotocol/ai';
import { cerebras } from '@providerprotocol/ai/cerebras';
const model = llm({
model: cerebras('gpt-oss-120b'),
params: {
reasoning_effort: 'high',
reasoning_format: 'parsed',
},
});Streaming Reasoning
All providers emit ReasoningDelta events during streaming:
for await (const event of stream) {
if (event.type === 'reasoning_delta') {
process.stdout.write(event.delta.text);
}
}Embeddings
import { embedding } from '@providerprotocol/ai';
import { openai } from '@providerprotocol/ai/openai';
const embedder = embedding({ model: openai('text-embedding-3-small') });
// Single or batch
const result = await embedder.embed('Hello world');
const batch = await embedder.embed(['doc1', 'doc2', 'doc3']);
console.log(result.embeddings[0].vector); // [0.123, -0.456, ...]
console.log(result.embeddings[0].dimensions); // 1536Chunked Processing
For large datasets with progress tracking:
const stream = embedder.embed(documents, {
chunked: true,
batchSize: 100,
concurrency: 2,
});
for await (const progress of stream) {
console.log(`${progress.percent.toFixed(1)}% complete`);
}
const result = await stream.result;Image Generation
import { image } from '@providerprotocol/ai';
import { openai } from '@providerprotocol/ai/openai';
const dalle = image({ model: openai('dall-e-3') });
const result = await dalle.generate('A sunset over mountains');
console.log(result.images[0].image.toBase64());With Parameters
const hd = image({
model: openai('dall-e-3'),
params: { size: '1792x1024', quality: 'hd', style: 'natural' },
});Image Editing
import { image, Image } from '@providerprotocol/ai';
const editor = image({ model: openai('dall-e-2') });
const source = await Image.fromPath('./photo.png');
const mask = await Image.fromPath('./mask.png');
const result = await editor.edit({
image: source,
mask,
prompt: 'Add a rainbow in the sky',
});Configuration
import { llm, exponentialBackoff, roundRobinKeys } from '@providerprotocol/ai';
import { openai } from '@providerprotocol/ai/openai';
const instance = llm({
model: openai('gpt-4o'),
config: {
apiKey: roundRobinKeys(['sk-key1', 'sk-key2']),
timeout: 30000,
retryStrategy: exponentialBackoff({ maxAttempts: 3 }),
},
params: {
temperature: 0.7,
max_tokens: 1000,
},
system: 'You are a helpful assistant.',
});System Prompts
System prompts can be a simple string or a provider-specific array for advanced features:
// Simple string (all providers)
const simple = llm({
model: anthropic('claude-sonnet-4-20250514'),
system: 'You are a helpful assistant.',
});
// Anthropic cache_control format
import { anthropic, betas } from '@providerprotocol/ai/anthropic';
const cached = llm({
model: anthropic('claude-sonnet-4-20250514', {
betas: [betas.promptCaching],
}),
system: [
{ type: 'text', text: 'Large context document...', cache_control: { type: 'ephemeral' } },
{ type: 'text', text: 'Instructions...' },
],
});Provider Config Options
interface ProviderConfig {
apiKey?: string | (() => Promise<string>) | KeyStrategy; // API key, async getter, or strategy
baseUrl?: string; // Custom API endpoint
timeout?: number; // Per-attempt timeout (ms)
retryStrategy?: RetryStrategy; // Retry behavior
headers?: Record<string, string>; // Custom headers (merged with provider defaults)
fetch?: typeof fetch; // Custom fetch implementation
apiVersion?: string; // API version override
retryAfterMaxSeconds?: number; // Cap for Retry-After header (default: 3600)
}Notes:
timeoutapplies per attempt; total time can exceed this with retriesheadersare merged with model-level headers (explicit config takes precedence)retryAfterMaxSecondsprevents honoring excessively long Retry-After values
Key Strategies
import { roundRobinKeys, weightedKeys, dynamicKey } from '@providerprotocol/ai/http';
// Cycle through keys evenly
roundRobinKeys(['sk-1', 'sk-2', 'sk-3'])
// Weighted selection (70% key1, 30% key2)
weightedKeys([
{ key: 'sk-1', weight: 70 },
{ key: 'sk-2', weight: 30 },
])
// Dynamic fetching (secrets manager, etc.)
dynamicKey(async () => fetchKeyFromVault())Retry Strategies
import {
exponentialBackoff,
linearBackoff,
noRetry,
retryAfterStrategy,
} from '@providerprotocol/ai';
// Exponential: 1s, 2s, 4s...
exponentialBackoff({
maxAttempts: 5,
baseDelay: 1000,
maxDelay: 30000,
jitter: true, // Randomize delays to prevent thundering herd (default: true)
})
// Linear: 1s, 2s, 3s...
linearBackoff({ maxAttempts: 3, delay: 1000 })
// Respect server Retry-After headers
retryAfterStrategy({ maxAttempts: 3, fallbackDelay: 5000 })
// No retries
noRetry()Retryable Errors: RATE_LIMITED, NETWORK_ERROR, TIMEOUT, PROVIDER_ERROR
Streaming Retry: Retry strategies work with both .generate() and .stream(). During streaming, stream_retry events are emitted to notify consumers of retry attempts, and middleware can use the onRetry hook to reset accumulated state.
Tool Execution Control
const turn = await claude.generate({
tools: [weatherTool, searchTool],
toolStrategy: {
maxIterations: 5,
onBeforeCall: (tool, params) => {
if (tool.name === 'dangerousTool') return false; // Block execution
return true;
},
onAfterCall: (tool, params, result) => {
console.log(`${tool.name} returned:`, result);
},
onError: (tool, params, error) => {
console.error(`${tool.name} failed:`, error);
},
},
}, 'Search for recent news about AI');Thread Management
import { Thread } from '@providerprotocol/ai';
const thread = new Thread();
thread.user('Hello!');
const turn = await claude.generate(thread.toMessages(), 'How are you?');
thread.append(turn);
// Serialize for storage
const json = thread.toJSON();
localStorage.setItem('conversation', JSON.stringify(json));
// Restore later
const restored = Thread.fromJSON(JSON.parse(localStorage.getItem('conversation')));Middleware
Compose request/response/stream transformations with the middleware system. Middleware is imported from dedicated entry points.
Parsed Object Middleware
Automatically parse streaming JSON from structured output and tool call events:
import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { parsedObjectMiddleware } from '@providerprotocol/ai/middleware/parsed-object';
const model = llm({
model: anthropic('claude-sonnet-4-20250514'),
structure: {
type: 'object',
properties: {
city: { type: 'string' },
country: { type: 'string' },
population: { type: 'number' },
},
required: ['city', 'country', 'population'],
},
middleware: [parsedObjectMiddleware()],
});
for await (const event of model.stream('What is the capital of France?')) {
if (event.type === 'object_delta') {
// Access incrementally parsed structured data
console.log(event.delta.parsed);
// { city: "Par" } -> { city: "Paris" } -> { city: "Paris", country: "Fr" } -> ...
}
}Logging Middleware
Add visibility into request lifecycle:
import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { loggingMiddleware } from '@providerprotocol/ai/middleware/logging';
const model = llm({
model: anthropic('claude-sonnet-4-20250514'),
middleware: [loggingMiddleware({ level: 'debug' })],
});
// Logs: [PP] [anthropic] Starting llm request (streaming)
// Logs: [PP] [anthropic] Completed in 1234ms
const result = await model.generate('Hello');Persistence Middleware
Load and save conversation threads around LLM requests:
import { llm, Thread } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { persistenceMiddleware, PersistenceAdapter } from '@providerprotocol/ai/middleware/persistence';
const adapter = new PersistenceAdapter({
id: 'conversation-123',
load: async (id) => loadThreadFromDatabase(id), // Thread | ThreadJSON | null
save: async (id, thread, turn) => {
await saveThreadToDatabase(id, thread);
if (turn) {
await saveTurnToDatabase(id, turn);
}
},
});
const model = llm({
model: anthropic('claude-sonnet-4-20250514'),
middleware: [persistenceMiddleware({ adapter })],
});
const turn = await model.generate('Hello!');Pub-Sub Middleware (Stream Resumption)
Enable reconnecting clients to catch up on missed events during active generation. The middleware buffers events, publishes them to subscribers, and removes streams on completion/abort/error.
If a stream never reaches those hooks (for example, a process crash), the adapter may retain the entry. Custom adapters should invoke onComplete when remove() runs so subscriber streams can terminate.
Streams are created lazily on first append() or subscribe() call.
import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { pubsubMiddleware, memoryAdapter } from '@providerprotocol/ai/middleware/pubsub';
import { webapi } from '@providerprotocol/ai/middleware/pubsub/server';
// Create a shared adapter instance
const adapter = memoryAdapter({ maxStreams: 1000 });
// Server route handling both new requests and reconnections
Bun.serve({
port: 3000,
async fetch(req) {
const { messages, streamId } = await req.json();
const exists = await adapter.exists(streamId);
if (!exists) {
// Start background generation (fire and forget)
// Stream is created lazily on first append()
const model = llm({
model: anthropic('claude-sonnet-4-20250514'),
middleware: [pubsubMiddleware({ adapter, streamId })],
});
model.stream(messages).then(turn => { /* save to DB */ });
}
// Both new and reconnect: subscribe to events
return new Response(webapi.createSubscriberStream(streamId, adapter), {
headers: { 'Content-Type': 'text/event-stream' },
});
},
});Framework Adapters:
// Express
import { express } from '@providerprotocol/ai/middleware/pubsub/server';
app.post('/api/ai', async (req, res) => {
const { messages, streamId } = req.body;
// Guard: prevent duplicate generations on reconnect
if (!await adapter.exists(streamId)) {
const model = llm({
model: anthropic('claude-sonnet-4-20250514'),
middleware: [pubsubMiddleware({ adapter, streamId })],
});
model.stream(messages).then(turn => { /* save to DB */ });
}
express.streamSubscriber(streamId, adapter, res);
});
// Fastify
import { fastify } from '@providerprotocol/ai/middleware/pubsub/server';
app.post('/api/ai', async (request, reply) => {
const { messages, streamId } = request.body;
// Guard: prevent duplicate generations on reconnect
if (!await adapter.exists(streamId)) {
const model = llm({
model: anthropic('claude-sonnet-4-20250514'),
middleware: [pubsubMiddleware({ adapter, streamId })],
});
model.stream(messages).then(turn => { /* save to DB */ });
}
return fastify.streamSubscriber(streamId, adapter, reply);
});
// H3/Nuxt
import { h3 } from '@providerprotocol/ai/middleware/pubsub/server';
export default defineEventHandler(async (event) => {
const { messages, streamId } = await readBody(event);
// Guard: prevent duplicate generations on reconnect
if (!await adapter.exists(streamId)) {
const model = llm({
model: anthropic('claude-sonnet-4-20250514'),
middleware: [pubsubMiddleware({ adapter, streamId })],
});
model.stream(messages).then(turn => { /* save to DB */ });
}
return h3.streamSubscriber(streamId, adapter, event);
});Custom Adapters:
Implement PubSubAdapter for custom backends (Redis, etc.):
import type { PubSubAdapter } from '@providerprotocol/ai/middleware/pubsub';
const redisAdapter: PubSubAdapter = {
async exists(streamId) { /* check if stream exists */ },
async append(streamId, event) { /* append event, create lazily */ },
async getEvents(streamId) { /* return events or [] */ },
subscribe(streamId, onEvent, onComplete, onFinalData) { /* subscribe to live events */ },
publish(streamId, event) { /* broadcast to subscribers */ },
setFinalData(streamId, data) { /* store final Turn data */ },
async remove(streamId) { /* notify onFinalData, onComplete, then delete */ },
};Combining Middleware
import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { loggingMiddleware } from '@providerprotocol/ai/middleware/logging';
import { parsedObjectMiddleware } from '@providerprotocol/ai/middleware/parsed-object';
const model = llm({
model: anthropic('claude-sonnet-4-20250514'),
structure: mySchema,
middleware: [
loggingMiddleware({ level: 'info' }),
parsedObjectMiddleware(),
],
});Pipeline Middleware (Post-Turn Processing)
Run async tasks (image generation, embeddings, slug creation, etc.) after the LLM completes, with progress events streamed to connected clients:
import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { pubsubMiddleware, memoryAdapter } from '@providerprotocol/ai/middleware/pubsub';
import { pipelineMiddleware, isPipelineStageEvent } from '@providerprotocol/ai/middleware/pipeline';
const adapter = memoryAdapter();
const model = llm({
model: anthropic('claude-sonnet-4-20250514'),
structure: BlogPostSchema,
middleware: [
pubsubMiddleware({ adapter, streamId: postId }),
pipelineMiddleware<BlogPost>({
stages: [
{
type: 'slug',
run: (turn, emit) => {
const slug = turn.data!.title.toLowerCase().replace(/\s+/g, '-');
(turn as { slug?: string }).slug = slug;
emit({ slug });
},
},
{
type: 'embedding',
run: async (turn, emit) => {
await vectorize(turn.data!);
emit({ embedded: true });
},
},
],
parallel: false, // Run stages sequentially (default)
continueOnError: false, // Stop on first error (default)
onStageError: ({ stage, error }) => {
console.error(`Stage ${stage.type} failed:`, error);
},
}),
],
});
// Stages run after streaming completes
model.stream(prompt).then(turn => {
const extended = turn as typeof turn & { slug?: string };
console.log(extended.slug);
});Consuming Pipeline Events:
for await (const event of model.stream(prompt)) {
if (isPipelineStageEvent(event)) {
console.log(event.delta.stage, event.delta.payload);
// 'slug' { slug: 'my-blog-post' }
// 'embedding' { embedded: true }
}
}Middleware Order: Place pipelineMiddleware after pubsubMiddleware in the array:
middleware: [
pubsubMiddleware({ ... }), // Setup runs first in onStart
pipelineMiddleware({ ... }), // Events run first in onTurn (reverse order)
]This ensures pubsub sets up before pipeline stages execute, and pipeline events emit before pubsub cleanup.
Pipeline Configuration:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| stages | PipelineStage[] | required | Stages to run after turn completion |
| parallel | boolean | false | Run stages in parallel instead of sequential |
| continueOnError | boolean | false | Continue running subsequent stages if one fails |
| onStageError | function | - | Called when a stage throws an error |
Error Handling
All errors are normalized to UPPError with consistent error codes:
import { UPPError } from '@providerprotocol/ai';
try {
await claude.generate('Hello');
} catch (error) {
if (error instanceof UPPError) {
console.log(error.code); // 'RATE_LIMITED'
console.log(error.provider); // 'anthropic'
console.log(error.modality); // 'llm'
console.log(error.statusCode); // 429
console.log(error.cause); // Original error (if any)
switch (error.code) {
case 'RATE_LIMITED':
// Wait and retry
break;
case 'CONTEXT_LENGTH_EXCEEDED':
// Reduce input size
break;
case 'AUTHENTICATION_FAILED':
// Check API key
break;
case 'CONTENT_FILTERED':
// Content policy violation
break;
}
}
}Error Codes: AUTHENTICATION_FAILED, RATE_LIMITED, CONTEXT_LENGTH_EXCEEDED, MODEL_NOT_FOUND, INVALID_REQUEST, INVALID_RESPONSE, CONTENT_FILTERED, QUOTA_EXCEEDED, PROVIDER_ERROR, NETWORK_ERROR, TIMEOUT, CANCELLED
API Gateway / Proxy
Build AI API gateways with your own authentication. Users authenticate with your platform - AI provider keys stay hidden on the server.
Security Note: The proxy works without any configuration, but this means no authentication by default. Always add your own auth layer in production - the examples below show how.
Server (Bun/Deno/Cloudflare Workers)
import { llm, exponentialBackoff, roundRobinKeys } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { parseBody, toJSON, toSSE, toError } from '@providerprotocol/ai/proxy';
// Server manages AI provider keys - users never see them
const claude = llm({
model: anthropic('claude-sonnet-4-20250514'),
config: {
apiKey: roundRobinKeys([process.env.ANTHROPIC_KEY_1!, process.env.ANTHROPIC_KEY_2!]),
retryStrategy: exponentialBackoff({ maxAttempts: 3 }),
},
});
Bun.serve({
port: 3000,
async fetch(req) {
// Authenticate with YOUR platform credentials
const token = req.headers.get('Authorization')?.replace('Bearer ', '');
const user = await validatePlatformToken(token ?? '');
if (!user) return toError('Unauthorized', 401);
// Rate limit, track usage, bill user, etc.
await trackUsage(user.id);
const { messages, system, params } = parseBody(await req.json());
if (params?.stream) {
return toSSE(claude.stream(messages, { system }));
}
return toJSON(await claude.generate(messages, { system }));
},
});Client
Clients authenticate with your platform token. They get automatic retry on network failures to your proxy.
import { llm, exponentialBackoff } from '@providerprotocol/ai';
import { proxy } from '@providerprotocol/ai/proxy';
const claude = llm({
model: proxy('https://api.yourplatform.com/ai'),
config: {
headers: { 'Authorization': 'Bearer user-platform-token' },
retryStrategy: exponentialBackoff({ maxAttempts: 3 }),
timeout: 30000,
},
});
const turn = await claude.generate('Hello!');Framework Adapters
Server adapters for Express, Fastify, and Nuxt/H3:
// Express
import { express as expressAdapter, parseBody } from '@providerprotocol/ai/proxy';
app.post('/ai', authMiddleware, async (req, res) => {
const { messages, system, params } = parseBody(req.body);
if (params?.stream) {
expressAdapter.streamSSE(claude.stream(messages, { system }), res);
} else {
expressAdapter.sendJSON(await claude.generate(messages, { system }), res);
}
});
// Fastify
import { fastify as fastifyAdapter, parseBody } from '@providerprotocol/ai/proxy';
app.post('/ai', async (request, reply) => {
const { messages, system, params } = parseBody(request.body);
if (params?.stream) {
return fastifyAdapter.streamSSE(claude.stream(messages, { system }), reply);
}
return fastifyAdapter.sendJSON(await claude.generate(messages, { system }), reply);
});
// Nuxt/H3 (server/api/ai.post.ts)
import { sendStream } from 'h3';
import { h3 as h3Adapter, parseBody } from '@providerprotocol/ai/proxy';
export default defineEventHandler(async (event) => {
const { messages, system, params } = parseBody(await readBody(event));
if (params?.stream) {
return sendStream(event, h3Adapter.createSSEStream(claude.stream(messages, { system })));
}
return h3Adapter.sendJSON(await claude.generate(messages, { system }), event);
});What this enables:
- Users auth with your platform credentials (JWT, API keys, sessions)
- You manage/rotate AI provider keys centrally
- Per-user rate limiting, usage tracking, billing
- Model access control (different users get different models)
- Request/response logging, content filtering
- Double-layer retry: client retries to proxy, server retries to AI provider
OpenAI API Modes
OpenAI supports two API endpoints. The Responses API is the default and recommended approach:
import { openai } from '@providerprotocol/ai/openai';
// Responses API (default, recommended)
openai('gpt-4o')
// Chat Completions API (legacy)
openai('gpt-4o', { api: 'completions' })The Responses API supports built-in tools and stateful conversations. Use completions for backward compatibility.
OpenAI Built-in Tools
With the Responses API, use OpenAI's built-in tools directly:
import { llm } from '@providerprotocol/ai';
import { openai, tools } from '@providerprotocol/ai/openai';
// Web search
const model = llm({
model: openai('gpt-4o'),
params: {
tools: [tools.webSearch()],
},
});
// File search with vector stores
const researchModel = llm({
model: openai('gpt-4o'),
params: {
tools: [tools.fileSearch({ vector_store_ids: ['vs_abc123'] })],
},
});
// Code interpreter
const codeModel = llm({
model: openai('gpt-4o'),
params: {
tools: [tools.codeInterpreter()],
},
});
// Image generation
const creativeModel = llm({
model: openai('gpt-4o'),
params: {
tools: [tools.imageGeneration()],
},
});Available Built-in Tools:
| Tool | Description |
|------|-------------|
| tools.webSearch() | Search the web with optional user location |
| tools.fileSearch() | Search uploaded files in vector stores |
| tools.codeInterpreter() | Execute code in a sandboxed environment |
| tools.computer() | Computer use with display configuration |
| tools.imageGeneration() | Generate images via DALL-E |
| tools.mcp() | Connect to MCP servers |
xAI API Modes
xAI supports multiple API compatibility modes:
import { xai } from '@providerprotocol/ai/xai';
// Chat Completions (OpenAI-compatible, default)
xai('grok-3-fast')
// Responses API (stateful)
xai('grok-3-fast', { api: 'responses' })
// Messages API (Anthropic-compatible)
xai('grok-3-fast', { api: 'messages' })Groq
Fast inference with Llama, Gemma, and Mixtral models:
import { llm } from '@providerprotocol/ai';
import { groq } from '@providerprotocol/ai/groq';
const model = llm({
model: groq('llama-3.3-70b-versatile'),
params: { max_tokens: 1000 },
});
const turn = await model.generate('Hello!');With web search:
const searchModel = llm({
model: groq('llama-3.3-70b-versatile'),
params: {
search_settings: { mode: 'auto' },
},
});With RAG documents:
const ragModel = llm({
model: groq('llama-3.3-70b-versatile'),
params: {
documents: [
{ title: 'Doc 1', content: 'Document content here...' },
{ title: 'Doc 2', content: 'More content...' },
],
citation_options: { include: true },
},
});Capabilities: Streaming, tool calling, structured output, image input (Llama 4 preview), web search, RAG with citations.
Environment: GROQ_API_KEY
Cerebras
Ultra-fast inference with Llama, Qwen, and GPT-OSS models:
import { llm } from '@providerprotocol/ai';
import { cerebras } from '@providerprotocol/ai/cerebras';
const model = llm({
model: cerebras('llama-3.3-70b'),
params: { max_completion_tokens: 1000 },
});
const turn = await model.generate('Hello!');With reasoning (GPT-OSS):
const model = llm({
model: cerebras('gpt-oss-120b'),
params: {
reasoning_effort: 'high',
reasoning_format: 'parsed',
},
});Capabilities: Streaming, tool calling, structured output, reasoning parameters.
Environment: CEREBRAS_API_KEY
Moonshot
Kimi K2.5 with 256K context, thinking mode, vision, and server-side builtin tools:
import { llm } from '@providerprotocol/ai';
import { moonshot, tools } from '@providerprotocol/ai/moonshot';
const model = llm({
model: moonshot('kimi-k2.5'),
params: { max_tokens: 1000 },
});
const turn = await model.generate('Hello!');With thinking mode (default for K2.5):
const model = llm({
model: moonshot('kimi-k2.5'),
params: {
max_tokens: 2000,
temperature: 1.0,
thinking: { type: 'enabled' },
},
});
// Response includes reasoning in turn.response.reasoning
const turn = await model.generate('Solve step by step: 2x + 5 = 13');With instant mode (disabled thinking):
const model = llm({
model: moonshot('kimi-k2.5'),
params: {
temperature: 0.6,
thinking: { type: 'disabled' },
},
});With builtin tools:
const model = llm({
model: moonshot('kimi-k2.5'),
params: {
tools: [
tools.webSearch(),
tools.codeRunner(),
tools.date(),
],
},
});Available Builtin Tools:
| Tool | Description |
|------|-------------|
| tools.webSearch() | Real-time internet search |
| tools.codeRunner() | Python code execution with matplotlib/pandas |
| tools.quickjs() | JavaScript execution via QuickJS engine |
| tools.fetch() | URL content fetching with markdown extraction |
| tools.convert() | Unit conversion (length, mass, temperature, currency) |
| tools.date() | Date/time processing and timezone conversion |
| tools.base64Encode() | Base64 encoding |
| tools.base64Decode() | Base64 decoding |
| tools.memory() | Memory storage and retrieval system |
| tools.rethink() | Intelligent reasoning/reflection tool |
| tools.randomChoice() | Random selection with optional weights |
Capabilities: Streaming, tool calling, structured output, thinking mode, image input, video input (experimental).
Environment: MOONSHOT_API_KEY or KIMI_API_KEY
OpenResponses Provider
Connect to any server implementing the OpenResponses specification:
import { llm } from '@providerprotocol/ai';
import { responses } from '@providerprotocol/ai/responses';
// Using with OpenAI
const model = llm({
model: responses('gpt-5.2', {
host: 'https://api.openai.com/v1',
apiKeyEnv: 'OPENAI_API_KEY',
}),
params: { max_output_tokens: 1000 },
});
// Using with OpenRouter
const routerModel = llm({
model: responses('openai/gpt-4o', {
host: 'https://openrouter.ai/api/v1',
apiKeyEnv: 'OPENROUTER_API_KEY',
}),
});
// Using with self-hosted server
const localModel = llm({
model: responses('llama-3.3-70b', {
host: 'http://localhost:8080/v1',
}),
});Capabilities: Full multimodal support, streaming, tool calling, structured output, reasoning summaries.
Alternative Import Style
Use the ai namespace for a grouped import style:
import { ai } from '@providerprotocol/ai';
import { openai } from '@providerprotocol/ai/openai';
const model = ai.llm({ model: openai('gpt-4o') });
const embedder = ai.embedding({ model: openai('text-embedding-3-small') });
const dalle = ai.image({ model: openai('dall-e-3') });TypeScript
Full type safety with no any types. All provider parameters are typed:
import type {
// Core types
Turn,
TurnJSON,
Message,
Tool,
ToolInput,
TokenUsage,
// Streaming
StreamEvent,
StreamResult,
// Content blocks
TextBlock,
ImageBlock,
ReasoningBlock,
DocumentBlock,
AudioBlock,
VideoBlock,
// Modality results
EmbeddingResult,
ImageResult,
// Errors
UPPError,
ErrorCode,
// Configuration
ProviderConfig,
KeyStrategy,
RetryStrategy,
LLMCapabilities,
// Middleware
Middleware,
MiddlewareContext,
StreamContext,
// Schema types (Zod support)
Structure,
ZodLike,
} from '@providerprotocol/ai';Zod Utilities:
import {
isZodSchema,
isZodV4,
zodToJSONSchema,
zodToJSONSchemaSync,
resolveStructure,
resolveTools,
} from '@providerprotocol/ai/utils';
// Type guard for Zod schemas
if (isZodSchema(schema)) {
const jsonSchema = zodToJSONSchemaSync(schema);
}Error & ID Utilities:
import {
toError,
isCancelledError,
generateId,
generateShortId,
} from '@providerprotocol/ai/utils';
// Convert unknown thrown values to Error instances
const error = toError(unknownValue);
// Check if an error is a cancellation/abort error
if (isCancelledError(error)) {
console.log('Request was cancelled');
}
// Generate IDs
const uuid = generateId(); // UUID v4: "a1b2c3d4-..."
const shortId = generateShortId('req'); // "req_abc123xyz789"Provider-Specific Types:
// OpenAI
import type { OpenAIHeaders, OpenAIImageParams } from '@providerprotocol/ai/openai';
// Google
import type { GoogleImagenParams } from '@providerprotocol/ai/google';
// Ollama
import type { OllamaHeaders } from '@providerprotocol/ai/ollama';
// OpenRouter
import type { OpenRouterProviderOptions } from '@providerprotocol/ai/openrouter';Type-Safe Enums:
import {
StreamEventType,
ErrorCode,
ContentBlockType,
MessageRole,
ModalityType,
} from '@providerprotocol/ai';
// Use instead of magic strings
if (event.type === StreamEventType.TextDelta) { ... }
if (error.code === ErrorCode.RateLimited) { ... }
if (block.type === ContentBlockType.Text) { ... }Custom Providers
Build custom providers with createProvider:
import { createProvider } from '@providerprotocol/ai';
const myProvider = createProvider({
name: 'my-provider',
version: '1.0.0',
handlers: {
llm: myLLMHandler,
embedding: myEmbeddingHandler,
},
});License
MIT
