@providerprotocol/ai

v0.0.40

Published

7 days ago

UPP: Unified Provider Protocol for AI inference

0High
0Medium
0Low

ai llm anthropic openai google gemini claude gpt ollama llama openrouter xai grok groq cerebras kimi moonshot vertex deepseek mistral provider protocol unified

@providerprotocol/ai

A unified TypeScript SDK for AI inference across multiple providers. One API for LLMs, embeddings, and image generation.

bun add @providerprotocol/ai

Quick Start

import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';

const claude = llm({ model: anthropic('claude-sonnet-4-20250514') });
const turn = await claude.generate('Hello!');
console.log(turn.response.text);

Providers

| Provider | Import | LLM | Embedding | Image | |----------|--------|:---:|:---------:|:-----:| | Anthropic | @providerprotocol/ai/anthropic | ✓ | | | | OpenAI | @providerprotocol/ai/openai | ✓ | ✓ | ✓ | | Google | @providerprotocol/ai/google | ✓ | ✓ | ✓ | | xAI | @providerprotocol/ai/xai | ✓ | | ✓ | | Ollama | @providerprotocol/ai/ollama | ✓ | ✓ | | | OpenRouter | @providerprotocol/ai/openrouter | ✓ | ✓ | ✓ | | Groq | @providerprotocol/ai/groq | ✓ | | | | Cerebras | @providerprotocol/ai/cerebras | ✓ | | | | Moonshot | @providerprotocol/ai/moonshot | ✓ | | | | OpenResponses | @providerprotocol/ai/responses | ✓ | | |

API keys are loaded automatically from environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, GROQ_API_KEY, CEREBRAS_API_KEY, MOONSHOT_API_KEY, etc.).

LLM

Streaming

const stream = claude.stream('Count to 5');
for await (const event of stream) {
  if (event.type === 'text_delta') {
    process.stdout.write(event.delta.text);
  }
}
const turn = await stream.turn;

Stream results are PromiseLike, so you can also await the stream directly to auto-drain:

const turn = await claude.stream('Count to 5');

Stream Control:

const stream = claude.stream('Write a long story');

// Abort the stream at any time
setTimeout(() => stream.abort(), 5000);

for await (const event of stream) {
  // Process events until abort
}

Stream Events:

| Event | Description | |-------|-------------| | text_delta | Incremental text output | | reasoning_delta | Incremental reasoning/thinking output | | object_delta | Incremental structured output JSON | | tool_call_delta | Tool call arguments being streamed | | tool_execution_start | Tool execution has started | | tool_execution_end | Tool execution has completed | | message_start / message_stop | Message boundaries | | content_block_start / content_block_stop | Content block boundaries |

Multi-turn Conversations

const history: Message[] = [];

const t1 = await claude.generate(history, 'My name is Alice');
history.push(...t1.messages);

const t2 = await claude.generate(history, 'What is my name?');
// Response: "Your name is Alice"

System-Only Inference

Both generate() and stream() can be called with no arguments for system-prompt-only inference:

const assistant = llm({
  model: anthropic('claude-sonnet-4-20250514'),
  system: 'You are a haiku generator. Generate a haiku about coding.',
});

// No user input needed
const turn = await assistant.generate();
console.log(turn.response.text);

Tools

const turn = await claude.generate({
  tools: [{
    name: 'getWeather',
    description: 'Get weather for a location',
    parameters: {
      type: 'object',
      properties: { location: { type: 'string' } },
      required: ['location'],
    },
    run: async ({ location }) => ({ temp: 72, conditions: 'sunny' }),
  }],
}, 'What is the weather in Tokyo?');

Tools with Zod Parameters

Tool parameters also accept Zod schemas:

import { z } from 'zod';

const model = llm({
  model: anthropic('claude-sonnet-4-20250514'),
  tools: [{
    name: 'get_weather',
    description: 'Get weather for a location',
    parameters: z.object({
      location: z.string().describe('City name'),
      units: z.enum(['celsius', 'fahrenheit']).optional(),
    }),
    run: async ({ location, units }) => fetchWeather(location, units),
  }],
});

Structured Output

import { llm } from '@providerprotocol/ai';
import { openai } from '@providerprotocol/ai/openai';

const extractor = llm({
  model: openai('gpt-4o'),
  structure: {
    type: 'object',
    properties: {
      name: { type: 'string' },
      age: { type: 'number' },
    },
    required: ['name', 'age'],
  },
});

const turn = await extractor.generate('John is 30 years old');
console.log(turn.data); // { name: 'John', age: 30 }

Zod Schema Support

Structured output and tool parameters accept Zod schemas directly, with automatic conversion to JSON Schema:

import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { z } from 'zod';

const extractor = llm({
  model: anthropic('claude-sonnet-4-20250514'),
  structure: z.object({
    name: z.string(),
    age: z.number(),
    tags: z.array(z.string()),
  }),
});

const turn = await extractor.generate('Extract: John Doe, 30 years old, likes coding');
console.log(turn.data); // { name: "John Doe", age: 30, tags: ["coding"] }

Requirements:

Zod schemas must be object schemas (z.object()). Non-object schemas will throw an error.
Zod is an optional peer dependency - install only if using Zod schemas:

bun add zod                    # v4+ for native JSON Schema conversion
bun add zod zod-to-json-schema # v3 requires additional package

Multimodal Input

import { Image, Document, Audio, Video } from '@providerprotocol/ai';

// Images
const img = await Image.fromPath('./photo.png');
const turn = await claude.generate([img, 'What is in this image?']);

// Documents (PDF, text)
const doc = await Document.fromPath('./report.pdf', 'Annual Report');
const docTurn = await claude.generate([doc.toBlock(), 'Summarize this document']);

// Audio (Google, OpenRouter)
const audio = await Audio.fromPath('./recording.mp3');
const audioTurn = await gemini.generate([audio.toBlock(), 'Transcribe this audio']);

// Video (Google, OpenRouter)
const video = await Video.fromPath('./clip.mp4');
const videoTurn = await gemini.generate([video.toBlock(), 'Describe this video']);

Multimodal Support by Provider:

| Provider | Image | Document | Audio | Video | |----------|:-----:|:--------:|:-----:|:-----:| | Anthropic | ✓ | PDF, Text | | | | OpenAI | ✓ | PDF, Text | | | | Google | ✓ | PDF, Text | ✓ | ✓ | | OpenRouter | ✓ | PDF, Text | ✓ | ✓ | | xAI | ✓ | | | | | Groq | ✓ | | | | | Moonshot | ✓ | | | ✓* |

* Moonshot video input is experimental.

Anthropic Beta Features

Anthropic provides beta features through the betas export. Enable them at the model level:

import { anthropic, betas } from '@providerprotocol/ai/anthropic';
import { llm } from '@providerprotocol/ai';

// Native structured outputs with guaranteed JSON schema conformance
const model = llm({
  model: anthropic('claude-sonnet-4-20250514', {
    betas: [betas.structuredOutputs],
  }),
  structure: {
    type: 'object',
    properties: { answer: { type: 'string' } },
    required: ['answer'],
  },
});

// Extended thinking with interleaved tool calls
const thinker = llm({
  model: anthropic('claude-sonnet-4-20250514', {
    betas: [betas.interleavedThinking],
  }),
  params: {
    thinking: { type: 'enabled', budget_tokens: 10000 },
  },
});

Available Beta Features:

| Beta | Description | |------|-------------| | structuredOutputs | Guaranteed JSON schema conformance for responses | | interleavedThinking | Claude can think between tool calls | | devFullThinking | Developer mode for full thinking visibility | | effort | Control response thoroughness vs efficiency (Opus 4.5) | | computerUseLegacy | Computer use for Claude 3.x models | | computerUse | Mouse, keyboard, screenshot control (Claude 4) | | computerUseOpus | Computer use with extra commands (Opus 4.5) | | codeExecution | Python/Bash sandbox execution | | tokenEfficientTools | Up to 70% token reduction for tool calls | | fineGrainedToolStreaming | Stream tool args without buffering | | maxTokens35Sonnet | 8,192 output tokens for Claude 3.5 Sonnet | | output128k | 128K token output length | | context1m | 1 million token context window (Sonnet 4) | | promptCaching | Reduced latency and costs via caching | | extendedCacheTtl | 1-hour cache TTL (vs 5-minute default) | | contextManagement | Automatic tool call clearing for context | | modelContextWindowExceeded | Handle exceeded context windows | | advancedToolUse | Tool Search, Programmatic Tool Calling | | mcpClient | Connect to remote MCP servers | | mcpClientLatest | Updated MCP client | | filesApi | Upload and manage files | | pdfs | PDF document support | | tokenCounting | Token counting endpoint | | messageBatches | Async batch processing at 50% cost | | skills | Agent Skills (PowerPoint, Excel, Word, PDF) |

Anthropic Built-in Tools

Use Anthropic's built-in tools directly with the tools export:

import { anthropic, betas, tools } from '@providerprotocol/ai/anthropic';
import { llm } from '@providerprotocol/ai';

// Web search with optional user location
const model = llm({
  model: anthropic('claude-sonnet-4-20250514'),
  params: {
    tools: [tools.webSearch({ max_results: 5 })],
  },
});

// Computer use (requires beta)
const computerModel = llm({
  model: anthropic('claude-sonnet-4-20250514', {
    betas: [betas.computerUse],
  }),
  params: {
    tools: [tools.computer({ display_width: 1920, display_height: 1080, display_number: 1 })],
  },
});

// Code execution (requires beta)
const codeModel = llm({
  model: anthropic('claude-sonnet-4-20250514', {
    betas: [betas.codeExecution],
  }),
  params: {
    tools: [tools.codeExecution()],
  },
});

Available Built-in Tools:

| Tool | Description | |------|-------------| | tools.webSearch() | Search the web with optional max results and location | | tools.computer() | Mouse, keyboard, and screenshot control | | tools.textEditor() | Edit text files programmatically | | tools.bash() | Execute bash commands | | tools.codeExecution() | Run code in a sandboxed environment | | tools.toolSearch() | Search through available tools |

Reasoning / Extended Thinking

Access model reasoning and extended thinking across providers with a unified API.

Anthropic

import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';

const claude = llm({
  model: anthropic('claude-sonnet-4-20250514'),
  params: {
    max_tokens: 16000,
    thinking: {
      type: 'enabled',
      budget_tokens: 5000,
    },
  },
});

const turn = await claude.generate('Solve this complex problem...');
console.log(turn.response.reasoning); // Reasoning blocks

OpenAI

import { llm } from '@providerprotocol/ai';
import { openai } from '@providerprotocol/ai/openai';

const gpt = llm({
  model: openai('o3-mini'),
  params: {
    max_output_tokens: 4000,
    reasoning: {
      effort: 'medium',
      summary: 'detailed',
    },
  },
});

Google Gemini

import { llm } from '@providerprotocol/ai';
import { google } from '@providerprotocol/ai/google';

const gemini = llm({
  model: google('gemini-2.5-flash'),
  params: {
    maxOutputTokens: 4000,
    thinkingConfig: {
      thinkingBudget: -1, // Dynamic
      includeThoughts: true,
    },
  },
});

Cerebras

import { llm } from '@providerprotocol/ai';
import { cerebras } from '@providerprotocol/ai/cerebras';

const model = llm({
  model: cerebras('gpt-oss-120b'),
  params: {
    reasoning_effort: 'high',
    reasoning_format: 'parsed',
  },
});

Streaming Reasoning

All providers emit ReasoningDelta events during streaming:

for await (const event of stream) {
  if (event.type === 'reasoning_delta') {
    process.stdout.write(event.delta.text);
  }
}

Embeddings

import { embedding } from '@providerprotocol/ai';
import { openai } from '@providerprotocol/ai/openai';

const embedder = embedding({ model: openai('text-embedding-3-small') });

// Single or batch
const result = await embedder.embed('Hello world');
const batch = await embedder.embed(['doc1', 'doc2', 'doc3']);

console.log(result.embeddings[0].vector);     // [0.123, -0.456, ...]
console.log(result.embeddings[0].dimensions); // 1536

Chunked Processing

For large datasets with progress tracking:

const stream = embedder.embed(documents, {
  chunked: true,
  batchSize: 100,
  concurrency: 2,
});

for await (const progress of stream) {
  console.log(`${progress.percent.toFixed(1)}% complete`);
}

const result = await stream.result;

Image Generation

import { image } from '@providerprotocol/ai';
import { openai } from '@providerprotocol/ai/openai';

const dalle = image({ model: openai('dall-e-3') });
const result = await dalle.generate('A sunset over mountains');

console.log(result.images[0].image.toBase64());

With Parameters

const hd = image({
  model: openai('dall-e-3'),
  params: { size: '1792x1024', quality: 'hd', style: 'natural' },
});

Image Editing

import { image, Image } from '@providerprotocol/ai';

const editor = image({ model: openai('dall-e-2') });

const source = await Image.fromPath('./photo.png');
const mask = await Image.fromPath('./mask.png');

const result = await editor.edit({
  image: source,
  mask,
  prompt: 'Add a rainbow in the sky',
});

Configuration

import { llm, exponentialBackoff, roundRobinKeys } from '@providerprotocol/ai';
import { openai } from '@providerprotocol/ai/openai';

const instance = llm({
  model: openai('gpt-4o'),
  config: {
    apiKey: roundRobinKeys(['sk-key1', 'sk-key2']),
    timeout: 30000,
    retryStrategy: exponentialBackoff({ maxAttempts: 3 }),
  },
  params: {
    temperature: 0.7,
    max_tokens: 1000,
  },
  system: 'You are a helpful assistant.',
});

System Prompts

System prompts can be a simple string or a provider-specific array for advanced features:

// Simple string (all providers)
const simple = llm({
  model: anthropic('claude-sonnet-4-20250514'),
  system: 'You are a helpful assistant.',
});

// Anthropic cache_control format
import { anthropic, betas } from '@providerprotocol/ai/anthropic';

const cached = llm({
  model: anthropic('claude-sonnet-4-20250514', {
    betas: [betas.promptCaching],
  }),
  system: [
    { type: 'text', text: 'Large context document...', cache_control: { type: 'ephemeral' } },
    { type: 'text', text: 'Instructions...' },
  ],
});

Provider Config Options

interface ProviderConfig {
  apiKey?: string | (() => Promise<string>) | KeyStrategy; // API key, async getter, or strategy
  baseUrl?: string;                 // Custom API endpoint
  timeout?: number;                 // Per-attempt timeout (ms)
  retryStrategy?: RetryStrategy;    // Retry behavior
  headers?: Record<string, string>; // Custom headers (merged with provider defaults)
  fetch?: typeof fetch;             // Custom fetch implementation
  apiVersion?: string;              // API version override
  retryAfterMaxSeconds?: number;    // Cap for Retry-After header (default: 3600)
}

Notes:

timeout applies per attempt; total time can exceed this with retries
headers are merged with model-level headers (explicit config takes precedence)
retryAfterMaxSeconds prevents honoring excessively long Retry-After values

Key Strategies

import { roundRobinKeys, weightedKeys, dynamicKey } from '@providerprotocol/ai/http';

// Cycle through keys evenly
roundRobinKeys(['sk-1', 'sk-2', 'sk-3'])

// Weighted selection (70% key1, 30% key2)
weightedKeys([
  { key: 'sk-1', weight: 70 },
  { key: 'sk-2', weight: 30 },
])

// Dynamic fetching (secrets manager, etc.)
dynamicKey(async () => fetchKeyFromVault())

Retry Strategies

import {
  exponentialBackoff,
  linearBackoff,
  noRetry,
  retryAfterStrategy,
} from '@providerprotocol/ai';

// Exponential: 1s, 2s, 4s...
exponentialBackoff({
  maxAttempts: 5,
  baseDelay: 1000,
  maxDelay: 30000,
  jitter: true,  // Randomize delays to prevent thundering herd (default: true)
})

// Linear: 1s, 2s, 3s...
linearBackoff({ maxAttempts: 3, delay: 1000 })

// Respect server Retry-After headers
retryAfterStrategy({ maxAttempts: 3, fallbackDelay: 5000 })

// No retries
noRetry()

Retryable Errors: RATE_LIMITED, NETWORK_ERROR, TIMEOUT, PROVIDER_ERROR

Streaming Retry: Retry strategies work with both .generate() and .stream(). During streaming, stream_retry events are emitted to notify consumers of retry attempts, and middleware can use the onRetry hook to reset accumulated state.

Tool Execution Control

const turn = await claude.generate({
  tools: [weatherTool, searchTool],
  toolStrategy: {
    maxIterations: 5,
    onBeforeCall: (tool, params) => {
      if (tool.name === 'dangerousTool') return false; // Block execution
      return true;
    },
    onAfterCall: (tool, params, result) => {
      console.log(`${tool.name} returned:`, result);
    },
    onError: (tool, params, error) => {
      console.error(`${tool.name} failed:`, error);
    },
  },
}, 'Search for recent news about AI');

Thread Management

import { Thread } from '@providerprotocol/ai';

const thread = new Thread();

thread.user('Hello!');
const turn = await claude.generate(thread.toMessages(), 'How are you?');
thread.append(turn);

// Serialize for storage
const json = thread.toJSON();
localStorage.setItem('conversation', JSON.stringify(json));

// Restore later
const restored = Thread.fromJSON(JSON.parse(localStorage.getItem('conversation')));

Middleware

Compose request/response/stream transformations with the middleware system. Middleware is imported from dedicated entry points.

Parsed Object Middleware

Automatically parse streaming JSON from structured output and tool call events:

import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { parsedObjectMiddleware } from '@providerprotocol/ai/middleware/parsed-object';

const model = llm({
  model: anthropic('claude-sonnet-4-20250514'),
  structure: {
    type: 'object',
    properties: {
      city: { type: 'string' },
      country: { type: 'string' },
      population: { type: 'number' },
    },
    required: ['city', 'country', 'population'],
  },
  middleware: [parsedObjectMiddleware()],
});

for await (const event of model.stream('What is the capital of France?')) {
  if (event.type === 'object_delta') {
    // Access incrementally parsed structured data
    console.log(event.delta.parsed);
    // { city: "Par" } -> { city: "Paris" } -> { city: "Paris", country: "Fr" } -> ...
  }
}

Logging Middleware

Add visibility into request lifecycle:

import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { loggingMiddleware } from '@providerprotocol/ai/middleware/logging';

const model = llm({
  model: anthropic('claude-sonnet-4-20250514'),
  middleware: [loggingMiddleware({ level: 'debug' })],
});

// Logs: [PP] [anthropic] Starting llm request (streaming)
// Logs: [PP] [anthropic] Completed in 1234ms
const result = await model.generate('Hello');

Persistence Middleware

Load and save conversation threads around LLM requests:

import { llm, Thread } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { persistenceMiddleware, PersistenceAdapter } from '@providerprotocol/ai/middleware/persistence';

const adapter = new PersistenceAdapter({
  id: 'conversation-123',
  load: async (id) => loadThreadFromDatabase(id), // Thread | ThreadJSON | null
  save: async (id, thread, turn) => {
    await saveThreadToDatabase(id, thread);
    if (turn) {
      await saveTurnToDatabase(id, turn);
    }
  },
});

const model = llm({
  model: anthropic('claude-sonnet-4-20250514'),
  middleware: [persistenceMiddleware({ adapter })],
});

const turn = await model.generate('Hello!');

Pub-Sub Middleware (Stream Resumption)

Enable reconnecting clients to catch up on missed events during active generation. The middleware buffers events, publishes them to subscribers, and removes streams on completion/abort/error. If a stream never reaches those hooks (for example, a process crash), the adapter may retain the entry. Custom adapters should invoke onComplete when remove() runs so subscriber streams can terminate. Streams are created lazily on first append() or subscribe() call.

import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { pubsubMiddleware, memoryAdapter } from '@providerprotocol/ai/middleware/pubsub';
import { webapi } from '@providerprotocol/ai/middleware/pubsub/server';

// Create a shared adapter instance
const adapter = memoryAdapter({ maxStreams: 1000 });

// Server route handling both new requests and reconnections
Bun.serve({
  port: 3000,
  async fetch(req) {
    const { messages, streamId } = await req.json();
    const exists = await adapter.exists(streamId);

    if (!exists) {
      // Start background generation (fire and forget)
      // Stream is created lazily on first append()
      const model = llm({
        model: anthropic('claude-sonnet-4-20250514'),
        middleware: [pubsubMiddleware({ adapter, streamId })],
      });
      model.stream(messages).then(turn => { /* save to DB */ });
    }

    // Both new and reconnect: subscribe to events
    return new Response(webapi.createSubscriberStream(streamId, adapter), {
      headers: { 'Content-Type': 'text/event-stream' },
    });
  },
});

Framework Adapters:

// Express
import { express } from '@providerprotocol/ai/middleware/pubsub/server';

app.post('/api/ai', async (req, res) => {
  const { messages, streamId } = req.body;

  // Guard: prevent duplicate generations on reconnect
  if (!await adapter.exists(streamId)) {
    const model = llm({
      model: anthropic('claude-sonnet-4-20250514'),
      middleware: [pubsubMiddleware({ adapter, streamId })],
    });
    model.stream(messages).then(turn => { /* save to DB */ });
  }

  express.streamSubscriber(streamId, adapter, res);
});

// Fastify
import { fastify } from '@providerprotocol/ai/middleware/pubsub/server';

app.post('/api/ai', async (request, reply) => {
  const { messages, streamId } = request.body;

  // Guard: prevent duplicate generations on reconnect
  if (!await adapter.exists(streamId)) {
    const model = llm({
      model: anthropic('claude-sonnet-4-20250514'),
      middleware: [pubsubMiddleware({ adapter, streamId })],
    });
    model.stream(messages).then(turn => { /* save to DB */ });
  }

  return fastify.streamSubscriber(streamId, adapter, reply);
});

// H3/Nuxt
import { h3 } from '@providerprotocol/ai/middleware/pubsub/server';

export default defineEventHandler(async (event) => {
  const { messages, streamId } = await readBody(event);

  // Guard: prevent duplicate generations on reconnect
  if (!await adapter.exists(streamId)) {
    const model = llm({
      model: anthropic('claude-sonnet-4-20250514'),
      middleware: [pubsubMiddleware({ adapter, streamId })],
    });
    model.stream(messages).then(turn => { /* save to DB */ });
  }

  return h3.streamSubscriber(streamId, adapter, event);
});

Custom Adapters:

Implement PubSubAdapter for custom backends (Redis, etc.):

import type { PubSubAdapter } from '@providerprotocol/ai/middleware/pubsub';

const redisAdapter: PubSubAdapter = {
  async exists(streamId) { /* check if stream exists */ },
  async append(streamId, event) { /* append event, create lazily */ },
  async getEvents(streamId) { /* return events or [] */ },
  subscribe(streamId, onEvent, onComplete, onFinalData) { /* subscribe to live events */ },
  publish(streamId, event) { /* broadcast to subscribers */ },
  setFinalData(streamId, data) { /* store final Turn data */ },
  async remove(streamId) { /* notify onFinalData, onComplete, then delete */ },
};

Combining Middleware

import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { loggingMiddleware } from '@providerprotocol/ai/middleware/logging';
import { parsedObjectMiddleware } from '@providerprotocol/ai/middleware/parsed-object';

const model = llm({
  model: anthropic('claude-sonnet-4-20250514'),
  structure: mySchema,
  middleware: [
    loggingMiddleware({ level: 'info' }),
    parsedObjectMiddleware(),
  ],
});

Pipeline Middleware (Post-Turn Processing)

Run async tasks (image generation, embeddings, slug creation, etc.) after the LLM completes, with progress events streamed to connected clients:

import { llm } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { pubsubMiddleware, memoryAdapter } from '@providerprotocol/ai/middleware/pubsub';
import { pipelineMiddleware, isPipelineStageEvent } from '@providerprotocol/ai/middleware/pipeline';

const adapter = memoryAdapter();

const model = llm({
  model: anthropic('claude-sonnet-4-20250514'),
  structure: BlogPostSchema,
  middleware: [
    pubsubMiddleware({ adapter, streamId: postId }),
    pipelineMiddleware<BlogPost>({
      stages: [
        {
          type: 'slug',
          run: (turn, emit) => {
            const slug = turn.data!.title.toLowerCase().replace(/\s+/g, '-');
            (turn as { slug?: string }).slug = slug;
            emit({ slug });
          },
        },
        {
          type: 'embedding',
          run: async (turn, emit) => {
            await vectorize(turn.data!);
            emit({ embedded: true });
          },
        },
      ],
      parallel: false,        // Run stages sequentially (default)
      continueOnError: false, // Stop on first error (default)
      onStageError: ({ stage, error }) => {
        console.error(`Stage ${stage.type} failed:`, error);
      },
    }),
  ],
});

// Stages run after streaming completes
model.stream(prompt).then(turn => {
  const extended = turn as typeof turn & { slug?: string };
  console.log(extended.slug);
});

Consuming Pipeline Events:

for await (const event of model.stream(prompt)) {
  if (isPipelineStageEvent(event)) {
    console.log(event.delta.stage, event.delta.payload);
    // 'slug' { slug: 'my-blog-post' }
    // 'embedding' { embedded: true }
  }
}

Middleware Order: Place pipelineMiddleware after pubsubMiddleware in the array:

middleware: [
  pubsubMiddleware({ ... }),  // Setup runs first in onStart
  pipelineMiddleware({ ... }),    // Events run first in onTurn (reverse order)
]

This ensures pubsub sets up before pipeline stages execute, and pipeline events emit before pubsub cleanup.

Pipeline Configuration:

| Option | Type | Default | Description | |--------|------|---------|-------------| | stages | PipelineStage[] | required | Stages to run after turn completion | | parallel | boolean | false | Run stages in parallel instead of sequential | | continueOnError | boolean | false | Continue running subsequent stages if one fails | | onStageError | function | - | Called when a stage throws an error |

Error Handling

All errors are normalized to UPPError with consistent error codes:

import { UPPError } from '@providerprotocol/ai';

try {
  await claude.generate('Hello');
} catch (error) {
  if (error instanceof UPPError) {
    console.log(error.code);       // 'RATE_LIMITED'
    console.log(error.provider);   // 'anthropic'
    console.log(error.modality);   // 'llm'
    console.log(error.statusCode); // 429
    console.log(error.cause);      // Original error (if any)

    switch (error.code) {
      case 'RATE_LIMITED':
        // Wait and retry
        break;
      case 'CONTEXT_LENGTH_EXCEEDED':
        // Reduce input size
        break;
      case 'AUTHENTICATION_FAILED':
        // Check API key
        break;
      case 'CONTENT_FILTERED':
        // Content policy violation
        break;
    }
  }
}

Error Codes: AUTHENTICATION_FAILED, RATE_LIMITED, CONTEXT_LENGTH_EXCEEDED, MODEL_NOT_FOUND, INVALID_REQUEST, INVALID_RESPONSE, CONTENT_FILTERED, QUOTA_EXCEEDED, PROVIDER_ERROR, NETWORK_ERROR, TIMEOUT, CANCELLED

API Gateway / Proxy

Build AI API gateways with your own authentication. Users authenticate with your platform - AI provider keys stay hidden on the server.

Security Note: The proxy works without any configuration, but this means no authentication by default. Always add your own auth layer in production - the examples below show how.

Server (Bun/Deno/Cloudflare Workers)

import { llm, exponentialBackoff, roundRobinKeys } from '@providerprotocol/ai';
import { anthropic } from '@providerprotocol/ai/anthropic';
import { parseBody, toJSON, toSSE, toError } from '@providerprotocol/ai/proxy';

// Server manages AI provider keys - users never see them
const claude = llm({
  model: anthropic('claude-sonnet-4-20250514'),
  config: {
    apiKey: roundRobinKeys([process.env.ANTHROPIC_KEY_1!, process.env.ANTHROPIC_KEY_2!]),
    retryStrategy: exponentialBackoff({ maxAttempts: 3 }),
  },
});

Bun.serve({
  port: 3000,
  async fetch(req) {
    // Authenticate with YOUR platform credentials
    const token = req.headers.get('Authorization')?.replace('Bearer ', '');
    const user = await validatePlatformToken(token ?? '');
    if (!user) return toError('Unauthorized', 401);

    // Rate limit, track usage, bill user, etc.
    await trackUsage(user.id);

    const { messages, system, params } = parseBody(await req.json());

    if (params?.stream) {
      return toSSE(claude.stream(messages, { system }));
    }
    return toJSON(await claude.generate(messages, { system }));
  },
});

Client

Clients authenticate with your platform token. They get automatic retry on network failures to your proxy.

import { llm, exponentialBackoff } from '@providerprotocol/ai';
import { proxy } from '@providerprotocol/ai/proxy';

const claude = llm({
  model: proxy('https://api.yourplatform.com/ai'),
  config: {
    headers: { 'Authorization': 'Bearer user-platform-token' },
    retryStrategy: exponentialBackoff({ maxAttempts: 3 }),
    timeout: 30000,
  },
});

const turn = await claude.generate('Hello!');

Framework Adapters

Server adapters for Express, Fastify, and Nuxt/H3:

// Express
import { express as expressAdapter, parseBody } from '@providerprotocol/ai/proxy';
app.post('/ai', authMiddleware, async (req, res) => {
  const { messages, system, params } = parseBody(req.body);
  if (params?.stream) {
    expressAdapter.streamSSE(claude.stream(messages, { system }), res);
  } else {
    expressAdapter.sendJSON(await claude.generate(messages, { system }), res);
  }
});

// Fastify
import { fastify as fastifyAdapter, parseBody } from '@providerprotocol/ai/proxy';
app.post('/ai', async (request, reply) => {
  const { messages, system, params } = parseBody(request.body);
  if (params?.stream) {
    return fastifyAdapter.streamSSE(claude.stream(messages, { system }), reply);
  }
  return fastifyAdapter.sendJSON(await claude.generate(messages, { system }), reply);
});

// Nuxt/H3 (server/api/ai.post.ts)
import { sendStream } from 'h3';
import { h3 as h3Adapter, parseBody } from '@providerprotocol/ai/proxy';
export default defineEventHandler(async (event) => {
  const { messages, system, params } = parseBody(await readBody(event));
  if (params?.stream) {
    return sendStream(event, h3Adapter.createSSEStream(claude.stream(messages, { system })));
  }
  return h3Adapter.sendJSON(await claude.generate(messages, { system }), event);
});

What this enables:

Users auth with your platform credentials (JWT, API keys, sessions)
You manage/rotate AI provider keys centrally
Per-user rate limiting, usage tracking, billing
Model access control (different users get different models)
Request/response logging, content filtering
Double-layer retry: client retries to proxy, server retries to AI provider

OpenAI API Modes

OpenAI supports two API endpoints. The Responses API is the default and recommended approach:

import { openai } from '@providerprotocol/ai/openai';

// Responses API (default, recommended)
openai('gpt-4o')

// Chat Completions API (legacy)
openai('gpt-4o', { api: 'completions' })

The Responses API supports built-in tools and stateful conversations. Use completions for backward compatibility.

OpenAI Built-in Tools

With the Responses API, use OpenAI's built-in tools directly:

import { llm } from '@providerprotocol/ai';
import { openai, tools } from '@providerprotocol/ai/openai';

// Web search
const model = llm({
  model: openai('gpt-4o'),
  params: {
    tools: [tools.webSearch()],
  },
});

// File search with vector stores
const researchModel = llm({
  model: openai('gpt-4o'),
  params: {
    tools: [tools.fileSearch({ vector_store_ids: ['vs_abc123'] })],
  },
});

// Code interpreter
const codeModel = llm({
  model: openai('gpt-4o'),
  params: {
    tools: [tools.codeInterpreter()],
  },
});

// Image generation
const creativeModel = llm({
  model: openai('gpt-4o'),
  params: {
    tools: [tools.imageGeneration()],
  },
});

Available Built-in Tools:

| Tool | Description | |------|-------------| | tools.webSearch() | Search the web with optional user location | | tools.fileSearch() | Search uploaded files in vector stores | | tools.codeInterpreter() | Execute code in a sandboxed environment | | tools.computer() | Computer use with display configuration | | tools.imageGeneration() | Generate images via DALL-E | | tools.mcp() | Connect to MCP servers |

xAI API Modes

xAI supports multiple API compatibility modes:

import { xai } from '@providerprotocol/ai/xai';

// Chat Completions (OpenAI-compatible, default)
xai('grok-3-fast')

// Responses API (stateful)
xai('grok-3-fast', { api: 'responses' })

// Messages API (Anthropic-compatible)
xai('grok-3-fast', { api: 'messages' })

Groq

Fast inference with Llama, Gemma, and Mixtral models:

import { llm } from '@providerprotocol/ai';
import { groq } from '@providerprotocol/ai/groq';

const model = llm({
  model: groq('llama-3.3-70b-versatile'),
  params: { max_tokens: 1000 },
});

const turn = await model.generate('Hello!');

With web search:

const searchModel = llm({
  model: groq('llama-3.3-70b-versatile'),
  params: {
    search_settings: { mode: 'auto' },
  },
});

With RAG documents:

const ragModel = llm({
  model: groq('llama-3.3-70b-versatile'),
  params: {
    documents: [
      { title: 'Doc 1', content: 'Document content here...' },
      { title: 'Doc 2', content: 'More content...' },
    ],
    citation_options: { include: true },
  },
});

Capabilities: Streaming, tool calling, structured output, image input (Llama 4 preview), web search, RAG with citations.

Environment: GROQ_API_KEY

Cerebras

Ultra-fast inference with Llama, Qwen, and GPT-OSS models:

import { llm } from '@providerprotocol/ai';
import { cerebras } from '@providerprotocol/ai/cerebras';

const model = llm({
  model: cerebras('llama-3.3-70b'),
  params: { max_completion_tokens: 1000 },
});

const turn = await model.generate('Hello!');

With reasoning (GPT-OSS):

const model = llm({
  model: cerebras('gpt-oss-120b'),
  params: {
    reasoning_effort: 'high',
    reasoning_format: 'parsed',
  },
});

Capabilities: Streaming, tool calling, structured output, reasoning parameters.

Environment: CEREBRAS_API_KEY

Moonshot

Kimi K2.5 with 256K context, thinking mode, vision, and server-side builtin tools:

import { llm } from '@providerprotocol/ai';
import { moonshot, tools } from '@providerprotocol/ai/moonshot';

const model = llm({
  model: moonshot('kimi-k2.5'),
  params: { max_tokens: 1000 },
});

const turn = await model.generate('Hello!');

With thinking mode (default for K2.5):

const model = llm({
  model: moonshot('kimi-k2.5'),
  params: {
    max_tokens: 2000,
    temperature: 1.0,
    thinking: { type: 'enabled' },
  },
});

// Response includes reasoning in turn.response.reasoning
const turn = await model.generate('Solve step by step: 2x + 5 = 13');

With instant mode (disabled thinking):

const model = llm({
  model: moonshot('kimi-k2.5'),
  params: {
    temperature: 0.6,
    thinking: { type: 'disabled' },
  },
});

With builtin tools:

const model = llm({
  model: moonshot('kimi-k2.5'),
  params: {
    tools: [
      tools.webSearch(),
      tools.codeRunner(),
      tools.date(),
    ],
  },
});

Available Builtin Tools:

| Tool | Description | |------|-------------| | tools.webSearch() | Real-time internet search | | tools.codeRunner() | Python code execution with matplotlib/pandas | | tools.quickjs() | JavaScript execution via QuickJS engine | | tools.fetch() | URL content fetching with markdown extraction | | tools.convert() | Unit conversion (length, mass, temperature, currency) | | tools.date() | Date/time processing and timezone conversion | | tools.base64Encode() | Base64 encoding | | tools.base64Decode() | Base64 decoding | | tools.memory() | Memory storage and retrieval system | | tools.rethink() | Intelligent reasoning/reflection tool | | tools.randomChoice() | Random selection with optional weights |

Capabilities: Streaming, tool calling, structured output, thinking mode, image input, video input (experimental).

Environment: MOONSHOT_API_KEY or KIMI_API_KEY

OpenResponses Provider

Connect to any server implementing the OpenResponses specification:

import { llm } from '@providerprotocol/ai';
import { responses } from '@providerprotocol/ai/responses';

// Using with OpenAI
const model = llm({
  model: responses('gpt-5.2', {
    host: 'https://api.openai.com/v1',
    apiKeyEnv: 'OPENAI_API_KEY',
  }),
  params: { max_output_tokens: 1000 },
});

// Using with OpenRouter
const routerModel = llm({
  model: responses('openai/gpt-4o', {
    host: 'https://openrouter.ai/api/v1',
    apiKeyEnv: 'OPENROUTER_API_KEY',
  }),
});

// Using with self-hosted server
const localModel = llm({
  model: responses('llama-3.3-70b', {
    host: 'http://localhost:8080/v1',
  }),
});

Capabilities: Full multimodal support, streaming, tool calling, structured output, reasoning summaries.

Alternative Import Style

Use the ai namespace for a grouped import style:

import { ai } from '@providerprotocol/ai';
import { openai } from '@providerprotocol/ai/openai';

const model = ai.llm({ model: openai('gpt-4o') });
const embedder = ai.embedding({ model: openai('text-embedding-3-small') });
const dalle = ai.image({ model: openai('dall-e-3') });

TypeScript

Full type safety with no any types. All provider parameters are typed:

import type {
  // Core types
  Turn,
  TurnJSON,
  Message,
  Tool,
  ToolInput,
  TokenUsage,

  // Streaming
  StreamEvent,
  StreamResult,

  // Content blocks
  TextBlock,
  ImageBlock,
  ReasoningBlock,
  DocumentBlock,
  AudioBlock,
  VideoBlock,

  // Modality results
  EmbeddingResult,
  ImageResult,

  // Errors
  UPPError,
  ErrorCode,

  // Configuration
  ProviderConfig,
  KeyStrategy,
  RetryStrategy,
  LLMCapabilities,

  // Middleware
  Middleware,
  MiddlewareContext,
  StreamContext,

  // Schema types (Zod support)
  Structure,
  ZodLike,
} from '@providerprotocol/ai';

Zod Utilities:

import {
  isZodSchema,
  isZodV4,
  zodToJSONSchema,
  zodToJSONSchemaSync,
  resolveStructure,
  resolveTools,
} from '@providerprotocol/ai/utils';

// Type guard for Zod schemas
if (isZodSchema(schema)) {
  const jsonSchema = zodToJSONSchemaSync(schema);
}

Error & ID Utilities:

import {
  toError,
  isCancelledError,
  generateId,
  generateShortId,
} from '@providerprotocol/ai/utils';

// Convert unknown thrown values to Error instances
const error = toError(unknownValue);

// Check if an error is a cancellation/abort error
if (isCancelledError(error)) {
  console.log('Request was cancelled');
}

// Generate IDs
const uuid = generateId();                    // UUID v4: "a1b2c3d4-..."
const shortId = generateShortId('req');       // "req_abc123xyz789"

Provider-Specific Types:

// OpenAI
import type { OpenAIHeaders, OpenAIImageParams } from '@providerprotocol/ai/openai';

// Google
import type { GoogleImagenParams } from '@providerprotocol/ai/google';

// Ollama
import type { OllamaHeaders } from '@providerprotocol/ai/ollama';

// OpenRouter
import type { OpenRouterProviderOptions } from '@providerprotocol/ai/openrouter';

Type-Safe Enums:

import {
  StreamEventType,
  ErrorCode,
  ContentBlockType,
  MessageRole,
  ModalityType,
} from '@providerprotocol/ai';

// Use instead of magic strings
if (event.type === StreamEventType.TextDelta) { ... }
if (error.code === ErrorCode.RateLimited) { ... }
if (block.type === ContentBlockType.Text) { ... }

Custom Providers

Build custom providers with createProvider:

import { createProvider } from '@providerprotocol/ai';

const myProvider = createProvider({
  name: 'my-provider',
  version: '1.0.0',
  handlers: {
    llm: myLLMHandler,
    embedding: myEmbeddingHandler,
  },
});

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@providerprotocol/ai

Quick Start

Providers

LLM

Streaming

Multi-turn Conversations

System-Only Inference

Tools

Tools with Zod Parameters

Structured Output

Zod Schema Support

Multimodal Input

Anthropic Beta Features

Anthropic Built-in Tools

Reasoning / Extended Thinking

Anthropic

OpenAI

Google Gemini

Cerebras

Streaming Reasoning

Embeddings

Chunked Processing

Image Generation

With Parameters

Image Editing

Configuration

System Prompts

Provider Config Options

Key Strategies

Retry Strategies

Tool Execution Control

Thread Management

Middleware

Parsed Object Middleware

Logging Middleware

Persistence Middleware

Pub-Sub Middleware (Stream Resumption)

Combining Middleware

Pipeline Middleware (Post-Turn Processing)

Error Handling

API Gateway / Proxy

Server (Bun/Deno/Cloudflare Workers)

Client

Framework Adapters

OpenAI API Modes

OpenAI Built-in Tools

xAI API Modes

Groq

Cerebras

Moonshot

OpenResponses Provider

Alternative Import Style

TypeScript

Custom Providers

License