@earendil-works/pi-ai

v0.80.2

Published

5 days ago

Unified LLM API with automatic model discovery and provider configuration

0High
0Medium
0Low

mitsuhiko

badlogic

rwachtler

ai llm openai anthropic gemini bedrock unified api

@earendil-works/pi-ai

Unified LLM API with provider collections, automatic auth resolution, token and cost tracking, and simple context persistence and hand-off to other models mid-session.

Note: This library only includes models that support tool calling (function calling), as this is essential for agentic workflows.

Supported Providers

OpenAI
Ant Ling
Azure OpenAI (Responses)
OpenAI Codex (ChatGPT Plus/Pro subscription, requires OAuth, see below)
DeepSeek
NVIDIA NIM
Anthropic
Google
Vertex AI (Gemini via Vertex AI)
Mistral
Groq
Cerebras
Cloudflare AI Gateway
Cloudflare Workers AI
xAI
OpenRouter
Vercel AI Gateway
ZAI Coding Plan (Global) (with separate China provider)
MiniMax (with separate China provider)
Together AI
Hugging Face
Moonshot AI (with separate China provider)
GitHub Copilot (requires OAuth, see below)
Amazon Bedrock
OpenCode Zen
OpenCode Go
Fireworks (uses OpenAI- and Anthropic-compatible APIs)
Kimi For Coding (Moonshot AI subscription endpoint, uses Anthropic-compatible API)
Xiaomi MiMo (defaults to API billing endpoint, with separate Token Plan providers for cn/ams/sgp regions)
Any OpenAI-compatible API: Ollama, vLLM, LM Studio, etc.

Installation

npm install @earendil-works/pi-ai

TypeBox exports are re-exported from @earendil-works/pi-ai: Type, Static, and TSchema.

Quick Start

You build a Models collection of providers and stream through it. The quickest start registers every built-in provider; apps that care about bundle size register individual providers instead (see Provider Factories and Bundling and Tree Shaking).

import { Type, type Context, type Tool } from '@earendil-works/pi-ai';
import { builtinModels } from '@earendil-works/pi-ai/providers/all';

// A Models collection with every built-in provider registered
const models = builtinModels();

// Sync lookup against the collection
const model = models.getModel('openai', 'gpt-4o-mini')!;

// Define tools with TypeBox schemas for type safety and validation
const tools: Tool[] = [{
  name: 'get_time',
  description: 'Get the current time',
  parameters: Type.Object({
    timezone: Type.Optional(Type.String({ description: 'Optional timezone (e.g., America/New_York)' }))
  })
}];

// Build a conversation context (easily serializable and transferable between models)
const context: Context = {
  systemPrompt: 'You are a helpful assistant.',
  messages: [{ role: 'user', content: 'What time is it?', timestamp: Date.now() }],
  tools
};

// Option 1: Streaming with all event types.
// Auth resolves through the provider (OPENAI_API_KEY from the environment here).
const s = models.stream(model, context);

for await (const event of s) {
  switch (event.type) {
    case 'start':
      console.log(`Starting with ${event.partial.model}`);
      break;
    case 'text_start':
      console.log('\n[Text started]');
      break;
    case 'text_delta':
      process.stdout.write(event.delta);
      break;
    case 'text_end':
      console.log('\n[Text ended]');
      break;
    case 'thinking_start':
      console.log('[Model is thinking...]');
      break;
    case 'thinking_delta':
      process.stdout.write(event.delta);
      break;
    case 'thinking_end':
      console.log('[Thinking complete]');
      break;
    case 'toolcall_start':
      console.log(`\n[Tool call started: index ${event.contentIndex}]`);
      break;
    case 'toolcall_delta':
      // Partial tool arguments are being streamed
      const partialCall = event.partial.content[event.contentIndex];
      if (partialCall.type === 'toolCall') {
        console.log(`[Streaming args for ${partialCall.name}]`);
      }
      break;
    case 'toolcall_end':
      console.log(`\nTool called: ${event.toolCall.name}`);
      console.log(`Arguments: ${JSON.stringify(event.toolCall.arguments)}`);
      break;
    case 'done':
      console.log(`\nFinished: ${event.reason}`);
      break;
    case 'error':
      console.error(`Error: ${event.error.errorMessage}`);
      break;
  }
}

// Get the final message after streaming, add it to the context
const finalMessage = await s.result();
context.messages.push(finalMessage);

// Handle tool calls if any
const toolCalls = finalMessage.content.filter(b => b.type === 'toolCall');
for (const call of toolCalls) {
  const result = call.name === 'get_time'
    ? new Date().toLocaleString('en-US', {
        timeZone: call.arguments.timezone || 'UTC',
        dateStyle: 'full',
        timeStyle: 'long'
      })
    : 'Unknown tool';

  // Add tool result to context (supports text and images)
  context.messages.push({
    role: 'toolResult',
    toolCallId: call.id,
    toolName: call.name,
    content: [{ type: 'text', text: result }],
    isError: false,
    timestamp: Date.now()
  });
}

// Continue if there were tool calls
if (toolCalls.length > 0) {
  const continuation = await models.complete(model, context);
  context.messages.push(continuation);
  console.log('After tool execution:', continuation.content);
}

console.log(`Total tokens: ${finalMessage.usage.input} in, ${finalMessage.usage.output} out`);
console.log(`Cost: $${finalMessage.usage.cost.total.toFixed(4)}`);

// Option 2: Get complete response without streaming
const response = await models.complete(model, context);

for (const block of response.content) {
  if (block.type === 'text') {
    console.log(block.text);
  } else if (block.type === 'toolCall') {
    console.log(`Tool: ${block.name}(${JSON.stringify(block.arguments)})`);
  }
}

Snippets in the rest of this README assume a models collection set up like this (with the relevant providers registered).

Providers and Models

A provider is the runtime unit: it owns its model catalog, its auth (API key resolution, OAuth flows), and its stream behavior. A Models collection holds providers and routes every request to the provider that owns the model.

Providers internally share API implementations (the wire protocols): Anthropic models use anthropic-messages, OpenAI uses openai-responses, while xAI, Groq, Cerebras, OpenRouter, and most others share openai-completions. Mixed-API providers (GitHub Copilot, OpenCode Zen) dispatch per model.

Provider Factories

For apps that only need specific providers, there is one factory per built-in provider, each a subpath import that pulls only that provider's catalog:

import { anthropicProvider } from '@earendil-works/pi-ai/providers/anthropic';
import { openaiProvider } from '@earendil-works/pi-ai/providers/openai';
import { openrouterProvider } from '@earendil-works/pi-ai/providers/openrouter';
import { amazonBedrockProvider } from '@earendil-works/pi-ai/providers/amazon-bedrock';
// ...one module per provider in the Supported Providers list

const models = createModels();
models.setProvider(anthropicProvider());
models.setProvider(openrouterProvider());

Provider factories import their model catalog and a lazy API wrapper. They do not import other providers. With bundler code splitting, SDK implementations (@anthropic-ai/sdk, openai, @google/genai, etc.) stay in lazy chunks loaded on the first request to a model of that API.

All Built-in Providers

For apps that want everything (as in Quick Start):

import { builtinModels } from '@earendil-works/pi-ai/providers/all';

const models = builtinModels(); // a Models collection with every built-in provider registered

This imports all catalogs and every built-in provider factory. It is the heavy, explicit entrypoint. builtinModels() accepts the same options as createModels() (credentials, authContext); builtinProviders() returns the provider array if you want to register them on your own collection.

Querying Models

Reads are synchronous and return the last-known lists:

const providers = models.getProviders();           // registered Provider objects
const provider = models.getProvider('anthropic');  // one provider

const all = models.getModels();                    // every model across providers
const anthropicModels = models.getModels('anthropic');
const model = models.getModel('anthropic', 'claude-sonnet-4-5');

for (const m of anthropicModels) {
  console.log(`${m.id}: ${m.name}`);
  console.log(`  API: ${m.api}`);
  console.log(`  Context: ${m.contextWindow} tokens`);
  console.log(`  Vision: ${m.input.includes('image')}`);
  console.log(`  Reasoning: ${m.reasoning}`);
}

Dynamically listed models are typed Model<Api>. Narrow with the hasApi() guard when you need API-specific option typing:

import { hasApi } from '@earendil-works/pi-ai';

const m = models.getModel('anthropic', 'claude-sonnet-4-5');
if (m && hasApi(m, 'anthropic-messages')) {
  // m: Model<'anthropic-messages'> — stream options fully typed
  models.stream(m, context, { thinkingEnabled: true, thinkingBudgetTokens: 2048 });
}

Static Catalog Reads

For tooling that wants the generated built-in catalog with full literal typing (provider and model IDs auto-complete), independent of any collection:

import { getBuiltinModel, getBuiltinModels, getBuiltinProviders } from '@earendil-works/pi-ai/providers/all';

const model = getBuiltinModel('openai', 'gpt-4o-mini'); // typed Model<'openai-responses'>
const providers = getBuiltinProviders();
const anthropic = getBuiltinModels('anthropic');

Dynamic Providers

Providers may have dynamic model lists (a llama.cpp server, a live OpenRouter listing). Reads stay sync; fetching is an explicit async verb:

// getModels() returns the last-known list (empty before the first refresh)
await models.refresh('llamacpp');        // fetch one provider's list; rejects on failure
await models.refresh();                  // refresh all providers concurrently, best-effort
const fresh = models.getModel('llamacpp', 'qwen3-30b');

Static built-in providers are no-ops for refresh(). See createProvider() for building a dynamic provider.

Auth

Every provider owns its auth: how API keys resolve (stored credentials, environment variables, ambient sources like AWS profiles or gcloud ADC) and, where supported, OAuth login/refresh flows.

How Auth Resolves

When you call models.stream(), the collection resolves auth through the owning provider and merges it into the request. Explicit per-request values always win:

// Resolved through the provider (env var, stored credential, OAuth token):
await models.complete(model, context);

// Explicit key wins over anything the provider would resolve:
await models.complete(model, context, { apiKey: 'sk-explicit' });

You can inspect resolution without making a request — useful for status UIs:

const auth = await models.getAuth(model);
if (auth) {
  console.log(`configured via ${auth.source}`); // e.g. "ANTHROPIC_API_KEY", "OAuth", "stored credential"
} else {
  console.log('not configured');
}

getAuth() resolves undefined for unconfigured providers and rejects with ModelsError when something is actually broken ("oauth": token refresh failed, credential preserved for re-login; "auth": key resolution or credential store failure). Request paths surface the same failures as stream errors.

Credential Store

Stored credentials (API keys entered interactively, OAuth tokens) live in a CredentialStore — one type-tagged credential per provider. pi-ai ships an in-memory default; apps inject persistent storage:

import { createModels, type CredentialStore } from '@earendil-works/pi-ai';

const models = createModels({ credentials: myFileBackedStore });
// builtinModels() takes the same options:
// const models = builtinModels({ credentials: myFileBackedStore });

The contract is small: read(providerId), modify(providerId, fn) (the only write path — a serialized read-modify-write), and delete(providerId). OAuth token refresh runs inside modify, so concurrent requests and processes cannot double-refresh a rotated token. A stored credential owns its provider: environment variables are only consulted when nothing is stored, and a failed refresh never silently falls back to an env key.

API-key credentials use the same discriminator as pi's auth.json and can carry provider-scoped env/config values:

const credential = {
  type: 'api_key',
  key: '...',
  env: {
    CLOUDFLARE_ACCOUNT_ID: 'account-id',
    CLOUDFLARE_GATEWAY_ID: 'gateway-id'
  }
} as const;

Environment Variables

Built-in providers resolve these env vars (Node.js; in browsers pass apiKey explicitly):

| Provider | Environment Variable(s) | |----------|------------------------| | OpenAI | OPENAI_API_KEY | | Ant Ling | ANT_LING_API_KEY | | Azure OpenAI | AZURE_OPENAI_API_KEY + AZURE_OPENAI_BASE_URL (e.g. https://{resource}.openai.azure.com) or AZURE_OPENAI_RESOURCE_NAME. Supports *.openai.azure.com and *.cognitiveservices.azure.com; root endpoints auto-normalize to /openai/v1. Optional: AZURE_OPENAI_API_VERSION (default v1), AZURE_OPENAI_DEPLOYMENT_NAME_MAP. | | Anthropic | ANTHROPIC_API_KEY or ANTHROPIC_OAUTH_TOKEN | | DeepSeek | DEEPSEEK_API_KEY | | NVIDIA NIM | NVIDIA_API_KEY | | Google | GEMINI_API_KEY | | Vertex AI | GOOGLE_CLOUD_API_KEY or GOOGLE_CLOUD_PROJECT (or GCLOUD_PROJECT) + GOOGLE_CLOUD_LOCATION + ADC | | Mistral | MISTRAL_API_KEY | | Groq | GROQ_API_KEY | | Cerebras | CEREBRAS_API_KEY | | Cloudflare AI Gateway | CLOUDFLARE_API_KEY + CLOUDFLARE_ACCOUNT_ID + CLOUDFLARE_GATEWAY_ID | | Cloudflare Workers AI | CLOUDFLARE_API_KEY + CLOUDFLARE_ACCOUNT_ID | | xAI | XAI_API_KEY | | Fireworks | FIREWORKS_API_KEY | | Together AI | TOGETHER_API_KEY | | OpenRouter | OPENROUTER_API_KEY | | Vercel AI Gateway | AI_GATEWAY_API_KEY | | ZAI Coding Plan (Global) | ZAI_API_KEY | | ZAI Coding Plan (China) | ZAI_CODING_CN_API_KEY | | MiniMax (Global) | MINIMAX_API_KEY | | MiniMax (China) | MINIMAX_CN_API_KEY | | Moonshot AI / Moonshot AI (China) | MOONSHOT_API_KEY | | Hugging Face | HF_TOKEN | | OpenCode Zen / OpenCode Go | OPENCODE_API_KEY | | Kimi For Coding | KIMI_API_KEY | | Xiaomi MiMo (API billing) | XIAOMI_API_KEY | | Xiaomi MiMo Token Plan (China) | XIAOMI_TOKEN_PLAN_CN_API_KEY | | Xiaomi MiMo Token Plan (Amsterdam) | XIAOMI_TOKEN_PLAN_AMS_API_KEY | | Xiaomi MiMo Token Plan (Singapore) | XIAOMI_TOKEN_PLAN_SGP_API_KEY | | GitHub Copilot | COPILOT_GITHUB_TOKEN |

Amazon Bedrock resolves ambient AWS credentials (AWS_PROFILE, access key pairs, AWS_BEARER_TOKEN_BEDROCK, ECS task roles, web identity tokens). Vertex AI resolves either an explicit key or gcloud Application Default Credentials plus project/location.

Tools

Tools enable LLMs to interact with external systems. This library uses TypeBox schemas for type-safe tool definitions with automatic validation using TypeBox's built-in validator and value conversion utilities. TypeBox schemas can be serialized and deserialized as plain JSON, making them ideal for distributed systems.

Defining Tools

import { Type, type Tool, StringEnum } from '@earendil-works/pi-ai';

// Define tool parameters with TypeBox
const weatherTool: Tool = {
  name: 'get_weather',
  description: 'Get current weather for a location',
  parameters: Type.Object({
    location: Type.String({ description: 'City name or coordinates' }),
    units: StringEnum(['celsius', 'fahrenheit'], { default: 'celsius' })
  })
};

// Note: For Google API compatibility, use StringEnum helper instead of Type.Enum
// Type.Enum generates anyOf/const patterns that Google doesn't support

const bookMeetingTool: Tool = {
  name: 'book_meeting',
  description: 'Schedule a meeting',
  parameters: Type.Object({
    title: Type.String({ minLength: 1 }),
    startTime: Type.String({ format: 'date-time' }),
    endTime: Type.String({ format: 'date-time' }),
    attendees: Type.Array(Type.String({ format: 'email' }), { minItems: 1 })
  })
};

Handling Tool Calls

Tool results use content blocks and can include both text and images:

import { readFileSync } from 'fs';

const context: Context = {
  messages: [{ role: 'user', content: 'What is the weather in London?', timestamp: Date.now() }],
  tools: [weatherTool]
};

const response = await models.complete(model, context);

// Check for tool calls in the response
for (const block of response.content) {
  if (block.type === 'toolCall') {
    // Execute your tool with the arguments
    // See "Validating Tool Arguments" section for validation
    const result = await executeWeatherApi(block.arguments);

    // Add tool result with text content
    context.messages.push({
      role: 'toolResult',
      toolCallId: block.id,
      toolName: block.name,
      content: [{ type: 'text', text: JSON.stringify(result) }],
      isError: false,
      timestamp: Date.now()
    });
  }
}

// Tool results can also include images (for vision-capable models)
const imageBuffer = readFileSync('chart.png');
context.messages.push({
  role: 'toolResult',
  toolCallId: 'tool_xyz',
  toolName: 'generate_chart',
  content: [
    { type: 'text', text: 'Generated chart showing temperature trends' },
    { type: 'image', data: imageBuffer.toString('base64'), mimeType: 'image/png' }
  ],
  isError: false,
  timestamp: Date.now()
});

Streaming Tool Calls with Partial JSON

During streaming, tool call arguments are progressively parsed as they arrive. This enables real-time UI updates before the complete arguments are available:

const s = models.stream(model, context);

for await (const event of s) {
  if (event.type === 'toolcall_delta') {
    const toolCall = event.partial.content[event.contentIndex];

    // toolCall.arguments contains partially parsed JSON during streaming
    // This allows for progressive UI updates
    if (toolCall.type === 'toolCall' && toolCall.arguments) {
      // BE DEFENSIVE: arguments may be incomplete
      // Example: Show file path being written even before content is complete
      if (toolCall.name === 'write_file' && toolCall.arguments.path) {
        console.log(`Writing to: ${toolCall.arguments.path}`);

        // Content might be partial or missing
        if (toolCall.arguments.content) {
          console.log(`Content preview: ${toolCall.arguments.content.substring(0, 100)}...`);
        }
      }
    }
  }

  if (event.type === 'toolcall_end') {
    // Here toolCall.arguments is complete (but not yet validated)
    const toolCall = event.toolCall;
    console.log(`Tool completed: ${toolCall.name}`, toolCall.arguments);
  }
}

Important notes about partial tool arguments:

During toolcall_delta events, arguments contains the best-effort parse of partial JSON
Fields may be missing or incomplete - always check for existence before use
String values may be truncated mid-word
Arrays may be incomplete
Nested objects may be partially populated
At minimum, arguments will be an empty object {}, never undefined
The Google provider does not support function call streaming. Instead, you will receive a single toolcall_delta event with the full arguments.

Validating Tool Arguments

When implementing your own tool execution loop, use validateToolCall to validate arguments before passing them to your tools:

import { validateToolCall, type Tool } from '@earendil-works/pi-ai';

const tools: Tool[] = [weatherTool, calculatorTool];
const s = models.stream(model, { messages, tools });

for await (const event of s) {
  if (event.type === 'toolcall_end') {
    const toolCall = event.toolCall;

    try {
      // Validate arguments against the tool's schema (throws on invalid args)
      const validatedArgs = validateToolCall(tools, toolCall);
      const result = await executeMyTool(toolCall.name, validatedArgs);
      // ... add tool result to context
    } catch (error) {
      // Validation failed - return error as tool result so model can retry
      context.messages.push({
        role: 'toolResult',
        toolCallId: toolCall.id,
        toolName: toolCall.name,
        content: [{ type: 'text', text: error.message }],
        isError: true,
        timestamp: Date.now()
      });
    }
  }
}

Complete Event Reference

All streaming events emitted during assistant message generation:

| Event Type | Description | Key Properties | |------------|-------------|----------------| | start | Stream begins | partial: Initial assistant message structure | | text_start | Text block starts | contentIndex: Position in content array | | text_delta | Text chunk received | delta: New text, contentIndex: Position | | text_end | Text block complete | content: Full text, contentIndex: Position | | thinking_start | Thinking block starts | contentIndex: Position in content array | | thinking_delta | Thinking chunk received | delta: New text, contentIndex: Position | | thinking_end | Thinking block complete | content: Full thinking, contentIndex: Position | | toolcall_start | Tool call begins | contentIndex: Position in content array | | toolcall_delta | Tool arguments streaming | delta: JSON chunk, partial.content[contentIndex].arguments: Partial parsed args | | toolcall_end | Tool call complete | toolCall: Complete validated tool call with id, name, arguments | | done | Stream complete | reason: Stop reason ("stop", "length", "toolUse"), message: Final assistant message | | error | Error occurred | reason: Error type ("error" or "aborted"), error: AssistantMessage with partial content |

Streaming events for different content blocks are not guaranteed to be contiguous. Providers may emit deltas for text, thinking, and tool calls in the same upstream chunk, and pi may surface corresponding events interleaved, for example text_start, text_delta, toolcall_start, text_delta, toolcall_delta. Consumers must use contentIndex to associate each delta/end event with its block and must not assume that a block's *_start/*_delta/*_end sequence is uninterrupted by events for other blocks.

Image Input

Models with vision capabilities can process images. You can check if a model supports images via the input property. If you pass images to a non-vision model, they are silently ignored.

import { readFileSync } from 'fs';

const model = models.getModel('openai', 'gpt-4o-mini')!;

// Check if model supports images
if (model.input.includes('image')) {
  console.log('Model supports vision');
}

const imageBuffer = readFileSync('image.png');
const base64Image = imageBuffer.toString('base64');

const response = await models.complete(model, {
  messages: [{
    role: 'user',
    content: [
      { type: 'text', text: 'What is in this image?' },
      { type: 'image', data: base64Image, mimeType: 'image/png' }
    ],
    timestamp: Date.now()
  }]
});

// Access the response
for (const block of response.content) {
  if (block.type === 'text') {
    console.log(block.text);
  }
}

Image Generation

Image generation uses a separate API surface from text/chat generation, mirroring the chat-side design: an ImagesModels collection holds ImagesProviders, reads are sync, and auth resolves through the owning provider. Image generation is a one-shot API: generateImages() waits for the provider response and returns the final AssistantImages result — do not use the chat/stream APIs for it.

Basic Image Generation

import { builtinImagesModels } from '@earendil-works/pi-ai/providers/all';

// Every built-in image-generation provider; accepts the same options as createModels()
const imagesModels = builtinImagesModels();

const model = imagesModels.getModel('openrouter', 'google/gemini-2.5-flash-image')!;

// Auth resolves through the provider (OPENROUTER_API_KEY here); explicit apiKey wins
const result = await imagesModels.generateImages(model, {
  input: [{ type: 'text', text: 'Generate a red circle on a plain white background.' }]
});

for (const block of result.output) {
  if (block.type === 'text') {
    console.log(block.text);
  } else if (block.type === 'image') {
    console.log(block.mimeType);
    console.log(block.data.substring(0, 32));
  }
}

Like the chat side, you can build the collection from parts: createImagesModels({ credentials?, authContext? }), the openrouterImagesProvider() factory from @earendil-works/pi-ai/providers/openrouter-images, and createImagesProvider({ id, auth, models, refreshModels?, api }) for custom image providers (with imagesModels.refresh(provider?) for dynamic lists). Failures never reject — they return an AssistantImages with stopReason: "error". The collection's getAuth(model) works exactly like the chat-side one.

The old global API (getImageModel() / getImageModels() / getImageProviders() / generateImages()) remains available on the compat entrypoint:

import { getImageModel, generateImages } from '@earendil-works/pi-ai/compat';

const model = getImageModel('openrouter', 'google/gemini-2.5-flash-image');
const result = await generateImages(model, {
  input: [{ type: 'text', text: 'Generate a red circle on a plain white background.' }]
}, {
  apiKey: process.env.OPENROUTER_API_KEY
});

Some models also support image input:

import { readFileSync } from 'fs';

const imageBuffer = readFileSync('input.png');
const result = await imagesModels.generateImages(model, {
  input: [
    { type: 'text', text: 'Create a variation of this image with a blue background.' },
    { type: 'image', data: imageBuffer.toString('base64'), mimeType: 'image/png' }
  ]
});

Check capabilities on the model metadata:

console.log(model.input);   // ['text', 'image']
console.log(model.output);  // ['image'] or ['image', 'text']

Notes and Limitations

Image models live in ImagesModels collections, chat models in Models collections; the two are separate surfaces.
Use generateImages(), not the chat/stream APIs.
Image-generation models do not participate in tool calling.
Outputs are returned in AssistantImages.output and can include both base64-encoded ImageContent blocks and TextContent blocks.
Some models return only images, others return images plus text. Check model.output.
Some models accept image input, others are text-to-image only. Check model.input.
Like the streaming APIs, image generation supports options such as apiKey, signal, headers, onPayload, and onResponse, and results may include stopReason, responseId, and usage.
If you want a model to analyze images in a conversation or call tools, use the regular chat APIs with a model that supports image input.
At the moment, image generation is available through only one provider, OpenRouter.

Thinking/Reasoning

Many models support thinking/reasoning capabilities where they can show their internal thought process. You can check if a model supports reasoning via the reasoning property. If you pass reasoning options to a non-reasoning model, they are silently ignored.

Unified Interface (streamSimple/completeSimple)

// Many models across providers support thinking/reasoning
const model = models.getModel('anthropic', 'claude-sonnet-4-5')!;
// or models.getModel('openai', 'gpt-5-mini');
// or models.getModel('google', 'gemini-2.5-flash');
// or models.getModel('xai', 'grok-code-fast-1');

// Check if model supports reasoning
if (model.reasoning) {
  console.log('Model supports reasoning/thinking');
}

// Use the simplified reasoning option
const response = await models.completeSimple(model, {
  messages: [{ role: 'user', content: 'Solve: 2x + 5 = 13', timestamp: Date.now() }]
}, {
  reasoning: 'medium'  // 'minimal' | 'low' | 'medium' | 'high' | 'xhigh'
});

// Access thinking and text blocks
for (const block of response.content) {
  if (block.type === 'thinking') {
    console.log('Thinking:', block.thinking);
  } else if (block.type === 'text') {
    console.log('Response:', block.text);
  }
}

Provider-Specific Options (stream/complete)

models.stream()/complete() accept the owning API's full option set. Use hasApi() to narrow a dynamically looked-up model to its API for full option typing:

import { hasApi } from '@earendil-works/pi-ai';

// OpenAI Reasoning (o1, o3, gpt-5)
const openaiModel = models.getModel('openai', 'gpt-5-mini')!;
if (hasApi(openaiModel, 'openai-responses')) {
  await models.complete(openaiModel, context, {
    reasoningEffort: 'medium',
    reasoningSummary: 'detailed'  // OpenAI Responses API only
  });
}

// Anthropic Thinking
const anthropicModel = models.getModel('anthropic', 'claude-sonnet-4-5')!;
if (hasApi(anthropicModel, 'anthropic-messages')) {
  await models.complete(anthropicModel, context, {
    thinkingEnabled: true,
    thinkingBudgetTokens: 8192  // Optional token limit
  });
}

// Google Gemini Thinking
const googleModel = models.getModel('google', 'gemini-2.5-flash')!;
if (hasApi(googleModel, 'google-generative-ai')) {
  await models.complete(googleModel, context, {
    thinking: {
      enabled: true,
      budgetTokens: 8192  // -1 for dynamic, 0 to disable
    }
  });
}

Streaming Thinking Content

When streaming, thinking content is delivered through specific events:

const s = models.streamSimple(model, context, { reasoning: 'high' });

for await (const event of s) {
  switch (event.type) {
    case 'thinking_start':
      console.log('[Model started thinking]');
      break;
    case 'thinking_delta':
      process.stdout.write(event.delta);  // Stream thinking content
      break;
    case 'thinking_end':
      console.log('\n[Thinking complete]');
      break;
  }
}

Stop Reasons

Every AssistantMessage includes a stopReason field that indicates how the generation ended:

"stop" - Normal completion, the model finished its response
"length" - Output hit the maximum token limit
"toolUse" - Model is calling tools and expects tool results
"error" - An error occurred during generation
"aborted" - Request was cancelled via abort signal

AssistantMessage may also include responseId, a provider-specific upstream response or message identifier when the underlying API exposes one. Do not assume it is always present across providers.

Error Handling

Request failures never throw out of the stream functions: when a request ends with an error (including aborts and tool call validation errors), the streaming API emits an error event and the final message carries the details:

// In streaming
for await (const event of s) {
  if (event.type === 'error') {
    // event.reason is either "error" or "aborted"
    // event.error is the AssistantMessage with partial content
    console.error(`Error (${event.reason}):`, event.error.errorMessage);
    console.log('Partial content:', event.error.content);
  }
}

// The final message will have the error details
const message = await s.result();
if (message.stopReason === 'error' || message.stopReason === 'aborted') {
  console.error('Request failed:', message.errorMessage);
  // message.content contains any partial content received before the error
  // message.usage contains partial token counts and costs
}

Auth failures (no key configured, OAuth refresh failed, unknown provider) surface the same way: as a stream error with stopReason: "error".

Aborting Requests

The abort signal allows you to cancel in-progress requests. Aborted requests have stopReason === 'aborted':

const controller = new AbortController();

// Abort after 2 seconds
setTimeout(() => controller.abort(), 2000);

const s = models.stream(model, {
  messages: [{ role: 'user', content: 'Write a long story', timestamp: Date.now() }]
}, {
  signal: controller.signal
});

for await (const event of s) {
  if (event.type === 'text_delta') {
    process.stdout.write(event.delta);
  } else if (event.type === 'error') {
    // event.reason tells you if it was "error" or "aborted"
    console.log(`${event.reason === 'aborted' ? 'Aborted' : 'Error'}:`, event.error.errorMessage);
  }
}

// Get results (may be partial if aborted)
const response = await s.result();
if (response.stopReason === 'aborted') {
  console.log('Request was aborted:', response.errorMessage);
  console.log('Partial content received:', response.content);
  console.log('Tokens used:', response.usage);
}

Continuing After Abort

Aborted messages can be added to the conversation context and continued in subsequent requests:

const context = {
  messages: [
    { role: 'user', content: 'Explain quantum computing in detail', timestamp: Date.now() }
  ]
};

// First request gets aborted after 2 seconds
const controller1 = new AbortController();
setTimeout(() => controller1.abort(), 2000);

const partial = await models.complete(model, context, { signal: controller1.signal });

// Add the partial response to context
context.messages.push(partial);
context.messages.push({ role: 'user', content: 'Please continue', timestamp: Date.now() });

// Continue the conversation
const continuation = await models.complete(model, context);

Debugging Provider Payloads

Use the onPayload callback to inspect the request payload sent to the provider. This is useful for debugging request formatting issues or provider validation errors.

const response = await models.complete(model, context, {
  onPayload: (payload) => {
    console.log('Provider payload:', JSON.stringify(payload, null, 2));
  }
});

The callback is supported by stream, complete, streamSimple, and completeSimple.

Custom Providers

createProvider()

createProvider() builds a provider from parts: identity, auth, a model list, and an API implementation. Use it for local inference servers, proxies, or any OpenAI/Anthropic-compatible endpoint:

import { createModels, createProvider, envApiKeyAuth, type Model } from '@earendil-works/pi-ai';
import { openAICompletionsApi } from '@earendil-works/pi-ai/api/openai-completions.lazy';

const ollamaModel: Model<'openai-completions'> = {
  id: 'llama-3.1-8b',
  name: 'Llama 3.1 8B (Ollama)',
  api: 'openai-completions',
  provider: 'ollama',
  baseUrl: 'http://localhost:11434/v1',
  reasoning: false,
  input: ['text'],
  cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
  contextWindow: 128000,
  maxTokens: 32000
};

const ollama = createProvider({
  id: 'ollama',
  name: 'Ollama',
  baseUrl: 'http://localhost:11434/v1',
  // Every provider declares auth; keyless local servers resolve as configured with no key.
  auth: { apiKey: { name: 'Ollama', resolve: async () => ({ auth: {} }) } },
  models: [ollamaModel],
  api: openAICompletionsApi(),
});

const models = createModels();
models.setProvider(ollama);

await models.complete(models.getModel('ollama', 'llama-3.1-8b')!, context);

For providers with real keys, envApiKeyAuth(displayName, envVars) gives the standard behavior (stored credential wins, then the first set env var):

const proxy = createProvider({
  id: 'my-proxy',
  auth: { apiKey: envApiKeyAuth('My proxy API key', ['MY_PROXY_API_KEY']) },
  models: [/* ... */],
  api: openAICompletionsApi(),
});

Mixed-API providers pass a map keyed by model.api; each model dispatches to its API's implementation:

import { anthropicMessagesApi } from '@earendil-works/pi-ai/api/anthropic-messages.lazy';
import { openAIResponsesApi } from '@earendil-works/pi-ai/api/openai-responses.lazy';

const gateway = createProvider({
  id: 'my-gateway',
  auth: { apiKey: envApiKeyAuth('Gateway key', ['GATEWAY_API_KEY']) },
  models: [/* models with api: 'anthropic-messages' or 'openai-responses' */],
  api: {
    'anthropic-messages': anthropicMessagesApi(),
    'openai-responses': openAIResponsesApi(),
  },
});

Dynamic model lists use refreshModels; the provider lists empty until the first models.refresh():

const llamacpp = createProvider({
  id: 'llamacpp',
  auth: { apiKey: { name: 'llama.cpp', resolve: async () => ({ auth: {} }) } },
  models: [],
  refreshModels: async () => fetchModelsFromServer('http://localhost:8080'),
  api: openAICompletionsApi(),
});

models.setProvider(llamacpp);
await models.refresh('llamacpp');

Custom models can carry headers (e.g. proxies behind bot detection) and compat flags — see OpenAI Compatibility Settings.

Some OpenAI-compatible servers do not understand the developer role used for reasoning-capable models. For those providers, set compat.supportsDeveloperRole to false so the system prompt is sent as a system message instead. If the server also does not support reasoning_effort, set compat.supportsReasoningEffort to false too. This commonly applies to Ollama, vLLM, SGLang, and similar OpenAI-compatible servers.

Use model-level thinkingLevelMap to describe model-specific thinking controls. Keys are pi thinking levels (off, minimal, low, medium, high, xhigh). Missing keys use provider defaults, string values are sent to the provider, and null marks a level unsupported.

const ollamaReasoningModel: Model<'openai-completions'> = {
  id: 'gpt-oss:20b',
  name: 'GPT-OSS 20B (Ollama)',
  api: 'openai-completions',
  provider: 'ollama',
  baseUrl: 'http://localhost:11434/v1',
  reasoning: true,
  input: ['text'],
  cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
  contextWindow: 131072,
  maxTokens: 32000,
  thinkingLevelMap: {
    minimal: null,
    low: null,
    medium: null,
    high: 'high',
    xhigh: null,
  },
  compat: {
    supportsDeveloperRole: false,
    supportsReasoningEffort: false,
  }
};

Calling API Implementations Directly

The API implementations are importable on their own. Each module exports exactly stream and streamSimple with that API's full option typing. Direct calls bypass provider auth — pass apiKey explicitly:

import { stream } from '@earendil-works/pi-ai/api/anthropic-messages';

const s = stream(claudeModel, context, {
  apiKey: process.env.ANTHROPIC_API_KEY,
  thinkingEnabled: true,
  thinkingBudgetTokens: 2048,
});

Built-in API implementations live under ./api/<api-id>:

| API id | Options type | |--------|--------------| | anthropic-messages | AnthropicOptions | | openai-completions | OpenAICompletionsOptions | | openai-responses | OpenAIResponsesOptions | | openai-codex-responses | OpenAICodexResponsesOptions | | azure-openai-responses | AzureOpenAIResponsesOptions | | google-generative-ai | GoogleOptions | | google-vertex | GoogleVertexOptions | | mistral-conversations | MistralOptions | | bedrock-converse-stream | BedrockOptions |

Importing an implementation module loads its SDK. The ./api/<id>.lazy wrappers (used by the provider factories) defer that load to the first request when the runtime or bundler supports dynamic import chunking. Legacy raw API subpaths from older releases (./anthropic, ./google, ./mistral, ./openai-completions, ...) were removed; use @earendil-works/pi-ai/api/<api-id>.

OpenAI Compatibility Settings

The openai-completions API is implemented by many providers with minor differences. By default, the library auto-detects compatibility settings based on baseUrl for a small set of known OpenAI-compatible providers (Cerebras, xAI, Chutes, DeepSeek, NVIDIA NIM, Together AI, zAi, OpenCode, Cloudflare Workers AI, etc.). For custom proxies or unknown endpoints, you can override these settings via the compat field. For openai-responses models, the compat field supports Responses-specific flags.

interface OpenAICompletionsCompat {
  supportsStore?: boolean;           // Whether provider supports the `store` field (default: true)
  supportsDeveloperRole?: boolean;   // Whether provider supports `developer` role vs `system` (default: true)
  supportsReasoningEffort?: boolean; // Whether provider supports `reasoning_effort` (default: true)
  supportsUsageInStreaming?: boolean; // Whether provider supports `stream_options: { include_usage: true }` (default: true)
  supportsStrictMode?: boolean;      // Whether provider supports `strict` in tool definitions (default: true)
  sendSessionAffinityHeaders?: boolean; // Whether to send `session_id`, `x-client-request-id`, and `x-session-affinity` from `sessionId` when caching is enabled (default: false)
  maxTokensField?: 'max_completion_tokens' | 'max_tokens';  // Which field name to use (default: max_completion_tokens)
  requiresToolResultName?: boolean;  // Whether tool results require the `name` field (default: false)
  requiresAssistantAfterToolResult?: boolean; // Whether tool results must be followed by an assistant message (default: false)
  requiresThinkingAsText?: boolean;  // Whether thinking blocks must be converted to text (default: false)
  requiresReasoningContentOnAssistantMessages?: boolean; // Whether all replayed assistant messages must include empty reasoning_content when reasoning is enabled (default: auto-detected for DeepSeek)
  thinkingFormat?: 'openai' | 'openrouter' | 'deepseek' | 'together' | 'zai' | 'qwen' | 'chat-template' | 'qwen-chat-template' | 'string-thinking' | 'ant-ling'; // Format for reasoning param: 'openai' uses reasoning_effort, 'openrouter' uses reasoning: { effort }, 'deepseek' uses thinking: { type } plus reasoning_effort when supported, 'together' uses reasoning: { enabled } plus reasoning_effort when supported, 'zai' uses thinking: { type }, 'qwen' uses enable_thinking, 'chat-template' uses configurable chat_template_kwargs, 'qwen-chat-template' uses chat_template_kwargs.enable_thinking and preserve_thinking, 'string-thinking' uses top-level thinking, 'ant-ling' uses reasoning: { effort } only for mapped efforts (default: openai)
  chatTemplateKwargs?: Record<string, string | number | boolean | null | { '$var': 'thinking.enabled' | 'thinking.effort'; omitWhenOff?: boolean }>; // chat_template_kwargs values; use $var for pi-controlled thinking values
  cacheControlFormat?: 'anthropic';  // Anthropic-style cache_control on system prompt, last tool, and last user/assistant text content
  openRouterRouting?: OpenRouterRouting; // OpenRouter routing preferences (default: {})
  vercelGatewayRouting?: VercelGatewayRouting; // Vercel AI Gateway routing preferences (default: {})
}

interface OpenAIResponsesCompat {
  supportsDeveloperRole?: boolean;   // Whether provider supports `developer` role vs `system` (default: true)
  sendSessionIdHeader?: boolean;     // Whether to send `session_id` from `sessionId` when caching is enabled (default: true)
  supportsLongCacheRetention?: boolean; // Whether provider supports `prompt_cache_retention: "24h"` (default: true)
}

If compat is not set, the library falls back to URL-based detection. If compat is partially set, unspecified fields use the detected defaults. This is useful for:

LiteLLM proxies: May not support store field
Custom inference servers: May use non-standard field names
Self-hosted endpoints: May have different feature support

Faux Provider for Tests

fauxProvider() builds an in-memory provider with scripted responses for tests and demos:

import {
  createModels,
  fauxAssistantMessage,
  fauxProvider,
  fauxText,
  fauxThinking,
  fauxToolCall,
} from '@earendil-works/pi-ai';

const faux = fauxProvider({
  tokensPerSecond: 50 // optional
});

const models = createModels();
models.setProvider(faux.provider);

const model = faux.getModel();
const context = {
  messages: [{ role: 'user', content: 'Summarize package.json and then call echo', timestamp: Date.now() }]
};

faux.setResponses([
  fauxAssistantMessage([
    fauxThinking('Need to inspect package metadata first.'),
    fauxToolCall('echo', { text: 'package.json' })
  ], { stopReason: 'toolUse' })
]);

const first = await models.complete(model, context, {
  sessionId: 'session-1',
  cacheRetention: 'short'
});
context.messages.push(first);

context.messages.push({
  role: 'toolResult',
  toolCallId: first.content.find((block) => block.type === 'toolCall')!.id,
  toolName: 'echo',
  content: [{ type: 'text', text: 'package.json contents here' }],
  isError: false,
  timestamp: Date.now()
});

faux.setResponses([
  fauxAssistantMessage([
    fauxThinking('Now I can summarize the tool output.'),
    fauxText('Here is the summary.')
  ])
]);

const s = models.stream(model, context);
for await (const event of s) {
  console.log(event.type);
}

// Optional: multiple faux models for model-switching tests
const multiModel = fauxProvider({
  provider: 'faux-multi',
  models: [
    { id: 'faux-fast', reasoning: false },
    { id: 'faux-thinker', reasoning: true }
  ]
});
models.setProvider(multiModel.provider);
const thinker = multiModel.getModel('faux-thinker');

console.log(thinker?.reasoning);
console.log(faux.getPendingResponseCount());
console.log(faux.state.callCount);

Notes:

Responses are consumed from a queue in request start order.
If the queue is empty, the faux provider returns an assistant error message with errorMessage: "No more faux responses queued".
Use faux.setResponses([...]) to replace the remaining queue and faux.appendResponses([...]) to add more responses.
faux.models exposes all faux models. faux.getModel() returns the first one, and faux.getModel(id) returns a specific one.
Use fauxAssistantMessage(...) for scripted assistant replies. Use fauxText(...), fauxThinking(...), and fauxToolCall(...) to build content blocks without filling in low-level fields manually.
Usage is estimated at roughly 1 token per 4 characters. When sessionId is present and cacheRetention is not "none", prompt cache reads and writes are simulated automatically.
Tool call arguments stream incrementally via toolcall_delta chunks.
By default, each streamed chunk is emitted on its own microtask. Set tokensPerSecond to pace chunk delivery in real time.
The intended use is one deterministic scripted flow per handle. If you need independent concurrent flows, create separate faux providers with distinct provider ids.

Cross-Provider Handoffs

The library supports seamless handoffs between different LLM providers within the same conversation. This allows you to switch models mid-conversation while preserving context, including thinking blocks, tool calls, and tool results.

When messages from one provider are sent to a different provider, the library automatically transforms them for compatibility:

User and tool result messages are passed through unchanged
Assistant messages from the same provider/API are preserved as-is
Assistant messages from different providers have their thinking blocks converted to text with <thinking> tags
Tool calls and regular text are preserved unchanged

import { createModels, type Context } from '@earendil-works/pi-ai';
import { anthropicProvider } from '@earendil-works/pi-ai/providers/anthropic';
import { openaiProvider } from '@earendil-works/pi-ai/providers/openai';
import { googleProvider } from '@earendil-works/pi-ai/providers/google';

const models = createModels();
models.setProvider(anthropicProvider());
models.setProvider(openaiProvider());
models.setProvider(googleProvider());

const context: Context = { messages: [] };

// Start with Claude
const claude = models.getModel('anthropic', 'claude-sonnet-4-5')!;
context.messages.push({ role: 'user', content: 'What is 25 * 18?', timestamp: Date.now() });
context.messages.push(await models.completeSimple(claude, context, { reasoning: 'medium' }));

// Switch to GPT-5 - it will see Claude's thinking as <thinking> tagged text
const gpt5 = models.getModel('openai', 'gpt-5-mini')!;
context.messages.push({ role: 'user', content: 'Is that calculation correct?', timestamp: Date.now() });
context.messages.push(await models.complete(gpt5, context));

// Switch to Gemini
const gemini = models.getModel('google', 'gemini-2.5-flash')!;
context.messages.push({ role: 'user', content: 'What was the original question?', timestamp: Date.now() });
const geminiResponse = await models.complete(gemini, context);

All providers can handle messages from other providers — text, tool calls and results (including images), thinking blocks (transformed to tagged text), and aborted messages with partial content. This enables flexible workflows: start with a fast model, switch to a more capable one for complex reasoning, or maintain continuity across provider outages.

Context Serialization

The Context object can be easily serialized and deserialized using standard JSON methods, making it simple to persist conversations, implement chat history, or transfer contexts between services:

const context: Context = {
  systemPrompt: 'You are a helpful assistant.',
  messages: [
    { role: 'user', content: 'What is TypeScript?', timestamp: Date.now() }
  ]
};

const model = models.getModel('openai', 'gpt-4o-mini')!;
const response = await models.complete(model, context);
context.messages.push(response);

// Serialize the entire context
const serialized = JSON.stringify(context);

// Save to database, localStorage, file, etc.
localStorage.setItem('conversation', serialized);

// Later: deserialize and continue the conversation
const restored: Context = JSON.parse(localStorage.getItem('conversation')!);
restored.messages.push({ role: 'user', content: 'Tell me more about its type system', timestamp: Date.now() });

// Continue with any model
const newModel = models.getModel('anthropic', 'claude-3-5-haiku-20241022')!;
const continuation = await models.complete(newModel, restored);

Models are plain serializable data too — no functions or implementations attached — so persisting "which model was this conversation using" is a JSON.stringify away.

Note: If the context contains images (encoded as base64 as shown in the Image Input section), those will also be serialized.

Browser Usage

The library supports browser environments. The core entrypoint and provider factories are side-effect free and bundle cleanly. Environment variables are not available in browsers, so pass API keys explicitly — or inject a CredentialStore (e.g. localStorage-backed) and let provider auth resolve from stored credentials:

import { createModels } from '@earendil-works/pi-ai';
import { anthropicProvider } from '@earendil-works/pi-ai/providers/anthropic';

const models = createModels();
models.setProvider(anthropicProvider());

const model = models.getModel('anthropic', 'claude-3-5-haiku-20241022')!;
const response = await models.complete(model, {
  messages: [{ role: 'user', content: 'Hello!', timestamp: Date.now() }]
}, {
  apiKey: 'your-api-key'
});

Security Warning: Exposing API keys in frontend code is dangerous. Anyone can extract and abuse your keys. Only use this approach for internal tools or demos. For production applications, use a backend proxy that keeps your API keys secure.

Browser compatibility notes:

Amazon Bedrock (bedrock-converse-stream) is not supported in browser environments. It can still appear in model lists; calls fail at runtime.
OAuth login flows are Node-only. They are lazy-loaded behind bundler-opaque imports, so registering an OAuth-capable provider does not pull Node-only code into a browser bundle — only actually logging in would.
Use a server-side proxy or backend service if you need Bedrock or OAuth-based auth from a web app.

Bundling and Tree Shaking

For small bundles, import only the providers you need:

import { createModels } from '@earendil-works/pi-ai';
import { openaiProvider } from '@earendil-works/pi-ai/providers/openai';

const models = createModels();
models.setProvider(openaiProvider());

Rules:

@earendil-works/pi-ai is the core entrypoint and does not import built-in catalogs, provider factories, or SDK implementations.
@earendil-works/pi-ai/providers/<provider> imports that provider's catalog and lazy API wrapper only.
@earendil-works/pi-ai/providers/all imports every built-in provider factory and all catalogs. Use it only when you want the full built-in set.
With code splitting, provider SDKs stay in lazy chunks and load on first request.
Without code splitting, bundlers fold reachable lazy API implementations into the single bundle. A single-provider bundle then includes that provider's SDK; providers/all includes all statically visible SDKs. Bedrock is the exception: its AWS SDK implementation is loaded through a bundler-opaque Node-only import.
Importing @earendil-works/pi-ai/api/<api-id> directly loads that API implementation and its SDK immediately.

Avoid @earendil-works/pi-ai/compat in new bundled apps; it preserves the old global API and imports the full built-in catalog surface.

For single-file Node ESM bundles, some SDK dependencies may still use dynamic CommonJS require() internally. If you see errors such as Dynamic require of "child_process" is not supported, add a Node require shim to the bundle. With esbuild:

esbuild app.js --bundle --platform=node --format=esm \
  --banner:js='import { createRequire } from "module";const require = createRequire(import.meta.url);' \
  --outfile=app.bundle.js

This is only for Node bundles; it is not a browser or Cloudflare Workers workaround.

Bedrock is Node-only. Add it like any other provider:

import { createModels } from '@earendil-works/pi-ai';
import { amazonBedrockProvider } from '@earendil-works/pi-ai/providers/amazon-bedrock';

const models = createModels();
models.setProvider(amazonBedrockProvider());

In normal Node package usage and code-split bundles, Bedrock loads its AWS SDK implementation lazily. For a standalone single-file bundle that must include Bedrock support, register the implementation module explicitly:

import { setBedrockProviderModule } from '@earendil-works/pi-ai/api/bedrock-converse-stream.lazy';
import { bedrockProviderModule } from '@earendil-works/pi-ai/bedrock-provider';

setBedrockProviderModule(bedrockProviderModule);

That explicit override bundles the AWS SDK. Without it, Bedrock's opaque runtime import expects the package's Bedrock implementation file to be available at runtime.

Provider-Scoped Environment Overrides

Pass env in stream options to scope provider configuration to a request. Values in env are used before process environment variables for provider auth and configuration such as Cloudflare account IDs, Azure OpenAI settings, Vertex project/location, Bedrock settings, PI_CACHE_RETENTION, and HTTP_PROXY/HTTPS_PROXY.

const models = builtinModels();
const model = models.getModel('cloudflare-ai-gateway', 'workers-ai/@cf/moonshotai/kimi-k2.6')!;

const response = await models.complete(model, context, {
  env: {
    CLOUDFLARE_API_KEY: '...',
    CLOUDFLARE_ACCOUNT_ID: 'account-id',
    CLOUDFLARE_GATEWAY_ID: 'gateway-id'
  }
});

Use this when one process needs different provider settings per request, or when ambient environment variables should not leak into a provider call.

OAuth Providers

Several providers support OAuth authentication instead of static API keys:

Anthropic (Claude Pro/Max subscription)
OpenAI Codex (ChatGPT Plus/Pro subscription, access to GPT-5.x Codex models)
GitHub Copilot (Copilot subscription)

Each of these providers carries an OAuthAuth on provider.auth.oauth with three operations: login(callbacks) runs the interactive flow and returns a credential, refresh(credential) exchanges the refresh token, and toAuth(credential) derives request auth (GitHub Copilot's per-account base URL comes from here). Refresh is automatic: models.getAuth() and the request paths refresh expired tokens under a credential-store lock, so concurrent requests and processes cannot double-refresh.

import { createModels } from '@earendil-works/pi-ai';
import { anthropicProvider } from '@earendil-works/pi-ai/providers/anthropic';

const models = createModels({ credentials: myStore }); // persistent CredentialStore
models.setProvider(anthropicProvider());

// Login: drive the flow with prompt()/notify() callbacks, persist the credential
const provider = models.getProvider('anthropic')!;
const credential = await provider.auth.oauth!.login({
  prompt: async (p) => {
    // p.type: 'text' | 'secret' | 'select' | 'manual_code'
    // manual_code prompts race a local callback server; p.signal aborts them when the server wins
    return await askUser(p.message);
  },
  notify: (event) => {
    // event.type: 'auth_url' | 'device_code' | 'progress'
    if (event.type === 'auth_url') console.log(`Open: ${event.url}`);
    if (event.type === 'device_code') console.log(`Code: ${event.userCode} at ${event.verificationUri}`);
    if (event.type === 'progress') console.log(event.message);
  },
});
await myStore.modify('anthropic', async () => credential);

// From here on, requests resolve and refresh the token automatically
const model = models.getModel('anthropic', 'claude-sonnet-4-5')!;
await models.complete(model, context);

// Logout
await myStore.delete('anthropic');

Vertex AI

Vertex AI models support either a Google Cloud API key or Application Default Credentials (ADC):

API key: Set GOOGLE_CLOUD_API_KEY or pass apiKey in the call options.
Local development (ADC): Run gcloud auth application-default login
CI/Production (ADC): Set GOOGLE_APPLICATION_CREDENTIALS to point to a service account JSON key file

When using ADC, also set GOOGLE_CLOUD_PROJECT (or GCLOUD_PROJECT) and GOOGLE_CLOUD_LOCATION. You can also pass project/location in the call options. When using GOOGLE_CLOUD_API_KEY, project and location are not required.

# Local (uses your user credentials)
gcloud auth application-default login
export GOOGLE_CLOUD_PROJECT="my-project"
export GOOGLE_CLOUD_LOCATION="us-central1"

# CI/Production (service account key file)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

Official docs: Application Default Credentials

CLI Login

The quickest way to authenticate:

npx @earendil-works/pi-ai login              # interactive provider selection
npx @earendil-works/pi-ai login anthropic    # login to specific provider
npx @earendil-works/pi-ai list               # list available providers

Credentials are saved to auth.json in the current directory.

Programmatic OAuth

The legacy flow functions remain available via the @earendil-works/pi-ai/oauth entry point (loginAnthropic, loginOpenAICodex, loginGitHubCopilot, refreshOAuthToken, getOAuthApiKey); credential storage is the caller's responsibility there. New code should prefer the provider-owned OAuthAuth shown above — it composes with the credential store and gets locked auto-refresh for free.

Provider notes:

OpenAI Codex: Requires a ChatGPT Plus or Pro subscription. Provides access to GPT-5.x Codex models with extended context windows and reasoning capabilities. The library automatically handles session-based prompt caching when sessionId is provided in stream options. You can set transport in stream options to "sse", "websocket", or "auto" for Codex Responses transport selection. When using WebSocket with a sessionId, connections are reused per session and expire after 5 minutes of inactivity.

Azure OpenAI (Responses): Uses the Responses API only. Set AZURE_OPENAI_API_KEY and either AZURE_OPENAI_BASE_URL or AZURE_OPENAI_RESOURCE_NAME. AZURE_OPENAI_BASE_URL supports both https://<resource>.openai.azure.com and https://<resource>.cognitiveservices.azure.com; root endpoints are normalized to .../openai/v1 automatically. Use AZURE_OPENAI_API_VERSION (defaults to v1) to override the API version if needed. Deployment names are treated as model IDs by default, override with azureDeploymentName or AZURE_OPENAI_DEPLOYMENT_NAME_MAP using comma-separated model-id=deployment pairs (for example gpt-4o-mini=my-deployment,gpt-4o=prod). Legacy deployment-based URLs are intentionally unsupported.

GitHub Copilot: If you get "The requested model is not supported" error, enable the model manually in VS Code: open Copilot Chat, click the model selector, select the model (warning icon), and click "Enable".

Migrating from the Old Global API

Older versions exposed a global API: stream()/complete() dispatching on model.api via a global registry, sync getModel()/getModels()/getProviders() catalog reads, registerApiProvider(), getEnvApiKey(), and per-API lazy stream functions. That surface lives unchanged on the compat entrypoint:

// Before
import { getModel, complete } from '@earendil-works/pi-ai';

// After (verbatim behavior, one import-path change)
import { getModel, complete } from '@earendil-works/pi-ai/compat';

Compat is a strict superset of the root entrypoint, so a file can switch its import path wholesale. It will be removed in a future release; migrate to createModels() + provider factories:

| Old | New | |-----|-----| | getModel('openai', 'gpt-4o-mini') | models.getModel('openai', 'gpt-4o-mini') or getBuiltinModel() from providers/all | | getModels('anthropic') / getProviders() | models.getModels('anthropic') / models.getProviders() or getBuiltin* | | stream(model, ctx, opts) (env-key injection) | models.stream(model, ctx, opts) (provider auth resolution) | | registerApiProvider({ api, stream, streamSimple }) | createProvider({ id, auth, models, api }) + models.setProvider() | | getEnvApiKey('openai') | await models.getAuth(model) | | streamAnthropic(model, ctx, opts) | stream from @earendil-works/pi-ai/api/anthropic-messages, or a provider in a collection | | registerFauxProvider() | fauxProvider() + models.setProvider() |

Development

Adding a New Provider

Adding a new LLM provider requires changes across multiple files. The layered layout: API implementations live in src/api/, provider factories in src/providers/, generated catalogs in src/providers/<id>.models.ts. This checklist covers all necessary steps:

1. Core Types (`src/types.ts`)

Add the API identifier to KnownApi (for example "bedrock-converse-stream"), if it is a new API
Add the provider name to KnownProvider (for example "amazon-bedrock")
Add the options type to ApiOptionsMap

2. API Implementation (`src/api/<api-id>.ts`, only for a new API)

Create a new API implementa

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@earendil-works/pi-ai

Table of Contents

Supported Providers

Installation

Quick Start

Providers and Models

Provider Factories

All Built-in Providers

Querying Models

Static Catalog Reads

Dynamic Providers

Auth

How Auth Resolves

Credential Store

Environment Variables

Tools

Defining Tools

Handling Tool Calls

Streaming Tool Calls with Partial JSON

Validating Tool Arguments

Complete Event Reference

Image Input

Image Generation

Basic Image Generation

Notes and Limitations

Thinking/Reasoning

Unified Interface (streamSimple/completeSimple)

Provider-Specific Options (stream/complete)

Streaming Thinking Content

Stop Reasons

Error Handling

Aborting Requests

Continuing After Abort

Debugging Provider Payloads

Custom Providers

createProvider()

Calling API Implementations Directly

OpenAI Compatibility Settings

Faux Provider for Tests

Cross-Provider Handoffs

Context Serialization

Browser Usage

Bundling and Tree Shaking

Provider-Scoped Environment Overrides

OAuth Providers

Vertex AI

CLI Login

Programmatic OAuth

Migrating from the Old Global API

Development

Adding a New Provider

1. Core Types (src/types.ts)

2. API Implementation (src/api/<api-id>.ts, only for a new API)

1. Core Types (`src/types.ts`)

2. API Implementation (`src/api/<api-id>.ts`, only for a new API)