@earendil-works/pi-ai
v0.80.2
Published
Unified LLM API with automatic model discovery and provider configuration
Downloads
8,117,308
Readme
@earendil-works/pi-ai
Unified LLM API with provider collections, automatic auth resolution, token and cost tracking, and simple context persistence and hand-off to other models mid-session.
Note: This library only includes models that support tool calling (function calling), as this is essential for agentic workflows.
Table of Contents
- Supported Providers
- Installation
- Quick Start
- Providers and Models
- Auth
- Tools
- Image Input
- Image Generation
- Thinking/Reasoning
- Stop Reasons
- Error Handling
- Custom Providers
- Faux Provider for Tests
- Cross-Provider Handoffs
- Context Serialization
- Browser Usage
- Bundling and Tree Shaking
- OAuth Providers
- Migrating from the Old Global API
- Development
- License
Supported Providers
- OpenAI
- Ant Ling
- Azure OpenAI (Responses)
- OpenAI Codex (ChatGPT Plus/Pro subscription, requires OAuth, see below)
- DeepSeek
- NVIDIA NIM
- Anthropic
- Vertex AI (Gemini via Vertex AI)
- Mistral
- Groq
- Cerebras
- Cloudflare AI Gateway
- Cloudflare Workers AI
- xAI
- OpenRouter
- Vercel AI Gateway
- ZAI Coding Plan (Global) (with separate China provider)
- MiniMax (with separate China provider)
- Together AI
- Hugging Face
- Moonshot AI (with separate China provider)
- GitHub Copilot (requires OAuth, see below)
- Amazon Bedrock
- OpenCode Zen
- OpenCode Go
- Fireworks (uses OpenAI- and Anthropic-compatible APIs)
- Kimi For Coding (Moonshot AI subscription endpoint, uses Anthropic-compatible API)
- Xiaomi MiMo (defaults to API billing endpoint, with separate Token Plan providers for
cn/ams/sgpregions) - Any OpenAI-compatible API: Ollama, vLLM, LM Studio, etc.
Installation
npm install @earendil-works/pi-aiTypeBox exports are re-exported from @earendil-works/pi-ai: Type, Static, and TSchema.
Quick Start
You build a Models collection of providers and stream through it. The quickest start registers every built-in provider; apps that care about bundle size register individual providers instead (see Provider Factories and Bundling and Tree Shaking).
import { Type, type Context, type Tool } from '@earendil-works/pi-ai';
import { builtinModels } from '@earendil-works/pi-ai/providers/all';
// A Models collection with every built-in provider registered
const models = builtinModels();
// Sync lookup against the collection
const model = models.getModel('openai', 'gpt-4o-mini')!;
// Define tools with TypeBox schemas for type safety and validation
const tools: Tool[] = [{
name: 'get_time',
description: 'Get the current time',
parameters: Type.Object({
timezone: Type.Optional(Type.String({ description: 'Optional timezone (e.g., America/New_York)' }))
})
}];
// Build a conversation context (easily serializable and transferable between models)
const context: Context = {
systemPrompt: 'You are a helpful assistant.',
messages: [{ role: 'user', content: 'What time is it?', timestamp: Date.now() }],
tools
};
// Option 1: Streaming with all event types.
// Auth resolves through the provider (OPENAI_API_KEY from the environment here).
const s = models.stream(model, context);
for await (const event of s) {
switch (event.type) {
case 'start':
console.log(`Starting with ${event.partial.model}`);
break;
case 'text_start':
console.log('\n[Text started]');
break;
case 'text_delta':
process.stdout.write(event.delta);
break;
case 'text_end':
console.log('\n[Text ended]');
break;
case 'thinking_start':
console.log('[Model is thinking...]');
break;
case 'thinking_delta':
process.stdout.write(event.delta);
break;
case 'thinking_end':
console.log('[Thinking complete]');
break;
case 'toolcall_start':
console.log(`\n[Tool call started: index ${event.contentIndex}]`);
break;
case 'toolcall_delta':
// Partial tool arguments are being streamed
const partialCall = event.partial.content[event.contentIndex];
if (partialCall.type === 'toolCall') {
console.log(`[Streaming args for ${partialCall.name}]`);
}
break;
case 'toolcall_end':
console.log(`\nTool called: ${event.toolCall.name}`);
console.log(`Arguments: ${JSON.stringify(event.toolCall.arguments)}`);
break;
case 'done':
console.log(`\nFinished: ${event.reason}`);
break;
case 'error':
console.error(`Error: ${event.error.errorMessage}`);
break;
}
}
// Get the final message after streaming, add it to the context
const finalMessage = await s.result();
context.messages.push(finalMessage);
// Handle tool calls if any
const toolCalls = finalMessage.content.filter(b => b.type === 'toolCall');
for (const call of toolCalls) {
const result = call.name === 'get_time'
? new Date().toLocaleString('en-US', {
timeZone: call.arguments.timezone || 'UTC',
dateStyle: 'full',
timeStyle: 'long'
})
: 'Unknown tool';
// Add tool result to context (supports text and images)
context.messages.push({
role: 'toolResult',
toolCallId: call.id,
toolName: call.name,
content: [{ type: 'text', text: result }],
isError: false,
timestamp: Date.now()
});
}
// Continue if there were tool calls
if (toolCalls.length > 0) {
const continuation = await models.complete(model, context);
context.messages.push(continuation);
console.log('After tool execution:', continuation.content);
}
console.log(`Total tokens: ${finalMessage.usage.input} in, ${finalMessage.usage.output} out`);
console.log(`Cost: $${finalMessage.usage.cost.total.toFixed(4)}`);
// Option 2: Get complete response without streaming
const response = await models.complete(model, context);
for (const block of response.content) {
if (block.type === 'text') {
console.log(block.text);
} else if (block.type === 'toolCall') {
console.log(`Tool: ${block.name}(${JSON.stringify(block.arguments)})`);
}
}Snippets in the rest of this README assume a models collection set up like this (with the relevant providers registered).
Providers and Models
A provider is the runtime unit: it owns its model catalog, its auth (API key resolution, OAuth flows), and its stream behavior. A Models collection holds providers and routes every request to the provider that owns the model.
Providers internally share API implementations (the wire protocols): Anthropic models use anthropic-messages, OpenAI uses openai-responses, while xAI, Groq, Cerebras, OpenRouter, and most others share openai-completions. Mixed-API providers (GitHub Copilot, OpenCode Zen) dispatch per model.
Provider Factories
For apps that only need specific providers, there is one factory per built-in provider, each a subpath import that pulls only that provider's catalog:
import { anthropicProvider } from '@earendil-works/pi-ai/providers/anthropic';
import { openaiProvider } from '@earendil-works/pi-ai/providers/openai';
import { openrouterProvider } from '@earendil-works/pi-ai/providers/openrouter';
import { amazonBedrockProvider } from '@earendil-works/pi-ai/providers/amazon-bedrock';
// ...one module per provider in the Supported Providers list
const models = createModels();
models.setProvider(anthropicProvider());
models.setProvider(openrouterProvider());Provider factories import their model catalog and a lazy API wrapper. They do not import other providers. With bundler code splitting, SDK implementations (@anthropic-ai/sdk, openai, @google/genai, etc.) stay in lazy chunks loaded on the first request to a model of that API.
All Built-in Providers
For apps that want everything (as in Quick Start):
import { builtinModels } from '@earendil-works/pi-ai/providers/all';
const models = builtinModels(); // a Models collection with every built-in provider registeredThis imports all catalogs and every built-in provider factory. It is the heavy, explicit entrypoint. builtinModels() accepts the same options as createModels() (credentials, authContext); builtinProviders() returns the provider array if you want to register them on your own collection.
Querying Models
Reads are synchronous and return the last-known lists:
const providers = models.getProviders(); // registered Provider objects
const provider = models.getProvider('anthropic'); // one provider
const all = models.getModels(); // every model across providers
const anthropicModels = models.getModels('anthropic');
const model = models.getModel('anthropic', 'claude-sonnet-4-5');
for (const m of anthropicModels) {
console.log(`${m.id}: ${m.name}`);
console.log(` API: ${m.api}`);
console.log(` Context: ${m.contextWindow} tokens`);
console.log(` Vision: ${m.input.includes('image')}`);
console.log(` Reasoning: ${m.reasoning}`);
}Dynamically listed models are typed Model<Api>. Narrow with the hasApi() guard when you need API-specific option typing:
import { hasApi } from '@earendil-works/pi-ai';
const m = models.getModel('anthropic', 'claude-sonnet-4-5');
if (m && hasApi(m, 'anthropic-messages')) {
// m: Model<'anthropic-messages'> — stream options fully typed
models.stream(m, context, { thinkingEnabled: true, thinkingBudgetTokens: 2048 });
}Static Catalog Reads
For tooling that wants the generated built-in catalog with full literal typing (provider and model IDs auto-complete), independent of any collection:
import { getBuiltinModel, getBuiltinModels, getBuiltinProviders } from '@earendil-works/pi-ai/providers/all';
const model = getBuiltinModel('openai', 'gpt-4o-mini'); // typed Model<'openai-responses'>
const providers = getBuiltinProviders();
const anthropic = getBuiltinModels('anthropic');Dynamic Providers
Providers may have dynamic model lists (a llama.cpp server, a live OpenRouter listing). Reads stay sync; fetching is an explicit async verb:
// getModels() returns the last-known list (empty before the first refresh)
await models.refresh('llamacpp'); // fetch one provider's list; rejects on failure
await models.refresh(); // refresh all providers concurrently, best-effort
const fresh = models.getModel('llamacpp', 'qwen3-30b');Static built-in providers are no-ops for refresh(). See createProvider() for building a dynamic provider.
Auth
Every provider owns its auth: how API keys resolve (stored credentials, environment variables, ambient sources like AWS profiles or gcloud ADC) and, where supported, OAuth login/refresh flows.
How Auth Resolves
When you call models.stream(), the collection resolves auth through the owning provider and merges it into the request. Explicit per-request values always win:
// Resolved through the provider (env var, stored credential, OAuth token):
await models.complete(model, context);
// Explicit key wins over anything the provider would resolve:
await models.complete(model, context, { apiKey: 'sk-explicit' });You can inspect resolution without making a request — useful for status UIs:
const auth = await models.getAuth(model);
if (auth) {
console.log(`configured via ${auth.source}`); // e.g. "ANTHROPIC_API_KEY", "OAuth", "stored credential"
} else {
console.log('not configured');
}getAuth() resolves undefined for unconfigured providers and rejects with ModelsError when something is actually broken ("oauth": token refresh failed, credential preserved for re-login; "auth": key resolution or credential store failure). Request paths surface the same failures as stream errors.
Credential Store
Stored credentials (API keys entered interactively, OAuth tokens) live in a CredentialStore — one type-tagged credential per provider. pi-ai ships an in-memory default; apps inject persistent storage:
import { createModels, type CredentialStore } from '@earendil-works/pi-ai';
const models = createModels({ credentials: myFileBackedStore });
// builtinModels() takes the same options:
// const models = builtinModels({ credentials: myFileBackedStore });The contract is small: read(providerId), modify(providerId, fn) (the only write path — a serialized read-modify-write), and delete(providerId). OAuth token refresh runs inside modify, so concurrent requests and processes cannot double-refresh a rotated token. A stored credential owns its provider: environment variables are only consulted when nothing is stored, and a failed refresh never silently falls back to an env key.
API-key credentials use the same discriminator as pi's auth.json and can carry provider-scoped env/config values:
const credential = {
type: 'api_key',
key: '...',
env: {
CLOUDFLARE_ACCOUNT_ID: 'account-id',
CLOUDFLARE_GATEWAY_ID: 'gateway-id'
}
} as const;Environment Variables
Built-in providers resolve these env vars (Node.js; in browsers pass apiKey explicitly):
| Provider | Environment Variable(s) |
|----------|------------------------|
| OpenAI | OPENAI_API_KEY |
| Ant Ling | ANT_LING_API_KEY |
| Azure OpenAI | AZURE_OPENAI_API_KEY + AZURE_OPENAI_BASE_URL (e.g. https://{resource}.openai.azure.com) or AZURE_OPENAI_RESOURCE_NAME. Supports *.openai.azure.com and *.cognitiveservices.azure.com; root endpoints auto-normalize to /openai/v1. Optional: AZURE_OPENAI_API_VERSION (default v1), AZURE_OPENAI_DEPLOYMENT_NAME_MAP. |
| Anthropic | ANTHROPIC_API_KEY or ANTHROPIC_OAUTH_TOKEN |
| DeepSeek | DEEPSEEK_API_KEY |
| NVIDIA NIM | NVIDIA_API_KEY |
| Google | GEMINI_API_KEY |
| Vertex AI | GOOGLE_CLOUD_API_KEY or GOOGLE_CLOUD_PROJECT (or GCLOUD_PROJECT) + GOOGLE_CLOUD_LOCATION + ADC |
| Mistral | MISTRAL_API_KEY |
| Groq | GROQ_API_KEY |
| Cerebras | CEREBRAS_API_KEY |
| Cloudflare AI Gateway | CLOUDFLARE_API_KEY + CLOUDFLARE_ACCOUNT_ID + CLOUDFLARE_GATEWAY_ID |
| Cloudflare Workers AI | CLOUDFLARE_API_KEY + CLOUDFLARE_ACCOUNT_ID |
| xAI | XAI_API_KEY |
| Fireworks | FIREWORKS_API_KEY |
| Together AI | TOGETHER_API_KEY |
| OpenRouter | OPENROUTER_API_KEY |
| Vercel AI Gateway | AI_GATEWAY_API_KEY |
| ZAI Coding Plan (Global) | ZAI_API_KEY |
| ZAI Coding Plan (China) | ZAI_CODING_CN_API_KEY |
| MiniMax (Global) | MINIMAX_API_KEY |
| MiniMax (China) | MINIMAX_CN_API_KEY |
| Moonshot AI / Moonshot AI (China) | MOONSHOT_API_KEY |
| Hugging Face | HF_TOKEN |
| OpenCode Zen / OpenCode Go | OPENCODE_API_KEY |
| Kimi For Coding | KIMI_API_KEY |
| Xiaomi MiMo (API billing) | XIAOMI_API_KEY |
| Xiaomi MiMo Token Plan (China) | XIAOMI_TOKEN_PLAN_CN_API_KEY |
| Xiaomi MiMo Token Plan (Amsterdam) | XIAOMI_TOKEN_PLAN_AMS_API_KEY |
| Xiaomi MiMo Token Plan (Singapore) | XIAOMI_TOKEN_PLAN_SGP_API_KEY |
| GitHub Copilot | COPILOT_GITHUB_TOKEN |
Amazon Bedrock resolves ambient AWS credentials (AWS_PROFILE, access key pairs, AWS_BEARER_TOKEN_BEDROCK, ECS task roles, web identity tokens). Vertex AI resolves either an explicit key or gcloud Application Default Credentials plus project/location.
Tools
Tools enable LLMs to interact with external systems. This library uses TypeBox schemas for type-safe tool definitions with automatic validation using TypeBox's built-in validator and value conversion utilities. TypeBox schemas can be serialized and deserialized as plain JSON, making them ideal for distributed systems.
Defining Tools
import { Type, type Tool, StringEnum } from '@earendil-works/pi-ai';
// Define tool parameters with TypeBox
const weatherTool: Tool = {
name: 'get_weather',
description: 'Get current weather for a location',
parameters: Type.Object({
location: Type.String({ description: 'City name or coordinates' }),
units: StringEnum(['celsius', 'fahrenheit'], { default: 'celsius' })
})
};
// Note: For Google API compatibility, use StringEnum helper instead of Type.Enum
// Type.Enum generates anyOf/const patterns that Google doesn't support
const bookMeetingTool: Tool = {
name: 'book_meeting',
description: 'Schedule a meeting',
parameters: Type.Object({
title: Type.String({ minLength: 1 }),
startTime: Type.String({ format: 'date-time' }),
endTime: Type.String({ format: 'date-time' }),
attendees: Type.Array(Type.String({ format: 'email' }), { minItems: 1 })
})
};Handling Tool Calls
Tool results use content blocks and can include both text and images:
import { readFileSync } from 'fs';
const context: Context = {
messages: [{ role: 'user', content: 'What is the weather in London?', timestamp: Date.now() }],
tools: [weatherTool]
};
const response = await models.complete(model, context);
// Check for tool calls in the response
for (const block of response.content) {
if (block.type === 'toolCall') {
// Execute your tool with the arguments
// See "Validating Tool Arguments" section for validation
const result = await executeWeatherApi(block.arguments);
// Add tool result with text content
context.messages.push({
role: 'toolResult',
toolCallId: block.id,
toolName: block.name,
content: [{ type: 'text', text: JSON.stringify(result) }],
isError: false,
timestamp: Date.now()
});
}
}
// Tool results can also include images (for vision-capable models)
const imageBuffer = readFileSync('chart.png');
context.messages.push({
role: 'toolResult',
toolCallId: 'tool_xyz',
toolName: 'generate_chart',
content: [
{ type: 'text', text: 'Generated chart showing temperature trends' },
{ type: 'image', data: imageBuffer.toString('base64'), mimeType: 'image/png' }
],
isError: false,
timestamp: Date.now()
});Streaming Tool Calls with Partial JSON
During streaming, tool call arguments are progressively parsed as they arrive. This enables real-time UI updates before the complete arguments are available:
const s = models.stream(model, context);
for await (const event of s) {
if (event.type === 'toolcall_delta') {
const toolCall = event.partial.content[event.contentIndex];
// toolCall.arguments contains partially parsed JSON during streaming
// This allows for progressive UI updates
if (toolCall.type === 'toolCall' && toolCall.arguments) {
// BE DEFENSIVE: arguments may be incomplete
// Example: Show file path being written even before content is complete
if (toolCall.name === 'write_file' && toolCall.arguments.path) {
console.log(`Writing to: ${toolCall.arguments.path}`);
// Content might be partial or missing
if (toolCall.arguments.content) {
console.log(`Content preview: ${toolCall.arguments.content.substring(0, 100)}...`);
}
}
}
}
if (event.type === 'toolcall_end') {
// Here toolCall.arguments is complete (but not yet validated)
const toolCall = event.toolCall;
console.log(`Tool completed: ${toolCall.name}`, toolCall.arguments);
}
}Important notes about partial tool arguments:
- During
toolcall_deltaevents,argumentscontains the best-effort parse of partial JSON - Fields may be missing or incomplete - always check for existence before use
- String values may be truncated mid-word
- Arrays may be incomplete
- Nested objects may be partially populated
- At minimum,
argumentswill be an empty object{}, neverundefined - The Google provider does not support function call streaming. Instead, you will receive a single
toolcall_deltaevent with the full arguments.
Validating Tool Arguments
When implementing your own tool execution loop, use validateToolCall to validate arguments before passing them to your tools:
import { validateToolCall, type Tool } from '@earendil-works/pi-ai';
const tools: Tool[] = [weatherTool, calculatorTool];
const s = models.stream(model, { messages, tools });
for await (const event of s) {
if (event.type === 'toolcall_end') {
const toolCall = event.toolCall;
try {
// Validate arguments against the tool's schema (throws on invalid args)
const validatedArgs = validateToolCall(tools, toolCall);
const result = await executeMyTool(toolCall.name, validatedArgs);
// ... add tool result to context
} catch (error) {
// Validation failed - return error as tool result so model can retry
context.messages.push({
role: 'toolResult',
toolCallId: toolCall.id,
toolName: toolCall.name,
content: [{ type: 'text', text: error.message }],
isError: true,
timestamp: Date.now()
});
}
}
}Complete Event Reference
All streaming events emitted during assistant message generation:
| Event Type | Description | Key Properties |
|------------|-------------|----------------|
| start | Stream begins | partial: Initial assistant message structure |
| text_start | Text block starts | contentIndex: Position in content array |
| text_delta | Text chunk received | delta: New text, contentIndex: Position |
| text_end | Text block complete | content: Full text, contentIndex: Position |
| thinking_start | Thinking block starts | contentIndex: Position in content array |
| thinking_delta | Thinking chunk received | delta: New text, contentIndex: Position |
| thinking_end | Thinking block complete | content: Full thinking, contentIndex: Position |
| toolcall_start | Tool call begins | contentIndex: Position in content array |
| toolcall_delta | Tool arguments streaming | delta: JSON chunk, partial.content[contentIndex].arguments: Partial parsed args |
| toolcall_end | Tool call complete | toolCall: Complete validated tool call with id, name, arguments |
| done | Stream complete | reason: Stop reason ("stop", "length", "toolUse"), message: Final assistant message |
| error | Error occurred | reason: Error type ("error" or "aborted"), error: AssistantMessage with partial content |
Streaming events for different content blocks are not guaranteed to be contiguous. Providers may emit deltas for text, thinking, and tool calls in the same upstream chunk, and pi may surface corresponding events interleaved, for example text_start, text_delta, toolcall_start, text_delta, toolcall_delta. Consumers must use contentIndex to associate each delta/end event with its block and must not assume that a block's *_start/*_delta/*_end sequence is uninterrupted by events for other blocks.
Image Input
Models with vision capabilities can process images. You can check if a model supports images via the input property. If you pass images to a non-vision model, they are silently ignored.
import { readFileSync } from 'fs';
const model = models.getModel('openai', 'gpt-4o-mini')!;
// Check if model supports images
if (model.input.includes('image')) {
console.log('Model supports vision');
}
const imageBuffer = readFileSync('image.png');
const base64Image = imageBuffer.toString('base64');
const response = await models.complete(model, {
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{ type: 'image', data: base64Image, mimeType: 'image/png' }
],
timestamp: Date.now()
}]
});
// Access the response
for (const block of response.content) {
if (block.type === 'text') {
console.log(block.text);
}
}Image Generation
Image generation uses a separate API surface from text/chat generation, mirroring the chat-side design: an ImagesModels collection holds ImagesProviders, reads are sync, and auth resolves through the owning provider. Image generation is a one-shot API: generateImages() waits for the provider response and returns the final AssistantImages result — do not use the chat/stream APIs for it.
Basic Image Generation
import { builtinImagesModels } from '@earendil-works/pi-ai/providers/all';
// Every built-in image-generation provider; accepts the same options as createModels()
const imagesModels = builtinImagesModels();
const model = imagesModels.getModel('openrouter', 'google/gemini-2.5-flash-image')!;
// Auth resolves through the provider (OPENROUTER_API_KEY here); explicit apiKey wins
const result = await imagesModels.generateImages(model, {
input: [{ type: 'text', text: 'Generate a red circle on a plain white background.' }]
});
for (const block of result.output) {
if (block.type === 'text') {
console.log(block.text);
} else if (block.type === 'image') {
console.log(block.mimeType);
console.log(block.data.substring(0, 32));
}
}Like the chat side, you can build the collection from parts: createImagesModels({ credentials?, authContext? }), the openrouterImagesProvider() factory from @earendil-works/pi-ai/providers/openrouter-images, and createImagesProvider({ id, auth, models, refreshModels?, api }) for custom image providers (with imagesModels.refresh(provider?) for dynamic lists). Failures never reject — they return an AssistantImages with stopReason: "error". The collection's getAuth(model) works exactly like the chat-side one.
The old global API (getImageModel() / getImageModels() / getImageProviders() / generateImages()) remains available on the compat entrypoint:
import { getImageModel, generateImages } from '@earendil-works/pi-ai/compat';
const model = getImageModel('openrouter', 'google/gemini-2.5-flash-image');
const result = await generateImages(model, {
input: [{ type: 'text', text: 'Generate a red circle on a plain white background.' }]
}, {
apiKey: process.env.OPENROUTER_API_KEY
});Some models also support image input:
import { readFileSync } from 'fs';
const imageBuffer = readFileSync('input.png');
const result = await imagesModels.generateImages(model, {
input: [
{ type: 'text', text: 'Create a variation of this image with a blue background.' },
{ type: 'image', data: imageBuffer.toString('base64'), mimeType: 'image/png' }
]
});Check capabilities on the model metadata:
console.log(model.input); // ['text', 'image']
console.log(model.output); // ['image'] or ['image', 'text']Notes and Limitations
- Image models live in
ImagesModelscollections, chat models inModelscollections; the two are separate surfaces. - Use
generateImages(), not the chat/stream APIs. - Image-generation models do not participate in tool calling.
- Outputs are returned in
AssistantImages.outputand can include both base64-encodedImageContentblocks andTextContentblocks. - Some models return only images, others return images plus text. Check
model.output. - Some models accept image input, others are text-to-image only. Check
model.input. - Like the streaming APIs, image generation supports options such as
apiKey,signal,headers,onPayload, andonResponse, and results may includestopReason,responseId, andusage. - If you want a model to analyze images in a conversation or call tools, use the regular chat APIs with a model that supports image input.
- At the moment, image generation is available through only one provider, OpenRouter.
Thinking/Reasoning
Many models support thinking/reasoning capabilities where they can show their internal thought process. You can check if a model supports reasoning via the reasoning property. If you pass reasoning options to a non-reasoning model, they are silently ignored.
Unified Interface (streamSimple/completeSimple)
// Many models across providers support thinking/reasoning
const model = models.getModel('anthropic', 'claude-sonnet-4-5')!;
// or models.getModel('openai', 'gpt-5-mini');
// or models.getModel('google', 'gemini-2.5-flash');
// or models.getModel('xai', 'grok-code-fast-1');
// Check if model supports reasoning
if (model.reasoning) {
console.log('Model supports reasoning/thinking');
}
// Use the simplified reasoning option
const response = await models.completeSimple(model, {
messages: [{ role: 'user', content: 'Solve: 2x + 5 = 13', timestamp: Date.now() }]
}, {
reasoning: 'medium' // 'minimal' | 'low' | 'medium' | 'high' | 'xhigh'
});
// Access thinking and text blocks
for (const block of response.content) {
if (block.type === 'thinking') {
console.log('Thinking:', block.thinking);
} else if (block.type === 'text') {
console.log('Response:', block.text);
}
}Provider-Specific Options (stream/complete)
models.stream()/complete() accept the owning API's full option set. Use hasApi() to narrow a dynamically looked-up model to its API for full option typing:
import { hasApi } from '@earendil-works/pi-ai';
// OpenAI Reasoning (o1, o3, gpt-5)
const openaiModel = models.getModel('openai', 'gpt-5-mini')!;
if (hasApi(openaiModel, 'openai-responses')) {
await models.complete(openaiModel, context, {
reasoningEffort: 'medium',
reasoningSummary: 'detailed' // OpenAI Responses API only
});
}
// Anthropic Thinking
const anthropicModel = models.getModel('anthropic', 'claude-sonnet-4-5')!;
if (hasApi(anthropicModel, 'anthropic-messages')) {
await models.complete(anthropicModel, context, {
thinkingEnabled: true,
thinkingBudgetTokens: 8192 // Optional token limit
});
}
// Google Gemini Thinking
const googleModel = models.getModel('google', 'gemini-2.5-flash')!;
if (hasApi(googleModel, 'google-generative-ai')) {
await models.complete(googleModel, context, {
thinking: {
enabled: true,
budgetTokens: 8192 // -1 for dynamic, 0 to disable
}
});
}Streaming Thinking Content
When streaming, thinking content is delivered through specific events:
const s = models.streamSimple(model, context, { reasoning: 'high' });
for await (const event of s) {
switch (event.type) {
case 'thinking_start':
console.log('[Model started thinking]');
break;
case 'thinking_delta':
process.stdout.write(event.delta); // Stream thinking content
break;
case 'thinking_end':
console.log('\n[Thinking complete]');
break;
}
}Stop Reasons
Every AssistantMessage includes a stopReason field that indicates how the generation ended:
"stop"- Normal completion, the model finished its response"length"- Output hit the maximum token limit"toolUse"- Model is calling tools and expects tool results"error"- An error occurred during generation"aborted"- Request was cancelled via abort signal
AssistantMessage may also include responseId, a provider-specific upstream response or message identifier when the underlying API exposes one. Do not assume it is always present across providers.
Error Handling
Request failures never throw out of the stream functions: when a request ends with an error (including aborts and tool call validation errors), the streaming API emits an error event and the final message carries the details:
// In streaming
for await (const event of s) {
if (event.type === 'error') {
// event.reason is either "error" or "aborted"
// event.error is the AssistantMessage with partial content
console.error(`Error (${event.reason}):`, event.error.errorMessage);
console.log('Partial content:', event.error.content);
}
}
// The final message will have the error details
const message = await s.result();
if (message.stopReason === 'error' || message.stopReason === 'aborted') {
console.error('Request failed:', message.errorMessage);
// message.content contains any partial content received before the error
// message.usage contains partial token counts and costs
}Auth failures (no key configured, OAuth refresh failed, unknown provider) surface the same way: as a stream error with stopReason: "error".
Aborting Requests
The abort signal allows you to cancel in-progress requests. Aborted requests have stopReason === 'aborted':
const controller = new AbortController();
// Abort after 2 seconds
setTimeout(() => controller.abort(), 2000);
const s = models.stream(model, {
messages: [{ role: 'user', content: 'Write a long story', timestamp: Date.now() }]
}, {
signal: controller.signal
});
for await (const event of s) {
if (event.type === 'text_delta') {
process.stdout.write(event.delta);
} else if (event.type === 'error') {
// event.reason tells you if it was "error" or "aborted"
console.log(`${event.reason === 'aborted' ? 'Aborted' : 'Error'}:`, event.error.errorMessage);
}
}
// Get results (may be partial if aborted)
const response = await s.result();
if (response.stopReason === 'aborted') {
console.log('Request was aborted:', response.errorMessage);
console.log('Partial content received:', response.content);
console.log('Tokens used:', response.usage);
}Continuing After Abort
Aborted messages can be added to the conversation context and continued in subsequent requests:
const context = {
messages: [
{ role: 'user', content: 'Explain quantum computing in detail', timestamp: Date.now() }
]
};
// First request gets aborted after 2 seconds
const controller1 = new AbortController();
setTimeout(() => controller1.abort(), 2000);
const partial = await models.complete(model, context, { signal: controller1.signal });
// Add the partial response to context
context.messages.push(partial);
context.messages.push({ role: 'user', content: 'Please continue', timestamp: Date.now() });
// Continue the conversation
const continuation = await models.complete(model, context);Debugging Provider Payloads
Use the onPayload callback to inspect the request payload sent to the provider. This is useful for debugging request formatting issues or provider validation errors.
const response = await models.complete(model, context, {
onPayload: (payload) => {
console.log('Provider payload:', JSON.stringify(payload, null, 2));
}
});The callback is supported by stream, complete, streamSimple, and completeSimple.
Custom Providers
createProvider()
createProvider() builds a provider from parts: identity, auth, a model list, and an API implementation. Use it for local inference servers, proxies, or any OpenAI/Anthropic-compatible endpoint:
import { createModels, createProvider, envApiKeyAuth, type Model } from '@earendil-works/pi-ai';
import { openAICompletionsApi } from '@earendil-works/pi-ai/api/openai-completions.lazy';
const ollamaModel: Model<'openai-completions'> = {
id: 'llama-3.1-8b',
name: 'Llama 3.1 8B (Ollama)',
api: 'openai-completions',
provider: 'ollama',
baseUrl: 'http://localhost:11434/v1',
reasoning: false,
input: ['text'],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 128000,
maxTokens: 32000
};
const ollama = createProvider({
id: 'ollama',
name: 'Ollama',
baseUrl: 'http://localhost:11434/v1',
// Every provider declares auth; keyless local servers resolve as configured with no key.
auth: { apiKey: { name: 'Ollama', resolve: async () => ({ auth: {} }) } },
models: [ollamaModel],
api: openAICompletionsApi(),
});
const models = createModels();
models.setProvider(ollama);
await models.complete(models.getModel('ollama', 'llama-3.1-8b')!, context);For providers with real keys, envApiKeyAuth(displayName, envVars) gives the standard behavior (stored credential wins, then the first set env var):
const proxy = createProvider({
id: 'my-proxy',
auth: { apiKey: envApiKeyAuth('My proxy API key', ['MY_PROXY_API_KEY']) },
models: [/* ... */],
api: openAICompletionsApi(),
});Mixed-API providers pass a map keyed by model.api; each model dispatches to its API's implementation:
import { anthropicMessagesApi } from '@earendil-works/pi-ai/api/anthropic-messages.lazy';
import { openAIResponsesApi } from '@earendil-works/pi-ai/api/openai-responses.lazy';
const gateway = createProvider({
id: 'my-gateway',
auth: { apiKey: envApiKeyAuth('Gateway key', ['GATEWAY_API_KEY']) },
models: [/* models with api: 'anthropic-messages' or 'openai-responses' */],
api: {
'anthropic-messages': anthropicMessagesApi(),
'openai-responses': openAIResponsesApi(),
},
});Dynamic model lists use refreshModels; the provider lists empty until the first models.refresh():
const llamacpp = createProvider({
id: 'llamacpp',
auth: { apiKey: { name: 'llama.cpp', resolve: async () => ({ auth: {} }) } },
models: [],
refreshModels: async () => fetchModelsFromServer('http://localhost:8080'),
api: openAICompletionsApi(),
});
models.setProvider(llamacpp);
await models.refresh('llamacpp');Custom models can carry headers (e.g. proxies behind bot detection) and compat flags — see OpenAI Compatibility Settings.
Some OpenAI-compatible servers do not understand the developer role used for reasoning-capable models. For those providers, set compat.supportsDeveloperRole to false so the system prompt is sent as a system message instead. If the server also does not support reasoning_effort, set compat.supportsReasoningEffort to false too. This commonly applies to Ollama, vLLM, SGLang, and similar OpenAI-compatible servers.
Use model-level thinkingLevelMap to describe model-specific thinking controls. Keys are pi thinking levels (off, minimal, low, medium, high, xhigh). Missing keys use provider defaults, string values are sent to the provider, and null marks a level unsupported.
const ollamaReasoningModel: Model<'openai-completions'> = {
id: 'gpt-oss:20b',
name: 'GPT-OSS 20B (Ollama)',
api: 'openai-completions',
provider: 'ollama',
baseUrl: 'http://localhost:11434/v1',
reasoning: true,
input: ['text'],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 131072,
maxTokens: 32000,
thinkingLevelMap: {
minimal: null,
low: null,
medium: null,
high: 'high',
xhigh: null,
},
compat: {
supportsDeveloperRole: false,
supportsReasoningEffort: false,
}
};Calling API Implementations Directly
The API implementations are importable on their own. Each module exports exactly stream and streamSimple with that API's full option typing. Direct calls bypass provider auth — pass apiKey explicitly:
import { stream } from '@earendil-works/pi-ai/api/anthropic-messages';
const s = stream(claudeModel, context, {
apiKey: process.env.ANTHROPIC_API_KEY,
thinkingEnabled: true,
thinkingBudgetTokens: 2048,
});Built-in API implementations live under ./api/<api-id>:
| API id | Options type |
|--------|--------------|
| anthropic-messages | AnthropicOptions |
| openai-completions | OpenAICompletionsOptions |
| openai-responses | OpenAIResponsesOptions |
| openai-codex-responses | OpenAICodexResponsesOptions |
| azure-openai-responses | AzureOpenAIResponsesOptions |
| google-generative-ai | GoogleOptions |
| google-vertex | GoogleVertexOptions |
| mistral-conversations | MistralOptions |
| bedrock-converse-stream | BedrockOptions |
Importing an implementation module loads its SDK. The ./api/<id>.lazy wrappers (used by the provider factories) defer that load to the first request when the runtime or bundler supports dynamic import chunking. Legacy raw API subpaths from older releases (./anthropic, ./google, ./mistral, ./openai-completions, ...) were removed; use @earendil-works/pi-ai/api/<api-id>.
OpenAI Compatibility Settings
The openai-completions API is implemented by many providers with minor differences. By default, the library auto-detects compatibility settings based on baseUrl for a small set of known OpenAI-compatible providers (Cerebras, xAI, Chutes, DeepSeek, NVIDIA NIM, Together AI, zAi, OpenCode, Cloudflare Workers AI, etc.). For custom proxies or unknown endpoints, you can override these settings via the compat field. For openai-responses models, the compat field supports Responses-specific flags.
interface OpenAICompletionsCompat {
supportsStore?: boolean; // Whether provider supports the `store` field (default: true)
supportsDeveloperRole?: boolean; // Whether provider supports `developer` role vs `system` (default: true)
supportsReasoningEffort?: boolean; // Whether provider supports `reasoning_effort` (default: true)
supportsUsageInStreaming?: boolean; // Whether provider supports `stream_options: { include_usage: true }` (default: true)
supportsStrictMode?: boolean; // Whether provider supports `strict` in tool definitions (default: true)
sendSessionAffinityHeaders?: boolean; // Whether to send `session_id`, `x-client-request-id`, and `x-session-affinity` from `sessionId` when caching is enabled (default: false)
maxTokensField?: 'max_completion_tokens' | 'max_tokens'; // Which field name to use (default: max_completion_tokens)
requiresToolResultName?: boolean; // Whether tool results require the `name` field (default: false)
requiresAssistantAfterToolResult?: boolean; // Whether tool results must be followed by an assistant message (default: false)
requiresThinkingAsText?: boolean; // Whether thinking blocks must be converted to text (default: false)
requiresReasoningContentOnAssistantMessages?: boolean; // Whether all replayed assistant messages must include empty reasoning_content when reasoning is enabled (default: auto-detected for DeepSeek)
thinkingFormat?: 'openai' | 'openrouter' | 'deepseek' | 'together' | 'zai' | 'qwen' | 'chat-template' | 'qwen-chat-template' | 'string-thinking' | 'ant-ling'; // Format for reasoning param: 'openai' uses reasoning_effort, 'openrouter' uses reasoning: { effort }, 'deepseek' uses thinking: { type } plus reasoning_effort when supported, 'together' uses reasoning: { enabled } plus reasoning_effort when supported, 'zai' uses thinking: { type }, 'qwen' uses enable_thinking, 'chat-template' uses configurable chat_template_kwargs, 'qwen-chat-template' uses chat_template_kwargs.enable_thinking and preserve_thinking, 'string-thinking' uses top-level thinking, 'ant-ling' uses reasoning: { effort } only for mapped efforts (default: openai)
chatTemplateKwargs?: Record<string, string | number | boolean | null | { '$var': 'thinking.enabled' | 'thinking.effort'; omitWhenOff?: boolean }>; // chat_template_kwargs values; use $var for pi-controlled thinking values
cacheControlFormat?: 'anthropic'; // Anthropic-style cache_control on system prompt, last tool, and last user/assistant text content
openRouterRouting?: OpenRouterRouting; // OpenRouter routing preferences (default: {})
vercelGatewayRouting?: VercelGatewayRouting; // Vercel AI Gateway routing preferences (default: {})
}
interface OpenAIResponsesCompat {
supportsDeveloperRole?: boolean; // Whether provider supports `developer` role vs `system` (default: true)
sendSessionIdHeader?: boolean; // Whether to send `session_id` from `sessionId` when caching is enabled (default: true)
supportsLongCacheRetention?: boolean; // Whether provider supports `prompt_cache_retention: "24h"` (default: true)
}If compat is not set, the library falls back to URL-based detection. If compat is partially set, unspecified fields use the detected defaults. This is useful for:
- LiteLLM proxies: May not support
storefield - Custom inference servers: May use non-standard field names
- Self-hosted endpoints: May have different feature support
Faux Provider for Tests
fauxProvider() builds an in-memory provider with scripted responses for tests and demos:
import {
createModels,
fauxAssistantMessage,
fauxProvider,
fauxText,
fauxThinking,
fauxToolCall,
} from '@earendil-works/pi-ai';
const faux = fauxProvider({
tokensPerSecond: 50 // optional
});
const models = createModels();
models.setProvider(faux.provider);
const model = faux.getModel();
const context = {
messages: [{ role: 'user', content: 'Summarize package.json and then call echo', timestamp: Date.now() }]
};
faux.setResponses([
fauxAssistantMessage([
fauxThinking('Need to inspect package metadata first.'),
fauxToolCall('echo', { text: 'package.json' })
], { stopReason: 'toolUse' })
]);
const first = await models.complete(model, context, {
sessionId: 'session-1',
cacheRetention: 'short'
});
context.messages.push(first);
context.messages.push({
role: 'toolResult',
toolCallId: first.content.find((block) => block.type === 'toolCall')!.id,
toolName: 'echo',
content: [{ type: 'text', text: 'package.json contents here' }],
isError: false,
timestamp: Date.now()
});
faux.setResponses([
fauxAssistantMessage([
fauxThinking('Now I can summarize the tool output.'),
fauxText('Here is the summary.')
])
]);
const s = models.stream(model, context);
for await (const event of s) {
console.log(event.type);
}
// Optional: multiple faux models for model-switching tests
const multiModel = fauxProvider({
provider: 'faux-multi',
models: [
{ id: 'faux-fast', reasoning: false },
{ id: 'faux-thinker', reasoning: true }
]
});
models.setProvider(multiModel.provider);
const thinker = multiModel.getModel('faux-thinker');
console.log(thinker?.reasoning);
console.log(faux.getPendingResponseCount());
console.log(faux.state.callCount);Notes:
- Responses are consumed from a queue in request start order.
- If the queue is empty, the faux provider returns an assistant error message with
errorMessage: "No more faux responses queued". - Use
faux.setResponses([...])to replace the remaining queue andfaux.appendResponses([...])to add more responses. faux.modelsexposes all faux models.faux.getModel()returns the first one, andfaux.getModel(id)returns a specific one.- Use
fauxAssistantMessage(...)for scripted assistant replies. UsefauxText(...),fauxThinking(...), andfauxToolCall(...)to build content blocks without filling in low-level fields manually. - Usage is estimated at roughly 1 token per 4 characters. When
sessionIdis present andcacheRetentionis not"none", prompt cache reads and writes are simulated automatically. - Tool call arguments stream incrementally via
toolcall_deltachunks. - By default, each streamed chunk is emitted on its own microtask. Set
tokensPerSecondto pace chunk delivery in real time. - The intended use is one deterministic scripted flow per handle. If you need independent concurrent flows, create separate faux providers with distinct
providerids.
Cross-Provider Handoffs
The library supports seamless handoffs between different LLM providers within the same conversation. This allows you to switch models mid-conversation while preserving context, including thinking blocks, tool calls, and tool results.
When messages from one provider are sent to a different provider, the library automatically transforms them for compatibility:
- User and tool result messages are passed through unchanged
- Assistant messages from the same provider/API are preserved as-is
- Assistant messages from different providers have their thinking blocks converted to text with
<thinking>tags - Tool calls and regular text are preserved unchanged
import { createModels, type Context } from '@earendil-works/pi-ai';
import { anthropicProvider } from '@earendil-works/pi-ai/providers/anthropic';
import { openaiProvider } from '@earendil-works/pi-ai/providers/openai';
import { googleProvider } from '@earendil-works/pi-ai/providers/google';
const models = createModels();
models.setProvider(anthropicProvider());
models.setProvider(openaiProvider());
models.setProvider(googleProvider());
const context: Context = { messages: [] };
// Start with Claude
const claude = models.getModel('anthropic', 'claude-sonnet-4-5')!;
context.messages.push({ role: 'user', content: 'What is 25 * 18?', timestamp: Date.now() });
context.messages.push(await models.completeSimple(claude, context, { reasoning: 'medium' }));
// Switch to GPT-5 - it will see Claude's thinking as <thinking> tagged text
const gpt5 = models.getModel('openai', 'gpt-5-mini')!;
context.messages.push({ role: 'user', content: 'Is that calculation correct?', timestamp: Date.now() });
context.messages.push(await models.complete(gpt5, context));
// Switch to Gemini
const gemini = models.getModel('google', 'gemini-2.5-flash')!;
context.messages.push({ role: 'user', content: 'What was the original question?', timestamp: Date.now() });
const geminiResponse = await models.complete(gemini, context);All providers can handle messages from other providers — text, tool calls and results (including images), thinking blocks (transformed to tagged text), and aborted messages with partial content. This enables flexible workflows: start with a fast model, switch to a more capable one for complex reasoning, or maintain continuity across provider outages.
Context Serialization
The Context object can be easily serialized and deserialized using standard JSON methods, making it simple to persist conversations, implement chat history, or transfer contexts between services:
const context: Context = {
systemPrompt: 'You are a helpful assistant.',
messages: [
{ role: 'user', content: 'What is TypeScript?', timestamp: Date.now() }
]
};
const model = models.getModel('openai', 'gpt-4o-mini')!;
const response = await models.complete(model, context);
context.messages.push(response);
// Serialize the entire context
const serialized = JSON.stringify(context);
// Save to database, localStorage, file, etc.
localStorage.setItem('conversation', serialized);
// Later: deserialize and continue the conversation
const restored: Context = JSON.parse(localStorage.getItem('conversation')!);
restored.messages.push({ role: 'user', content: 'Tell me more about its type system', timestamp: Date.now() });
// Continue with any model
const newModel = models.getModel('anthropic', 'claude-3-5-haiku-20241022')!;
const continuation = await models.complete(newModel, restored);Models are plain serializable data too — no functions or implementations attached — so persisting "which model was this conversation using" is a JSON.stringify away.
Note: If the context contains images (encoded as base64 as shown in the Image Input section), those will also be serialized.
Browser Usage
The library supports browser environments. The core entrypoint and provider factories are side-effect free and bundle cleanly. Environment variables are not available in browsers, so pass API keys explicitly — or inject a CredentialStore (e.g. localStorage-backed) and let provider auth resolve from stored credentials:
import { createModels } from '@earendil-works/pi-ai';
import { anthropicProvider } from '@earendil-works/pi-ai/providers/anthropic';
const models = createModels();
models.setProvider(anthropicProvider());
const model = models.getModel('anthropic', 'claude-3-5-haiku-20241022')!;
const response = await models.complete(model, {
messages: [{ role: 'user', content: 'Hello!', timestamp: Date.now() }]
}, {
apiKey: 'your-api-key'
});Security Warning: Exposing API keys in frontend code is dangerous. Anyone can extract and abuse your keys. Only use this approach for internal tools or demos. For production applications, use a backend proxy that keeps your API keys secure.
Browser compatibility notes:
- Amazon Bedrock (
bedrock-converse-stream) is not supported in browser environments. It can still appear in model lists; calls fail at runtime. - OAuth login flows are Node-only. They are lazy-loaded behind bundler-opaque imports, so registering an OAuth-capable provider does not pull Node-only code into a browser bundle — only actually logging in would.
- Use a server-side proxy or backend service if you need Bedrock or OAuth-based auth from a web app.
Bundling and Tree Shaking
For small bundles, import only the providers you need:
import { createModels } from '@earendil-works/pi-ai';
import { openaiProvider } from '@earendil-works/pi-ai/providers/openai';
const models = createModels();
models.setProvider(openaiProvider());Rules:
@earendil-works/pi-aiis the core entrypoint and does not import built-in catalogs, provider factories, or SDK implementations.@earendil-works/pi-ai/providers/<provider>imports that provider's catalog and lazy API wrapper only.@earendil-works/pi-ai/providers/allimports every built-in provider factory and all catalogs. Use it only when you want the full built-in set.- With code splitting, provider SDKs stay in lazy chunks and load on first request.
- Without code splitting, bundlers fold reachable lazy API implementations into the single bundle. A single-provider bundle then includes that provider's SDK;
providers/allincludes all statically visible SDKs. Bedrock is the exception: its AWS SDK implementation is loaded through a bundler-opaque Node-only import. - Importing
@earendil-works/pi-ai/api/<api-id>directly loads that API implementation and its SDK immediately.
Avoid @earendil-works/pi-ai/compat in new bundled apps; it preserves the old global API and imports the full built-in catalog surface.
For single-file Node ESM bundles, some SDK dependencies may still use dynamic CommonJS require() internally. If you see errors such as Dynamic require of "child_process" is not supported, add a Node require shim to the bundle. With esbuild:
esbuild app.js --bundle --platform=node --format=esm \
--banner:js='import { createRequire } from "module";const require = createRequire(import.meta.url);' \
--outfile=app.bundle.jsThis is only for Node bundles; it is not a browser or Cloudflare Workers workaround.
Bedrock is Node-only. Add it like any other provider:
import { createModels } from '@earendil-works/pi-ai';
import { amazonBedrockProvider } from '@earendil-works/pi-ai/providers/amazon-bedrock';
const models = createModels();
models.setProvider(amazonBedrockProvider());In normal Node package usage and code-split bundles, Bedrock loads its AWS SDK implementation lazily. For a standalone single-file bundle that must include Bedrock support, register the implementation module explicitly:
import { setBedrockProviderModule } from '@earendil-works/pi-ai/api/bedrock-converse-stream.lazy';
import { bedrockProviderModule } from '@earendil-works/pi-ai/bedrock-provider';
setBedrockProviderModule(bedrockProviderModule);That explicit override bundles the AWS SDK. Without it, Bedrock's opaque runtime import expects the package's Bedrock implementation file to be available at runtime.
Provider-Scoped Environment Overrides
Pass env in stream options to scope provider configuration to a request. Values in env are used before process environment variables for provider auth and configuration such as Cloudflare account IDs, Azure OpenAI settings, Vertex project/location, Bedrock settings, PI_CACHE_RETENTION, and HTTP_PROXY/HTTPS_PROXY.
const models = builtinModels();
const model = models.getModel('cloudflare-ai-gateway', 'workers-ai/@cf/moonshotai/kimi-k2.6')!;
const response = await models.complete(model, context, {
env: {
CLOUDFLARE_API_KEY: '...',
CLOUDFLARE_ACCOUNT_ID: 'account-id',
CLOUDFLARE_GATEWAY_ID: 'gateway-id'
}
});Use this when one process needs different provider settings per request, or when ambient environment variables should not leak into a provider call.
OAuth Providers
Several providers support OAuth authentication instead of static API keys:
- Anthropic (Claude Pro/Max subscription)
- OpenAI Codex (ChatGPT Plus/Pro subscription, access to GPT-5.x Codex models)
- GitHub Copilot (Copilot subscription)
Each of these providers carries an OAuthAuth on provider.auth.oauth with three operations: login(callbacks) runs the interactive flow and returns a credential, refresh(credential) exchanges the refresh token, and toAuth(credential) derives request auth (GitHub Copilot's per-account base URL comes from here). Refresh is automatic: models.getAuth() and the request paths refresh expired tokens under a credential-store lock, so concurrent requests and processes cannot double-refresh.
import { createModels } from '@earendil-works/pi-ai';
import { anthropicProvider } from '@earendil-works/pi-ai/providers/anthropic';
const models = createModels({ credentials: myStore }); // persistent CredentialStore
models.setProvider(anthropicProvider());
// Login: drive the flow with prompt()/notify() callbacks, persist the credential
const provider = models.getProvider('anthropic')!;
const credential = await provider.auth.oauth!.login({
prompt: async (p) => {
// p.type: 'text' | 'secret' | 'select' | 'manual_code'
// manual_code prompts race a local callback server; p.signal aborts them when the server wins
return await askUser(p.message);
},
notify: (event) => {
// event.type: 'auth_url' | 'device_code' | 'progress'
if (event.type === 'auth_url') console.log(`Open: ${event.url}`);
if (event.type === 'device_code') console.log(`Code: ${event.userCode} at ${event.verificationUri}`);
if (event.type === 'progress') console.log(event.message);
},
});
await myStore.modify('anthropic', async () => credential);
// From here on, requests resolve and refresh the token automatically
const model = models.getModel('anthropic', 'claude-sonnet-4-5')!;
await models.complete(model, context);
// Logout
await myStore.delete('anthropic');Vertex AI
Vertex AI models support either a Google Cloud API key or Application Default Credentials (ADC):
- API key: Set
GOOGLE_CLOUD_API_KEYor passapiKeyin the call options. - Local development (ADC): Run
gcloud auth application-default login - CI/Production (ADC): Set
GOOGLE_APPLICATION_CREDENTIALSto point to a service account JSON key file
When using ADC, also set GOOGLE_CLOUD_PROJECT (or GCLOUD_PROJECT) and GOOGLE_CLOUD_LOCATION. You can also pass project/location in the call options. When using GOOGLE_CLOUD_API_KEY, project and location are not required.
# Local (uses your user credentials)
gcloud auth application-default login
export GOOGLE_CLOUD_PROJECT="my-project"
export GOOGLE_CLOUD_LOCATION="us-central1"
# CI/Production (service account key file)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"Official docs: Application Default Credentials
CLI Login
The quickest way to authenticate:
npx @earendil-works/pi-ai login # interactive provider selection
npx @earendil-works/pi-ai login anthropic # login to specific provider
npx @earendil-works/pi-ai list # list available providersCredentials are saved to auth.json in the current directory.
Programmatic OAuth
The legacy flow functions remain available via the @earendil-works/pi-ai/oauth entry point (loginAnthropic, loginOpenAICodex, loginGitHubCopilot, refreshOAuthToken, getOAuthApiKey); credential storage is the caller's responsibility there. New code should prefer the provider-owned OAuthAuth shown above — it composes with the credential store and gets locked auto-refresh for free.
Provider notes:
OpenAI Codex: Requires a ChatGPT Plus or Pro subscription. Provides access to GPT-5.x Codex models with extended context windows and reasoning capabilities. The library automatically handles session-based prompt caching when sessionId is provided in stream options. You can set transport in stream options to "sse", "websocket", or "auto" for Codex Responses transport selection. When using WebSocket with a sessionId, connections are reused per session and expire after 5 minutes of inactivity.
Azure OpenAI (Responses): Uses the Responses API only. Set AZURE_OPENAI_API_KEY and either AZURE_OPENAI_BASE_URL or AZURE_OPENAI_RESOURCE_NAME. AZURE_OPENAI_BASE_URL supports both https://<resource>.openai.azure.com and https://<resource>.cognitiveservices.azure.com; root endpoints are normalized to .../openai/v1 automatically. Use AZURE_OPENAI_API_VERSION (defaults to v1) to override the API version if needed. Deployment names are treated as model IDs by default, override with azureDeploymentName or AZURE_OPENAI_DEPLOYMENT_NAME_MAP using comma-separated model-id=deployment pairs (for example gpt-4o-mini=my-deployment,gpt-4o=prod). Legacy deployment-based URLs are intentionally unsupported.
GitHub Copilot: If you get "The requested model is not supported" error, enable the model manually in VS Code: open Copilot Chat, click the model selector, select the model (warning icon), and click "Enable".
Migrating from the Old Global API
Older versions exposed a global API: stream()/complete() dispatching on model.api via a global registry, sync getModel()/getModels()/getProviders() catalog reads, registerApiProvider(), getEnvApiKey(), and per-API lazy stream functions. That surface lives unchanged on the compat entrypoint:
// Before
import { getModel, complete } from '@earendil-works/pi-ai';
// After (verbatim behavior, one import-path change)
import { getModel, complete } from '@earendil-works/pi-ai/compat';Compat is a strict superset of the root entrypoint, so a file can switch its import path wholesale. It will be removed in a future release; migrate to createModels() + provider factories:
| Old | New |
|-----|-----|
| getModel('openai', 'gpt-4o-mini') | models.getModel('openai', 'gpt-4o-mini') or getBuiltinModel() from providers/all |
| getModels('anthropic') / getProviders() | models.getModels('anthropic') / models.getProviders() or getBuiltin* |
| stream(model, ctx, opts) (env-key injection) | models.stream(model, ctx, opts) (provider auth resolution) |
| registerApiProvider({ api, stream, streamSimple }) | createProvider({ id, auth, models, api }) + models.setProvider() |
| getEnvApiKey('openai') | await models.getAuth(model) |
| streamAnthropic(model, ctx, opts) | stream from @earendil-works/pi-ai/api/anthropic-messages, or a provider in a collection |
| registerFauxProvider() | fauxProvider() + models.setProvider() |
Development
Adding a New Provider
Adding a new LLM provider requires changes across multiple files. The layered layout: API implementations live in src/api/, provider factories in src/providers/, generated catalogs in src/providers/<id>.models.ts. This checklist covers all necessary steps:
1. Core Types (src/types.ts)
- Add the API identifier to
KnownApi(for example"bedrock-converse-stream"), if it is a new API - Add the provider name to
KnownProvider(for example"amazon-bedrock") - Add the options type to
ApiOptionsMap
2. API Implementation (src/api/<api-id>.ts, only for a new API)
Create a new API implementa
