@nghyane/pi-ai

v12.19.0

Published

2 months ago

Unified LLM API with automatic model discovery and provider configuration

0High
0Medium
0Low

nghyane

ai llm openai anthropic gemini unified api

@nghyane/pi-ai

Unified LLM API with automatic model discovery, provider configuration, token and cost tracking, and simple context persistence and hand-off to other models mid-session.

Note: This library only includes models that support tool calling (function calling), as this is essential for agentic workflows.

Supported Providers

OpenAI
OpenAI Codex (ChatGPT Plus/Pro subscription, requires OAuth, see below)
Anthropic
Google
Vertex AI (Gemini via Vertex AI)
Mistral
Groq
Cerebras
Together
Moonshot (requires MOONSHOT_API_KEY)
Qianfan (requires QIANFAN_API_KEY)
NVIDIA (requires NVIDIA_API_KEY)
NanoGPT (requires NANO_GPT_API_KEY)
Hugging Face Inference
xAI
Venice (requires VENICE_API_KEY)
OpenRouter
LiteLLM (requires LITELLM_API_KEY)
zAI (requires ZAI_API_KEY)
MiniMax Coding Plan (requires MINIMAX_CODE_API_KEY or MINIMAX_CODE_CN_API_KEY)
Xiaomi MiMo (requires XIAOMI_API_KEY)
Qwen Portal (supports QWEN_OAUTH_TOKEN or QWEN_PORTAL_API_KEY)
Cloudflare AI Gateway (requires CLOUDFLARE_AI_GATEWAY_API_KEY and provider-specific gateway base URL)
Ollama (local OpenAI-compatible runtime; optional OLLAMA_API_KEY)
vLLM (OpenAI-compatible server; VLLM_API_KEY for secured deployments)
GitHub Copilot (requires OAuth, see below)
Google Gemini CLI (requires OAuth, see below)
Antigravity (requires OAuth, see below)
Any OpenAI-compatible API: LM Studio, custom proxies, etc.

Installation

npm install @nghyane/pi-ai

Quick Start

import { Type, getModel, stream, complete, Context, Tool, StringEnum } from "@nghyane/pi-ai";

// Fully typed with auto-complete support for both providers and models
const model = getModel("openai", "gpt-4o-mini");

// Define tools with TypeBox schemas for type safety and validation
const tools: Tool[] = [
	{
		name: "get_time",
		description: "Get the current time",
		parameters: Type.Object({
			timezone: Type.Optional(Type.String({ description: "Optional timezone (e.g., America/New_York)" })),
		}),
	},
];

// Build a conversation context (easily serializable and transferable between models)
const context: Context = {
	systemPrompt: "You are a helpful assistant.",
	messages: [{ role: "user", content: "What time is it?" }],
	tools,
};

// Option 1: Streaming with all event types
const s = stream(model, context);

for await (const event of s) {
	switch (event.type) {
		case "start":
			console.log(`Starting with ${event.partial.model}`);
			break;
		case "text_start":
			console.log("\n[Text started]");
			break;
		case "text_delta":
			process.stdout.write(event.delta);
			break;
		case "text_end":
			console.log("\n[Text ended]");
			break;
		case "thinking_start":
			console.log("[Model is thinking...]");
			break;
		case "thinking_delta":
			process.stdout.write(event.delta);
			break;
		case "thinking_end":
			console.log("[Thinking complete]");
			break;
		case "toolcall_start":
			console.log(`\n[Tool call started: index ${event.contentIndex}]`);
			break;
		case "toolcall_delta":
			// Partial tool arguments are being streamed
			const partialCall = event.partial.content[event.contentIndex];
			if (partialCall.type === "toolCall") {
				console.log(`[Streaming args for ${partialCall.name}]`);
			}
			break;
		case "toolcall_end":
			console.log(`\nTool called: ${event.toolCall.name}`);
			console.log(`Arguments: ${JSON.stringify(event.toolCall.arguments)}`);
			break;
		case "done":
			console.log(`\nFinished: ${event.reason}`);
			break;
		case "error":
			console.error(`Error: ${event.error}`);
			break;
	}
}

// Get the final message after streaming, add it to the context
const finalMessage = await s.result();
context.messages.push(finalMessage);

// Handle tool calls if any
const toolCalls = finalMessage.content.filter((b) => b.type === "toolCall");
for (const call of toolCalls) {
	// Execute the tool
	const result =
		call.name === "get_time"
			? new Date().toLocaleString("en-US", {
					timeZone: call.arguments.timezone || "UTC",
					dateStyle: "full",
					timeStyle: "long",
				})
			: "Unknown tool";

	// Add tool result to context (supports text and images)
	context.messages.push({
		role: "toolResult",
		toolCallId: call.id,
		toolName: call.name,
		content: [{ type: "text", text: result }],
		isError: false,
		timestamp: Date.now(),
	});
}

// Continue if there were tool calls
if (toolCalls.length > 0) {
	const continuation = await complete(model, context);
	context.messages.push(continuation);
	console.log("After tool execution:", continuation.content);
}

console.log(`Total tokens: ${finalMessage.usage.input} in, ${finalMessage.usage.output} out`);
console.log(`Cost: $${finalMessage.usage.cost.total.toFixed(4)}`);

// Option 2: Get complete response without streaming
const response = await complete(model, context);

for (const block of response.content) {
	if (block.type === "text") {
		console.log(block.text);
	} else if (block.type === "toolCall") {
		console.log(`Tool: ${block.name}(${JSON.stringify(block.arguments)})`);
	}
}

Tools

Tools enable LLMs to interact with external systems. This library uses TypeBox schemas for type-safe tool definitions with automatic validation using AJV. TypeBox schemas can be serialized and deserialized as plain JSON, making them ideal for distributed systems.

Defining Tools

import { Type, Tool, StringEnum } from "@nghyane/pi-ai";

// Define tool parameters with TypeBox
const weatherTool: Tool = {
	name: "get_weather",
	description: "Get current weather for a location",
	parameters: Type.Object({
		location: Type.String({ description: "City name or coordinates" }),
		units: StringEnum(["celsius", "fahrenheit"], { default: "celsius" }),
	}),
};

// Note: For Google API compatibility, use StringEnum helper instead of Type.Enum
// Type.Enum generates anyOf/const patterns that Google doesn't support

const bookMeetingTool: Tool = {
	name: "book_meeting",
	description: "Schedule a meeting",
	parameters: Type.Object({
		title: Type.String({ minLength: 1 }),
		startTime: Type.String({ format: "date-time" }),
		endTime: Type.String({ format: "date-time" }),
		attendees: Type.Array(Type.String({ format: "email" }), { minItems: 1 }),
	}),
};

Handling Tool Calls

Tool results use content blocks and can include both text and images:

import * as fs from "node:fs";

const context: Context = {
	messages: [{ role: "user", content: "What is the weather in London?" }],
	tools: [weatherTool],
};

const response = await complete(model, context);

// Check for tool calls in the response
for (const block of response.content) {
	if (block.type === "toolCall") {
		// Execute your tool with the arguments
		// See "Validating Tool Arguments" section for validation
		const result = await executeWeatherApi(block.arguments);

		// Add tool result with text content
		context.messages.push({
			role: "toolResult",
			toolCallId: block.id,
			toolName: block.name,
			content: [{ type: "text", text: JSON.stringify(result) }],
			isError: false,
			timestamp: Date.now(),
		});
	}
}

// Tool results can also include images (for vision-capable models)
const imageBuffer = fs.readFileSync("chart.png");
context.messages.push({
	role: "toolResult",
	toolCallId: "tool_xyz",
	toolName: "generate_chart",
	content: [
		{ type: "text", text: "Generated chart showing temperature trends" },
		{ type: "image", data: imageBuffer.toBase64(), mimeType: "image/png" },
	],
	isError: false,
	timestamp: Date.now(),
});

Streaming Tool Calls with Partial JSON

During streaming, tool call arguments are progressively parsed as they arrive. This enables real-time UI updates before the complete arguments are available:

const s = stream(model, context);

for await (const event of s) {
	if (event.type === "toolcall_delta") {
		const toolCall = event.partial.content[event.contentIndex];

		// toolCall.arguments contains partially parsed JSON during streaming
		// This allows for progressive UI updates
		if (toolCall.type === "toolCall" && toolCall.arguments) {
			// BE DEFENSIVE: arguments may be incomplete
			// Example: Show file path being written even before content is complete
			if (toolCall.name === "write_file" && toolCall.arguments.path) {
				console.log(`Writing to: ${toolCall.arguments.path}`);

				// Content might be partial or missing
				if (toolCall.arguments.content) {
					console.log(`Content preview: ${toolCall.arguments.content.substring(0, 100)}...`);
				}
			}
		}
	}

	if (event.type === "toolcall_end") {
		// Here toolCall.arguments is complete (but not yet validated)
		const toolCall = event.toolCall;
		console.log(`Tool completed: ${toolCall.name}`, toolCall.arguments);
	}
}

Important notes about partial tool arguments:

During toolcall_delta events, arguments contains the best-effort parse of partial JSON
Fields may be missing or incomplete - always check for existence before use
String values may be truncated mid-word
Arrays may be incomplete
Nested objects may be partially populated
At minimum, arguments will be an empty object {}, never undefined
The Google provider does not support function call streaming. Instead, you will receive a single toolcall_delta event with the full arguments.

Validating Tool Arguments

When using agentLoop, tool arguments are automatically validated against your TypeBox schemas before execution. If validation fails, the error is returned to the model as a tool result, allowing it to retry.

When implementing your own tool execution loop with stream() or complete(), use validateToolCall to validate arguments before passing them to your tools:

import { stream, validateToolCall, Tool } from "@nghyane/pi-ai";

const tools: Tool[] = [weatherTool, calculatorTool];
const s = stream(model, { messages, tools });

for await (const event of s) {
	if (event.type === "toolcall_end") {
		const toolCall = event.toolCall;

		try {
			// Validate arguments against the tool's schema (throws on invalid args)
			const validatedArgs = validateToolCall(tools, toolCall);
			const result = await executeMyTool(toolCall.name, validatedArgs);
			// ... add tool result to context
		} catch (error) {
			// Validation failed - return error as tool result so model can retry
			context.messages.push({
				role: "toolResult",
				toolCallId: toolCall.id,
				toolName: toolCall.name,
				content: [{ type: "text", text: error.message }],
				isError: true,
				timestamp: Date.now(),
			});
		}
	}
}

Complete Event Reference

All streaming events emitted during assistant message generation:

| Event Type | Description | Key Properties | | ---------------- | ------------------------ | ------------------------------------------------------------------------------------------- | | start | Stream begins | partial: Initial assistant message structure | | text_start | Text block starts | contentIndex: Position in content array | | text_delta | Text chunk received | delta: New text, contentIndex: Position | | text_end | Text block complete | content: Full text, contentIndex: Position | | thinking_start | Thinking block starts | contentIndex: Position in content array | | thinking_delta | Thinking chunk received | delta: New text, contentIndex: Position | | thinking_end | Thinking block complete | content: Full thinking, contentIndex: Position | | toolcall_start | Tool call begins | contentIndex: Position in content array | | toolcall_delta | Tool arguments streaming | delta: JSON chunk, partial.content[contentIndex].arguments: Partial parsed args | | toolcall_end | Tool call complete | toolCall: Complete validated tool call with id, name, arguments | | done | Stream complete | reason: Stop reason ("stop", "length", "toolUse"), message: Final assistant message | | error | Error occurred | reason: Error type ("error" or "aborted"), error: AssistantMessage with partial content |

Image Input

Models with vision capabilities can process images. You can check if a model supports images via the input property. If you pass images to a non-vision model, they are silently ignored.

import * as fs from "node:fs";
import { getModel, complete } from "@nghyane/pi-ai";

const model = getModel("openai", "gpt-4o-mini");

// Check if model supports images
if (model.input.includes("image")) {
	console.log("Model supports vision");
}

const imageBuffer = fs.readFileSync("image.png");
const base64Image = imageBuffer.toBase64();

const response = await complete(model, {
	messages: [
		{
			role: "user",
			content: [
				{ type: "text", text: "What is in this image?" },
				{ type: "image", data: base64Image, mimeType: "image/png" },
			],
		},
	],
});

// Access the response
for (const block of response.content) {
	if (block.type === "text") {
		console.log(block.text);
	}
}

Thinking/Reasoning

Many models support thinking/reasoning capabilities where they can show their internal thought process. You can check if a model supports reasoning via the reasoning property. If you pass reasoning options to a non-reasoning model, they are silently ignored.

Unified Interface (streamSimple/completeSimple)

import { getModel, streamSimple, completeSimple } from "@nghyane/pi-ai";

// Many models across providers support thinking/reasoning
const model = getModel("anthropic", "claude-sonnet-4-20250514");
// or getModel('openai', 'gpt-5-mini');
// or getModel('google', 'gemini-2.5-flash');
// or getModel('xai', 'grok-code-fast-1');
// or getModel('groq', 'openai/gpt-oss-20b');
// or getModel('cerebras', 'gpt-oss-120b');
// or getModel('openrouter', 'z-ai/glm-4.5v');

// Check if model supports reasoning
if (model.reasoning) {
	console.log("Model supports reasoning/thinking");
}

// Use the simplified reasoning option
const response = await completeSimple(
	model,
	{
		messages: [{ role: "user", content: "Solve: 2x + 5 = 13" }],
	},
	{
		reasoning: "medium", // 'minimal' | 'low' | 'medium' | 'high' | 'xhigh' (xhigh maps to high on non-OpenAI providers)
	}
);

// Access thinking and text blocks
for (const block of response.content) {
	if (block.type === "thinking") {
		console.log("Thinking:", block.thinking);
	} else if (block.type === "text") {
		console.log("Response:", block.text);
	}
}

Provider-Specific Options (stream/complete)

For fine-grained control, use the provider-specific options:

import { getModel, complete } from "@nghyane/pi-ai";

// OpenAI Reasoning (o1, o3, gpt-5)
const openaiModel = getModel("openai", "gpt-5-mini");
await complete(openaiModel, context, {
	reasoningEffort: "medium",
	reasoningSummary: "detailed", // OpenAI Responses API only
});

// Anthropic Thinking (Claude Sonnet 4)
const anthropicModel = getModel("anthropic", "claude-sonnet-4-20250514");
await complete(anthropicModel, context, {
	thinkingEnabled: true,
	thinkingBudgetTokens: 8192, // Optional token limit
});

// Google Gemini Thinking
const googleModel = getModel("google", "gemini-2.5-flash");
await complete(googleModel, context, {
	thinking: {
		enabled: true,
		budgetTokens: 8192, // -1 for dynamic, 0 to disable
	},
});

Streaming Thinking Content

When streaming, thinking content is delivered through specific events:

const s = streamSimple(model, context, { reasoning: "high" });

for await (const event of s) {
	switch (event.type) {
		case "thinking_start":
			console.log("[Model started thinking]");
			break;
		case "thinking_delta":
			process.stdout.write(event.delta); // Stream thinking content
			break;
		case "thinking_end":
			console.log("\n[Thinking complete]");
			break;
	}
}

Stop Reasons

Every AssistantMessage includes a stopReason field that indicates how the generation ended:

"stop" - Normal completion, the model finished its response
"length" - Output hit the maximum token limit
"toolUse" - Model is calling tools and expects tool results
"error" - An error occurred during generation
"aborted" - Request was cancelled via abort signal

Error Handling

When a request ends with an error (including aborts and tool call validation errors), the streaming API emits an error event:

// In streaming
for await (const event of stream) {
	if (event.type === "error") {
		// event.reason is either "error" or "aborted"
		// event.error is the AssistantMessage with partial content
		console.error(`Error (${event.reason}):`, event.error.errorMessage);
		console.log("Partial content:", event.error.content);
	}
}

// The final message will have the error details
const message = await stream.result();
if (message.stopReason === "error" || message.stopReason === "aborted") {
	console.error("Request failed:", message.errorMessage);
	// message.content contains any partial content received before the error
	// message.usage contains partial token counts and costs
}

Aborting Requests

The abort signal allows you to cancel in-progress requests. Aborted requests have stopReason === 'aborted':

import { getModel, stream } from "@nghyane/pi-ai";

const model = getModel("openai", "gpt-4o-mini");

// Abort after 2 seconds
const signal = AbortSignal.timeout(2000);

const s = stream(
	model,
	{
		messages: [{ role: "user", content: "Write a long story" }],
	},
	{
		signal,
	}
);

for await (const event of s) {
	if (event.type === "text_delta") {
		process.stdout.write(event.delta);
	} else if (event.type === "error") {
		// event.reason tells you if it was "error" or "aborted"
		console.log(`${event.reason === "aborted" ? "Aborted" : "Error"}:`, event.error.errorMessage);
	}
}

// Get results (may be partial if aborted)
const response = await s.result();
if (response.stopReason === "aborted") {
	console.log("Request was aborted:", response.errorMessage);
	console.log("Partial content received:", response.content);
	console.log("Tokens used:", response.usage);
}

Continuing After Abort

Aborted messages can be added to the conversation context and continued in subsequent requests:

const context = {
	messages: [{ role: "user", content: "Explain quantum computing in detail" }],
};

// First request gets aborted after 2 seconds
const controller1 = new AbortController();
setTimeout(() => controller1.abort(), 2000);

const partial = await complete(model, context, { signal: controller1.signal });

// Add the partial response to context
context.messages.push(partial);
context.messages.push({ role: "user", content: "Please continue" });

// Continue the conversation
const continuation = await complete(model, context);

Common Stream Options

All providers accept the base StreamOptions (in addition to provider-specific options):

apiKey: Override the provider API key
headers: Extra request headers merged on top of model-defined headers
sessionId: Provider-specific session identifier (prompt caching/routing)
signal: Abort in-flight requests
onPayload: Callback invoked with the provider request payload just before sending

Example:

const response = await complete(model, context, {
	apiKey: "sk-live",
	headers: { "X-Debug-Trace": "true" },
	onPayload: (payload) => {
		console.log("request payload", payload);
	},
});

APIs, Models, and Providers

The library implements 4 API interfaces, each with its own streaming function and options:

anthropic-messages: Anthropic's Messages API (streamAnthropic, AnthropicOptions)
google-generative-ai: Google's Generative AI API (streamGoogle, GoogleOptions)
openai-completions: OpenAI's Chat Completions API (streamOpenAICompletions, OpenAICompletionsOptions)
openai-responses: OpenAI's Responses API (streamOpenAIResponses, OpenAIResponsesOptions)

Providers and Models

A provider offers models through a specific API. For example:

Anthropic models use the anthropic-messages API
Google models use the google-generative-ai API
OpenAI models use the openai-responses API
Mistral, xAI, Cerebras, Groq, etc. models use the openai-completions API (OpenAI-compatible)

Querying Providers and Models

import { getProviders, getModels, getModel } from "@nghyane/pi-ai";

// Get all available providers
const providers = getProviders();
console.log(providers); // ['openai', 'anthropic', 'google', 'xai', 'groq', ...]

// Get all models from a provider (fully typed)
const anthropicModels = getModels("anthropic");
for (const model of anthropicModels) {
	console.log(`${model.id}: ${model.name}`);
	console.log(`  API: ${model.api}`); // 'anthropic-messages'
	console.log(`  Context: ${model.contextWindow} tokens`);
	console.log(`  Vision: ${model.input.includes("image")}`);
	console.log(`  Reasoning: ${model.reasoning}`);
}

// Get a specific model (both provider and model ID are auto-completed in IDEs)
const model = getModel("openai", "gpt-4o-mini");
console.log(`Using ${model.name} via ${model.api} API`);

Custom Models

You can create custom models for local inference servers or custom endpoints: For Ollama, OLLAMA_API_KEY is optional and mainly needed for authenticated/self-hosted gateways.

import { Model, stream } from "@nghyane/pi-ai";

// Example: Ollama using OpenAI-compatible API
const ollamaModel: Model<"openai-completions"> = {
	id: "llama-3.1-8b",
	name: "Llama 3.1 8B (Ollama)",
	api: "openai-completions",
	provider: "ollama",
	baseUrl: "http://localhost:11434/v1",
	reasoning: false,
	input: ["text"],
	cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
	contextWindow: 128000,
	maxTokens: 32000,
};

// Example: LiteLLM proxy with explicit compat settings
const litellmModel: Model<"openai-completions"> = {
	id: "gpt-4o",
	name: "GPT-4o (via LiteLLM)",
	api: "openai-completions",
	provider: "litellm",
	baseUrl: "http://localhost:4000/v1",
	reasoning: false,
	input: ["text", "image"],
	cost: { input: 2.5, output: 10, cacheRead: 0, cacheWrite: 0 },
	contextWindow: 128000,
	maxTokens: 16384,
	compat: {
		supportsStore: false, // LiteLLM doesn't support the store field
	},
};

// Example: Custom endpoint with headers (bypassing Cloudflare bot detection)
const proxyModel: Model<"anthropic-messages"> = {
	id: "claude-sonnet-4",
	name: "Claude Sonnet 4 (Proxied)",
	api: "anthropic-messages",
	provider: "custom-proxy",
	baseUrl: "https://proxy.example.com/v1",
	reasoning: true,
	input: ["text", "image"],
	cost: { input: 3, output: 15, cacheRead: 0.3, cacheWrite: 3.75 },
	contextWindow: 200000,
	maxTokens: 8192,
	headers: {
		"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
		"X-Custom-Auth": "bearer-token-here",
	},
};

// Use the custom model
const response = await stream(ollamaModel, context, {
	apiKey: process.env.OLLAMA_API_KEY, // Optional; local Ollama usually runs without auth
});

OpenAI Compatibility Settings

The openai-completions API is implemented by many providers with minor differences. By default, the library auto-detects compatibility settings based on baseUrl for known providers (Cerebras, xAI, Mistral, Chutes, etc.). For custom proxies or unknown endpoints, you can override these settings via the compat field:

interface OpenAICompat {
	supportsStore?: boolean; // Whether provider supports the `store` field (default: true)
	supportsDeveloperRole?: boolean; // Whether provider supports `developer` role vs `system` (default: true)
	supportsReasoningEffort?: boolean; // Whether provider supports `reasoning_effort` (default: true)
	maxTokensField?: "max_completion_tokens" | "max_tokens"; // Which field name to use (default: max_completion_tokens)
}

If compat is not set, the library falls back to URL-based detection. If compat is partially set, unspecified fields use the detected defaults. This is useful for:

LiteLLM proxies: May not support store field
Custom inference servers: May use non-standard field names
Self-hosted endpoints: May have different feature support

Type Safety

Models are typed by their API, ensuring type-safe options:

// TypeScript knows this is an Anthropic model
const claude = getModel("anthropic", "claude-sonnet-4-20250514");

// So these options are type-checked for AnthropicOptions
await stream(claude, context, {
	thinkingEnabled: true, // ✓ Valid for anthropic-messages
	thinkingBudgetTokens: 2048, // ✓ Valid for anthropic-messages
	// reasoningEffort: 'high'  // ✗ TypeScript error: not valid for anthropic-messages
});

Cross-Provider Handoffs

The library supports seamless handoffs between different LLM providers within the same conversation. This allows you to switch models mid-conversation while preserving context, including thinking blocks, tool calls, and tool results.

How It Works

When messages from one provider are sent to a different provider, the library automatically transforms them for compatibility:

User and tool result messages are passed through unchanged
Assistant messages from the same provider/API are preserved as-is
Assistant messages from different providers have their thinking blocks converted to text with <thinking> tags
Tool calls and regular text are preserved unchanged

Example: Multi-Provider Conversation

import { getModel, complete, Context } from "@nghyane/pi-ai";

// Start with Claude
const claude = getModel("anthropic", "claude-sonnet-4-20250514");
const context: Context = {
	messages: [],
};

context.messages.push({ role: "user", content: "What is 25 * 18?" });
const claudeResponse = await complete(claude, context, {
	thinkingEnabled: true,
});
context.messages.push(claudeResponse);

// Switch to GPT-5 - it will see Claude's thinking as <thinking> tagged text
const gpt5 = getModel("openai", "gpt-5-mini");
context.messages.push({ role: "user", content: "Is that calculation correct?" });
const gptResponse = await complete(gpt5, context);
context.messages.push(gptResponse);

// Switch to Gemini
const gemini = getModel("google", "gemini-2.5-flash");
context.messages.push({ role: "user", content: "What was the original question?" });
const geminiResponse = await complete(gemini, context);

Provider Compatibility

All providers can handle messages from other providers, including:

Text content
Tool calls and tool results (including images in tool results)
Thinking/reasoning blocks (transformed to tagged text for cross-provider compatibility)
Aborted messages with partial content

This enables flexible workflows where you can:

Start with a fast model for initial responses
Switch to a more capable model for complex reasoning
Use specialized models for specific tasks
Maintain conversation continuity across provider outages

Context Serialization

The Context object can be easily serialized and deserialized using standard JSON methods, making it simple to persist conversations, implement chat history, or transfer contexts between services:

import { Context, getModel, complete } from "@nghyane/pi-ai";

// Create and use a context
const context: Context = {
	systemPrompt: "You are a helpful assistant.",
	messages: [{ role: "user", content: "What is TypeScript?" }],
};

const model = getModel("openai", "gpt-4o-mini");
const response = await complete(model, context);
context.messages.push(response);

// Serialize the entire context
const serialized = JSON.stringify(context);
console.log("Serialized context size:", serialized.length, "bytes");

// Save to database, localStorage, file, etc.
localStorage.setItem("conversation", serialized);

// Later: deserialize and continue the conversation
const restored: Context = JSON.parse(localStorage.getItem("conversation")!);
restored.messages.push({ role: "user", content: "Tell me more about its type system" });

// Continue with any model
const newModel = getModel("anthropic", "claude-haiku-4-5-20251001");
const continuation = await complete(newModel, restored);

Note: If the context contains images (encoded as base64 as shown in the Image Input section), those will also be serialized.

Browser Usage

The library supports browser environments. You must pass the API key explicitly since environment variables are not available in browsers:

import { getModel, complete } from "@nghyane/pi-ai";

// API key must be passed explicitly in browser
const model = getModel("anthropic", "claude-haiku-4-5-20251001");

const response = await complete(
	model,
	{
		messages: [{ role: "user", content: "Hello!" }],
	},
	{
		apiKey: "your-api-key",
	}
);

Security Warning: Exposing API keys in frontend code is dangerous. Anyone can extract and abuse your keys. Only use this approach for internal tools or demos. For production applications, use a backend proxy that keeps your API keys secure.

Environment Variables (Node.js only)

In Node.js environments, you can set environment variables to avoid passing API keys:

| Provider | Environment Variable(s) | -------------- | ---------------------------- | OpenAI | OPENAI_API_KEY | Anthropic | ANTHROPIC_API_KEY | Google | GEMINI_API_KEY | Vertex AI | GOOGLE_CLOUD_PROJECT | Mistral | MISTRAL_API_KEY | Groq | GROQ_API_KEY | Cerebras | CEREBRAS_API_KEY | Together | TOGETHER_API_KEY | Qianfan | QIANFAN_API_KEY | Hugging Face | HUGGINGFACE_HUB_TOKEN | Synthetic | SYNTHETIC_API_KEY | NVIDIA | NVIDIA_API_KEY | NanoGPT | NANO_GPT_API_KEY | Venice | VENICE_API_KEY | Moonshot | MOONSHOT_API_KEY | xAI | XAI_API_KEY | OpenRouter | OPENROUTER_API_KEY | LiteLLM | LITELLM_API_KEY | Ollama | OLLAMA_API_KEY | Qwen Portal | QWEN_OAUTH_TOKEN | zAI | ZAI_API_KEY | MiniMax Code | MINIMAX_CODE_API_KEY | Xiaomi MiMo | XIAOMI_API_KEY | vLLM | VLLM_API_KEY | Cloudflare AI Gateway | CLOUDFLARE_AI_G | GitHub Copilot | COPILOT_GITHUB_TOKEN | ------------------------------------------------ | | or ANTHROPIC_OAUTH_TOKEN | | (or GCLOUD_PROJECT) + GOOGLE_CLOUD_LOCATION + ADC | | | | | | or HF_TOKEN | | | | | | | | | (optional for local deployments) | or QWEN_PORTAL_API_KEY | | (international) or MINIMAX_CODE_CN_API_KEY (China) | | | ATEWAY_API_KEY | or GH_TOKEN or GITHUB_TOKEN |

For Cloudflare AI Gateway models, use provider base URL format https://gateway.ai.cloudflare.com/v1/<account>/<gateway>/anthropic.

Provider endpoint defaults for the current OpenAI-compatible integrations:

Together: https://api.together.xyz/v1
Moonshot: https://api.moonshot.ai/v1
Qianfan: https://qianfan.baidubce.com/v2
NVIDIA: https://integrate.api.nvidia.com/v1
NanoGPT: https://nano-gpt.com/api/v1
Hugging Face Inference: https://router.huggingface.co/v1
Venice: https://api.venice.ai/api/v1
Xiaomi MiMo: https://api.xiaomimimo.com/anthropic
vLLM: http://127.0.0.1:8000/v1
Ollama: local OpenAI-compatible runtime
LiteLLM: http://localhost:4000/v1
Cloudflare AI Gateway: https://gateway.ai.cloudflare.com/v1/<account>/<gateway>/anthropic
Qwen Portal: https://portal.qwen.ai/v1 When set, the library automatically uses these keys:

// Uses OPENAI_API_KEY from environment
const model = getModel("openai", "gpt-4o-mini");
const response = await complete(model, context);

// Or override with explicit key
const response = await complete(model, context, {
	apiKey: "sk-different-key",
});

Checking Environment Variables

import { getEnvApiKey } from "@nghyane/pi-ai";

// Check if an API key is set in environment variables
const key = getEnvApiKey("openai"); // checks OPENAI_API_KEY

OAuth Providers

Several providers support OAuth authentication (some also support static API keys):

Anthropic (Claude Pro/Max subscription)
OpenAI Codex (ChatGPT Plus/Pro subscription, access to GPT-5.x Codex models)
GitHub Copilot (Copilot subscription)
Google Gemini CLI (Gemini 2.0/2.5 via Google Cloud Code Assist; free tier or paid subscription)
Antigravity (Free Gemini 3, Claude, GPT-OSS via Google Cloud)
Qwen Portal (Qwen OAuth token or API key)

For paid Cloud Code Assist subscriptions, set GOOGLE_CLOUD_PROJECT or GOOGLE_CLOUD_PROJECT_ID to your project ID.

Vertex AI (ADC)

Vertex AI models use Application Default Credentials (ADC):

Local development: Run gcloud auth application-default login
CI/Production: Set GOOGLE_APPLICATION_CREDENTIALS to point to a service account JSON key file

Also set GOOGLE_CLOUD_PROJECT (or GCLOUD_PROJECT) and GOOGLE_CLOUD_LOCATION. You can also pass project/location in the call options.

Example:

# Local (uses your user credentials)
gcloud auth application-default login
export GOOGLE_CLOUD_PROJECT="my-project"
export GOOGLE_CLOUD_LOCATION="us-central1"

# CI/Production (service account key file)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

import { getModel, complete } from "@nghyane/pi-ai";

(async () => {
	const model = getModel("google-vertex", "gemini-2.5-flash");
	const response = await complete(model, {
		messages: [{ role: "user", content: "Hello from Vertex AI" }],
	});

	for (const block of response.content) {
		if (block.type === "text") console.log(block.text);
	}
})().catch(console.error);

Official docs: Application Default Credentials

CLI Login

The quickest way to authenticate:

bunx @nghyane/pi-ai login              # interactive provider selection
bunx @nghyane/pi-ai login anthropic    # login to specific provider
bunx @nghyane/pi-ai login vllm         # store vLLM API key (or placeholder for local no-auth)
bunx @nghyane/pi-ai list               # list available providers

Credentials are saved to agent.db in the agent directory. /login qianfan opens the Qianfan console and stores the pasted API key.

login supports OAuth providers (Anthropic, OpenAI Codex, GitHub Copilot, Gemini CLI, Antigravity) and API-key onboarding flows.

For the current OpenAI-compatible integrations, API-key onboarding covers Together, Moonshot, Qianfan, NVIDIA, NanoGPT, Hugging Face, Venice, Xiaomi, vLLM, LiteLLM, Cloudflare AI Gateway, and Qwen Portal. Ollama is typically local and unauthenticated; set OLLAMA_API_KEY only when your Ollama deployment enforces bearer auth.

Programmatic OAuth

The library provides login and token refresh functions. Credential storage is the caller's responsibility.

import {
	// Login functions (return credentials, do not store)
	loginAnthropic,
	loginOpenAICodex,
	loginGitHubCopilot,
	loginGeminiCli,
	loginAntigravity,
	loginCloudflareAiGateway,
	loginHuggingface,
	loginLiteLLM,
	loginMoonshot,
	loginNvidia,
	loginNanoGPT,
	loginQianfan,
	loginQwenPortal,
	loginTogether,
	loginVenice,
	loginVllm,
	loginXiaomi,

	// Token management
	refreshOAuthToken, // (provider, credentials) => new credentials
	getOAuthApiKey, // (provider, credentialsMap) => { newCredentials, apiKey } | null

	// Types
	type OAuthProvider, // includes 'anthropic', 'openai-codex', 'github-copilot', 'google-gemini-cli', 'google-antigravity', 'together', 'moonshot', 'qianfan', 'nvidia', 'nanogpt', 'huggingface', 'venice', 'xiaomi', 'vllm', 'litellm', 'cloudflare-ai-gateway', 'qwen-portal', ...
	type OAuthCredentials,
} from "@nghyane/pi-ai";

loginOpenAICodex accepts an optional originator value used in the OAuth flow:

await loginOpenAICodex({
	onAuth: ({ url }) => console.log(url),
	originator: "my-cli",
});

Login Flow Example

import { loginGitHubCopilot } from "@nghyane/pi-ai";
import * as fs from "node:fs";

const credentials = await loginGitHubCopilot({
	onAuth: (url, instructions) => {
		console.log(`Open: ${url}`);
		if (instructions) console.log(instructions);
	},
	onPrompt: async (prompt) => {
		return await getUserInput(prompt.message);
	},
	onProgress: (message) => console.log(message),
});

// Store credentials yourself
const auth = { "github-copilot": { type: "oauth", ...credentials } };
fs.writeFileSync("credentials.json", JSON.stringify(auth, null, 2));

Using OAuth Tokens

Use getOAuthApiKey() to get an API key, automatically refreshing if expired:

import { getModel, complete, getOAuthApiKey } from "@nghyane/pi-ai";
import * as fs from "node:fs";

// Load your stored credentials
const auth = JSON.parse(fs.readFileSync("credentials.json", "utf-8"));

// Get API key (refreshes if expired)
const result = await getOAuthApiKey("github-copilot", auth);
if (!result) throw new Error("Not logged in");

// Save refreshed credentials
auth["github-copilot"] = { type: "oauth", ...result.newCredentials };
fs.writeFileSync("credentials.json", JSON.stringify(auth, null, 2));

// Use the API key
const model = getModel("github-copilot", "gpt-4o");
const response = await complete(
	model,
	{
		messages: [{ role: "user", content: "Hello!" }],
	},
	{ apiKey: result.apiKey }
);

Provider Notes

OpenAI Codex: Requires a ChatGPT Plus or Pro subscription. Provides access to GPT-5.x Codex models with extended context windows and reasoning capabilities. The library automatically handles session-based prompt caching when sessionId is provided in stream options.

GitHub Copilot: If you get "The requested model is not supported" error, enable the model manually in VS Code: open Copilot Chat, click the model selector, select the model (warning icon), and click "Enable".

Google Gemini CLI / Antigravity: These use Google Cloud OAuth. The apiKey returned by getOAuthApiKey() is a JSON string containing both the token and project ID, which the library handles automatically.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@nghyane/pi-ai

Table of Contents

Supported Providers

Installation

Quick Start

Tools

Defining Tools

Handling Tool Calls

Streaming Tool Calls with Partial JSON

Validating Tool Arguments

Complete Event Reference

Image Input

Thinking/Reasoning

Unified Interface (streamSimple/completeSimple)

Provider-Specific Options (stream/complete)

Streaming Thinking Content

Stop Reasons

Error Handling

Aborting Requests

Continuing After Abort

Common Stream Options

APIs, Models, and Providers

Providers and Models

Querying Providers and Models

Custom Models

OpenAI Compatibility Settings

Type Safety

Cross-Provider Handoffs

How It Works

Example: Multi-Provider Conversation

Provider Compatibility

Context Serialization

Browser Usage

Environment Variables (Node.js only)

Checking Environment Variables

OAuth Providers

Vertex AI (ADC)

CLI Login

Programmatic OAuth

Login Flow Example

Using OAuth Tokens

Provider Notes

License