aio-llm
v1.0.6
Published
All-In-One LLM Framework - Multi-provider LLM integration with auto-fallback, priority management, multimodal support, and XML-based tool calling
Maintainers
Readme
AIO
All-In-One LLM Framework - Multi-provider LLM integration với auto-fallback, priority management, multimodal support và structured outputs cho JavaScript/TypeScript.
✨ Tính năng
- 🔄 Multi-Provider: Hỗ trợ 5 providers (OpenRouter, Groq, Cerebras, Google AI, Nvidia)
- 🎯 Priority Management: Quản lý độ ưu tiên cho providers, models và API keys
- 🔁 Auto Fallback: Tự động chuyển sang provider/model khác khi fail
- 🔑 Key Rotation: Tự động thử các API keys khác khi key hiện tại fail
- 🖼️ Multimodal Support: Hỗ trợ images, video, audio, PDF (Google AI)
- 📊 Structured Outputs: JSON mode và JSON Schema validation
- 🛠️ Tool Calling: Text-based tool calling với streaming, validation, retry
- 🌊 Streaming: Hỗ trợ streaming responses với abort
- 🛑 Abort Control: Cancel requests bất kỳ lúc nào
- 💪 TypeScript: Full TypeScript support với type definitions
- 📝 Logging & Validation: Winston logger và Zod validation
- 🔄 Retry Logic: Exponential backoff retry với error classification
📦 Cài đặt
npm install aio🚀 Quick Start
1. Basic Usage
import { AIO } from "aio";
const aio = new AIO({
providers: [
{
provider: "openrouter",
apiKeys: [{ key: "sk-or-v1-xxx" }],
models: [{ modelId: "arcee-ai/trinity-large-preview:free" }],
},
],
});
const response = await aio.chatCompletion({
provider: "openrouter",
model: "arcee-ai/trinity-large-preview:free",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);2. Auto Mode với Fallback
const aio = new AIO({
providers: [
{
provider: "groq",
apiKeys: [{ key: "gsk_xxx" }],
models: [{ modelId: "llama-3.3-70b-versatile" }],
priority: 10, // Ưu tiên cao nhất
},
{
provider: "cerebras",
apiKeys: [{ key: "csk_xxx" }],
models: [{ modelId: "llama3.1-8b" }],
priority: 8, // Fallback
},
],
autoMode: true, // Bật auto mode
});
// Không cần chỉ định provider/model
const response = await aio.chatCompletion({
messages: [
{ role: "user", content: "Hello!" },
],
});
// AIO tự động chọn Groq trước, nếu fail sẽ fallback sang Cerebras3. Priority Management
const aio = new AIO({
providers: [
{
provider: "groq",
apiKeys: [
{ key: "gsk_primary", priority: 100 }, // Key chính
{ key: "gsk_backup1", priority: 50 }, // Backup 1
{ key: "gsk_backup2", priority: 10 }, // Backup 2
],
models: [
{ modelId: "llama-3.3-70b-versatile", priority: 100 }, // Model tốt nhất
{ modelId: "llama-3.1-8b-instant", priority: 50 }, // Model nhanh hơn
],
priority: 100, // Provider priority
},
],
autoMode: true,
});
// AIO sẽ thử theo thứ tự:
// 1. groq:llama-3.3-70b-versatile với gsk_primary
// 2. Nếu fail → thử gsk_backup1
// 3. Nếu fail → thử gsk_backup2
// 4. Nếu fail → thử groq:llama-3.1-8b-instant4. Streaming
await aio.streamChatCompletion(
{
provider: "openrouter",
model: "arcee-ai/trinity-large-preview:free",
messages: [{ role: "user", content: "Write a poem" }],
},
(chunk) => {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
},
(error) => {
if (error) console.error("Error:", error);
else console.log("\nDone!");
}
);5. Multimodal Input (Google AI Only)
// Image from base64
const response = await aio.chatCompletion({
provider: "google-ai",
model: "gemini-1.5-flash",
messages: [
{
role: "user",
content: [
{ type: "text", text: "Describe this image" },
{
type: "image",
source: {
type: "base64",
media_type: "image/jpeg",
data: "base64_encoded_image_data",
},
},
],
},
],
});
// Image from URL
const response2 = await aio.chatCompletion({
provider: "google-ai",
model: "gemini-1.5-flash",
messages: [
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{
type: "image",
source: {
type: "url",
media_type: "image/jpeg",
url: "https://example.com/image.jpg",
},
},
],
},
],
});
// PDF, Video, Audio
const response3 = await aio.chatCompletion({
provider: "google-ai",
model: "gemini-1.5-flash",
messages: [
{
role: "user",
content: [
{ type: "text", text: "Summarize this PDF" },
{
type: "file",
source: {
type: "base64",
media_type: "application/pdf",
data: "base64_encoded_pdf_data",
},
},
],
},
],
});6. Structured Outputs (JSON Mode)
// JSON Object Mode
const response = await aio.chatCompletion({
provider: "openrouter",
model: "arcee-ai/trinity-large-preview:free",
messages: [
{
role: "user",
content: "Return a JSON with name, age, city for John, 25, New York",
},
],
response_format: { type: "json_object" },
});
const data = JSON.parse(response.choices[0].message.content);
console.log(data); // { name: "John", age: 25, city: "New York" }7. Structured Outputs (JSON Schema)
const response = await aio.chatCompletion({
provider: "openrouter",
model: "arcee-ai/trinity-large-preview:free",
messages: [
{
role: "user",
content: "Extract: iPhone 15 Pro - Great camera, expensive. Rating: 4.5/5",
},
],
response_format: {
type: "json_schema",
json_schema: {
name: "product_review",
strict: true,
schema: {
type: "object",
properties: {
product_name: { type: "string" },
rating: { type: "number" },
sentiment: {
type: "string",
enum: ["positive", "negative", "neutral"],
},
key_features: {
type: "array",
items: { type: "string" },
},
},
required: ["product_name", "rating", "sentiment", "key_features"],
additionalProperties: false,
},
},
},
});
const data = JSON.parse(response.choices[0].message.content);
// Guaranteed to match schema!8. System Prompt
const response = await aio.chatCompletion({
provider: "openrouter",
model: "arcee-ai/trinity-large-preview:free",
systemPrompt: "You are a helpful assistant that always responds in JSON format",
messages: [{ role: "user", content: "What is 2+2?" }],
});9. Advanced Parameters
const response = await aio.chatCompletion({
provider: "google-ai",
model: "gemini-1.5-flash",
messages: [{ role: "user", content: "Tell me a story" }],
temperature: 0.7,
max_tokens: 1000,
top_p: 0.9,
top_k: 40, // Only for Google AI and OpenRouter
stop: ["END", "STOP"],
});🆓 Nvidia Provider - Free Kimi K2.5
Nvidia cung cấp Kimi K2.5 hoàn toàn miễn phí thông qua OpenAI-compatible API:
import { AIO } from "aio";
const aio = new AIO({
providers: [
{
provider: "nvidia",
apiKeys: [{ key: process.env.NVIDIA_API_KEY }],
models: [{ modelId: "moonshotai/kimi-k2.5" }],
},
],
});
const response = await aio.chatCompletion({
provider: "nvidia",
model: "moonshotai/kimi-k2.5",
messages: [{ role: "user", content: "Explain quantum computing" }],
temperature: 0.7,
});Đăng ký API key miễn phí:
- Truy cập: https://build.nvidia.com/settings/api-keys
- Đăng ký và lấy API key
- Base URL:
https://integrate.api.nvidia.com/v1/chat/completions - Model ID:
moonshotai/kimi-k2.5
Tính năng:
- ✅ Hoàn toàn miễn phí
- ✅ OpenAI-compatible API
- ✅ Hỗ trợ streaming
- ✅ JSON response format
- ✅ Tích hợp sẵn trong AIO Framework
🛠️ Tool Calling (NEW in v1.0.1)
AIO Framework hỗ trợ text-based tool calling với streaming real-time. Framework tự động parse [tool]...[/tool] tags, validate parameters, retry on errors, và track execution metadata.
Quick Start
import { AIO } from "aio";
const aio = new AIO({
providers: [
{
provider: "google-ai",
apiKeys: [{ key: "your-api-key" }],
models: [{ modelId: "gemini-flash-latest" }],
},
],
});
// 1. Define tools
const tools = [
{
name: "get_weather",
description: "Get current weather for a city",
parameters: {
city: {
type: "string",
description: "City name",
required: true,
},
unit: {
type: "string",
description: "Temperature unit",
required: false,
enum: ["celsius", "fahrenheit"],
default: "celsius", // Auto-applied if not provided
},
},
},
];
// 2. Implement tool handler
async function handleToolCall(call) {
console.log(`🔧 Calling: ${call.name}`, call.params);
if (call.name === "get_weather") {
// Your tool logic here
return {
temperature: 22,
condition: "Sunny",
unit: call.params.unit,
};
}
throw new Error(`Unknown tool: ${call.name}`);
}
// 3. Start streaming with tools
const stream = await aio.chatCompletionStream({
provider: "google-ai",
model: "gemini-flash-latest",
messages: [
{ role: "user", content: "What's the weather in Tokyo?" }
],
tools,
onToolCall: handleToolCall,
maxToolIterations: 5, // Default: 5
});
// 4. Process events
stream.on("data", (chunk) => {
const data = JSON.parse(chunk.toString().slice(6));
if (data.tool_call) {
// Tool call event: pending, executing, success, error
console.log("Tool:", data.tool_call.type);
} else if (data.choices[0].delta.content) {
// Text content
process.stdout.write(data.choices[0].delta.content);
}
});
stream.on("end", () => console.log("\n✅ Done!"));Automatic Features
1. Parameter Validation
Framework tự động validate:
- ✅ Required parameters
- ✅ Enum values
- ✅ Unknown parameters
// Tool definition
{
name: "set_temperature",
parameters: {
value: { type: "number", required: true },
unit: { type: "string", enum: ["C", "F"], required: true }
}
}
// AI calls with invalid enum
[tool]{"name": "set_temperature", "params": {"value": 25, "unit": "Kelvin"}}[/tool]
// Framework returns error
[tool_result]
Tool: set_temperature
Success: false
Error: Invalid value for unit. Must be one of: C, F
Suggestion: Check the tool definition and provide all required parameters.
[/tool_result]2. Default Values
{
parameters: {
limit: { type: "number", default: 10 },
unit: { type: "string", enum: ["celsius", "fahrenheit"], default: "celsius" }
}
}
// AI calls without defaults
{"name": "search", "params": {"query": "test"}}
// Framework applies automatically
{"name": "search", "params": {"query": "test", "limit": 10}}3. Retry Logic
Framework automatically retries up to 3 times với exponential backoff:
async function handleToolCall(call) {
// Simulate transient error
if (Math.random() < 0.5) {
throw new Error("Temporary network error");
}
return { success: true };
}
// Framework retries:
// Attempt 1: Immediate
// Attempt 2: Wait 1s
// Attempt 3: Wait 2s
// Attempt 4: Wait 4s (max 5s)4. Execution Metadata
[tool_result]
Tool: get_weather
Success: true
Data: {"temperature": 22, "condition": "Sunny"}
Execution Time: 1234ms
Retries: 1
[/tool_result]Multi-Step Tool Chaining
AI tự động chain tools để hoàn thành complex tasks:
const tools = [
{
name: "search_docs",
description: "Search documentation",
parameters: {
query: { type: "string", required: true }
}
},
{
name: "read_file",
description: "Read file content",
parameters: {
path: { type: "string", required: true }
}
}
];
// User: "Find and read the authentication guide"
// AI automatically:
// 1. Calls search_docs → Gets file path
// 2. Calls read_file → Gets content
// 3. Answers question with contentTool Call Events
Framework emits SSE events cho mỗi tool call:
// 1. Tool Call Pending
{
"tool_call": {
"type": "pending"
}
}
// 2. Tool Call Executing
{
"tool_call": {
"type": "executing",
"call": {
"name": "get_weather",
"params": {"city": "Tokyo", "unit": "celsius"}
}
}
}
// 3. Tool Call Success
{
"tool_call": {
"type": "success",
"call": {...},
"result": {
"temperature": 22,
"condition": "Sunny"
}
}
}
// 4. Tool Call Error
{
"tool_call": {
"type": "error",
"call": {...},
"error": "Weather API temporarily unavailable"
}
}Advanced Tool Definition
{
name: "search_database",
description: "Search database with filters",
parameters: {
query: {
type: "string",
description: "Search query",
required: true,
},
limit: {
type: "number",
description: "Max results",
required: false,
default: 10, // Auto-applied
},
sort_by: {
type: "string",
description: "Sort field",
required: false,
enum: ["date", "relevance", "popularity"], // Validated
default: "relevance",
},
filters: {
type: "object",
description: "Additional filters",
required: false,
},
},
requireReasoning: true, // Force AI to explain why calling this tool
}Configuration
const stream = await aio.chatCompletionStream({
messages: [...],
tools: [...],
onToolCall: handleToolCall,
maxToolIterations: 10, // Default: 5 (max tool call loops)
signal: abortController.signal, // Cancel anytime
});Best Practices
- Force Reasoning - Require explanation parameter:
{
name: "delete_file",
parameters: {
path: { type: "string", required: true },
reasoning: {
type: "string",
description: "Explain why you need to delete this file",
required: true
}
}
}- Clear Descriptions - Be specific:
// ✅ Good
description: "Search codebase for function definitions matching the query"
// ❌ Bad
description: "Search stuff"- Use Enums - Prevent invalid values:
{
sort_by: {
type: "string",
enum: ["date", "relevance", "popularity"],
default: "relevance"
}
}- Provide Suggestions - Help AI recover from errors:
async function handleToolCall(call) {
if (call.name === "read_file") {
if (!fs.existsSync(call.params.path)) {
throw new Error(
`File not found: ${call.params.path}. ` +
`Did you mean: ${suggestSimilarFiles(call.params.path).join(", ")}?`
);
}
}
}Documentation
- 📖 Tool Calling User Guide - Detailed usage guide
- 🏗️ Tool Calling Architecture - Architecture comparison với Cursor, OpenAI
- 📝 Tool Calling History - How AI remembers tool calls and results
- 💡 Improvements Summary - What's new and why
Examples
examples/tool-test-simple.ts- Basic tool callingexamples/tool-calling.ts- Complex multi-tool exampleexamples/tool-test-validation.ts- Validation & retry exampleexamples/tool-test-history.ts- History management demonstration
Comparison with Native Function Calling
| Feature | AIO Text-based | OpenAI Function Calling | |---------|----------------|-------------------------| | Provider Support | ✅ Any LLM | ❌ OpenAI, Anthropic only | | Streaming | ✅ Yes (only) | ✅ Yes | | Validation | ✅ Built-in | ✅ JSON Schema | | Retry | ✅ Automatic (3x) | ❌ Manual | | Metadata | ✅ Execution time, retry count | ❌ No | | Default Values | ✅ Automatic | ❌ Manual | | Format | Text tags | Native API |
📚 API Reference
AIO Class
Constructor
new AIO(config: AIOConfig)Methods
chatCompletion(request: ChatCompletionRequest): Promise<ChatCompletionResponse>chatCompletionStream(request: ChatCompletionRequest): AsyncGenerator<StreamChunk>validateApiKey(provider: Provider, apiKey: string): Promise<boolean>
Types
AIOConfig
interface AIOConfig {
providers: ProviderConfig[];
autoMode?: boolean; // Default: false
maxRetries?: number; // Default: 3
retryDelay?: number; // Default: 1000ms
}ProviderConfig
interface ProviderConfig {
provider: Provider; // "openrouter" | "groq" | "cerebras" | "google-ai"
apiKeys: ApiKey[];
models: ModelConfig[];
priority?: number; // Default: 0 (cao hơn = ưu tiên hơn)
isActive?: boolean; // Default: true
}ApiKey
interface ApiKey {
key: string;
priority?: number; // Default: 0
isActive?: boolean; // Default: true
dailyLimit?: number;
requestsToday?: number;
}ModelConfig
interface ModelConfig {
modelId: string;
priority?: number; // Default: 0
isActive?: boolean; // Default: true
}ChatCompletionRequest
interface ChatCompletionRequest {
messages: Message[];
temperature?: number;
maxTokens?: number;
// Direct mode
provider?: Provider;
modelId?: string;
}🎯 Supported Providers
| Provider | Base URL | Models | |----------|----------|--------| | OpenRouter | https://openrouter.ai/api/v1 | 30+ free models | | Groq | https://api.groq.com/openai/v1 | llama-3.3-70b, llama-3.1-8b, etc. | | Cerebras | https://api.cerebras.ai/v1 | llama3.1-8b, llama3.1-70b | | Google AI | https://generativelanguage.googleapis.com | gemini-1.5-flash, gemini-1.5-pro | | Nvidia | https://integrate.api.nvidia.com/v1 | moonshotai/kimi-k2.5 (FREE) |
📖 Examples
Xem thêm examples trong thư mục examples/:
basic.ts- Basic usage với direct modeauto-mode.ts- Auto mode với fallbackpriority.ts- Priority managementstreaming.ts- Streaming responses
Chạy examples:
npm run example:basic
npm run example:auto
npm run example:priority🛠️ Development
# Install dependencies
npm install
# Build
npm run build
# Run examples
npm run dev📄 License
MIT
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
🛑 Abort/Cancel Requests
Cancel Non-Streaming Request
const controller = new AbortController();
// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000);
try {
const response = await aio.chatCompletion({
provider: "openrouter",
model: "openrouter/pony-alpha",
messages: [{ role: "user", content: "Long task..." }],
signal: controller.signal, // Pass abort signal
});
} catch (error) {
if (error.message.includes("cancel")) {
console.log("Request was cancelled");
}
}Cancel Streaming Request
const controller = new AbortController();
const stream = await aio.chatCompletionStream({
provider: "openrouter",
model: "openrouter/pony-alpha",
messages: [{ role: "user", content: "Count to 100" }],
signal: controller.signal,
});
let chunks = 0;
for await (const chunk of stream) {
chunks++;
if (chunks >= 10) {
controller.abort(); // Cancel after 10 chunks
break;
}
}Pre-cancelled Request
const controller = new AbortController();
controller.abort(); // Cancel before calling
try {
await aio.chatCompletion({
provider: "openrouter",
model: "openrouter/pony-alpha",
messages: [{ role: "user", content: "Test" }],
signal: controller.signal,
});
} catch (error) {
console.log("Request was pre-cancelled");
}📊 Key Statistics
// Get key stats for a provider
const stats = aio.getKeyStats("openrouter");
console.log(stats);
// {
// total: 3,
// active: 2,
// disabled: 1,
// totalUsage: 150,
// totalErrors: 5
// }
// Reset daily counters (call this daily)
aio.resetDailyCounters();
// Get config summary
const summary = aio.getConfigSummary();
console.log(summary);
// {
// providers: 2,
// totalKeys: 5,
// totalModels: 8,
// autoMode: true,
// maxRetries: 3
// }🔧 Configuration Options
interface AIOConfig {
providers: ProviderConfig[];
autoMode?: boolean; // Default: false
maxRetries?: number; // Default: 3
retryDelay?: number; // Default: 1000ms
enableLogging?: boolean; // Default: true
enableValidation?: boolean; // Default: true
}
interface ApiKey {
key: string;
priority?: number; // Higher = preferred (default: 0)
isActive?: boolean; // Default: true
dailyLimit?: number; // Max requests per day
requestsToday?: number; // Current usage
errorCount?: number; // Consecutive errors
lastError?: string; // Last error message
lastUsed?: Date; // Last usage timestamp
}🎯 Error Classification
Framework tự động phân loại lỗi:
- rate_limit: Rate limit exceeded (retryable, rotate key)
- auth: Authentication failed (not retryable, rotate key)
- invalid_request: Bad request (not retryable, don't rotate)
- server: Server error 5xx (retryable, don't rotate)
- network: Network timeout (retryable, don't rotate)
- unknown: Unknown error
const errorInfo = AIOError.classify(error);
console.log(errorInfo);
// {
// isRetryable: true,
// shouldRotateKey: true,
// category: "rate_limit"
// }📁 Project Structure
aio-framework/
├── src/
│ ├── aio.ts # Main AIO class (284 lines)
│ ├── types.ts # TypeScript types
│ ├── index.ts # Public exports
│ ├── core/ # Core logic modules
│ │ ├── auto-mode.ts # Auto fallback logic
│ │ ├── direct-mode.ts # Direct mode with retry
│ │ └── stream-handler.ts # Streaming logic
│ ├── providers/ # Provider implementations
│ │ ├── base.ts
│ │ ├── openrouter.ts
│ │ ├── groq.ts
│ │ ├── cerebras.ts
│ │ └── google-ai.ts
│ └── utils/ # Utilities
│ ├── logger.ts # Winston logger
│ ├── retry.ts # Retry logic
│ ├── validation.ts # Zod schemas
│ ├── key-manager.ts # Key management
│ └── abort-manager.ts # Abort controller manager
└── examples/
├── basic.ts
├── streaming.ts
├── auto-mode.ts
├── priority.ts
├── test-simple.ts
├── test-new-features.ts
└── test-abort-simple.ts🧪 Testing
# Simple test
npm run build
npx tsx examples/test-simple.ts
# Test all new features
npx tsx examples/test-new-features.ts
# Test abort functionality
npx tsx examples/test-abort-simple.ts📝 License
MIT
