@aid-on/unillm
v0.5.1
Published
Edge-native unified LLM provider - pure fetch API, minimal dependencies (zod), WebStreams, memory optimization for Cloudflare Workers and edge computing environments
Maintainers
Readme
@aid-on/unillm
unillm is a unified LLM interface for edge computing. It provides a consistent, type-safe API across multiple LLM providers with minimal dependencies and optimized memory usage for edge environments.
日本語 | English
Features
- 🚀 Edge-First: ~50KB bundle size, ~10ms cold start, optimized for edge runtimes
- 🔄 Unified Interface: Single API for Anthropic, OpenAI, Groq, Gemini, Cloudflare, and more
- 🌊 Streaming Native: Built on Web Streams API with nagare integration
- 🎯 Type-Safe: Full TypeScript support with Zod schema validation
- 📦 Minimal Dependencies: Only Zod (~11KB) required
- ⚡ Memory Optimized: Automatic chunking and backpressure handling
Installation
npm install @aid-on/unillmyarn add @aid-on/unillmpnpm add @aid-on/unillmQuick Start
import { unillm } from "@aid-on/unillm";
// Fluent API with type safety
const response = await unillm()
.model("openai:gpt-4o-mini")
.credentials({ openaiApiKey: process.env.OPENAI_API_KEY })
.temperature(0.7)
.generate("Explain quantum computing in simple terms");
console.log(response.text);Streaming with nagare
unillm returns @aid-on/nagare Stream<T> for reactive stream processing:
import { unillm } from "@aid-on/unillm";
import type { Stream } from "@aid-on/nagare";
const stream: Stream<string> = await unillm()
.model("groq:llama-3.3-70b-versatile")
.credentials({ groqApiKey: "..." })
.stream("Write a story about AI");
// Use nagare's reactive operators
const enhanced = stream
.map(chunk => chunk.trim())
.filter(chunk => chunk.length > 0)
.throttle(16) // ~60fps for UI updates
.tap(chunk => console.log(chunk))
.toSSE(); // Convert to Server-Sent EventsStructured Output
Generate type-safe structured data with Zod schemas:
import { z } from "zod";
const PersonSchema = z.object({
name: z.string(),
age: z.number(),
skills: z.array(z.string())
});
const result = await unillm()
.model("groq:llama-3.1-8b-instant")
.credentials({ groqApiKey: "..." })
.schema(PersonSchema)
.generate("Generate a software engineer profile");
// Type-safe access
console.log(result.object.name); // string
console.log(result.object.skills); // string[]Provider Shortcuts
Ultra-concise syntax for common models:
import { anthropic, openai, groq, gemini, cloudflare } from "@aid-on/unillm";
// One-liners for quick prototyping
await anthropic.sonnet("sk-ant-...").generate("Hello");
await openai.mini("sk-...").generate("Hello");
await groq.instant("gsk_...").generate("Hello");
await gemini.flash("AIza...").generate("Hello");
await cloudflare.llama({ accountId: "...", apiToken: "..." }).generate("Hello");Supported Models (48 Models)
Anthropic (8 models) - v0.4.0
anthropic:claude-opus-4-5-20251101- Claude Opus 4.5 (Most Intelligent)anthropic:claude-haiku-4-5-20251001- Claude Haiku 4.5 (Ultra Fast)anthropic:claude-sonnet-4-5-20250929- Claude Sonnet 4.5 (Best for Coding)anthropic:claude-opus-4-1-20250805- Claude Opus 4.1anthropic:claude-opus-4-20250514- Claude Opus 4anthropic:claude-sonnet-4-20250514- Claude Sonnet 4anthropic:claude-3-5-haiku-20241022- Claude 3.5 Haikuanthropic:claude-3-haiku-20240307- Claude 3 Haiku
OpenAI (9 models)
openai:gpt-4o- GPT-4o (Latest, fastest GPT-4)openai:gpt-4o-mini- GPT-4o Mini (Cost-effective)openai:gpt-4o-2024-11-20- GPT-4o November snapshotopenai:gpt-4o-2024-08-06- GPT-4o August snapshotopenai:gpt-4-turbo- GPT-4 Turbo (High capability)openai:gpt-4-turbo-preview- GPT-4 Turbo Previewopenai:gpt-4- GPT-4 (Original)openai:gpt-3.5-turbo- GPT-3.5 Turbo (Fast & cheap)openai:gpt-3.5-turbo-0125- GPT-3.5 Turbo Latest
Groq (7 models)
groq:llama-3.3-70b-versatile- Llama 3.3 70B Versatilegroq:llama-3.1-8b-instant- Llama 3.1 8B Instantgroq:meta-llama/llama-guard-4-12b- Llama Guard 4 12Bgroq:openai/gpt-oss-120b- GPT-OSS 120Bgroq:openai/gpt-oss-20b- GPT-OSS 20Bgroq:groq/compound- Groq Compoundgroq:groq/compound-mini- Groq Compound Mini
Google Gemini (8 models)
gemini:gemini-3-pro-preview- Gemini 3 Pro Previewgemini:gemini-3-flash-preview- Gemini 3 Flash Previewgemini:gemini-2.5-pro- Gemini 2.5 Progemini:gemini-2.5-flash- Gemini 2.5 Flashgemini:gemini-2.0-flash- Gemini 2.0 Flashgemini:gemini-2.0-flash-lite- Gemini 2.0 Flash Litegemini:gemini-1.5-pro-002- Gemini 1.5 Pro 002gemini:gemini-1.5-flash-002- Gemini 1.5 Flash 002
Cloudflare Workers AI (13 models)
cloudflare:@cf/meta/llama-4-scout-17b-16e-instruct- Llama 4 Scoutcloudflare:@cf/meta/llama-3.3-70b-instruct-fp8-fast- Llama 3.3 70B FP8cloudflare:@cf/meta/llama-3.1-70b-instruct- Llama 3.1 70Bcloudflare:@cf/meta/llama-3.1-8b-instruct-fast- Llama 3.1 8B Fastcloudflare:@cf/meta/llama-3.1-8b-instruct- Llama 3.1 8Bcloudflare:@cf/openai/gpt-oss-120b- GPT-OSS 120Bcloudflare:@cf/openai/gpt-oss-20b- GPT-OSS 20Bcloudflare:@cf/ibm/granite-4.0-h-micro- IBM Granite 4.0cloudflare:@cf/mistralai/mistral-small-3.1-24b-instruct- Mistral Small 3.1cloudflare:@cf/mistralai/mistral-7b-instruct-v0.2- Mistral 7Bcloudflare:@cf/google/gemma-3-12b-it- Gemma 3 12Bcloudflare:@cf/qwen/qwq-32b- QwQ 32Bcloudflare:@cf/qwen/qwen2.5-coder-32b-instruct- Qwen 2.5 Coder
Advanced Usage
Fluent Builder Pattern
const builder = unillm()
.model("groq:llama-3.3-70b-versatile")
.credentials({ groqApiKey: "..." })
.temperature(0.7)
.maxTokens(1000)
.topP(0.9)
.system("You are a helpful assistant")
.messages([
{ role: "user", content: "Previous question..." },
{ role: "assistant", content: "Previous answer..." }
]);
// Reusable configuration
const response1 = await builder.generate("New question");
const response2 = await builder.stream("Another question");Memory Optimization
Automatic memory management for edge environments:
import { createMemoryOptimizedStream } from "@aid-on/unillm";
const stream = await createMemoryOptimizedStream(
largeResponse,
{
maxMemory: 1024 * 1024, // 1MB limit
chunkSize: 512 // Optimal chunk size
}
);Error Handling
import { UnillmError, RateLimitError } from "@aid-on/unillm";
try {
const response = await unillm()
.model("groq:llama-3.3-70b-versatile")
.credentials({ groqApiKey: "..." })
.generate("Hello");
} catch (error) {
if (error instanceof RateLimitError) {
console.log(`Rate limited. Retry after ${error.retryAfter}ms`);
} else if (error instanceof UnillmError) {
console.log(`LLM error: ${error.message}`);
}
}Integration Examples
With React
import { useState } from "react";
import { unillm } from "@aid-on/unillm";
export default function ChatComponent() {
const [response, setResponse] = useState("");
const [loading, setLoading] = useState(false);
const handleGenerate = async () => {
setLoading(true);
const stream = await unillm()
.model("groq:llama-3.1-8b-instant")
.credentials({ groqApiKey: import.meta.env.VITE_GROQ_API_KEY })
.stream("Write a haiku");
for await (const chunk of stream) {
setResponse(prev => prev + chunk);
}
setLoading(false);
};
return (
<div>
<button onClick={handleGenerate} disabled={loading}>
{loading ? "Generating..." : "Generate"}
</button>
<p>{response}</p>
</div>
);
}With Cloudflare Workers
export default {
async fetch(request: Request, env: Env) {
const stream = await unillm()
.model("cloudflare:@cf/meta/llama-3.1-8b-instruct")
.credentials({
accountId: env.CF_ACCOUNT_ID,
apiToken: env.CF_API_TOKEN
})
.stream("Hello from the edge!");
return new Response(stream.toReadableStream(), {
headers: { "Content-Type": "text/event-stream" }
});
}
};API Reference
unillm() Builder Methods
| Method | Description | Example |
|--------|-------------|---------|
| model(id) | Set the model ID | model("groq:llama-3.3-70b-versatile") |
| credentials(creds) | Set API credentials | credentials({ groqApiKey: "..." }) |
| temperature(n) | Set temperature (0-1) | temperature(0.7) |
| maxTokens(n) | Set max tokens | maxTokens(1000) |
| topP(n) | Set top-p sampling | topP(0.9) |
| schema(zod) | Set output schema | schema(PersonSchema) |
| system(text) | Set system prompt | system("You are...") |
| messages(msgs) | Set message history | messages([...]) |
| generate(prompt) | Generate response | await generate("Hello") |
| stream(prompt) | Stream response | await stream("Hello") |
License
MIT
