@aid-on/unillm

v0.5.5

Published

8 days ago

Edge-native unified LLM provider - pure fetch API, minimal dependencies (zod), WebStreams, memory optimization for Cloudflare Workers and edge computing environments

Downloads

447

0High
0Medium
0Low

aid-on

llm edge-computing cloudflare-workers vercel-edge ai openai groq gemini webstreams typescript zod serverless

@aid-on/unillm

unillm is a unified LLM interface for edge computing. It provides a consistent, type-safe API across multiple LLM providers with minimal dependencies and optimized memory usage for edge environments.

日本語 | English

Features

🚀 Edge-First: ~50KB bundle size, ~10ms cold start, optimized for edge runtimes
🔄 Unified Interface: Single API for Anthropic, OpenAI, Groq, Gemini, Cloudflare, and more
🌊 Streaming Native: Built on Web Streams API with nagare integration
🎯 Type-Safe: Full TypeScript support with Zod schema validation
📦 Minimal Dependencies: Only Zod (~11KB) required
⚡ Memory Optimized: Automatic chunking and backpressure handling

Installation

npm install @aid-on/unillm

yarn add @aid-on/unillm

pnpm add @aid-on/unillm

Quick Start

import { unillm } from "@aid-on/unillm";

// Fluent API with type safety
const response = await unillm()
  .model("openai:gpt-4o-mini")
  .credentials({ openaiApiKey: process.env.OPENAI_API_KEY })
  .temperature(0.7)
  .generate("Explain quantum computing in simple terms");

console.log(response.text);

Streaming with nagare

unillm returns @aid-on/nagare Stream<T> for reactive stream processing:

import { unillm } from "@aid-on/unillm";
import type { Stream } from "@aid-on/nagare";

const stream: Stream<string> = await unillm()
  .model("groq:llama-3.3-70b-versatile")
  .credentials({ groqApiKey: "..." })
  .stream("Write a story about AI");

// Use nagare's reactive operators
const enhanced = stream
  .map(chunk => chunk.trim())
  .filter(chunk => chunk.length > 0)
  .throttle(16)  // ~60fps for UI updates
  .tap(chunk => console.log(chunk))
  .toSSE();      // Convert to Server-Sent Events

Structured Output

Generate type-safe structured data with Zod schemas:

import { z } from "zod";

const PersonSchema = z.object({
  name: z.string(),
  age: z.number(),
  skills: z.array(z.string())
});

const result = await unillm()
  .model("groq:llama-3.1-8b-instant")
  .credentials({ groqApiKey: "..." })
  .schema(PersonSchema)
  .generate("Generate a software engineer profile");

// Type-safe access
console.log(result.object.name);     // string
console.log(result.object.skills);   // string[]

Provider Shortcuts

Ultra-concise syntax for common models:

import { anthropic, openai, groq, gemini, cloudflare } from "@aid-on/unillm";

// One-liners for quick prototyping
await anthropic.sonnet("sk-ant-...").generate("Hello");
await openai.mini("sk-...").generate("Hello");
await groq.instant("gsk_...").generate("Hello");
await gemini.flash("AIza...").generate("Hello");
await cloudflare.llama({ accountId: "...", apiToken: "..." }).generate("Hello");

Supported Models (48 Models)

Anthropic (8 models) - v0.4.0

anthropic:claude-opus-4-5-20251101 - Claude Opus 4.5 (Most Intelligent)
anthropic:claude-haiku-4-5-20251001 - Claude Haiku 4.5 (Ultra Fast)
anthropic:claude-sonnet-4-5-20250929 - Claude Sonnet 4.5 (Best for Coding)
anthropic:claude-opus-4-1-20250805 - Claude Opus 4.1
anthropic:claude-opus-4-20250514 - Claude Opus 4
anthropic:claude-sonnet-4-20250514 - Claude Sonnet 4
anthropic:claude-3-5-haiku-20241022 - Claude 3.5 Haiku
anthropic:claude-3-haiku-20240307 - Claude 3 Haiku

OpenAI (9 models)

openai:gpt-4o - GPT-4o (Latest, fastest GPT-4)
openai:gpt-4o-mini - GPT-4o Mini (Cost-effective)
openai:gpt-4o-2024-11-20 - GPT-4o November snapshot
openai:gpt-4o-2024-08-06 - GPT-4o August snapshot
openai:gpt-4-turbo - GPT-4 Turbo (High capability)
openai:gpt-4-turbo-preview - GPT-4 Turbo Preview
openai:gpt-4 - GPT-4 (Original)
openai:gpt-3.5-turbo - GPT-3.5 Turbo (Fast & cheap)
openai:gpt-3.5-turbo-0125 - GPT-3.5 Turbo Latest

Groq (7 models)

groq:llama-3.3-70b-versatile - Llama 3.3 70B Versatile
groq:llama-3.1-8b-instant - Llama 3.1 8B Instant
groq:meta-llama/llama-guard-4-12b - Llama Guard 4 12B
groq:openai/gpt-oss-120b - GPT-OSS 120B
groq:openai/gpt-oss-20b - GPT-OSS 20B
groq:groq/compound - Groq Compound
groq:groq/compound-mini - Groq Compound Mini

Google Gemini (8 models)

gemini:gemini-3-pro-preview - Gemini 3 Pro Preview
gemini:gemini-3-flash-preview - Gemini 3 Flash Preview
gemini:gemini-2.5-pro - Gemini 2.5 Pro
gemini:gemini-2.5-flash - Gemini 2.5 Flash
gemini:gemini-2.0-flash - Gemini 2.0 Flash
gemini:gemini-2.0-flash-lite - Gemini 2.0 Flash Lite
gemini:gemini-1.5-pro-002 - Gemini 1.5 Pro 002
gemini:gemini-1.5-flash-002 - Gemini 1.5 Flash 002

Cloudflare Workers AI (13 models)

cloudflare:@cf/meta/llama-4-scout-17b-16e-instruct - Llama 4 Scout
cloudflare:@cf/meta/llama-3.3-70b-instruct-fp8-fast - Llama 3.3 70B FP8
cloudflare:@cf/meta/llama-3.1-70b-instruct - Llama 3.1 70B
cloudflare:@cf/meta/llama-3.1-8b-instruct-fast - Llama 3.1 8B Fast
cloudflare:@cf/meta/llama-3.1-8b-instruct - Llama 3.1 8B
cloudflare:@cf/openai/gpt-oss-120b - GPT-OSS 120B
cloudflare:@cf/openai/gpt-oss-20b - GPT-OSS 20B
cloudflare:@cf/ibm/granite-4.0-h-micro - IBM Granite 4.0
cloudflare:@cf/mistralai/mistral-small-3.1-24b-instruct - Mistral Small 3.1
cloudflare:@cf/mistralai/mistral-7b-instruct-v0.2 - Mistral 7B
cloudflare:@cf/google/gemma-3-12b-it - Gemma 3 12B
cloudflare:@cf/qwen/qwq-32b - QwQ 32B
cloudflare:@cf/qwen/qwen2.5-coder-32b-instruct - Qwen 2.5 Coder

Advanced Usage

Fluent Builder Pattern

const builder = unillm()
  .model("groq:llama-3.3-70b-versatile")
  .credentials({ groqApiKey: "..." })
  .temperature(0.7)
  .maxTokens(1000)
  .topP(0.9)
  .system("You are a helpful assistant")
  .messages([
    { role: "user", content: "Previous question..." },
    { role: "assistant", content: "Previous answer..." }
  ]);

// Reusable configuration
const response1 = await builder.generate("New question");
const response2 = await builder.stream("Another question");

Memory Optimization

Automatic memory management for edge environments:

import { createMemoryOptimizedStream } from "@aid-on/unillm";

const stream = await createMemoryOptimizedStream(
  largeResponse,
  { 
    maxMemory: 1024 * 1024,  // 1MB limit
    chunkSize: 512           // Optimal chunk size
  }
);

Error Handling

import { UnillmError, RateLimitError } from "@aid-on/unillm";

try {
  const response = await unillm()
    .model("groq:llama-3.3-70b-versatile")
    .credentials({ groqApiKey: "..." })
    .generate("Hello");
} catch (error) {
  if (error instanceof RateLimitError) {
    console.log(`Rate limited. Retry after ${error.retryAfter}ms`);
  } else if (error instanceof UnillmError) {
    console.log(`LLM error: ${error.message}`);
  }
}

Integration Examples

With React

import { useState } from "react";
import { unillm } from "@aid-on/unillm";

export default function ChatComponent() {
  const [response, setResponse] = useState("");
  const [loading, setLoading] = useState(false);
  
  const handleGenerate = async () => {
    setLoading(true);
    const stream = await unillm()
      .model("groq:llama-3.1-8b-instant")
      .credentials({ groqApiKey: import.meta.env.VITE_GROQ_API_KEY })
      .stream("Write a haiku");
    
    for await (const chunk of stream) {
      setResponse(prev => prev + chunk);
    }
    setLoading(false);
  };
  
  return (
    <div>
      <button onClick={handleGenerate} disabled={loading}>
        {loading ? "Generating..." : "Generate"}
      </button>
      <p>{response}</p>
    </div>
  );
}

With Cloudflare Workers

export default {
  async fetch(request: Request, env: Env) {
    const stream = await unillm()
      .model("cloudflare:@cf/meta/llama-3.1-8b-instruct")
      .credentials({
        accountId: env.CF_ACCOUNT_ID,
        apiToken: env.CF_API_TOKEN
      })
      .stream("Hello from the edge!");
    
    return new Response(stream.toReadableStream(), {
      headers: { "Content-Type": "text/event-stream" }
    });
  }
};

API Reference

unillm() Builder Methods

| Method | Description | Example | |--------|-------------|---------| | model(id) | Set the model ID | model("groq:llama-3.3-70b-versatile") | | credentials(creds) | Set API credentials | credentials({ groqApiKey: "..." }) | | temperature(n) | Set temperature (0-1) | temperature(0.7) | | maxTokens(n) | Set max tokens | maxTokens(1000) | | topP(n) | Set top-p sampling | topP(0.9) | | schema(zod) | Set output schema | schema(PersonSchema) | | system(text) | Set system prompt | system("You are...") | | messages(msgs) | Set message history | messages([...]) | | generate(prompt) | Generate response | await generate("Hello") | | stream(prompt) | Stream response | await stream("Hello") |

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@aid-on/unillm

Features

Installation

Quick Start

Streaming with nagare

Structured Output

Provider Shortcuts

Supported Models (48 Models)

Anthropic (8 models) - v0.4.0

OpenAI (9 models)

Groq (7 models)

Google Gemini (8 models)

Cloudflare Workers AI (13 models)

Advanced Usage

Fluent Builder Pattern

Memory Optimization

Error Handling

Integration Examples

With React

With Cloudflare Workers

API Reference

unillm() Builder Methods

License