npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

universal-llm-client

v4.0.0

Published

A universal LLM client with transparent provider failover, streaming tool execution, pluggable reasoning, and native observability.

Downloads

325

Readme

universal-llm-client

A universal LLM client for JavaScript/TypeScript with transparent provider failover, streaming tool execution, pluggable reasoning strategies, and native observability.

import { AIModel } from 'universal-llm-client';

const model = new AIModel({
    model: 'gemini-2.5-flash',
    providers: [
        { type: 'google', apiKey: process.env.GOOGLE_API_KEY },
        { type: 'openai', url: 'https://openrouter.ai/api', apiKey: process.env.OPENROUTER_KEY },
        { type: 'ollama' },
    ],
});

const response = await model.chat([
    { role: 'user', content: 'Hello!' },
]);

One model, multiple backends. If Google fails, it transparently fails over to OpenRouter, then to local Ollama. Your code never knows the difference.


Features

  • 🔄 Transparent Failover — Priority-ordered provider chain with retries, health tracking, and cooldowns
  • 🛠️ Tool Calling — Register tools once, works across all providers. Autonomous multi-turn execution loop
  • 📋 Structured Output — Zod schema validation, JSON Schema support, streaming, and type-safe responses
  • 🌊 Streaming — First-class async generator streaming with pluggable decoder strategies
  • 🧠 Reasoning — Native <think> tag parsing, interleaved reasoning, and model thinking support
  • 🔍 Observability — Built-in auditor interface for logging, cost tracking, and behavioral analysis
  • 🌐 Universal Runtime — Node.js 22+, Bun, Deno, and modern browsers
  • 🤖 MCP Native — Bridge MCP servers to LLM tools with zero glue code
  • 📊 Embeddings — Single and batch embedding generation

Supported Providers

| Provider | Type | Notes | |---|---|---| | Ollama | ollama | Local or cloud models, NDJSON streaming, model pulling, vision/multimodal | | OpenAI | openai | GPT-4o, o3, etc. Also works with OpenRouter, Groq, LM Studio, vLLM | | Google AI Studio | google | Gemini models, system instructions, multimodal | | Vertex AI | vertex | Same as Google AI but with regional endpoints and Bearer tokens | | LlamaCpp | llamacpp | Local llama.cpp / llama-server instances |


Installation

bun add universal-llm-client
# or
npm install universal-llm-client

Optional: For MCP integration:

bun add @modelcontextprotocol/sdk

Quick Start

Basic Chat

import { AIModel } from 'universal-llm-client';

const model = new AIModel({
    model: 'qwen3:4b',
    providers: [{ type: 'ollama' }],
});

const response = await model.chat([
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' },
]);

console.log(response.message.content);
// "The capital of France is Paris."

Streaming

for await (const event of model.chatStream([
    { role: 'user', content: 'Write a haiku about code.' },
])) {
    if (event.type === 'text') {
        process.stdout.write(event.content);
    } else if (event.type === 'thinking') {
        // Model reasoning (when supported)
        console.log('[thinking]', event.content);
    }
}

Tool Calling

model.registerTool(
    'get_weather',
    'Get current weather for a location',
    {
        type: 'object',
        properties: {
            city: { type: 'string', description: 'City name' },
        },
        required: ['city'],
    },
    async (args) => {
        const { city } = args as { city: string };
        return { temperature: 22, condition: 'sunny', city };
    },
);

// Autonomous tool execution — the model calls tools and loops until done
const response = await model.chatWithTools([
    { role: 'user', content: "What's the weather in Tokyo?" },
]);

console.log(response.message.content);
// "The weather in Tokyo is 22°C and sunny."
console.log(response.toolTrace);
// [{ name: 'get_weather', args: { city: 'Tokyo' }, result: {...}, duration: 5 }]

Provider Failover

const model = new AIModel({
    model: 'gemini-2.5-flash',
    retries: 2,        // retries per provider before failover
    timeout: 30000,    // request timeout in ms
    providers: [
        { type: 'google', apiKey: process.env.GOOGLE_KEY, priority: 0 },
        { type: 'openai', url: 'https://openrouter.ai/api', apiKey: process.env.OPENROUTER_KEY, priority: 1 },
        { type: 'ollama', url: 'http://localhost:11434', priority: 2 },
    ],
});

// If Google returns 500, retries twice, then seamlessly tries OpenRouter.
// If OpenRouter also fails, falls back to local Ollama.
// Your code sees a single response.
const response = await model.chat([{ role: 'user', content: 'Hello' }]);

// Check provider health at any time
console.log(model.getProviderStatus());
// [{ id: 'google-0', healthy: true }, { id: 'openai-1', healthy: true }, ...]

Multimodal (Vision)

import { AIModel, multimodalMessage } from 'universal-llm-client';

const model = new AIModel({
    model: 'gemini-2.5-flash',
    providers: [{ type: 'google', apiKey: process.env.GOOGLE_KEY }],
});

const response = await model.chat([
    multimodalMessage('What do you see in this image?', [
        'https://example.com/photo.jpg',
    ]),
]);

Embeddings

const embedModel = new AIModel({
    model: 'nomic-embed-text-v2-moe:latest',
    providers: [{ type: 'ollama' }],
});

const vector = await embedModel.embed('Hello world');
// [0.006, 0.026, -0.009, ...]

const vectors = await embedModel.embedArray(['Hello', 'World']);
// [[0.006, ...], [0.012, ...]]

Structured Output

Get typed, validated JSON responses from any LLM using Zod schemas:

import { AIModel } from 'universal-llm-client';
import { z } from 'zod';

const model = new AIModel({
    model: 'gemini-2.5-flash',
    providers: [
        { type: 'google', apiKey: process.env.GOOGLE_API_KEY },
        { type: 'ollama' },
    ],
});

// Define your schema
const UserSchema = z.object({
    name: z.string(),
    age: z.number(),
    email: z.string().email(),
    interests: z.array(z.string()),
});

// Method 1: generateStructured (throws on validation failure)
const user = await model.generateStructured(UserSchema, [
    { role: 'user', content: 'Generate a user profile for a software developer' },
]);

console.log(user.name);     // TypeScript knows this is string
console.log(user.age);      // TypeScript knows this is number
console.log(user.email);    // TypeScript knows this is string
console.log(user.interests); // TypeScript knows this is string[]

Non-throwing variant:

// Method 2: tryParseStructured (returns result object, never throws)
const result = await model.tryParseStructured(UserSchema, messages);

if (result.ok) {
    console.log('User:', result.value.name);
} else {
    console.log('Error:', result.error.message);
    console.log('Raw LLM output:', result.rawOutput);
}

Via chat options:

// Method 3: chat with output parameter
const response = await model.chat(messages, {
    output: { schema: UserSchema },
});

// response.structured is typed as { name: string, age: number, ... }
if (response.structured) {
    console.log(response.structured.name);
}

Streaming structured output:

// Stream partial validated objects as JSON generates
for await (const partial of model.generateStructuredStream(UserSchema, messages)) {
    console.log('Partial:', partial);
    // Partial: { name: 'Alice' }
    // Partial: { name: 'Alice', age: 30 }
    // Partial: { name: 'Alice', age: 30, email: '[email protected]' }
}

Raw JSON Schema (without Zod):

const response = await model.chat(messages, {
    jsonSchema: {
        type: 'object',
        properties: {
            name: { type: 'string' },
            age: { type: 'number' },
        },
        required: ['name', 'age'],
    },
    name: 'Person',  // Optional, used for LLM guidance
});

Separate module import (tree-shaking):

// Import only structured output types if you don't need the full client
import {
    StructuredOutputError,
    type StructuredOutputResult,
    type StructuredOutputOptions,
    parseStructured,
    tryParseStructured,
    zodToJsonSchema,
} from 'universal-llm-client/structured-output';

Vision with structured output:

const ImageAnalysisSchema = z.object({
    objects: z.array(z.string()),
    scene: z.string(),
    mood: z.string(),
});

const response = await model.generateStructured(ImageAnalysisSchema, [
    multimodalMessage('Analyze this image', ['https://example.com/photo.jpg']),
]);

Provider compatibility:

| Provider | Method | Notes | |----------|--------|-------| | OpenAI | response_format.json_schema | Strict mode enabled | | Ollama | format: { schema } | Model must support grammar | | Google | responseMimeType + responseSchema | Some features stripped |

Observability

import { AIModel, ConsoleAuditor, BufferedAuditor } from 'universal-llm-client';

// Simple console logging
const model = new AIModel({
    model: 'qwen3:4b',
    providers: [{ type: 'ollama' }],
    auditor: new ConsoleAuditor('[LLM]'),
});
// [LLM] REQUEST [ollama] (qwen3:4b) →
// [LLM] RESPONSE [ollama] (qwen3:4b) 1200ms 68 tokens

// Buffered for custom sinks (OpenTelemetry, DB, etc.)
const auditor = new BufferedAuditor({
    maxBufferSize: 100,
    onFlush: async (events) => {
        await sendToOpenTelemetry(events);
    },
});

MCP Integration

import { AIModel, MCPToolBridge } from 'universal-llm-client';

const model = new AIModel({
    model: 'qwen3:4b',
    providers: [{ type: 'ollama' }],
});

const mcp = new MCPToolBridge({
    servers: {
        filesystem: {
            command: 'npx',
            args: ['-y', '@modelcontextprotocol/server-filesystem', './'],
        },
        weather: {
            url: 'https://mcp.example.com/weather',
        },
    },
});

await mcp.connect();
await mcp.registerTools(model);

// MCP tools are now callable via chatWithTools
const response = await model.chatWithTools([
    { role: 'user', content: 'List files in the current directory' },
]);

await mcp.disconnect();

Stream Decoders

import { AIModel, createDecoder } from 'universal-llm-client';

// Passthrough — raw text, no parsing
// Standard Chat — text + native reasoning + tool calls
// Interleaved Reasoning — parses <think> and <progress> tags from text streams

const decoder = createDecoder('interleaved-reasoning', (event) => {
    switch (event.type) {
        case 'text': console.log(event.content); break;
        case 'thinking': console.log('[think]', event.content); break;
        case 'progress': console.log('[progress]', event.content); break;
        case 'tool_call': console.log('[tool]', event.calls); break;
    }
});

decoder.push('<think>Let me analyze this</think>The answer is 42');
decoder.flush();

console.log(decoder.getCleanContent());  // "The answer is 42"
console.log(decoder.getReasoning());      // "Let me analyze this"

API Reference

AIModel

The universal client. One class, multiple backends.

new AIModel(config: AIModelConfig)

Config:

| Property | Type | Default | Description | |---|---|---|---| | model | string | — | Model name (e.g., 'gemini-2.5-flash') | | providers | ProviderConfig[] | — | Ordered list of provider backends | | retries | number | 2 | Retries per provider before failover | | timeout | number | 30000 | Request timeout in ms | | auditor | Auditor | NoopAuditor | Observability sink | | thinking | boolean | false | Enable model thinking/reasoning | | debug | boolean | false | Debug logging | | defaultParameters | object | — | Default parameters for all requests |

Provider Config:

| Property | Type | Description | |---|---|---| | type | string | 'ollama', 'openai', 'google', 'vertex', 'llamacpp' | | url | string | Provider URL (has sensible defaults) | | apiKey | string | API key or Bearer token | | priority | number | Lower = tried first (defaults to array index) | | model | string | Override model name for this provider | | region | string | Vertex AI region (e.g., 'us-central1') | | apiVersion | string | API version (e.g., 'v1beta') |

Methods:

| Method | Returns | Description | |---|---|---| | chat(messages, options?) | Promise<LLMChatResponse> | Send chat request | | chatWithTools(messages, options?) | Promise<LLMChatResponse> | Chat with autonomous tool execution | | chatStream(messages, options?) | AsyncGenerator<DecodedEvent> | Stream chat response | | generateStructured(schema, messages, options?) | Promise<T> | Generate typed JSON validated against Zod schema | | tryParseStructured(schema, messages, options?) | Promise<StructuredOutputResult<T>> | Non-throwing variant returning result object | | generateStructuredStream(schema, messages, options?) | AsyncGenerator<T, T> | Stream partial validated objects as JSON generates | | embed(text) | Promise<number[]> | Generate single embedding | | embedArray(texts) | Promise<number[][]> | Generate batch embeddings | | registerTool(name, desc, params, handler) | void | Register a callable tool | | registerTools(tools) | void | Register multiple tools | | getModels() | Promise<string[]> | List available models | | getModelInfo() | Promise<ModelMetadata> | Get model metadata | | getProviderStatus() | ProviderStatus[] | Check provider health | | setModel(name) | void | Switch model at runtime | | dispose() | Promise<void> | Clean shutdown |

Structured Output

import { z } from 'zod';

// Define your schema
const UserSchema = z.object({
    name: z.string(),
    age: z.number(),
    email: z.string().email(),
});

// Generate typed JSON
const user = await model.generateStructured(UserSchema, messages);
// TypeScript infers: { name: string; age: number; email: string }

// Non-throwing variant
const result = await model.tryParseStructured(UserSchema, messages);
if (result.ok) {
    console.log(result.value.name);  // Fully typed
} else {
    console.log(result.error.message);
}

// Stream partial objects
for await (const partial of model.generateStructuredStream(UserSchema, messages)) {
    console.log(partial);  // Partial validated objects
}

Separate module import (tree-shaking):

import {
    StructuredOutputError,
    type StructuredOutputResult,
    parseStructured,
    tryParseStructured,
    zodToJsonSchema,
} from 'universal-llm-client/structured-output';

// Use without importing the full client
const schema = z.object({ name: z.string() });
const jsonSchema = zodToJsonSchema(schema);

ToolBuilder / ToolExecutor

import { ToolBuilder, ToolExecutor } from 'universal-llm-client';

// Fluent builder
const tool = new ToolBuilder('search')
    .description('Search the web')
    .addParameter('query', 'string', 'Search query', true)
    .addParameter('limit', 'number', 'Max results', false)
    .build();

// Execution wrappers
const safeHandler = ToolExecutor.compose(
    myHandler,
    h => ToolExecutor.withTimeout(h, 5000),
    h => ToolExecutor.safe(h),
    h => ToolExecutor.withValidation(h, ['query']),
);

Auditor Interface

Implement custom observability by providing an Auditor:

interface Auditor {
    record(event: AuditEvent): void;
    flush?(): Promise<void>;
}

Built-in implementations:

  • NoopAuditor — Zero overhead (default)
  • ConsoleAuditor — Structured console logging
  • BufferedAuditor — Collects events for custom sinks

Architecture

universal-llm-client
├── AIModel          ← Public API (the only class you import)
├── Router           ← Internal failover engine
├── BaseLLMClient    ← Abstract client with tool execution
├── Providers
│   ├── OllamaClient
│   ├── OpenAICompatibleClient  (OpenAI, OpenRouter, Groq, LM Studio, vLLM, LlamaCpp)
│   └── GoogleClient            (AI Studio + Vertex AI)
├── StreamDecoder    ← Pluggable reasoning strategies
├── Auditor          ← Observability interface
├── MCPToolBridge    ← MCP server integration
└── HTTP Utilities   ← Universal fetch-based transport

Design Principles

  1. Single importAIModel is the only class users need
  2. Provider agnostic — Same code works with any backend
  3. Transparent failover — Health tracking and cooldowns happen behind the scenes
  4. Zero dependencies — Core library depends only on native fetch
  5. Agent-ready — Stateless, composable instances designed as foundation for agent frameworks
  6. Observable — Every request, response, tool call, retry, and failover is auditable

Runtime Support

| Runtime | Version | Status | |---|---|---| | Node.js | 22+ | ✅ Full support | | Bun | 1.0+ | ✅ Full support | | Deno | 2.0+ | ✅ Full support | | Browsers | Modern | ✅ No stdio MCP, HTTP transport only |


For Agent Framework Authors

AIModel is designed as the transport layer for agentic systems:

  • Stateless — No conversation history stored. Your framework manages memory
  • Composable — Create separate instances for chat, embeddings, vision
  • Tool tracingchatWithTools() returns full execution trace
  • Context budgetgetModelInfo() exposes contextLength
  • Auditor as system bus — Inject custom sinks for cost tracking, behavioral scoring
  • StreamDecoder as UI bridge — Select decoder strategy per-call

License

MIT