@dukanify/omnillm

v1.0.21

Published

5 months ago

A TypeScript SDK for interacting with Ollama Proxy, supporting multiple LLM providers with fallback capabilities

0High
0Medium
0Low

dukanify

ollama openai anthropic llm sdk unified proxy

OmniLLM SDK

A unified SDK for accessing multiple LLM providers (Ollama, OpenAI, Anthropic) through a single interface.

Installation

npm install @dukanify/omnillm

Usage

import { OmniLLM } from '@dukanify/omnillm';

// Initialize with configuration (baseURL is required)
const llm = new OmniLLM({
  baseURL: 'http://localhost:3000',     // Base URL for the proxy (without version) - REQUIRED
  apiVersion: 'v1',                     // API version (defaults to 'v1')
  apiKey: 'ollama',                     // Default API key
  defaultModel: 'llama3.2:3b',          // Default model to use
  provider: 'ollama',                   // Default provider
  providerApiKey: 'your-api-key',       // Provider-specific API key
  fallback: {
    provider: 'openai',
    apiKey: process.env.OPENAI_API_KEY,
    model: 'gpt-3.5-turbo',
    embeddingModel: 'text-embedding-ada-002'
  },
  maxRetries: 3                         // Maximum number of retries
});

// For production:
const llm = new OmniLLM({
  baseURL: 'https://your-ollama-proxy.com',  // Your deployed proxy URL
  defaultModel: 'llama3.2:3b'
});

// Chat completion const response = await llm.chat.completions.create({ messages: [ { role: 'user', content: 'Hello!' } ], model: 'llama2', temperature: 0.7, stream: false, tools: [], // Optional tools configuration tool_choice: 'auto', // Optional tool choice provider: 'ollama', // Optional provider override providerApiKey: 'your-api-key' // Optional provider API key override });

// Text completion (generate) const completion = await llm.chat.generate({ prompt: 'Write a short story about a robot:', model: 'llama2', temperature: 0.7, max_tokens: 100, stream: false, provider: 'ollama', // Optional provider override providerApiKey: 'your-api-key' // Optional provider API key override });

// Text completion with images (vision models) const completionWithImages = await llm.chat.generate({ prompt: 'Describe what you see in this image:', model: 'llama3.2-vision:11b', images: ['base64string_without_prefix'], // Array of base64 strings without data:image/...;base64, prefix temperature: 0.7, max_tokens: 200 });

// Note: Total image payload size is limited to 50MB // Base64 encoding increases size by ~33%, so keep images reasonable

// Helper functions for working with images import { imageToBase64, prepareImagesForVision } from '@dukanify/omnillm';

// Convert Buffer to base64 const imageBuffer = fs.readFileSync('./image.jpg'); const base64Image = imageToBase64(imageBuffer);

// Or prepare multiple images with validation const images = prepareImagesForVision([ imageBuffer, 'existing_base64_string', anotherBuffer ]);

// Embeddings const embeddings = await llm.embeddings.create({ input: 'Hello, world!', model: 'llama2', encoding_format: 'float', // or 'base64' provider: 'ollama', // Optional provider override providerApiKey: 'your-api-key' // Optional provider API key override });

// List available models const models = await llm.listModels();

// Check health status const health = await llm.healthCheck();

// Cleanup when done (optional - health checks are automatically cleaned up when no instances remain) llm.destroy();


## Features

- Unified interface for multiple LLM providers (Ollama, OpenAI, Anthropic)
- Automatic fallback to OpenAI when the proxy is unavailable
- Support for chat completions, text completions (generate), and embeddings
- Automatic health checks for the proxy server
- Configurable retry mechanism
- TypeScript support
- Environment variable configuration support

## Configuration

The SDK accepts the following configuration options:

```typescript
interface OmniLLMConfig {
  baseURL: string;            // Base URL for the proxy - REQUIRED
  apiVersion?: string;        // API version (default: 'v1')
  apiKey?: string;            // API key for the proxy (default: 'ollama')
  defaultModel?: string;      // Default model to use (default: 'llama3.2:3b')
  provider?: Provider;        // Default provider ('ollama' | 'openai' | 'anthropic')
  providerApiKey?: string;    // Provider-specific API key
  fallback?: {
    provider: 'openai';
    apiKey: string;
    model?: string;
    embeddingModel?: string;  // Optional embedding model for fallback
  };
  maxRetries?: number;        // Maximum number of retries (default: 3)
}

Configuration Best Practices

For Development:

// For local development
const llm = new OmniLLM({
  baseURL: 'http://localhost:3000'
});

For Production:

// Always use a proper domain for production deployments
const llm = new OmniLLM({
  baseURL: 'https://your-ollama-proxy.com',
  defaultModel: 'llama3.2:3b'
});

With Fallback:

// Configure fallback to OpenAI when Ollama is unavailable
const llm = new OmniLLM({
  baseURL: 'https://your-ollama-proxy.com',
  fallback: {
    provider: 'openai',
    apiKey: process.env.OPENAI_API_KEY,
    model: 'gpt-3.5-turbo'
  }
});

Environment Variables

The SDK requires explicit configuration through the constructor. All configuration should be passed explicitly for better clarity and control.

If you need to use environment variables, you can pass them explicitly:

const llm = new OmniLLM({
  baseURL: process.env.OLLAMA_PROXY_URL,  // Required
  apiKey: process.env.OLLAMA_PROXY_API_KEY || 'ollama',
  fallback: {
    provider: 'openai',
    apiKey: process.env.OPENAI_API_KEY
  }
});

Fallback Behavior

When the proxy server is unavailable, the SDK will automatically fall back to the configured provider (OpenAI or Anthropic). This is useful for:

Development environments where the proxy might not be running
Production environments where you want a backup provider
Testing different providers without changing code

Health Checks

The SDK performs automatic health checks on the proxy server every 30 seconds. Health checks are shared per baseURL - if you create multiple OmniLLM instances with the same baseURL, they will share the same health check interval to prevent resource waste.

You can also manually check the health status using the healthCheck() method, which returns:

{
  status: 'healthy' | 'unhealthy',
  details: {
    // ... proxy health details
  }
}

Resource Management

Health Check Lifecycle

The SDK uses a singleton pattern for health checks per baseURL:

Shared Health Checks: Multiple instances with the same baseURL share one health check interval
Automatic Cleanup: Health checks are automatically stopped when no instances remain for a baseURL
Manual Cleanup: You can call destroy() on an instance to unsubscribe from health updates

// Multiple instances share the same health check
const client1 = new OmniLLM({ baseURL: 'http://localhost:3000' });
const client2 = new OmniLLM({ baseURL: 'http://localhost:3000' });
// Only one health check runs for localhost:3000

// Manual cleanup (optional)
client1.destroy();
client2.destroy();
// Health check stops when both instances are destroyed

Memory Management

Health check intervals are automatically cleaned up when no instances remain
No memory leaks from multiple instances
Each baseURL maintains its own health state

API Types

Chat Completions vs Text Completions

Chat Completions (llm.chat.completions.create):

Uses conversation-style messages with roles (user, assistant, system)
Better for multi-turn conversations
Supports tool calling and function calling
Example: Customer service chatbots, conversation agents

Text Completions (llm.chat.generate):

Uses simple text prompts
Better for single-turn text generation
More suitable for creative writing, code generation, etc.
Example: Story writing, code completion, text summarization

Examples

Check the examples directory for complete usage examples:

full-demo.ts: Complete demonstration of all features
fallback-test.ts: Testing fallback functionality

License

MIT