@dukanify/omnillm
v1.0.21
Published
A TypeScript SDK for interacting with Ollama Proxy, supporting multiple LLM providers with fallback capabilities
Maintainers
Readme
OmniLLM SDK
A unified SDK for accessing multiple LLM providers (Ollama, OpenAI, Anthropic) through a single interface.
Installation
npm install @dukanify/omnillmUsage
import { OmniLLM } from '@dukanify/omnillm';
// Initialize with configuration (baseURL is required)
const llm = new OmniLLM({
baseURL: 'http://localhost:3000', // Base URL for the proxy (without version) - REQUIRED
apiVersion: 'v1', // API version (defaults to 'v1')
apiKey: 'ollama', // Default API key
defaultModel: 'llama3.2:3b', // Default model to use
provider: 'ollama', // Default provider
providerApiKey: 'your-api-key', // Provider-specific API key
fallback: {
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-3.5-turbo',
embeddingModel: 'text-embedding-ada-002'
},
maxRetries: 3 // Maximum number of retries
});
// For production:
const llm = new OmniLLM({
baseURL: 'https://your-ollama-proxy.com', // Your deployed proxy URL
defaultModel: 'llama3.2:3b'
});// Chat completion const response = await llm.chat.completions.create({ messages: [ { role: 'user', content: 'Hello!' } ], model: 'llama2', temperature: 0.7, stream: false, tools: [], // Optional tools configuration tool_choice: 'auto', // Optional tool choice provider: 'ollama', // Optional provider override providerApiKey: 'your-api-key' // Optional provider API key override });
// Text completion (generate) const completion = await llm.chat.generate({ prompt: 'Write a short story about a robot:', model: 'llama2', temperature: 0.7, max_tokens: 100, stream: false, provider: 'ollama', // Optional provider override providerApiKey: 'your-api-key' // Optional provider API key override });
// Text completion with images (vision models) const completionWithImages = await llm.chat.generate({ prompt: 'Describe what you see in this image:', model: 'llama3.2-vision:11b', images: ['base64string_without_prefix'], // Array of base64 strings without data:image/...;base64, prefix temperature: 0.7, max_tokens: 200 });
// Note: Total image payload size is limited to 50MB // Base64 encoding increases size by ~33%, so keep images reasonable
// Helper functions for working with images import { imageToBase64, prepareImagesForVision } from '@dukanify/omnillm';
// Convert Buffer to base64 const imageBuffer = fs.readFileSync('./image.jpg'); const base64Image = imageToBase64(imageBuffer);
// Or prepare multiple images with validation const images = prepareImagesForVision([ imageBuffer, 'existing_base64_string', anotherBuffer ]);
// Embeddings const embeddings = await llm.embeddings.create({ input: 'Hello, world!', model: 'llama2', encoding_format: 'float', // or 'base64' provider: 'ollama', // Optional provider override providerApiKey: 'your-api-key' // Optional provider API key override });
// List available models const models = await llm.listModels();
// Check health status const health = await llm.healthCheck();
// Cleanup when done (optional - health checks are automatically cleaned up when no instances remain) llm.destroy();
## Features
- Unified interface for multiple LLM providers (Ollama, OpenAI, Anthropic)
- Automatic fallback to OpenAI when the proxy is unavailable
- Support for chat completions, text completions (generate), and embeddings
- Automatic health checks for the proxy server
- Configurable retry mechanism
- TypeScript support
- Environment variable configuration support
## Configuration
The SDK accepts the following configuration options:
```typescript
interface OmniLLMConfig {
baseURL: string; // Base URL for the proxy - REQUIRED
apiVersion?: string; // API version (default: 'v1')
apiKey?: string; // API key for the proxy (default: 'ollama')
defaultModel?: string; // Default model to use (default: 'llama3.2:3b')
provider?: Provider; // Default provider ('ollama' | 'openai' | 'anthropic')
providerApiKey?: string; // Provider-specific API key
fallback?: {
provider: 'openai';
apiKey: string;
model?: string;
embeddingModel?: string; // Optional embedding model for fallback
};
maxRetries?: number; // Maximum number of retries (default: 3)
}Configuration Best Practices
For Development:
// For local development
const llm = new OmniLLM({
baseURL: 'http://localhost:3000'
});For Production:
// Always use a proper domain for production deployments
const llm = new OmniLLM({
baseURL: 'https://your-ollama-proxy.com',
defaultModel: 'llama3.2:3b'
});With Fallback:
// Configure fallback to OpenAI when Ollama is unavailable
const llm = new OmniLLM({
baseURL: 'https://your-ollama-proxy.com',
fallback: {
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-3.5-turbo'
}
});Environment Variables
The SDK requires explicit configuration through the constructor. All configuration should be passed explicitly for better clarity and control.
If you need to use environment variables, you can pass them explicitly:
const llm = new OmniLLM({
baseURL: process.env.OLLAMA_PROXY_URL, // Required
apiKey: process.env.OLLAMA_PROXY_API_KEY || 'ollama',
fallback: {
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY
}
});Fallback Behavior
When the proxy server is unavailable, the SDK will automatically fall back to the configured provider (OpenAI or Anthropic). This is useful for:
- Development environments where the proxy might not be running
- Production environments where you want a backup provider
- Testing different providers without changing code
Health Checks
The SDK performs automatic health checks on the proxy server every 30 seconds. Health checks are shared per baseURL - if you create multiple OmniLLM instances with the same baseURL, they will share the same health check interval to prevent resource waste.
You can also manually check the health status using the healthCheck() method, which returns:
{
status: 'healthy' | 'unhealthy',
details: {
// ... proxy health details
}
}Resource Management
Health Check Lifecycle
The SDK uses a singleton pattern for health checks per baseURL:
- Shared Health Checks: Multiple instances with the same
baseURLshare one health check interval - Automatic Cleanup: Health checks are automatically stopped when no instances remain for a
baseURL - Manual Cleanup: You can call
destroy()on an instance to unsubscribe from health updates
// Multiple instances share the same health check
const client1 = new OmniLLM({ baseURL: 'http://localhost:3000' });
const client2 = new OmniLLM({ baseURL: 'http://localhost:3000' });
// Only one health check runs for localhost:3000
// Manual cleanup (optional)
client1.destroy();
client2.destroy();
// Health check stops when both instances are destroyedMemory Management
- Health check intervals are automatically cleaned up when no instances remain
- No memory leaks from multiple instances
- Each
baseURLmaintains its own health state
API Types
Chat Completions vs Text Completions
Chat Completions (llm.chat.completions.create):
- Uses conversation-style messages with roles (user, assistant, system)
- Better for multi-turn conversations
- Supports tool calling and function calling
- Example: Customer service chatbots, conversation agents
Text Completions (llm.chat.generate):
- Uses simple text prompts
- Better for single-turn text generation
- More suitable for creative writing, code generation, etc.
- Example: Story writing, code completion, text summarization
Examples
Check the examples directory for complete usage examples:
full-demo.ts: Complete demonstration of all featuresfallback-test.ts: Testing fallback functionality
License
MIT
