@tamasha/litellm-wrapper

v1.0.0

Published

2 months ago

Production-grade unified multimodal LLM client for the Tamasha LiteLLM proxy. Supports GPT and Gemini families through a single API surface with adapter, transformer, and strategy patterns.

0High
0Medium
0Low

sde2-codes

nomercy0-aashish

litellm llm openai gpt gemini google-ai multimodal vision voice realtime tamasha

@tamasha-packages/litellm-wrapper

Production-grade unified multimodal LLM client for the Tamasha LiteLLM proxy. One API surface, multiple model families (gpt, gemini), full TypeScript types, ESM + CJS dual-published.

Features

No-modelFamily API: pass just the model id; the package looks up the family in its built-in catalog.
Adapter / Transformer / Strategy patterns keep family-specific concerns out of the public API.
Realtime WebSocket client for streaming/voice — single RealtimeSession API on top of GPT Realtime and Gemini Live.
Strict TypeScript (strict: true, noUncheckedIndexedAccess), Zod-validated config, structured pino logging, p-retry with exponential backoff.
Pluggable: register custom families via registerFamily(...) and custom models via registerModel(...).
Tested: Vitest unit + integration + e2e suites with v8 coverage thresholds.
Node 20+ ESM-first, CJS supplied via dual build.

Install

npm install @tamasha-packages/litellm-wrapper
# Optional: enables GPT video frame extraction
npm install ffmpeg-static

Quick start

import createClient, { TextTask } from '@tamasha-packages/litellm-wrapper';

const client = createClient({
  model: 'gpt-4o',                       // family inferred from model id
  litellmUrl: process.env.LITELLM_URL!,  // http(s), no trailing slash, no query/fragment
  apiKey: process.env.LITELLM_KEY!,      // ≥8 chars, no whitespace
});

const summary = await client.analyzeText(
  'Tamasha is a hyper-local audio social network.',
  TextTask.Summarize,
);

const vision = await client.analyzeImage(
  'https://signed.example.com/image.png',
  'What is in this image?',
);

const generated = await client.generateImage('A cinematic Mumbai skyline at dusk', {
  size: '1024x1024',
});

chat()

import createClient from '@tamasha-packages/litellm-wrapper';

const client = createClient({
  model: 'gpt-4o',
  litellmUrl: process.env.LITELLM_URL!,
  apiKey: process.env.LITELLM_KEY!,
});

const result = await client.chat({
  systemPrompt: 'You are a translator.',
  prompt: 'Translate "hello" to French.',
  outputFormat: {
    type: 'json_schema',
    name: 'translation',
    strict: true,
    schema: {
      type: 'object',
      properties: { fr: { type: 'string' } },
      required: ['fr'],
      additionalProperties: false,
    },
  },
});

const { fr } = JSON.parse(result.content);

input accepts tagged content ({kind:'image', value, mimeType?}), outputModalities: ['image','text'] requests image output (e.g. with gemini-2.5-flash-image).

createEphemeralSession()

import createClient from '@tamasha-packages/litellm-wrapper';

const client = createClient({
  model: 'gpt-4o-realtime-preview',
  litellmUrl: process.env.LITELLM_URL!,
  apiKey: process.env.LITELLM_KEY!,
});

const { id, ephemeralKey } = await client.createEphemeralSession();
// Send `ephemeralKey` to your client app; the client opens its own WebSocket.

Distinct from createRealtimeClient() which opens a server-side WebSocket. Use createEphemeralSession() when the client app handles the WebSocket directly.

Built-in models

The wrapper uses actual model identifiers (not LiteLLM model_name aliases) so that family is obvious from inspection. Configure your LiteLLM proxy with model_name entries that match these identifiers exactly.

| Family | Models | | --- | --- | | GPT (gpt) | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, gpt-5-nano, gpt-4o-realtime-preview, gpt-4o-mini-realtime-preview, whisper-1, dall-e-3, dall-e-2, gpt-image-1 | | Gemini (gemini) | gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite, gemini-2.0-flash, gemini-2.0-flash-lite, gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash-live-001, gemini-3.1-flash-live-preview, gemini-2.5-flash-image, gemini-2.5-flash-image-preview |

Get the live list at runtime:

import { listSupportedModels, listModelsByFamily, ModelFamily } from '@tamasha-packages/litellm-wrapper';

listSupportedModels();                            // → ['dall-e-2', 'dall-e-3', ...]
listModelsByFamily(ModelFamily.GEMINI);           // → ['gemini-1.5-flash', ...]

Adding a new model

Two ways.

1. Edit the built-in list

src/core/model-catalog.ts is the source of truth. Append the new identifier to the right family array:

export const BUILTIN_GPT_MODELS = [
  'gpt-4o',
  'gpt-4o-mini',
  // ...
  'gpt-5',          // ← add here
] as const;

The catalog and family resolution pick it up automatically.

2. Register at runtime (no fork needed)

import { registerModel, ModelFamily } from '@tamasha-packages/litellm-wrapper';

registerModel('o3-mini', ModelFamily.GPT);
registerModel('gemini-3-pro', ModelFamily.GEMINI);

const client = createClient({ model: 'o3-mini', litellmUrl, apiKey });

For a brand-new family altogether, register a provider too:

import { registerFamily, registerModel } from '@tamasha-packages/litellm-wrapper';

registerFamily('claude', (deps) => new ClaudeFamilyProvider(deps));
registerModel('claude-opus-4-7', 'claude');

Realtime / voice

import { createRealtimeClient, RealtimeEventType } from '@tamasha-packages/litellm-wrapper';

const realtime = createRealtimeClient({
  model: 'gpt-4o-realtime-preview',     // family inferred from model id
  litellmUrl: process.env.LITELLM_URL!,
  apiKey: process.env.LITELLM_KEY!,
  voiceMode: true,
});

const session = await realtime.connect({ reconnect: true });
session.on(RealtimeEventType.TextDelta, (chunk) => process.stdout.write(String(chunk)));
session.on(RealtimeEventType.AudioBuffer, (frame) => playPcm(frame));

session.send({ type: RealtimeEventType.TextDelta, data: 'Tell me a 30s bedtime story.' });

Public API

`createClient(config, options?)` (default export)

Returns a TamashaLitellmClient. Equivalent to new TamashaLitellmClient(config, options).

`TamashaLitellmClient`

| Method | Signature | | --- | --- | | analyzeVideo(input, prompt, options?) | Promise<UnifiedResponse> | | analyzeAudio(input, prompt, options?) | Promise<UnifiedResponse> | | analyzeText(text, task, options?) | Promise<UnifiedResponse> | | analyzeImage(input, prompt, options?) | Promise<UnifiedResponse> | | generateImage(prompt, options?) | Promise<UnifiedResponse> | | getConfig() | Readonly<ResolvedClientConfig> (includes derived modelFamily) | | getAdapter() | active IFamilyAdapter |

`TamashaRealtimeClient`

| Method | Signature | | --- | --- | | connect({ reconnect? }) | Promise<RealtimeSession> | | getConfig() | Readonly<ResolvedRealtimeConfig> |

`RealtimeSession`

send(event), on(type, fn) → unsubscribe, off, close.

Errors

All errors extend LitellmWrapperError and carry a stable code:

| Class | Code | | --- | --- | | ConfigurationError | CONFIGURATION_ERROR | | UnsupportedFamilyError | UNSUPPORTED_FAMILY | | UnsupportedModelError | UNSUPPORTED_MODEL | | ValidationError | VALIDATION_ERROR | | ProxyRequestError | PROXY_REQUEST_ERROR | | TimeoutError | TIMEOUT_ERROR | | RealtimeError | REALTIME_ERROR | | TransformError | TRANSFORM_ERROR |

Configuration reference

interface ClientConfig {
  model: string;          // looked up in catalog → derives family
  litellmUrl: string;     // http(s), no trailing slash, no query/fragment
  apiKey: string;         // ≥8 printable-ASCII chars, no whitespace
  maxRetries?: number;    // 0..10, default 3
  timeout?: number;       // 1..600_000 ms, default 30_000
  logger?: LoggerLike;    // pino-compatible
  metrics?: MetricsHook;
}

interface RealtimeConfig extends ClientConfig {
  voiceMode?: boolean;
}

Validation rules

| Field | Rule | | --- | --- | | model | must be present in defaultModelCatalog. Throws UnsupportedModelError otherwise. | | litellmUrl | absolute http or https, has a host, no trailing slash, no query string, no fragment. | | apiKey | ≥8 printable-ASCII chars, no whitespace. | | maxRetries | integer, 0–10. | | timeout | integer, 1–600 000 ms. |

Validation runs at construction time so misconfiguration fails fast.

Logging

Bring your own pino-compatible logger via config.logger. Without one, a pino logger is used; the level can be set with LITELLM_WRAPPER_LOG_LEVEL=debug|info|warn|error|silent.

Metrics hooks

config.metrics = {
  onRequestStart: ({ type, family, model }) => stats.inc('llm.start', { type, family, model }),
  onRequestEnd: ({ ok, durationMs, ... })   => stats.timing('llm.end', durationMs, { ok }),
};

Development

npm install
npm run lint
npm run typecheck
npm run test
npm run test:coverage
npm run build

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme