@mate-academy/llm-gateway

v7.5.1

Published

a day ago

A gateway package for LLM services.

0High
0Medium
0Low

mateacademy

LLM Gateway

A standardized interface for interacting with various Large Language Model (LLM) providers.

Overview

The @mate-academy/llm-gateway package provides a unified way to work with different LLM providers like OpenAI and Google Generative AI. It abstracts the implementation details of each provider, allowing you to easily switch between them without changing your application code.

Installation

npm install @mate-academy/llm-gateway

Features

Support for multiple LLM providers (OpenAI, Google Generative AI)
Structured Output: Type-safe JSON responses with schema validation
Extensible Schema Architecture: Driver pattern with provider-specific adapters
LLM Metrics & Reporting: Comprehensive metrics collection with automatic cost calculation
Cost Tracking: Real-time cost calculation for all LLM operations with flexible pricing functions supporting text and audio tokens, including cached token discounts and reasoning tokens
Cached Token Support: Automatic detection and separate pricing for cached tokens (prompt caching) with provider-specific discount rates
Instance Caching: Intelligent SDK instance pooling with LRU eviction to prevent memory leaks
Standardized completion service interface
Standardized assistance service interface with file handling and direct chat
Speech-to-text transcription capabilities
Text-to-speech generation capabilities
Abort signal support for cancelling long-running operations
Lightweight token counting using character-based approximation (no heavyweight tokenizer dependencies)
Type-safe prompt templates with dynamic replacements
Factory pattern for easy provider selection
Consistent logging across all providers
Comprehensive testing suite with integration tests and shared test utilities
Full backward compatibility

Usage

Logger Interface

The package accepts an optional logger that implements the LLMLoggerInterface interface. Most logging libraries are compatible (@mate-academy/logger, winston, pino, etc.). If no logger is provided, no logging will occur.

// Using @mate-academy/logger (recommended)
import { logger } from '@mate-academy/logger';

// Or simple console logger
const logger = {
  info: (msg, meta) => console.log(msg, meta),
  error: (msg, meta) => console.error(msg, meta),
  warn: (msg, meta) => console.warn(msg, meta),
  child: (context) => logger,
};

Available Metrics:

The reporter automatically collects the following metrics:

interface LLMMetrics {
  // Provider & Model Information
  provider: 'OpenAI' | 'GoogleGenerativeAI';
  purpose: 'completion' | 'assistance' | 'speech_to_text' | 'text_to_speech';
  model: string | null;
  modelConfig: Record<string, any> | null;

  // Method Context
  method: string; // e.g., 'sendMessage', 'assistInChat', 'transcribe'
  status: 'success' | 'error' | 'cancelled';

  // Token Usage
  tokens: {
    input: number;  // Regular input tokens (excludes cached tokens)
    output: number; // Output tokens (includes reasoning tokens for o1/o3 models)
    total: number;  // Total tokens (input + output)
  };

  // Cost Tracking (automatically calculated with cached token discounts)
  costs: {
    input: number;   // Cost for input tokens (text + audio), with cached token discounts applied
    output: number;  // Cost for output tokens (text + audio + reasoning tokens)
    total: number;   // Total cost
    currency: string; // 'USD'
  };
}

TypeScript Behavior:

The reporterContext parameter behavior depends on your reporter configuration:

// Case 1: Reporter with defined context type - reporterContext is REQUIRED
const reporterWithContext: LLMReporterInterface<{ userId: string }> = new MyReporter();

// Case 2: Reporter without context - reporterContext is OPTIONAL
const reporterWithoutContext: LLMReporterInterface<undefined> = new SimpleReporter();

// Case 3: No reporter - reporterContext is OPTIONAL
const service = LLMServiceFactory.getCompletionService({
  provider, options
  // no reporter
});

Basic Setup

import {
  LLMProviders,
  LLMServiceFactory,
  LLMCompletionService,
  LLMAssistanceService,
  LLMSpeechToTextService,
  LLMTextToSpeechService,
  LLMRoles,
  LLMMessageContentType,
  LLMUploadFileMimeTypes,
  LLMCompletionMessage,
  LLMModel,
  LLMSchema,
  createPromptTemplate,
  LLMLoggerInterface,
  LLMReporterInterface,
} from '@mate-academy/llm-gateway';

// Define provider options
const llmProviderOptions = LLMServiceFactory.resolveProviderOptions(
  LLMProviders.OpenAI, // chosen provider
  {
    [LLMProviders.OpenAI]: {
      apiKey: 'your-openai-api-key',
      organization: 'your-organization-id', // optional
      baseURL: 'https://api.openai.com/v1', // optional
    },
    [LLMProviders.GoogleGenerativeAI]: {
      apiKey: 'your-google-ai-api-key',
    },
  },
);

// Get completion service
const completionService = LLMServiceFactory.getCompletionService({
  provider: LLMProviders.OpenAI,
  options: llmProviderOptions, // optional, but recommended - otherwise must be set later via service.setOptions()
  logger, // optional - omit for no logging
  reporter, // optional - omit for no metrics collection
});

// Get assistance service
const assistanceService = LLMServiceFactory.getAssistanceService({
  provider: LLMProviders.OpenAI,
  options: llmProviderOptions, // optional, but recommended - otherwise must be set later via service.setOptions()
  logger, // optional - omit for no logging
  reporter, // optional - omit for no metrics collection
});

// Get speech-to-text service
const speechToTextService = LLMServiceFactory.getSpeechToTextService({
  provider: LLMProviders.OpenAI,
  options: llmProviderOptions, // optional, but recommended - otherwise must be set later via service.setOptions()
  logger, // optional - omit for no logging
  reporter, // optional - omit for no metrics collection
});

// Get text-to-speech service
const textToSpeechService = LLMServiceFactory.getTextToSpeechService({
  provider: LLMProviders.OpenAI,
  options: llmProviderOptions, // optional, but recommended - otherwise must be set later via service.setOptions()
  logger, // optional - omit for no metrics collection
  reporter, // optional - omit for no metrics collection
});

Real-world Example

Here's how you might integrate the LLM Gateway in a use case class:

import {
  LLMProviders,
  LLMServiceFactory,
  LLMCompletionService,
  LLMAssistanceService,
  LLMSpeechToTextService,
  LLMTextToSpeechService,
} from '@mate-academy/llm-gateway';

class MyUseCase {
  private llmCompletionService: LLMCompletionService<LLMProviders>;
  private llmAssistanceService: LLMAssistanceService<LLMProviders>;
  private llmSpeechToTextService: LLMSpeechToTextService<LLMProviders>;
  private llmTextToSpeechService: LLMTextToSpeechService<LLMProviders>;

  constructor(logger, config) {
    const llmProviderOptions = LLMServiceFactory.resolveProviderOptions(
      config.llmProvider,
      {
        [LLMProviders.OpenAI]: {
          apiKey: config.openAIApiKey,
          organization: config.openAIOrgId,
          baseURL: config.openAIBaseUrl,
        },
        [LLMProviders.GoogleGenerativeAI]: {
          apiKey: config.googleAIApiKey,
        },
      },
    );

    this.llmCompletionService = LLMServiceFactory.getCompletionService({
      provider: config.llmProvider,
      options: llmProviderOptions,
      logger,
      reporter,
    });

    this.llmAssistanceService = LLMServiceFactory.getAssistanceService({
      provider: config.llmProvider,
      options: llmProviderOptions,
      logger,
      reporter,
    });

    this.llmSpeechToTextService = LLMServiceFactory.getSpeechToTextService({
      provider: config.llmProvider,
      options: llmProviderOptions,
      logger,
      reporter,
    });

    this.llmTextToSpeechService = LLMServiceFactory.getTextToSpeechService({
      provider: config.llmProvider,
      options: llmProviderOptions,
      logger,
      reporter,
    });
  }

  async processRequest(prompt) {
    // Use completion service
    const completion = await this.llmCompletionService.sendMessage({
      message: {
        role: LLMRoles.User,
        content: [
          {
            type: LLMMessageContentType.TEXT,
            text: prompt,
          }
        ],
      },
      model: this.getPreferredModel(),
    });

    return completion;
  }

  async transcribeAudio(audioPath) {
    // Use speech-to-text service
    const transcription = await this.llmSpeechToTextService.transcribe({
      pathToAudio: audioPath,
      model: this.getPreferredModel(),
    });

    return transcription;
  }

  async generateSpeech(text) {
    // Use text-to-speech service
    const speech = await this.llmTextToSpeechService.createSpeech({
      text: text,
      model: this.getPreferredModel(),
      speechOptions: {
        voice: 'alloy', // OpenAI voice option
        response_format: 'mp3',
      },
    });

    return speech;
  }

  private getPreferredModel() {
    // Get the appropriate model from the service's available models
    const models = Object.values(this.llmCompletionService.models);
    return models[0]; // Use the first available model
  }
}

Usage Examples

Basic Text Completion

const result = await completionService.sendMessage({
  message: {
    role: LLMRoles.User,
    content: [
      {
        type: LLMMessageContentType.TEXT,
        text: 'Hello, how are you?',
      }
    ],
  },
  model: preferredModel,
  reporterContext: { // required if reporter is configured with context type
    userId: 12345,
    feature: 'chat',
    environment: 'production',
  },
});

console.log(result.text); // AI response

Chat-based Assistance

// Create a chat session
const chat = await assistanceService.createChat({
  model: preferredModel,
  instructions: 'You are a helpful coding assistant.',
});

// Send messages directly to the chat
const result = await assistanceService.assistInChat({
  model: preferredModel,
  message: {
    role: LLMRoles.User,
    content: [{
      type: LLMMessageContentType.TEXT,
      text: 'Help me understand React hooks',
    }],
  },
  chatId: chat.chatId,
  reporterContext: { // required if reporter is configured with context type
    userId: 12345,
    feature: 'coding_assistance',
    environment: 'production',
  },
});

console.log(result.text); // Assistant response

Direct Image Analysis

// Send image directly in message content (OpenAI)
const result = await assistanceService.assistInChat({
  model: preferredModel,
  message: {
    role: LLMRoles.User,
    content: [
      {
        type: LLMMessageContentType.TEXT,
        text: 'What do you see in this image?',
      },
      {
        type: LLMMessageContentType.IMAGE_URL,
        image_url: {
          url: 'https://example.com/image.png',
          detail: 'high',
        },
      },
    ],
  },
  chatId: chat.chatId,
});

console.log(result.text); // Image analysis response

Structured Output

The LLM Gateway supports structured output, allowing you to request type-safe, validated JSON responses from LLM providers. Built on Zod v4 with native JSON Schema generation for optimal performance.

Basic Structured Output

import { LLMSchema } from '@mate-academy/llm-gateway';

// Define your output schema
const personSchema = LLMSchema.object({
  name: LLMSchema.string(),
  age: LLMSchema.number().min(1),
  email: LLMSchema.string().email(),
  isActive: LLMSchema.boolean(),
});

// Request structured output
const result = await completionService.sendMessage({
  message: {
    role: LLMRoles.User,
    content: [{
      type: LLMMessageContentType.TEXT,
      text: 'Create a person profile for John Smith, 30 years old, email [email protected]'
    }]
  },
  model: preferredModel,
  schema: personSchema  // Add schema for structured output
});

// Type-safe access to structured data
if (result.data) {
  console.log(result.data.name);        // string
  console.log(result.data.age);         // number
  console.log(result.data.email);       // string
  console.log(result.data.isActive);    // boolean
}

// Always available as text fallback
console.log(result.text);

Complex Schema Example

const analysisSchema = LLMSchema.object({
  sentiment: LLMSchema.enum(['positive', 'negative', 'neutral']),
  confidence: LLMSchema.number().min(0).max(1),
  keywords: LLMSchema.array(LLMSchema.string()),
  summary: LLMSchema.string(),
  metadata: LLMSchema.object({
    processedAt: LLMSchema.string(),
    modelVersion: LLMSchema.string(),
  }),
});

const result = await completionService.sendMessage({
  message: {
    role: LLMRoles.User,
    content: [{
      type: LLMMessageContentType.TEXT,
      text: 'Analyze this text: "I love this new feature!"'
    }]
  },
  model: preferredModel,
  schema: analysisSchema
});

// Fully type-safe access
if (result.data) {
  console.log(result.data.sentiment);         // 'positive' | 'negative' | 'neutral'
  console.log(result.data.confidence);        // number (0-1)
  console.log(result.data.keywords);          // string[]
  console.log(result.data.summary);           // string
  console.log(result.data.metadata.processedAt); // string
}

Schema API Reference

The LLMSchema builder provides a fluent API for defining output schemas:

Basic Types:

LLMSchema.string() - String values
LLMSchema.number() - Numeric values
LLMSchema.boolean() - Boolean values
LLMSchema.array(itemSchema) - Arrays of items
LLMSchema.object({ ... }) - Object with specified properties
LLMSchema.enum(['option1', 'option2']) - Enumerated values
LLMSchema.literal('exact_value') - Exact literal values

Modifiers:

.optional() - Makes field optional and nullable (compatible with all providers)
.nullable() - Allows null values
.default(value) - Sets default value
.describe(text) - Adds description

String Modifiers:

.min(length) - Minimum string length
.max(length) - Maximum string length
.email() - Email validation
.url() - URL validation
.uuid() - UUID validation

Number Modifiers:

.int() - Integer values only
.positive() - Positive numbers (Note: Use .min(1) for better OpenAI compatibility)
.negative() - Negative numbers
.min(value) - Minimum value
.max(value) - Maximum value

Error Handling

const result = await completionService.sendMessage({
  message: { /* ... */ },
  model: preferredModel,
  schema: mySchema
});

if (result.data) {
  // Successfully parsed structured output
  console.log('Structured data:', result.data);
} else if (result.parseError) {
  // Parsing failed, but text is still available
  console.error('Parse error:', result.parseError);
  console.log('Raw text:', result.text);
} else if (result.error) {
  // Request failed entirely
  console.error('Request error:', result.error);
}

Provider Support

OpenAI: Uses native response_format with json_schema for optimal performance
Google Generative AI: Uses native responseSchema parameter for structured output
Backward Compatibility: All existing code continues to work unchanged

Provider Compatibility Notes

OpenAI Structured Output:

The .optional() modifier now automatically converts to required+nullable for OpenAI compatibility
Use .min(1) instead of .positive() for better compatibility
Nested objects and arrays are fully supported

Google Generative AI:

Supports all schema types and modifiers
Handles optional fields natively
More flexible with schema variations

Best Practices for Cross-Provider Compatibility:

Use .optional() for optional fields (automatically handled for all providers)
Use .min() and .max() instead of .positive(), .negative()
Both providers now have unified schema handling

Schema Architecture

The package uses a driver pattern for schema conversion, automatically adapting schemas to each provider's specific format while maintaining a unified API.

Schema adapters are automatically registered when providers are imported, so no manual configuration is needed. For detailed architecture information and extending with new providers, see the Developer Guide.

Model Configuration

Each model comes with default configuration values that can be customized for your specific needs.

Accessing and Customizing Models

// Get available models from the service
const models = completionService.models;

// Get a specific model
const model = models[OpenAIModelNames.GPT_4_1];

// Customize model configuration
const customModel = {
  ...model,
  config: {
    ...model.config,
    temperature: 0.8,  // Override temperature (0-2, controls randomness)
    top_p: 0.9,       // Override top_p (nucleus sampling)
  }
};

// Use customized model in requests
const result = await completionService.sendMessage({
  message: {
    role: LLMRoles.User,
    content: [{ type: LLMMessageContentType.TEXT, text: 'Hello!' }]
  },
  model: customModel,
});

Model-Specific Configuration

GPT-5 Models have special configuration options:

const gpt5Model = models[OpenAIModelNames.GPT_5];

const customGPT5Model = {
  ...gpt5Model,
  config: {
    ...gpt5Model.config,
    temperature: 1,  // Note: GPT-5 only supports temperature=1
    reasoning_effort: 'high',  // 'minimal' | 'low' | 'medium' | 'high' | 'xhigh'
    verbosity: 'low',  // 'low' | 'medium' | 'high'
  }
};

// GPT-5.1 and GPT-5.2 supports 'none' reasoning_effort
const gpt51Model = models[OpenAIModelNames.GPT_5_1];

const customGPT51Model = {
  ...gpt51Model,
  config: {
    ...gpt51Model.config,
    temperature: 1,  // Note: GPT-5.1 only supports temperature=1
    reasoning_effort: 'none',  // 'none' | 'minimal' | 'low' | 'medium' | 'high' | 'xhigh'
    verbosity: 'low',  // 'low' | 'medium' | 'high'
  }
};

Note: All GPT-5 models require temperature: 1 and cannot be changed. GPT-5, GPT-5-MINI, and GPT-5-NANO support reasoning_effort values of 'minimal', 'low', 'medium', 'high', or 'xhigh'. GPT-5.1, GPT-5.2, and GPT-5.4 additionally support 'none'.

Gemini 3.x thinking models (gemini-3-flash-preview, gemini-3.1-pro-preview) have special configuration options:

import { ThinkingLevel } from '@google/genai';

const gemini31Model = models[GoogleGenerativeAIModelNames.GEMINI_3_1_PRO_PREVIEW];

const customGemini31Model = {
  ...gemini31Model,
  config: {
    ...gemini31Model.config,
    temperature: 1,  // Note: Gemini 3.x thinking models only support temperature=1
    thinkingConfig: {
      thinkingLevel: ThinkingLevel.HIGH,  // LOW | HIGH | THINKING_LEVEL_UNSPECIFIED
    },
  }
};

Note: Gemini 3.x thinking models require temperature: 1 and cannot be changed. The thinkingConfig allows you to control the model's reasoning depth.

Gemini tool-loop replay handling (LLMAPI provider). Gemini through the LLMAPI Chat Completions proxy imposes two replay constraints on tool loops, both handled internally by LLMAPIAssistanceService:

Parallel-call pairing: Gemini emits id-less function calls and the proxy synthesizes ids, so an assistant turn with several parallel tool calls is recorded in chat history as sequential single-call turns (each function-call turn has exactly one matching response).
thought_signature round-trip: thinking models stamp tool_calls[n].extra_content.google.thought_signature (only on the first of N parallel calls) and require it echoed verbatim on replay. Split turns without a signature carry the Google-documented skip_thought_signature_validator bypass marker. If the provider still rejects a replayed signature (HTTP 400 "Corrupted thought signature."), the service downgrades every recorded signature to the bypass marker and retries that round trip once.

Set LLM_GATEWAY_DEBUG_TOOL_CALLS=true to log the outgoing tool-call turn shapes (ids, signature presence) for wire-level debugging.

File-based Assistance

// Upload files for context
const uploadedFile = await assistanceService.uploadFile({
  name: 'document.txt',
  path: '/path/to/document.txt',
  mimeType: LLMUploadFileMimeTypes.PLAIN_TEXT,
});

// Create file storage with instructions
const storage = await assistanceService.createFileStorage({
  uploadedFiles: [uploadedFile],
  model: preferredModel,
  instructions: 'Help me analyze this document',  // Optional context for the storage
});

// Create chat with file context
const chat = await assistanceService.createChat({
  storageId: storage.storageId,
  model: preferredModel,
  history: [],
  files: [uploadedFile],
  instructions: 'Answer questions about the uploaded document',  // Chat instructions
});

// Send messages in the chat
const result = await assistanceService.assistInChat({
  model: preferredModel,
  message: {
    role: LLMRoles.User,
    content: [{
      type: LLMMessageContentType.TEXT,
      text: 'What are the key points in this document?',
    }],
  },
  chatId: chat.chatId,
  storageId: storage.storageId,
});

Speech-to-Text Transcription

const transcription = await speechToTextService.transcribe({
  pathToAudio: '/path/to/audio.mp3',
  mimeType: LLMUploadFileMimeTypes.AUDIO_MP3, // Optional, but recommended
  model: preferredModel,
  instructions: 'Transcribe the audio file', // Optional: custom prompt for transcription context
  reporterContext: { // required if reporter is configured with context type
    userId: 12345,
    feature: 'audio_transcription',
    environment: 'production',
  },
});

console.log(transcription.text); // Transcribed text

Text-to-Speech Generation

const speech = await textToSpeechService.createSpeech({
  text: 'Hello, this will be converted to speech',
  model: preferredModel,
  speechOptions: {
    voice: 'alloy', // OpenAI voice option
    response_format: 'mp3',
  },
});

// Save audio buffer to file
fs.writeFileSync('output.mp3', speech.audio);

Token Counting for Context Management

The LLM Gateway provides token counting functionality to help you manage context and optimize API usage. Token counting uses a lightweight character-based approximation (approximately 4 characters per token) to avoid heavyweight tokenizer dependencies.

Note: Token counts are approximate and may differ from actual provider token usage. For precise token counts, refer to the usage metrics returned in API responses.

// Count tokens in messages before sending to optimize context usage
const messages = [
  {
    role: LLMRoles.User,
    content: [
      {
        type: LLMMessageContentType.TEXT,
        text: 'Analyze this document and provide insights.',
      },
      {
        type: LLMMessageContentType.TEXT,
        text: longDocumentContent, // Large text content
      },
      // ... potentially more content including images
    ],
  }
];

const tokenCount = await assistanceService.countTokens(messages, preferredModel);

console.log(`Total tokens: ${tokenCount}`);

// Make intelligent decisions based on token count
if (tokenCount > 50000) {
  // Use file upload approach for large content
  const uploadedFile = await assistanceService.uploadFile({
    name: 'document.txt',
    path: '/path/to/document.txt',
    mimeType: LLMUploadFileMimeTypes.PLAIN_TEXT,
  });

  const storage = await assistanceService.createFileStorage({
    uploadedFiles: [uploadedFile],
    model: preferredModel,
  });

  const chat = await assistanceService.createChat({
    storageId: storage.storageId,
    model: preferredModel,
    instructions: 'Analyze the uploaded document',
  });
} else {
  // Send content directly in messages
  const result = await assistanceService.assistInChat({
    model: preferredModel,
    message: messages[0],
    chatId: existingChatId,
  });
}

Abort Signal Support

All operations support abort signals for cancellation:

const controller = new AbortController();

// Cancel after 30 seconds
setTimeout(() => controller.abort(), 30000);

const result = await completionService.sendMessage({
  message: {
    role: LLMRoles.User,
    content: [{ type: LLMMessageContentType.TEXT, text: 'Long request...' }],
  },
  model: preferredModel,
  abortSignal: controller.signal,
});

if (result.error) {
  console.error('Operation failed or was cancelled:', result.error);
} else {
  console.log('Success:', result.text);
}

API Reference

LLMProviders

Enum of supported LLM providers:

enum LLMProviders {
  OpenAI = 'OpenAI',
  GoogleGenerativeAI = 'GoogleGenerativeAI',
  // other providers may be added in the future
}

LLMPurposes

Enum of supported service purposes:

enum LLMPurposes {
  Completion = 'completion',
  Assistance = 'assistance',
  SpeechToText = 'speech_to_text',
  TextToSpeech = 'text_to_speech',
}

LLMServiceFactory

Factory class for creating LLM service instances.

Methods

resolveProviderOptions(provider, optionsMap): Resolves the options for the specified provider
getCompletionService(provider, logger, reporter, options): Creates a completion service instance
getAssistanceService(provider, logger, reporter, options): Creates an assistance service instance
getSpeechToTextService(provider, logger, reporter, options): Creates a speech-to-text service instance
getTextToSpeechService(provider, logger, reporter, options): Creates a text-to-speech service instance

Service Instance Methods

All service instances provide the following common methods:

setOptions(options): Update the provider options (e.g., API key, base URL) for an existing service instance
getOptions(): Retrieve the current provider options
clearInstanceCache(): Clear the internal SDK instance cache (rarely needed)

Instance Caching:

The gateway implements intelligent SDK instance caching to prevent memory leaks when switching between different credentials:

// Create service with initial credentials
const service = LLMServiceFactory.getCompletionService({
  provider: LLMProviders.OpenAI,
  options: { apiKey: 'key-1' },
});

// First call creates and caches SDK instance for 'key-1'
await service.sendMessage({ /* ... */ });

// Update to different credentials
service.setOptions({ apiKey: 'key-2' });

// Second call creates and caches SDK instance for 'key-2'
await service.sendMessage({ /* ... */ });

// Switch back to original credentials
service.setOptions({ apiKey: 'key-1' });

// Reuses cached SDK instance for 'key-1' (no recreation needed)
await service.sendMessage({ /* ... */ });

The cache uses LRU (Least Recently Used) eviction and maintains up to 10 instances per service. This is particularly useful when:

Using the same service with multiple products/features that have different API keys
Implementing multi-tenant systems where credentials change frequently
Running tests with different credential sets

LLMCompletionService

Interface for text completion services.

Methods

sendMessage(options): Send a message to the LLM and get a completion response
- message: The message to send
- model: The LLM model to use (with optional config overrides)
- history: Optional conversation history
- instructions: Optional system instructions to guide the model's behavior
- schema: Optional schema for structured output
- reporterContext: Context data passed to reporter for metrics tracking (required if reporter is configured with a defined context type)
- abortSignal: Optional abort signal for cancellation

LLMAssistanceService

Interface for chat/assistance services with file handling capabilities.

Methods

uploadFile(fileOptions): Upload a file to the LLM service
getFile(path): Retrieve the file information for a previously uploaded file by its path
deleteFile(fileId): Delete a file from the LLM service
createFileStorage(options): Create a file storage for organizing files
- uploadedFiles: Array of previously uploaded files
- model: The LLM model to use
- instructions: Optional instructions for how to use the stored files
deleteFileStorage(fileStorageId): Delete a file storage
createChat(options): Create a new chat/conversation
- model: The LLM model to use
- instructions: Optional initial instructions for the conversation
- history: Optional conversation history
- files: Optional array of uploaded files
- storageId: Optional storage ID to use
deleteChat(chatId): Delete a chat
assistInChat(options): Send a message in an existing chat and get a response
- model: The LLM model to use (required)
- message: The message to send
- chatId: The ID of the chat to send the message to
- storageId: Optional storage ID to use for file context
- schema: Optional schema for structured output (supports type-safe JSON responses)
- tools: Optional array of tool definitions (built with LLMTool.create) the model may call during the turn
- maxToolIterations: Optional cap on the number of tool-call rounds
- reporterContext: Context data passed to reporter for metrics tracking (required if reporter is configured with a defined context type)
countTokens(messages, model): Count tokens in messages for context management

Tool-returned Images

A tool's execute may return either a plain string (text-only, the legacy behavior) or an LLMStructuredToolResult to surface images the model can actually see:

const screenshotTool = LLMTool.create({
  name: 'get_node_screenshot',
  description: 'Renders a screenshot of a design node.',
  parameters: LLMSchema.object({ nodeId: LLMSchema.string() }),
  execute: async ({ nodeId }) => ({
    content: `Rendered a screenshot of node ${nodeId}.`,
    images: [{ url: imageUrl, detail: 'high', label: `Node ${nodeId}` }],
  }),
});

content always becomes the tool/function-result message. Each provider then surfaces images the way its API allows:

OpenAI (Responses API) and Gemini 3 (multimodalToolResults capability) embed the images natively in the function-result message.
LLMAPI (Chat Completions) and Gemini 2.5 cannot carry images in a tool-result message, so the gateway appends the images as a separate user message right after the tool results.

LLMToolResultImage.url accepts an https URL or a base64 data: URL.

LLMSpeechToTextService

Interface for converting speech audio to text.

Methods

transcribe(options): Convert audio file to text transcription
- pathToAudio: Path to the audio file
- mimeType: Optional MIME type of the audio file
- model: The LLM model to use
- instructions: Optional custom transcription prompt or context
- reporterContext: Context data passed to reporter for metrics tracking (required if reporter is configured with a defined context type)

LLMTextToSpeechService

Interface for converting text to speech audio.

Methods

createSpeech(options): Convert text to speech audio file
- text: Text to convert to speech
- model: The LLM model to use
- instructions: Optional instructions for speech generation
- speechOptions: Provider-specific speech configuration
- reporterContext: Context data passed to reporter for metrics tracking (required if reporter is configured with a defined context type)

Prompt Builder

The LLM Gateway includes a powerful prompt template system that provides type-safe string templates with dynamic replacements and conditional sections. This allows you to create reusable prompt templates with placeholders that can be replaced with actual values at runtime.

Features

Type Safety: Automatic extraction and validation of placeholder keys from template strings
Dynamic Replacements: Replace placeholders like {{variableName}} with actual values
Conditional Sections: Show or hide content based on variable values using {{#condition}}...{{/condition}} syntax
Nested Conditionals: Support for nested conditional sections for complex logic
Template Reusability: Create templates once and use them multiple times with different values
Zero Runtime Dependencies: Pure TypeScript utility functions

Basic Usage

import { createPromptTemplate } from '@mate-academy/llm-gateway';

// Create a prompt template with placeholders
const welcomePrompt = createPromptTemplate(`
  Generate a welcome message for a user who has just started their auto tech check attempt on {{topicTitle}}.
  The user's experience level is {{experienceLevel}} and they prefer {{learningStyle}} learning.
`);

// Use the template with actual values
const instruction = welcomePrompt({
  topicTitle: 'JavaScript Basics',
  experienceLevel: 'beginner',
  learningStyle: 'hands-on',
});

// Result: "Generate a welcome message for a user who has just started their auto tech check attempt on JavaScript Basics. The user's experience level is beginner and they prefer hands-on learning."

Conditional Sections

Conditional sections allow you to show or hide parts of the template based on variable values:

// Template with conditional sections
const coursePrompt = createPromptTemplate(`
  Generate a lesson plan for {{topicTitle}}.
  {{#hasPrerequisites}}
  Prerequisites: {{prerequisites}}
  {{/hasPrerequisites}}

  {{#includeExercises}}
  Include practical exercises and code examples.
  {{/includeExercises}}

  {{#difficultyLevel}}
  Adjust content for {{difficultyLevel}} level students.
  {{/difficultyLevel}}
`);

// Usage with all sections visible
const fullLesson = coursePrompt({
  topicTitle: 'React Hooks',
  hasPrerequisites: true,
  prerequisites: 'Basic React knowledge',
  includeExercises: true,
  difficultyLevel: 'intermediate'
});

// Usage with some sections hidden
const basicLesson = coursePrompt({
  topicTitle: 'React Hooks',
  hasPrerequisites: false,
  prerequisites: '',
  includeExercises: false,
  difficultyLevel: 'beginner'
});

Negative Conditional Sections

Negative conditions allow you to show content when a value is falsy:

const feedbackPrompt = createPromptTemplate(`
  Analyze the {{language}} code submission.
  {{#passed}}
  Great job! The tests passed successfully.
  {{/passed}}
  {{#!passed}}
  The tests did not pass. Please review the following issues:
  {{errors}}
  {{/passed}}

  {{#!skipSuggestions}}
  Here are some suggestions for improvement:
  - Consider refactoring for better readability
  - Add more comprehensive error handling
  {{/skipSuggestions}}
`);

// When tests pass
const successResult = feedbackPrompt({
  language: 'JavaScript',
  passed: true,
  errors: '',
  skipSuggestions: false
});
// Result: Shows success message and suggestions

// When tests fail
const failureResult = feedbackPrompt({
  language: 'Python',
  passed: false,
  errors: 'TypeError on line 15',
  skipSuggestions: false
});
// Result: Shows failure message with errors and suggestions

Nested Conditional Sections

You can nest conditional sections for more complex logic:

const reviewPrompt = createPromptTemplate(`
  Review the {{language}} code for {{focusArea}}.
  {{#includeMetrics}}
  Provide performance metrics.
  {{#includeDetailed}}
  Include detailed benchmark analysis and memory usage patterns.
  {{/includeDetailed}}
  {{/includeMetrics}}

  {{#suggestImprovements}}
  Suggest specific improvements for better {{improvementFocus}}.
  {{/suggestImprovements}}
`);

const detailedReview = reviewPrompt({
  language: 'TypeScript',
  focusArea: 'performance',
  includeMetrics: true,
  includeDetailed: true,
  suggestImprovements: true,
  improvementFocus: 'scalability'
});

Advanced Usage

// Template without placeholders (no parameters required)
const staticPrompt = createPromptTemplate(`
  Please analyze the provided code and suggest improvements.
`);
const staticInstruction = staticPrompt(); // No parameters needed

// Template with multiple placeholders and conditional sections
const codeReviewPrompt = createPromptTemplate(`
  Review the {{language}} code below for {{focusArea}}.
  {{#includeCriteria}}
  Pay special attention to {{criteria}} and provide {{outputFormat}} feedback.
  {{/includeCriteria}}

  {{#includeCode}}
  Code:
  {{codeSnippet}}
  {{/includeCode}}

  {{#provideExamples}}
  Include examples of best practices for {{language}}.
  {{/provideExamples}}
`);

const reviewInstruction = codeReviewPrompt({
  language: 'TypeScript',
  focusArea: 'performance optimization',
  includeCriteria: true,
  criteria: 'algorithmic efficiency and memory usage',
  outputFormat: 'structured',
  includeCode: true,
  codeSnippet: 'function example() { /* code here */ }',
  provideExamples: false,
});

Integration with LLM Services

import {
  createPromptTemplate,
  LLMServiceFactory,
  LLMProviders,
  LLMRoles,
} from '@mate-academy/llm-gateway';

// Define reusable prompt templates
const PROMPTS = {
  codeExplanation: createPromptTemplate(`
    Explain the following {{language}} code in simple terms for a {{level}} developer:
    {{#includeContext}}
    Context: {{context}}
    {{/includeContext}}

    {{code}}

    {{#includeExamples}}
    Provide practical examples of how this code would be used.
    {{/includeExamples}}
  `),

  bugFinding: createPromptTemplate(`
    Find potential bugs in this {{language}} code and suggest fixes:
    {{#focusArea}}
    Focus specifically on {{focusArea}} issues.
    {{/focusArea}}

    {{code}}

    {{#includeSeverity}}
    Rate the severity of each issue from 1-5.
    {{/includeSeverity}}
  `),

  optimization: createPromptTemplate(`
    Optimize the following code for {{optimizationType}}:
    {{code}}

    {{#includeMetrics}}
    Provide before/after performance metrics.
    {{/includeMetrics}}

    {{#includeAlternatives}}
    Suggest alternative approaches and explain trade-offs.
    {{/includeAlternatives}}
  `),
};

// Use with completion service
async function explainCode(code: string, language: string, level: string, includeExamples = false) {
  const prompt = PROMPTS.codeExplanation({
    code,
    language,
    level,
    includeContext: false,
    includeExamples,
  });

  return await completionService.sendMessage({
    message: {
      role: LLMRoles.User,
      content: [{ type: LLMMessageContentType.TEXT, text: prompt }],
    },
    model: preferredModel,
  });
}

Type Safety Features

The prompt builder provides compile-time type checking for template placeholders and conditional sections:

// This will show TypeScript errors for missing or incorrect parameters
const template = createPromptTemplate(`
  Hello {{name}}, welcome to {{platform}}!
  {{#showBonus}}You have a bonus: {{bonusAmount}}{{/showBonus}}
`);

// ✅ Correct usage with all required variables
template({
  name: 'John',
  platform: 'LLM Gateway',
  showBonus: true,
  bonusAmount: '$50'
});

// ✅ Correct usage with conditional section hidden
template({
  name: 'John',
  platform: 'LLM Gateway',
  showBonus: false
  // bonusAmount is not required when showBonus is false
});

// ❌ TypeScript error: missing required parameter 'platform'
template({ name: 'John', showBonus: false });

// ❌ TypeScript error: unknown parameter 'age'
template({
  name: 'John',
  platform: 'LLM Gateway',
  showBonus: false,
  age: 25
});

API Reference

createPromptTemplate<T extends string>(template: T)

Creates a prompt template function from a template string.

Parameters:
- template: T - The template string with placeholders and conditional sections
Returns: A function that accepts replacement values and returns the processed string
Type Safety: Automatically extracts placeholder names and conditional section names from the template string for type checking

Template Syntax

Variable Placeholders:

Placeholders must be enclosed in double curly braces: {{variableName}}
Whitespace around variable names is ignored: {{ variableName }} works the same as {{variableName}}
Variable names can contain letters, numbers, and underscores
Replacement values can be strings, numbers, or booleans (automatically converted to strings)

Conditional Sections:

Positive conditional sections use the syntax: {{#conditionName}}content{{/conditionName}}
- The section content is included only if the condition variable is truthy
Negative conditional sections use the syntax: {{#!conditionName}}content{{/conditionName}}
- The section content is included only if the condition variable is falsy
Truthy values: true, non-empty strings, non-zero numbers
Falsy values: false, empty strings, 0, null, undefined
Conditional variables are optional in the type system when used only as conditions
Variables used both as conditions and values are required in the type system

Nested Conditionals:

Conditional sections can be nested for complex logic
Inner sections are processed only if outer sections are visible
Variables inside nested sections follow the same truthy/falsy rules

Supported Providers

OpenAI

Supports all service types: completion, assistance, speech-to-text, and text-to-speech APIs. For more information, see OpenAI API documentation.

Available Models:

| Model | Purpose | Max Input | Max Output | Notes | |-------|---------|-----------|------------|-------| | gpt-4.1 | Completion, Assistance | 1M+ | 32K | Extended context window | | gpt-4.1-mini | Completion, Assistance | 1M+ | 32K | Cost-effective extended context | | gpt-4.1-nano | Completion, Assistance | 1M+ | 32K | Fastest, most cost-efficient GPT-4.1 | | gpt-5 | Completion, Assistance | 400K | 128K | Advanced reasoning, requires temperature=1 | | gpt-5.1 | Completion, Assistance | 400K | 128K | Advanced reasoning with 'none' reasoning_effort support, requires temperature=1 | | gpt-5.2 | Completion, Assistance | 400K | 128K | Advanced reasoning with 'none' reasoning_effort support, requires temperature=1 | | gpt-5.4 | Completion, Assistance | 1M+ | 128K | Flagship model, supports 'none'/'xhigh' reasoning_effort, requires temperature=1 | | gpt-5-mini | Completion, Assistance | 400K | 128K | Smaller GPT-5 variant, requires temperature=1 | | gpt-5-nano | Completion, Assistance | 400K | 128K | Fastest GPT-5 variant, requires temperature=1 | | gpt-4o-transcribe | Speech-to-Text | 16K | 2K | Optimized for transcription | | gpt-4o-mini-transcribe | Speech-to-Text | 16K | 2K | Cost-effective transcription | | tts-1 | Text-to-Speech | 2K | - | Standard TTS model | | gpt-4o-mini-tts | Text-to-Speech | 2K | - | Alternative TTS model |

Google Generative AI

Supports completion, assistance, speech-to-text, and text-to-speech APIs through Google's Generative AI models. For more information, see Google Generative AI documentation.

Available Models:

| Model | Purpose | Max Input | Max Output | Notes | |-------|---------|-----------|------------|-------| | gemini-2.5-flash | Completion, Assistance, Speech-to-Text | 1M+ | 65K | Fast, cost-effective, supports caching for long context | | gemini-2.5-flash-lite | Completion, Assistance, Speech-to-Text | 1M+ | 65K | Fastest, most cost-effective, supports caching | | gemini-2.5-pro | Completion, Assistance, Speech-to-Text | 1M+ | 65K | Advanced reasoning, supports caching for long context | | gemini-3-flash-preview | Completion, Assistance, Speech-to-Text | 1M+ | 65K | Thinking model, requires temperature=1, configurable thinking levels | | gemini-3.1-flash-lite-preview | Completion, Assistance, Speech-to-Text | 1M+ | 65K | Most cost-efficient Gemini 3.x model | | gemini-3.1-pro-preview | Completion, Assistance, Speech-to-Text | 1M+ | 65K | Advanced reasoning with thinking capabilities, requires temperature=1 | | gemini-2.5-flash-preview-tts | Text-to-Speech | 8K | 16K | Preview TTS model with flash performance | | gemini-2.5-pro-preview-tts | Text-to-Speech | 8K | 16K | Preview TTS model with pro capabilities |

Note: Google Generative AI models support context caching for content longer than 32,768 tokens, which can significantly reduce costs for repeated queries on the same large context.

Gemini 3.x Thinking Model Configuration:

Applies to: gemini-3-flash-preview, gemini-3.1-pro-preview
Requires temperature: 1 (cannot be changed)
Supports thinkingConfig with thinkingLevel property (LOW, HIGH, THINKING_LEVEL_UNSPECIFIED)
Default thinking level is LOW

Testing

The LLM Gateway includes a comprehensive testing suite with both unit and integration tests.

Test Structure

The package includes:

Unit tests for core functionality (src/tests/unit/)
- Schema builder and validation tests
- Prompt template builder tests
- Utility function tests
Integration tests for all service types (src/tests/integration/)
- Real API integration tests with live providers
- Service-specific functionality tests
- Cross-provider compatibility tests
Shared test utilities (src/tests/integration/shared/)
- Reusable test patterns for common functionality
- Service initialization tests
- Logger integration tests
- Abort signal handling tests
- Reporter integration tests
- Instance caching tests
Test helpers for common testing utilities (src/tests/helpers.ts)
Mock implementations for testing environments
Audio test files for speech-to-text testing

Running Tests

# Run all tests with logging
npm test

# Run tests silently (without logs)
npm run test:silent

# Run only integration tests with verbose output
npm run test:integration

# Run specific test file
npm test -- LLMSchema.test.ts

# Run tests matching a pattern
npm test -- --testNamePattern="should handle basic schemas"

# Run interactive prepublish test selector
npm run prepublish-tests

# Run with specific environment variables
ENABLE_LOGGING=true npm test

Test Configuration

Integration tests require API keys for the respective providers:

# Required environment variables for OpenAI tests
OPENAI_SECRET_API_KEY=your_openai_api_key
OPENAI_ORG_ID=your_organization_id  # optional
OPENAI_BASE_URL=https://api.openai.com/v1  # optional

# Required environment variables for Google AI tests
GOOGLE_GENERATIVE_AI_API_KEY=your_google_ai_api_key

Test Coverage

The integration tests cover:

Common Service Tests (via shared utilities):
- Service initialization with and without options
- Instance caching and credential switching
- Logger integration and error handling
- Reporter metrics collection (success/abort/fail)
- Abort signal handling for all operations
Completion Service Tests:
- Basic message sending and responses
- Message history handling
- Custom instructions
- Image analysis
- Structured output with schemas
- Error handling for invalid inputs
- Model-specific functionality
Assistance Service Tests:
- File upload and storage creation
- Chat creation and management
- Direct chat interactions
- File-based conversations
- Direct image handling in messages
- Multiple abort scenarios (file upload, storage, chat)
Speech-to-Text Service Tests:
- Audio file transcription
- Multiple audio format support (MP3, WAV, WEBM, OGG)
- Error handling for invalid files
- Model-specific transcription quality
Text-to-Speech Service Tests:
- Text-to-audio conversion
- Voice selection options
- Audio format configuration
- Instructions for speech generation
- Error handling

Test Helpers

The package provides several test utilities:

resolveTestConfig Function

The resolveTestConfig function creates standardized test configurations for all supported providers based on the service purpose:

import { resolveTestConfig } from '@mate-academy/llm-gateway/tests/helpers';

// Get test config for completion services
const testConfig = resolveTestConfig(LLMPurposes.Completion);

// testConfig contains configuration for all providers:
// {
//   [LLMProviders.OpenAI]: {
//     provider: LLMProviders.OpenAI,
//     availableModels: {...}, // Models available for completion
//     clientOptions: { apiKey: process.env.OPENAI_SECRET_API_KEY, ... },
//     requireCredentials: () => void, // Throws if credentials missing
//     isEnabled: true
//   },
//   [LLMProviders.GoogleGenerativeAI]: {
//     provider: LLMProviders.GoogleGenerativeAI,
//     availableModels: {...}, // Models available for completion
//     clientOptions: { apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY },
//     requireCredentials: () => void, // Throws if credentials missing
//     isEnabled: true
//   }
// }

// Use in tests to iterate over all providers
Object.values(testConfig).forEach((config) => {
  const { provider, clientOptions, availableModels, requireCredentials } = config;

  describe(`${provider} Provider`, () => {
    beforeAll(() => {
      requireCredentials(); // Ensures API keys are present
    });

    it('should create service', () => {
      const service = LLMServiceFactory.getCompletionService({
        provider,
        options: clientOptions,
        logger: mockLogger, // Optional
      });
      expect(service).toBeDefined();
    });
  });
});

Mock Logger

// Note: Test helpers are for internal use only
// When testing your integration, create your own mock logger:
const mockLogger = {
  info: jest.fn(),
  error: jest.fn(),
  warn: jest.fn(),
  child: jest.fn(() => mockLogger),
};

Type Guards

// Type guards for test assertions
if ('text' in result && result.text) {
  expect(result.text).toContain('expected content');
}

if ('error' in result && result.error) {
  expect(result.error).toBeDefined();
}

Writing Custom Tests

Example of writing a custom integration test using resolveTestConfig:

import {
  describe,
  it,
  expect,
  beforeAll,
} from '@jest/globals';
import {
  LLMServiceFactory,
  LLMProviders,
  LLMPurposes,
  resolveTestConfig,
} from '@mate-academy/llm-gateway';

// Create your own mock logger
const mockLogger = {
  info: jest.fn(),
  error: jest.fn(),
  warn: jest.fn(),
  child: jest.fn(() => mockLogger),
};

describe('Custom LLM Integration Test', () => {
  // Use resolveTestConfig for consistent test configuration
  const testConfig = resolveTestConfig(LLMPurposes.Completion);

  // Test all enabled providers
  Object.values(testConfig).forEach((config) => {
    const { provider, clientOptions, requireCredentials } = config;

    describe(`${provider} Provider`, () => {
      let service;

      beforeAll(() => {
        requireCredentials(); // Validates API keys are present

        service = LLMServiceFactory.getCompletionService({
          provider,
          options: clientOptions,
          logger: mockLogger, // Optional - can be omitted
        });
      });

      it('should process custom request', async () => {
        // Your custom test logic here
        const result = await service.sendMessage({
          message: {
            role: LLMRoles.User,
            content: [{ type: LLMMessageContentType.TEXT, text: 'Test message' }],
          },
          model: Object.values(config.availableModels)[0], // Use first available model
        });

        // Use type guards for assertions
        if ('text' in result && result.text) {
          expect(result.text).toBeDefined();
          expect(typeof result.text).toBe('string');
        } else if ('error' in result && result.error) {
          throw result.error;
        }
      });
    });
  });
});

Developer Guide

Syncing the LLMAPI model list

The LLMAPIModelNames enum and LLMAPI_AVAILABLE_MODELS map drift as LLMAPI adds/renames models. The pipeline in scripts/sync-llmapi-models/ reconciles them against GET https://api.llmapi.ai/v1/models under these rules:

Unscoped aliases are always present; scoped aliases (openai/…, azure/…, nebius/…) only when LLMAPI advertises 2+ providers for that model.
Existing entries are never modified — the pipeline only appends new entries and removes stale ones.
maxOutputTokens (the one field absent from the LLMAPI payload) is researched from provider docs via web-search sub-agents.

See the linked README for the full classify → research → apply workflow.

Pricing Model Architecture

The LLM Gateway uses a flexible function-based pricing model that supports different token types and caching:

interface LLMModelPricing {
  getPriceForTextInput: (tokens: number) => number;   // Price for regular text input tokens
  getPriceForTextOutput: (tokens: number) => number;  // Price for text output tokens
  getPriceForAudioInput: (tokens: number) => number;  // Price for regular audio input tokens
  getPriceForAudioOutput: (tokens: number) => number; // Price for audio output tokens

  // Optional: Cached token pricing (prompt caching)
  getPriceForCachedTextInput?: (tokens: number) => number;  // Discounted price for cached text tokens
  getPriceForCachedAudioInput?: (tokens: number) => number; // Discounted price for cached audio tokens

  currency: string; // Currency code (e.g., 'USD')
}

This architecture allows:

Dynamic Pricing: Support for tiered pricing based on token count
Multi-Modal Support: Separate pricing for text and audio tokens
Provider Flexibility: Each provider can implement custom pricing logic
Cached Token Discounts: Automatic detection and separate pricing for cached tokens with provider-specific discount rates
Reasoning Token Support: Proper cost calculation for reasoning tokens in o1/o3 models
Character-based Pricing: Some models (e.g., OpenAI TTS) use character count instead of tokens for input pricing

Cached Token Support:

OpenAI: 50% discount for cached tokens (GPT-4 models), 10% discount (GPT-5 models), 25% discount (GPT-4.1 models)
Google Gemini: 90% discount for cached tokens (all models with context caching)
Cached tokens are automatically detected and priced separately
If cached pricing is not defined, falls back to regular pricing

Important Notes:

Cached tokens are tracked separately and receive automatic discount rates
Reasoning tokens (o1/o3 models) are charged as output tokens
Text-to-Speech models may use character-based pricing for input (not token-based)

Architecture Overview

The LLM Gateway uses several architectural patterns to provide a clean, extensible interface for multiple LLM providers:

Core Architecture Components

Service Layer:

Abstract Base Services: LLMBaseService provides common functionality for all services
Purpose-Specific Services: Separate services for Completion, Assistance, Speech-to-Text, and Text-to-Speech
Provider Implementations: Each provider extends abstract services with specific implementations
Instance Caching: Automatic SDK instance pooling with LRU eviction to prevent memory leaks

Instance Caching Architecture:

Shared Global Cache: All service types share a single static cache to optimize memory usage
Provider-based Caching: SDK instances are cached by provider and credential hash (SHA256)
Cross-Purpose Sharing: Different purposes (Completion, Assistance, etc.) share instances for the same provider and credentials
LRU Eviction: Maintains up to 10 instances globally using Least Recently Used eviction
Automatic Reuse: Instances are automatically reused when switching back to previously used credentials
Memory Safety: Prevents memory leaks when credentials change frequently (e.g., multi-tenant scenarios)

Metrics & Reporting:

LLMReporterInterface<ReporterContext>: Type-safe reporter interface for metrics collection
Automatic Cost Calculation: Built-in cost tracking using flexible pricing functions for text and audio tokens
Timer Integration: Automatic duration tracking via initMetricsWriter() pattern
Token Usage Tracking: Separate tracking for text and audio tokens (input/output)

Logging Infrastructure:

LLMLoggerInterface: Standard logging interface compatible with major logging libraries
Child Logger Support: Context propagation through service hierarchies

Schema Architecture Overview

The LLM Gateway uses a driver pattern for schema conversion, providing clean separation between core schema logic and provider-specific implementations.

Core Components:

LLMSchema: Core schema builder with unified API
SchemaAdapterInterface: Simple contract for provider-specific schema converters
SchemaAdapterRegistry: Type-safe registry ensuring all providers are handled
Provider Adapters: Convert generic JSON Schema to provider-specific formats

How It Works:

// The schema uses a unified API regardless of provider
const schema = LLMSchema.object({
  name: LLMSchema.string(),
  age: LLMSchema.number().min(1),
});

// Internally, services call _toProviderSchema() which automatically
// converts to the correct provider-specific format:
schema._toProviderSchema(LLMProviders.OpenAI);           // → OpenAI JSON Schema format
schema._toProviderSchema(LLMProviders.GoogleGenerativeAI); // → Google Type-based format

Provider Schema Adapters:

Each provider has its own schema adapter located in src/providers/{Provider}/schemas/:

OpenAISchemaAdapter: Converts to OpenAI's JSON Schema format, handles strict mode requirements
GoogleSchemaAdapter: Converts to Google's Type-based schema format using their Type enum

Type-Safe Registry:

Schema adapters are managed through a centralized, type-safe registry that ensures compile-time safety:

// src/utilities/schema/SchemaAdapterRegistry.ts
export const SCHEMA_ADAPTER_REGISTRY = {
  OpenAI: new OpenAISchemaAdapter(),
  GoogleGenerativeAI: new GoogleSchemaAdapter(),
} as const satisfies Record<LLMProviders, SchemaAdapterInterface | null>;

Adding a New Provider

To add support for a new LLM provider, follow these steps:

1. Create Provider Directory Structure

Create a new directory in src/providers with your provider name, following the established pattern:

src/providers/YourProvider/
├── index.ts                    # Entry point for provider exports
├── YourProvider.constants.ts   # Provider-specific constants
├── YourProvider.entity.ts      # Provider-specific entity
├── YourProvider.typedefs.ts    # TypeScript definitions
├── YourProviderService.factory.ts # Factory for your provider's services
├── schemas/                    # Schema conversion adapters
│   └── YourProviderSchemaAdapter.ts
└── services/                   # Provider service implementations
    ├── index.ts
    ├── YourProviderCompletionService.ts
    └── YourProviderAssistanceService.ts

2. Add Provider to LLM Providers Enum

Update the LLM providers enum in src/LLMService.typedefs.ts:

export enum LLMProviders {
  OpenAI = 'openai',
  GoogleGenerativeAI = 'google',
  YourProvider = 'your-provider-id',
}

3. Define Provider-Specific Types

Create type definitions in src/providers/YourProvider/YourProvider.typedefs.ts:

// Define model names as an enum for type safety
export enum YourProviderModelNames {
  MODEL_ONE = 'model-one',
  MODEL_TWO = 'model-extended',
}

// Define message roles if applicable
export enum YourProviderRoles {
  User = 'user',
  Assistant = 'assistant',
  System = 'system',
}

// Add any other provider-specific enums or interfaces

Then ensure your provider is properly integrated in the main type system by updating the necessary type mappings in src/LLMService.typedefs.ts:

// Add import for your provider's types
import { type YourProviderModelNames } from './providers/YourProvider/YourProvider.typedefs';

// Update the LLMProviders enum
export enum LLMProviders {
  OpenAI = 'OpenAI',
  GoogleGenerativeAI = 'GoogleGenerativeAI',
  YourProvider = 'YourProvider',
}

// Update LLMInstances type mapping
export type LLMInstances = {
  // ...existing code...
  [LLMProviders.YourProvider]: YourProviderClient; // Your provider's client type
};

// Update LLMInstanceOptions type mapping
export type LLMInstanceOptions = {
  // ...existing code...
  [LLMProviders.YourProvider]: {
    apiKey: string;
    // Add other provider-specific options
  };
};

// Update LLMModelName type mapping
export type LLMModelName = {
  // ...existing code...
  [LLMProviders.YourProvider]: YourProviderModelNames;
};

4. Create Schema Adapter

Implement a schema adapter in src/providers/YourProvider/schemas/YourProviderSchemaAdapter.ts:

import { LLMProviders } from '@/LLMService.typedefs';
import type { SchemaAdapterInterface } from '@/utilities/schema/SchemaAdapterInterface';

export class YourProviderSchemaAdapter implements SchemaAdapterInterface {
  convertSchema(jsonSchema: any): any {
    // Convert JSON Schema to your provider's specific format
    // Example: transform to provider-specific schema structure
    return this.transformToYourProviderFormat(jsonSchema);
  }

  private transformToYourProviderFormat(jsonSchema: any): any {
    // Implement provider-specific schema transformation logic
    // Handle objects, arrays, strings, numbers, etc.
    // Return the schema in your provider's expected format

    if (jsonSchema.type === 'object') {
      // Handle object schemas
      return {
        // Your provider's object schema format
      };
    }

    // Handle other schema types...
    return jsonSchema;
  }
}

5. Add Adapter to Schema Registry

Update the schema adapter registry in src/utilities/schema/SchemaAdapterRegistry.ts:

import { YourProviderSchemaAdapter } from '@/providers/YourProvider/schemas/YourProviderSchemaAdapter';

export const SCHEMA_ADAPTER_REGISTRY = {
  OpenAI: new OpenAISchemaAdapter(),
  GoogleGenerativeAI: new GoogleSchemaAdapter(),
  YourProvider: new YourProviderSchemaAdapter(), // Add your adapter here
  // TypeScript will enforce that ALL providers have adapters
} as const satisfies Record<LLMProviders, SchemaAdapterInterface | null>;

If your provider doesn't support structured output, set it to null:

YourProvider: null, // Provider doesn't support structured output

6. Implement Provider Constants

Define constants in src/providers/YourProvider/YourProvider.constants.ts:

import {
  type LLMProviderAvailableModels,
  type LLMProviderModelsByPurpose,
  type LLMProviders,
  LLMPurposes,
  type LLMServiceBuilder,
} from '@/LLMService.typedefs';
import { YourProviderModelNames } from './YourProvider.typedefs';
import {
  YourProviderAssistanceService,
  YourProviderCompletionService,
  YourProviderSpeechToTextService,
  YourProviderTextToSpeechService,
} from './services';
import { pick } from '@/utilities/functional.utils';

// Define available models with their capabilities, configurations and pricing
const YOUR_PROVIDER_AVAILAB

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

LLM Gateway

Table of Contents

Overview

Installation

Features

Usage

Logger Interface

Basic Setup

Real-world Example

Usage Examples

Basic Text Completion

Chat-based Assistance

Direct Image Analysis

Structured Output

Basic Structured Output

Complex Schema Example

Schema API Reference

Error Handling

Provider Support

Provider Compatibility Notes

Schema Architecture

Model Configuration

Accessing and Customizing Models

Model-Specific Configuration

File-based Assistance

Speech-to-Text Transcription

Text-to-Speech Generation

Token Counting for Context Management

Abort Signal Support

API Reference

LLMProviders

LLMPurposes

LLMServiceFactory

Methods

Service Instance Methods

LLMCompletionService

Methods

LLMAssistanceService

Methods

Tool-returned Images

LLMSpeechToTextService

Methods

LLMTextToSpeechService

Methods

Prompt Builder

Features

Basic Usage

Conditional Sections

Negative Conditional Sections

Nested Conditional Sections

Advanced Usage

Integration with LLM Services

Type Safety Features

API Reference

Supported Providers

OpenAI

Google Generative AI

Testing

Test Structure

Running Tests

Test Configuration

Test Coverage

Test Helpers

resolveTestConfig Function

Mock Logger

Type Guards

Writing Custom Tests

Developer Guide

Syncing the LLMAPI model list

Pricing Model Architecture

Architecture Overview

Core Architecture Components

Schema Architecture Overview

Adding a New Provider