@pierreraby/openrouter-client

v1.0.6

Published

3 months ago

Complete TypeScript SDK for OpenRouter API with full type safety, streaming, tool calls, and advanced features

0High
0Medium
0Low

pierreraby

openrouter ai llm chat gpt claude api sdk typescript streaming openai anthropic

OpenRouter TypeScript SDK

A complete, type-safe TypeScript SDK for the OpenRouter API. Node.js only (ESM), with full API coverage, streaming support, and comprehensive error handling.

Features

✅ Full API Coverage: Chat completions, streaming, models, providers, credits, analytics
✅ Type Safety: Complete TypeScript types for all endpoints and responses
✅ Streaming: Two approaches - ReadableStream (low-level) or AsyncIterable (recommended)
✅ Advanced Features: Tool calling, structured outputs, multimodal (vision), provider preferences
✅ Batch Requests: Execute multiple requests concurrently with rate limiting
✅ Validation Helpers: Pre-validate parameters, check model capabilities, truncate messages
✅ Reliability: Automatic retry with exponential backoff, timeouts, proper error handling
✅ Security: Automatic redaction of sensitive data in logs
✅ Logging: Multiple logger implementations (default, silent, formatted)
✅ 100% Test Coverage: 92 tests covering all features

Installation

npm install @pierreraby/openrouter-client
# or
pnpm add @pierreraby/openrouter-client
# or
yarn add @pierreraby/openrouter-client
yarn add openrouter-client

Quick Start

import OpenRouterClient from 'openrouter-client';

const client = new OpenRouterClient({
  apiKey: process.env.OPENROUTER_API_KEY
});

// Simple chat completion
const response = await client.createChatCompletion({
  model: 'openai/gpt-3.5-turbo',
  messages: [
    { role: 'user', content: 'Hello!' }
  ]
});

console.log(response.choices[0].message.content);

Streaming (Recommended)

// Using AsyncIterable (cleanest approach)
for await (const chunk of client.streamChatCompletion({
  model: 'openai/gpt-3.5-turbo',
  messages: [{ role: 'user', content: 'Tell me a story' }]
})) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}

Examples

The examples/ directory contains comprehensive examples for all features:

Basic Usage (01-03)

01-basic-usage.ts: Client initialization, simple chat completion
02-streaming.ts: ReadableStream vs AsyncIterable streaming
03-tool-calls.ts: Function calling with helpers

Advanced Features (04-07)

04-structured-outputs.ts: JSON mode and json_schema
05-multimodal.ts: Vision with images (URL, base64, multiple)
06-provider-preferences.ts: Provider routing, fallbacks, quantization
07-cost-tracking.ts: Cost monitoring with getGeneration(), getCredits()

Production Patterns (08-12)

08-error-handling.ts: Robust error handling strategies
09-retry-backoff.ts: Retry configuration and best practices
10-prompt-caching.ts: Anthropic caching for 90% cost reduction
11-model-capabilities.ts: Discover model features and validate compatibility
12-rate-limits.ts: Monitor rate limits, budgets, and usage

Validation & Optimization (13-16)

13-validation-helpers.ts: Parameter validation, feature checking, message truncation
14-batch-requests.ts: Concurrent batch processing with rate limiting
15-tool-message-validation.ts: Tool message formatting and common validation errors
16-immediate-cost-tracking.ts: Immediate cost tracking via response.usage (recommended)

Run examples with:

tsx examples/01-basic-usage.ts

Configuration

const client = new OpenRouterClient({
  apiKey: string;              // Required: Your OpenRouter API key
  baseURL?: string;            // Default: 'https://openrouter.ai/api/v1'
  timeout?: number;            // Default: 30000 (30s)
  maxRetries?: number;         // Default: 3
  retryDelay?: number;         // Default: 1000 (1s initial delay)
  headers?: Record<string, string>; // Additional headers
  logger?: Logger;             // Custom logger
  logLevel?: LogLevel;         // 'error' | 'warn' | 'info' | 'debug'
});

Recommended Configurations

Development:

const client = new OpenRouterClient({
  apiKey: process.env.OPENROUTER_API_KEY!,
  maxRetries: 1,
  logLevel: 'debug'
});

Production:

const client = new OpenRouterClient({
  apiKey: process.env.OPENROUTER_API_KEY!,
  timeout: 60000,
  maxRetries: 5,
  retryDelay: 2000,
  logLevel: 'error'
});

Advanced Features

Tool Calling (Function Calling)

const tools = [
  {
    type: 'function' as const,
    function: {
      name: 'get_weather',
      description: 'Get current weather',
      parameters: {
        type: 'object',
        properties: {
          location: { type: 'string' }
        },
        required: ['location']
      }
    }
  }
];

const response = await client.createChatCompletion({
  model: 'openai/gpt-4o-mini',
  messages: [{ role: 'user', content: "What's the weather in Paris?" }],
  tools,
  tool_choice: 'auto'
});

// Parse and execute tool calls
if (response.choices[0].message.tool_calls) {
  const parsedCalls = OpenRouterClient.parseToolCalls(
    response.choices[0].message.tool_calls
  );
  
  for (const call of parsedCalls) {
    const result = yourFunctions[call.function.name](call.function.arguments);
    
    const toolMessage = OpenRouterClient.createToolResponseMessage(
      call.id,
      result,
      call.function.name
    );
    messages.push(toolMessage);
  }
}

Structured Outputs (JSON Schema)

const response = await client.createChatCompletion({
  model: 'openai/gpt-4o-mini',
  messages: [{ role: 'user', content: 'Generate a person profile' }],
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'person_profile',
      strict: true,
      schema: {
        type: 'object',
        properties: {
          name: { type: 'string' },
          age: { type: 'number' },
          occupation: { type: 'string' }
        },
        required: ['name', 'age', 'occupation']
      }
    }
  }
});

const person = JSON.parse(response.choices[0].message.content!);

Multimodal (Vision)

const response = await client.createChatCompletion({
  model: 'openai/gpt-4o-mini',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What is in this image?' },
        {
          type: 'image_url',
          image_url: {
            url: 'https://example.com/image.jpg',
            detail: 'high'
          }
        }
      ]
    }
  ]
});

Cost Tracking

// Get account credits
const credits = await client.getCredits();
console.log(`Remaining: $${credits.total_credits - credits.total_usage}`);

// Track specific generation (⚠️ NOT immediately available - see note below)
const response = await client.createChatCompletion({ /* ... */ });
const stats = await client.getGeneration(response.id);
console.log(`Cost: $${stats.total_cost}`);

// ⚠️ RECOMMENDED: Use response.usage for immediate cost tracking
const response = await client.createChatCompletion({ /* ... */ });
if (response.usage) {
  console.log(`Prompt tokens: ${response.usage.prompt_tokens}`);
  console.log(`Completion tokens: ${response.usage.completion_tokens}`);
  console.log(`Total tokens: ${response.usage.total_tokens}`);
  // Calculate approximate cost based on model pricing
}

// Estimate before request
const messages = [/* ... */];
const estimatedTokens = client.countMessagesTokens(messages);
console.log(`Estimated tokens: ${estimatedTokens}`);

Note: getGeneration() statistics are not immediately available after a request completes. OpenRouter needs time to process them. For real-time cost tracking, use response.usage instead (see example 16).

Prompt Caching (Anthropic)

Reduce costs up to 90% by caching portions of your prompts with Anthropic's Claude models:

// Mark system prompt as cacheable (must be >1024 tokens for Claude 3.5 Sonnet)
const systemPrompt = OpenRouterClient.markMessageAsCacheable({
  role: 'system',
  content: 'Long instructions, examples, or context that will be reused...' // >1024 tokens
});

// First call: cache creation (10% surcharge)
const response1 = await client.createChatCompletion({
  model: 'anthropic/claude-3.5-sonnet',
  messages: [
    systemPrompt,
    { role: 'user', content: 'First question' }
  ],
  usage: { include: true }  // ✅ Get detailed cache metrics
});

// Second call: cache hit (90% discount)
const response2 = await client.createChatCompletion({
  model: 'anthropic/claude-3.5-sonnet',
  messages: [
    systemPrompt,
    { role: 'user', content: 'Second question' }
  ],
  usage: { include: true }
});

// Track cache performance (real-time)
console.log('Cached tokens:', response2.usage?.prompt_tokens_details?.cached_tokens);
// Output: 1668 (90% discount on these tokens!)

// Or track via generation ID (async, more accurate)
const stats = await client.getGeneration(response2.id);
console.log('Cache discount:', stats.cache_discount); // e.g., 0.0045036 ($)
console.log('Native cached tokens:', stats.native_tokens_cached); // e.g., 1668

Two methods to track cache metrics:

Real-time with usage: { include: true } (recommended for development)
- Returns prompt_tokens_details.cached_tokens in response
- Adds ~200ms latency to final response
- Best for debugging and real-time monitoring
Async with getGeneration(id) (recommended for production)
- Returns cache_discount (actual $ savings) and native_tokens_cached
- No latency impact on responses
- Best for cost analytics and reporting

Requirements:

Minimum 1024 tokens for Claude 3.7/3.5 Sonnet and 3 Opus
Minimum 2048 tokens for Claude 3.5/3 Haiku
Cache expires after 5 minutes of inactivity

Best practices:

Cache stable content (system prompts, reference docs, examples)
Don't cache dynamic content (user messages, real-time data)
Use provider-specific models (e.g., anthropic/claude-3.5-sonnet)
See examples/10-prompt-caching.ts for complete examples with both tracking methods

Model Capabilities Discovery

Automatically discover what features a model supports before using it:

const caps = await client.getModelCapabilities('anthropic/claude-3.5-sonnet');

// Check capabilities
if (caps.supportsVision) {
  // Can send images
}
if (caps.supportsTools) {
  // Can use function calling
}
if (caps.supportsJSON) {
  // Can use response_format
}

// Access detailed info
console.log('Context length:', caps.maxContextLength);
console.log('Input modalities:', caps.inputModalities); // ['text', 'image']
console.log('Supported params:', caps.supportedParameters);
console.log('Pricing:', caps.pricing); // { prompt: 0.003, completion: 0.015 }

Use cases:

Validate model compatibility before requests
Build dynamic UIs that adapt to model capabilities
Auto-select the best model for your needs
See examples/11-model-capabilities.ts for advanced patterns

Rate Limits & Usage Monitoring

Track your API usage, budgets, and rate limits in real-time:

// Get detailed key information
const keyInfo = await client.getKeyInfo();
console.log('Usage:', keyInfo.usage);
console.log('Limit:', keyInfo.limit || 'Unlimited');
console.log('Free tier:', keyInfo.is_free_tier);
if (keyInfo.rate_limit) {
  console.log(`${keyInfo.rate_limit.requests} requests per ${keyInfo.rate_limit.interval}`);
}

// Get credits with current rate limit status
const credits = await client.getCredits();
console.log('Credits remaining:', credits.total_credits - credits.total_usage);
if (credits.rate_limit) {
  console.log('Requests remaining:', credits.rate_limit.remaining);
  console.log('Resets at:', new Date(credits.rate_limit.reset * 1000));
}

Benefits:

Prevent 429 errors with proactive throttling
Monitor budget usage in real-time
Set up alerts before hitting limits
See examples/12-rate-limits.ts for monitoring patterns

Validation Helpers

Pre-validate requests before sending them to save costs and avoid errors:

// Check if a model supports a specific feature
const supportsVision = await client.supportsFeature(
  'anthropic/claude-3.5-sonnet',
  'vision'
);

if (!supportsVision) {
  console.log('This model cannot process images');
}

// Validate parameters against model capabilities
const validation = await client.validateParams('openai/gpt-3.5-turbo', {
  messages: [{ role: 'user', content: 'Hello' }],
  stream: true,
  tools: [/* ... */],
  max_tokens: 5000
});

if (!validation.valid) {
  console.error('Errors:', validation.errors);
  // Example: ["Model doesn't support streaming", "max_tokens exceeds limit"]
}

if (validation.warnings?.length) {
  console.warn('Warnings:', validation.warnings);
  // Example: ["max_tokens is high and may be expensive"]
}

// Truncate conversation to fit context window
const longConversation = [
  { role: 'system', content: 'You are helpful' },
  // ... 50+ messages
];

const truncated = client.truncateMessages(longConversation, 4000);
// Keeps system message + most recent messages that fit in 4000 tokens

Benefits:

Validate before spending credits on invalid requests
Prevent errors for unsupported features
Auto-truncate long conversations (FIFO, preserves system message)
See examples/13-validation-helpers.ts for complete workflows

Batch Requests

Execute multiple chat completion requests concurrently with automatic rate limiting:

// Prepare multiple requests
const requests = [
  {
    model: 'openai/gpt-3.5-turbo',
    messages: [{ role: 'user', content: 'Translate "hello" to French' }]
  },
  {
    model: 'openai/gpt-3.5-turbo',
    messages: [{ role: 'user', content: 'Translate "hello" to Spanish' }]
  },
  {
    model: 'openai/gpt-3.5-turbo',
    messages: [{ role: 'user', content: 'Translate "hello" to German' }]
  }
];

// Execute with concurrency control
const results = await client.batchChatCompletion(requests, {
  maxConcurrent: 5,      // Max 5 concurrent requests (default)
  stopOnError: false     // Continue on errors (default)
});

// Process results
results.forEach((result, idx) => {
  if (result.success && result.response) {
    console.log(`Request ${idx}:`, result.response.choices[0].message.content);
  } else {
    console.error(`Request ${idx} failed:`, result.error?.message);
  }
});

Options:

maxConcurrent: Limit concurrent requests (default: 5)
stopOnError: Stop on first error (default: false)

Benefits:

2-5x faster than sequential requests
Automatic concurrency control
Individual error handling per request
See examples/14-batch-requests.ts for advanced patterns

Error Handling

import { OpenRouterError } from 'openrouter-client';

try {
  const response = await client.createChatCompletion({ /* ... */ });
} catch (error) {
  if (error instanceof OpenRouterError) {
    console.error('OpenRouter Error:', {
      message: error.message,
      status: error.status,
      code: error.code,
      requestId: error.requestId
    });
    
    if (error.status === 429) {
      // Handle rate limit
    } else if (error.status && error.status >= 500) {
      // Handle server error
    }
  }
}

Logging

import { formattedLogger, createLogger, silentLogger } from 'openrouter-client';

// Formatted logger with timestamps and colors
const client = new OpenRouterClient({
  apiKey: process.env.OPENROUTER_API_KEY!,
  logger: formattedLogger,
  logLevel: 'info'
});

// Custom prefixed logger
const client = new OpenRouterClient({
  apiKey: process.env.OPENROUTER_API_KEY!,
  logger: createLogger('MyApp'),
  logLevel: 'debug'
});

// Silent logger (no output)
const client = new OpenRouterClient({
  apiKey: process.env.OPENROUTER_API_KEY!,
  logger: silentLogger
});

API Reference

📚 Complete API Documentation (TypeDoc)

See docs/INDEX.md for architectural decisions and contribution guidelines.

Main Methods

Chat Completions:

createChatCompletion(params) - Standard chat completion
streamChatCompletion(params) - Streaming with AsyncIterable (recommended)
createChatCompletionStream(params) - Streaming with ReadableStream
batchChatCompletion(requests, options?) - Execute multiple requests concurrently

Models & Providers:

listModels() - Get available models
getModel(id) - Get model details
getModelEndpoints(id) - Get model endpoints
getModelCapabilities(id) - Get detailed model capabilities
listProviders() - Get available providers

Account & Usage:

getCredits() - Get account credits (with rate limits)
getKeyInfo() - Get API key information and limits
getActivity() - Get activity analytics
getGeneration(id) - Get generation statistics

Validation & Helpers:

supportsFeature(modelId, feature) - Check if model supports a feature
validateParams(modelId, params) - Validate parameters against model
truncateMessages(messages, maxTokens) - Truncate messages to fit context
countTokens(text) - Estimate tokens in text
countMessagesTokens(messages) - Estimate tokens in messages
validateApiKey() - Validate API key

Static Helpers

OpenRouterClient.parseToolCalls(toolCalls) - Parse tool calls
OpenRouterClient.createToolResponseMessage(id, content, name?) - Create tool response (requires string content)
OpenRouterClient.createToolResponseFromResult(id, result, name?) - Create tool response from any object (auto-serializes)
OpenRouterClient.executeToolCalls(toolCalls, functions) - Execute tool calls
OpenRouterClient.markMessageAsCacheable(message) - Mark message for caching

Development

# Install dependencies
pnpm install

# Run tests
pnpm test

# Run tests in watch mode
pnpm test:watch

# Build
pnpm build

# Lint
pnpm lint

# Format
pnpm format

Requirements

Node.js 22.x LTS or later (native fetch support)
TypeScript 5.9.x or later
ESM only (no CommonJS)

License

MIT

Contributing

See docs/INDEX.md for contribution guidelines and architecture decisions.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

OpenRouter TypeScript SDK

Features

Installation

Quick Start

Streaming (Recommended)

Examples

Basic Usage (01-03)

Advanced Features (04-07)

Production Patterns (08-12)

Validation & Optimization (13-16)

Configuration

Recommended Configurations

Advanced Features

Tool Calling (Function Calling)

Structured Outputs (JSON Schema)

Multimodal (Vision)

Cost Tracking

Prompt Caching (Anthropic)

Model Capabilities Discovery

Rate Limits & Usage Monitoring

Validation Helpers

Batch Requests

Error Handling

Logging

API Reference

Main Methods

Static Helpers

Development

Requirements

License

Contributing