npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@happyvertical/ai

v0.74.10

Published

Standardized AI interface supporting OpenAI, LiteLLM, Bifrost, Ollama, Anthropic, Gemini, Bedrock, Hugging Face, Claude CLI, and Qwen3-TTS

Readme

@happyvertical/ai

Unified interface for AI model interactions across multiple providers. Supports OpenAI, LiteLLM, Bifrost, Ollama, Anthropic Claude, Google Gemini, AWS Bedrock, Hugging Face, Claude CLI, and Qwen3-TTS with a consistent API for chat, completions, embeddings, streaming, function calling, image operations, text-to-speech, and gateway admin provisioning where available.

Installation

pnpm add @happyvertical/ai

Requires @happyvertical/utils as a peer dependency.

Quick Start

import { getAI } from '@happyvertical/ai';

const ai = await getAI({
  type: 'openai',
  apiKey: process.env.OPENAI_API_KEY!,
  defaultModel: 'gpt-4o'
});

// Chat completion
const response = await ai.chat([
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'What is TypeScript?' }
]);
console.log(response.content);

// Simple message (convenience wrapper around chat)
const reply = await ai.message('Explain generics in one sentence');

// Streaming
for await (const chunk of ai.stream([
  { role: 'user', content: 'Write a haiku' }
])) {
  process.stdout.write(chunk);
}

Providers

// OpenAI (default when type is omitted)
const openai = await getAI({ apiKey: 'sk-...' });

// LiteLLM (OpenAI-compatible gateway)
const litellm = await getAI({
  type: 'litellm',
  apiKey: process.env.LITELLM_API_KEY!,
  baseUrl: process.env.LITELLM_BASE_URL || 'https://llm.happyvertical.com/v1',
  defaultModel: process.env.LITELLM_MODEL, // Use a model id returned by /v1/models
});

// Bifrost (OpenAI-compatible gateway with governance admin APIs)
const bifrost = await getAI({
  type: 'bifrost',
  apiKey: process.env.BIFROST_API_KEY!,
  adminUser: process.env.BIFROST_ADMIN_USER,
  adminPassword: process.env.BIFROST_ADMIN_PASSWORD,
  adminUrl: process.env.BIFROST_ADMIN_URL,
  baseUrl: process.env.BIFROST_BASE_URL || 'http://localhost:8080',
  defaultModel: process.env.BIFROST_MODEL,
});

// Ollama (local by default)
const ollama = await getAI({
  type: 'ollama',
  baseUrl: process.env.OLLAMA_BASE_URL || process.env.OLLAMA_HOST || 'http://localhost:11434',
  apiKey: process.env.OLLAMA_API_KEY, // Optional, only needed for remote/cloud hosts
  defaultModel: process.env.OLLAMA_MODEL, // Optional; otherwise the first compatible local model is selected
});

// Bare host:port values are also accepted and normalized to http://
const ollamaNode = await getAI({
  type: 'ollama',
  baseUrl: 'warthog:11434',
});

// Anthropic Claude
const claude = await getAI({ type: 'anthropic', apiKey: process.env.ANTHROPIC_API_KEY! });

// Google Gemini
const gemini = await getAI({ type: 'gemini', apiKey: process.env.GEMINI_API_KEY! });

// AWS Bedrock
const bedrock = await getAI({
  type: 'bedrock',
  region: 'us-east-1',
  credentials: { accessKeyId: '...', secretAccessKey: '...' }
});

// Hugging Face
const hf = await getAI({ type: 'huggingface', apiToken: process.env.HF_TOKEN! });

// Claude CLI (uses Claude Max subscription, no API key needed)
const cli = await getAI({ type: 'claude-cli', defaultModel: 'sonnet' });

// Qwen3-TTS (text-to-speech only)
const tts = await getAI({ type: 'qwen3-tts', endpoint: 'http://localhost:8880' });

Gateway Admin

Gateway providers that support provisioning expose ai.admin.

const ai = await getAI({
  type: 'bifrost',
  apiKey: process.env.BIFROST_API_KEY!,
  adminUrl: process.env.BIFROST_ADMIN_URL || 'http://localhost:8080',
  adminUser: process.env.BIFROST_ADMIN_USER!,
  adminPassword: process.env.BIFROST_ADMIN_PASSWORD!,
  baseUrl: 'http://localhost:8080',
});

const project = await ai.admin!.createProject({
  name: 'Tenant A Production',
  tenantId: 'customer-tenant-a',
  budget: { maxLimit: 100, resetDuration: '1M' },
});

const key = await ai.admin!.createVirtualKey({
  name: 'Tenant A API Key',
  projectId: project.id,
  providerConfigs: [
    {
      provider: 'openai',
      weight: 1,
      allowedModels: ['gpt-4o-mini'],
    },
  ],
  keyIds: ['*'],
  budget: { maxLimit: 25, resetDuration: '1M' },
  rateLimit: {
    tokenMaxLimit: 10000,
    tokenResetDuration: '1h',
    requestMaxLimit: 100,
    requestResetDuration: '1m',
  },
});

console.log(key.key);

LiteLLM uses the same SDK surface, mapping projects to LiteLLM teams and virtual keys to /key/generate.

Opt-In Rate-Limit Pacing

Use rateLimit when multiple calls share the same provider budget and you want getAI() to serialize requests, honor Retry-After hints, and retry only rate-limit failures.

Pacing is enabled when:

  • you set enabled: true, or
  • you omit enabled and set any pacing field such as key, cooldownMs, initialDelayMs, or maxAttempts
const ai = await getAI({
  type: 'gemini',
  apiKey: process.env.GEMINI_API_KEY!,
  defaultModel: 'gemini-2.5-flash',
  rateLimit: {
    enabled: true,
    key: 'gemini:shared-batch-key',
    cooldownMs: 2000,
    initialDelayMs: 15000,
    maxAttempts: 3,
  },
});
  • key coordinates pacing across multiple clients in the same process
  • cooldownMs spaces successful calls that share the same budget
  • initialDelayMs is the fallback retry delay when the provider omits Retry-After
  • maxAttempts counts the first call plus any rate-limit retries

When rateLimit is omitted, or enabled: false is set explicitly, getAI() behaves exactly as it did before.

rateLimit Options

| Field | Type | Default | Notes | |------|------|---------|------| | enabled | boolean | unset | Set to true for explicit opt-in, or false to force pacing off even if other pacing fields are present | | key | string | derived | Shared budget key; clients with the same key coordinate with each other | | cooldownMs | number | 0 | Minimum delay after a successful call before the next call with the same key | | initialDelayMs | number | 5000 | Fallback retry delay when the provider does not return Retry-After | | maxAttempts | number | 3 | Total attempts, including the initial call | | requestsPerMinute | number | provider-specific | Used by qwen3-tts local token-bucket limiting | | maxConcurrent | number | provider-specific | Used by qwen3-tts local concurrency limiting |

  • If key is omitted, @happyvertical/ai derives a provider-scoped key from the configured credentials
  • Setting any of key, cooldownMs, initialDelayMs, or maxAttempts also opts in when enabled is omitted
  • Only normalized rate-limit failures are retried
  • stream() is left unchanged; pacing is applied to the promise-returning request methods

Example quota-sensitive batch workload:

const ai = await getAI({
  type: 'gemini',
  apiKey: process.env.GEMINI_API_KEY!,
  defaultModel: 'gemini-2.5-flash',
  rateLimit: {
    enabled: true,
    key: 'praeco:multi-site-analysis',
    cooldownMs: 2000,
    initialDelayMs: 15000,
    maxAttempts: 3,
  },
});

for (const site of sites) {
  const summary = await ai.message(`Summarize anomalies for ${site.name}`);
  console.log(site.name, summary);
}

Environment Variables

getAI() reads HAVE_AI_* variables. Explicit options passed to getAI() take precedence over those env vars.

| Variable | Purpose | |----------|---------| | HAVE_AI_PROVIDER / HAVE_AI_TYPE | Provider type | | HAVE_AI_MODEL / HAVE_AI_DEFAULT_MODEL | Default model | | HAVE_AI_API_KEY | API key (fallback) | | HAVE_AI_BASE_URL | Custom base URL | | HAVE_AI_TIMEOUT | Request timeout (ms) | | HAVE_AI_MAX_RETRIES | Max retry attempts |

Node Auto-Detection Env Vars

getAIAuto() also checks provider-specific Node.js environment variables:

  • LITELLM_BASE_URL, LITELLM_API_KEY
  • OLLAMA_HOST, OLLAMA_BASE_URL, OLLAMA_API_KEY
  • OPENAI_API_KEY
  • ANTHROPIC_API_KEY
  • GEMINI_API_KEY, GOOGLE_API_KEY
  • HF_TOKEN
  • AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION

API Overview

Factory Functions

  • getAI(options) — Creates a provider instance by type
  • getAIAuto(options) — Auto-detects provider from credentials

AIInterface Methods

All providers implement AIInterface:

| Method | Description | |--------|-------------| | chat(messages, options?) | Chat completion returning AIResponse | | message(text, options?) | Simple single-turn convenience method | | complete(prompt, options?) | Text completion | | stream(messages, options?) | Streaming chat (async iterable) | | embed(text, options?) | Text embeddings | | embedImage(image, options?) | Image embeddings (Gemini and Bedrock native, OpenAI and Ollama via describe-then-embed) | | describeImage(image, prompt?, options?) | Image description via vision models | | generateImage(prompt, options?) | Image generation (DALL-E, Imagen, Titan Image Generator, Ollama-compatible image models) | | countTokens(text) | Token count estimation | | getModels() | List available models | | getCapabilities() | Query provider capabilities | | synthesizeSpeech(text, options?) | Text-to-speech synthesis | | streamSpeech(text, options?) | Streaming TTS | | cloneVoice(options) | Clone a voice from audio sample | | designVoice(options) | Design a voice via text description | | getVoices(options?) | List available voices |

Error Types

All extend AIError: AuthenticationError, RateLimitError, ModelNotFoundError, ContextLengthError, ContentFilterError.

  • AIError.retryable distinguishes retryable failures from terminal ones
  • RateLimitError.retryAfter exposes provider retry hints in seconds when available
try {
  await ai.chat(messages);
} catch (error) {
  if (error instanceof RateLimitError && error.retryable) {
    console.log('retry after seconds:', error.retryAfter);
  }
}

Legacy Classes

AIClient, OpenAIClient, AIThread, and AIMessageClass are exported for backward compatibility. New code should use getAI() and the AIInterface methods.

Function Calling

const response = await ai.chat([
  { role: 'user', content: 'What is the weather in Tokyo?' }
], {
  tools: [{
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get weather for a location',
      parameters: {
        type: 'object',
        properties: { location: { type: 'string' } },
        required: ['location']
      }
    }
  }]
});

if (response.toolCalls) {
  console.log(response.toolCalls[0].function.name);
}

Usage Tracking

Track token usage, costs, and performance across all providers with the onUsage callback:

const ai = await getAI({
  type: 'openai',
  apiKey: process.env.OPENAI_API_KEY!,
  onUsage: (event) => {
    console.log(`[${event.provider}/${event.model}] ${event.operation}: ${event.usage?.totalTokens} tokens in ${event.duration}ms`);
    // Or: save to database, send to analytics, aggregate in-memory, etc.
  },
});

The UsageEvent payload:

| Field | Type | Description | |-------|------|-------------| | provider | string | Provider name ('openai', 'anthropic', 'gemini', etc.) | | model | string | Model used (e.g. 'gpt-4o', 'claude-3-5-sonnet-20241022') | | operation | string | 'chat' | 'complete' | 'message' | 'embed' | 'stream' | ... | | usage? | TokenUsage | { promptTokens, completionTokens, totalTokens } (if available) | | duration | number | Wall-clock time in milliseconds | | timestamp | Date | When the call completed | | tags? | Record<string, string> | Merged from global + per-call usageTags |

  • Works with all providers and methods (chat, complete, message, embed, stream)
  • complete() and message() report through their underlying chat() call
  • Errors thrown inside onUsage are silently caught and will not affect API results

Tagging Usage Events

Attach custom tags to correlate usage with features, users, or workflows:

// Global tags applied to every call
const ai = await getAI({
  type: 'openai',
  apiKey: process.env.OPENAI_API_KEY!,
  usageTags: { app: 'indagator', team: 'news' },
  onUsage: (event) => {
    console.log(event.tags); // { app: 'indagator', team: 'news', feature: 'summarize' }
  },
});

// Per-call tags merge over global tags
await ai.chat(messages, {
  usageTags: { feature: 'summarize', userId: 'u_123' },
});

Claude Code Context

Install context files for AI-assisted development:

npx have-ai-context

License

MIT