@anolilab/ai-model-registry

v1.3.0

Published

4 months ago

Unified registry for AI model providers and their metadata

@anolilab/ai-model-registry

A comprehensive, unified registry for AI model providers and their metadata. This package provides a tree-shakable interface to access model information from 50+ AI providers including OpenAI, Anthropic, Google, Meta, Groq, and many others.

✨ Features

🔄 Unified Interface: Access models from multiple providers through a single API
🛡️ Type Safety: Full TypeScript support with Zod schema validation
📦 Tree Shaking: Import only what you need to minimize bundle size
💰 Rich Metadata: Comprehensive model information including capabilities, pricing, and limits
🔍 Powerful Search: Advanced search and filtering capabilities across all models
🔄 Auto-Sync: Automatic data synchronization between models with the same ID
💸 Pricing Integration: Real-time pricing data from Helicone API (840+ models)
📊 Provider Stats: Detailed statistics and analytics

📦 Installation

# Using npm
npm install @anolilab/ai-model-registry

# Using yarn
yarn add @anolilab/ai-model-registry

# Using pnpm
pnpm add @anolilab/ai-model-registry

🚀 Quick Start

import { getProviders, getModelsByProvider, getModelById, searchModels, getAllModels } from "@anolilab/ai-model-registry";

// Get all available providers
const providers = getProviders();
console.log(providers);
// ['Anthropic', 'Google', 'Groq', 'Meta', 'OpenAI', 'DeepSeek', ...]

// Get all models from a specific provider
const anthropicModels = getModelsByProvider("Anthropic");
console.log(`Found ${anthropicModels.length} Anthropic models`);

// Get a specific model by ID
const model = getModelById("claude-3-opus-20240229");
if (model) {
    console.log(`Model: ${model.name}`);
    console.log(`Provider: ${model.provider}`);
    console.log(`Cost: $${model.cost.input}/1K input, $${model.cost.output}/1K output`);
    console.log(`Context: ${model.limit.context?.toLocaleString()} tokens`);
}

// Search for models with specific capabilities
const visionModels = searchModels({ vision: true });
const reasoningModels = searchModels({ reasoning: true });
const toolCallModels = searchModels({ tool_call: true });

// Get all models for advanced filtering
const allModels = getAllModels();

📚 API Reference

Core Functions

`getProviders(): string[]`

Returns an array of all available provider names.

const providers = getProviders();
// ['Anthropic', 'Google', 'Groq', 'Meta', 'OpenAI', ...]

`getModelsByProvider(provider: string): Model[]`

Returns all models for a specific provider.

const openAIModels = getModelsByProvider("OpenAI");
const anthropicModels = getModelsByProvider("Anthropic");

`getModelById(id: string): Model | undefined`

Returns a specific model by its ID, or undefined if not found.

const gpt4 = getModelById("gpt-4");
const claude = getModelById("claude-3-opus-20240229");

`getAllModels(): Model[]`

Returns all models (useful for advanced filtering and custom logic).

const allModels = getAllModels();
const expensiveModels = allModels.filter((model) => (model.cost.input || 0) > 0.1 || (model.cost.output || 0) > 0.1);

`getProviderStats(): Record<string, number>`

Returns provider statistics with model counts.

const stats = getProviderStats();
console.log(stats);
// {
//   'OpenAI': 15,
//   'Anthropic': 8,
//   'Google': 12,
//   'Meta': 25,
//   ...
// }

Advanced Search

`searchModels(criteria: SearchCriteria): Model[]`

Search models by various criteria with powerful filtering options.

interface SearchCriteria {
    // Capability filters
    vision?: boolean;
    reasoning?: boolean;
    tool_call?: boolean;
    streaming_supported?: boolean;
    preview?: boolean;

    // Provider filter
    provider?: string;

    // Modality filters
    modalities?: {
        input?: string[];
        output?: string[];
    };

    // Context window filters
    context_min?: number;
    context_max?: number;

    // Cost filters
    max_input_cost?: number;
    max_output_cost?: number;
}

Search Examples

// Find all vision-capable models
const visionModels = searchModels({ vision: true });

// Find models with reasoning capabilities
const reasoningModels = searchModels({ reasoning: true });

// Find models that support tool calling
const toolCallModels = searchModels({ tool_call: true });

// Find models from a specific provider
const openAIModels = searchModels({ provider: "OpenAI" });

// Find models with large context windows
const largeContextModels = searchModels({ context_min: 100000 });

// Find affordable models
const affordableModels = searchModels({
    max_input_cost: 0.01,
    max_output_cost: 0.02,
});

// Find models that accept text and image input
const multimodalModels = searchModels({
    modalities: {
        input: ["text", "image"],
    },
});

// Find models with streaming support
const streamingModels = searchModels({ streaming_supported: true });

// Find preview/beta models
const previewModels = searchModels({ preview: true });

🏗️ Model Schema

Each model follows a comprehensive schema with the following structure:

interface Model {
    // Core identification
    id: string;
    name: string | null;
    provider?: string;
    providerId?: string;

    // Provider metadata
    providerEnv?: string[];
    providerNpm?: string;
    providerDoc?: string;
    providerModelsDevId?: string;

    // Date information
    releaseDate?: string | null;
    lastUpdated?: string | null;
    launchDate?: string;
    trainingCutoff?: string | null;

    // Capabilities
    attachment: boolean;
    reasoning: boolean;
    temperature: boolean;
    toolCall: boolean;
    openWeights: boolean;
    vision?: boolean;
    extendedThinking?: boolean;
    preview?: boolean;

    // Knowledge and context
    knowledge?: string | null;

    // Pricing structure
    cost: {
        input: number | null; // per 1K tokens
        output: number | null; // per 1K tokens
        inputCacheHit: number | null; // cache hit pricing
        imageGeneration?: number | null;
        imageGenerationUltra?: number | null;
        videoGeneration?: number | null;
        videoGenerationWithAudio?: number | null;
        videoGenerationWithoutAudio?: number | null;
    };

    // Limits
    limit: {
        context: number | null; // max tokens
        output: number | null; // max tokens
    };

    // Modalities
    modalities: {
        input: string[]; // ['text', 'image', 'audio', ...]
        output: string[]; // ['text', 'image', 'audio', ...]
    };

    // Infrastructure
    regions?: string[];
    streamingSupported?: boolean | null;
    deploymentType?: string;
    version?: string | null;

    // Provider-specific capabilities
    cacheRead?: boolean;
    codeExecution?: boolean;
    searchGrounding?: boolean;
    structuredOutputs?: boolean;
    batchMode?: boolean;
    audioGeneration?: boolean;
    imageGeneration?: boolean;
    compoundSystem?: boolean;

    // Version management
    versions?: {
        stable?: string | null;
        preview?: string | null;
    };

    // Additional metadata
    description?: string;
    ownedBy?: string;
    originalModelId?: string;
    providerStatus?: string;
    supportsTools?: boolean;
    supportsStructuredOutput?: boolean;
}

🌳 Tree Shaking

The package supports tree shaking, so you can import only what you need:

// Only import specific functions
import { getProviders, getModelById } from "@anolilab/ai-model-registry";

// Import schema for validation
import { ModelSchema } from "@anolilab/ai-model-registry/schema";

// Import icons (if needed)
import { getIcon } from "@anolilab/ai-model-registry/icons";

🏢 Supported Providers

The registry includes models from 50+ providers:

Major Providers

OpenAI (GPT-4, GPT-3.5, O1, O3, etc.)
Anthropic (Claude 3.5, Claude 3, Claude 2.1, etc.)
Google (Gemini 2.5, Gemini 1.5, PaLM, etc.)
Meta (Llama 3, Llama 2, Code Llama, etc.)
Groq (Various models with ultra-fast inference)
DeepSeek (DeepSeek R1, DeepSeek V3, etc.)

Specialized Providers

Mistral AI (Mistral Large, Mixtral, etc.)
Cohere (Command R, Command A, etc.)
Perplexity (Sonar, Sonar Pro, etc.)
Together AI (Various open models)
Fireworks AI (Various models)
Vercel (v0 models)

Open Source & Research

HuggingFace (Various hosted models)
ModelScope (Chinese models)
OpenRouter (Aggregated models)
GitHub Copilot (Code models)
Azure (OpenAI models)

And many more...

🛠️ Development

Prerequisites

Node.js 22+
pnpm (recommended) or npm

Setup

# Clone the repository
git clone https://github.com/anolilab/ai-models.git
cd ai-models

# Install dependencies
pnpm install

# Complete build process
pnpm run download    # Download provider data
pnpm run aggregate   # Aggregate and enrich data
pnpm run generate-icons  # Generate provider icons
pnpm run build       # Build the package

# Run tests
pnpm test

Build Process

The complete build process involves several steps to download, aggregate, and generate all necessary data:

# 1. Download provider data
pnpm run download

# 2. Aggregate and enrich provider data (includes pricing from Helicone API)
pnpm run aggregate

# 3. Generate provider icons
pnpm run generate-icons

# 4. Build the package
pnpm run build

What Each Step Does

Download (pnpm run download): Downloads model data from various AI providers using transformers in scripts/download/transformers/. This creates the raw provider data in data/providers/.
Aggregate (pnpm run aggregate):
- Reads all provider data from data/providers/
- Fetches pricing data from Helicone API
- Enriches models with icon information
- Synchronizes data between models with the same ID
- Generates data/all-models.json and src/models-data.ts
Generate Icons (pnpm run generate-icons):
- Creates SVG sprite sheet from LobeHub icons and custom icons
- Generates src/icons-sprite.ts with icon mappings
- Provides fallback icons for providers without official icons
Build (pnpm run build):
- Compiles TypeScript to JavaScript
- Generates type definitions
- Creates the final distributable package

Available Scripts

# Download provider data from various sources
pnpm run download

# Download data for a specific provider
pnpm run download --provider openai
pnpm run download --provider anthropic

# Aggregate provider data (includes pricing enrichment and synchronization)
pnpm run aggregate

# Generate provider icons
pnpm run generate-icons

# Build the package
pnpm run build

# Build for production
pnpm run build:prod

# Run tests
pnpm test

# Run tests with coverage
pnpm run test:coverage

# Lint code
pnpm run lint:eslint

# Type check
pnpm run lint:types

Project Structure

packages/ai-model-registry/
├── src/
│   ├── index.ts          # Main exports
│   ├── schema.ts         # Model schema definitions
│   └── models-data.ts    # Generated model data
├── scripts/
│   ├── aggregate-providers.ts    # Data aggregation script
│   ├── generate-svg-sprite.ts    # Icon generation
│   └── download/                 # Provider data downloaders
├── data/
│   ├── all-models.json   # Generated model data
│   └── providers/        # Raw provider data
└── assets/
    └── icons/            # Provider icons

💰 Pricing Data Integration

This package automatically includes real-time pricing data from Helicone's LLM Cost API during the aggregation process.

Features

🔄 Automatic Enrichment: Pricing data is automatically added during aggregation
🎯 Smart Matching: Uses multiple strategies to match models with pricing data
🛡️ Non-Destructive: Preserves existing pricing data while filling in missing values
🔄 Cost Conversion: Automatically converts from per 1M tokens to per 1K tokens format
📊 840+ Models: Covers pricing for 840+ models across all major providers

Supported Pricing Providers

Helicone provides pricing data for models from:

OpenAI (GPT-4, GPT-3.5, O1, O3, etc.)
Anthropic (Claude models)
Google (Gemini models)
Meta (Llama models)
Mistral (Mistral models)
Groq (Various models)
And many more...

Usage

# Aggregate all models with pricing data
pnpm run aggregate

# Build the package (includes aggregation with pricing)
pnpm run build

🔄 Model Data Synchronization

The provider registry includes a powerful data synchronization system that automatically merges missing data between models with the same ID across different providers.

How It Works

Groups models by ID: Finds all models with the same ID across different providers
Calculates completeness scores: Evaluates how complete each model's data is (excluding cost fields)
Uses the most complete model as base: Selects the model with the highest data completeness
Merges missing data: Fills in missing fields from other models with the same ID
Preserves cost data: Never overwrites existing cost information

Protected Fields

The following cost-related fields are never synchronized to preserve pricing accuracy:

cost (entire cost object)
input (input cost)
output (output cost)
inputCacheHit (cache hit pricing)
imageGeneration (image generation pricing)
videoGeneration (video generation pricing)

Example

If you have the same model (e.g., gpt-4) from multiple providers:

OpenAI Provider:

{
    "id": "gpt-4",
    "name": "GPT-4",
    "cost": { "input": 0.03, "output": 0.06 },
    "description": null,
    "releaseDate": "2023-03-14"
}

Azure Provider:

{
    "id": "gpt-4",
    "name": null,
    "cost": { "input": 0.03, "output": 0.06 },
    "description": "GPT-4 is a large multimodal model",
    "releaseDate": null
}

Result after synchronization:

{
    "id": "gpt-4",
    "name": "GPT-4",
    "cost": { "input": 0.03, "output": 0.06 },
    "description": "GPT-4 is a large multimodal model",
    "releaseDate": "2023-03-14"
}

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Adding New Providers

Create a transformer in scripts/download/transformers/
Add provider configuration to scripts/config.ts
Run pnpm run download --provider <provider-name>
Test with pnpm run aggregate

Reporting Issues

Please use our Issue Tracker to report bugs or request features.

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

Helicone for providing pricing data
OpenRouter for reference data
All the AI providers for their amazing models
The open source community for inspiration and tools

📞 Support

Documentation: GitHub Wiki
Issues: GitHub Issues
Discussions: GitHub Discussions
Email: [email protected]

Made with ❤️ by AnoliLab

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@anolilab/ai-model-registry

✨ Features

📦 Installation

🚀 Quick Start

📚 API Reference

Core Functions

getProviders(): string[]

getModelsByProvider(provider: string): Model[]

getModelById(id: string): Model | undefined

getAllModels(): Model[]

getProviderStats(): Record<string, number>

Advanced Search

searchModels(criteria: SearchCriteria): Model[]

Search Examples

🏗️ Model Schema

🌳 Tree Shaking

🏢 Supported Providers

Major Providers

Specialized Providers

Open Source & Research

🛠️ Development

Prerequisites

Setup

Build Process

What Each Step Does

Available Scripts

Project Structure

💰 Pricing Data Integration

Features

Supported Pricing Providers

Usage

🔄 Model Data Synchronization

How It Works

Protected Fields

Example

🤝 Contributing

Adding New Providers

Reporting Issues

📄 License

🙏 Acknowledgments

📞 Support

`getProviders(): string[]`

`getModelsByProvider(provider: string): Model[]`

`getModelById(id: string): Model | undefined`

`getAllModels(): Model[]`

`getProviderStats(): Record<string, number>`

`searchModels(criteria: SearchCriteria): Model[]`