@helicone/ai-gateway

v0.2.0-beta.30

Published

4 months ago

A high-performance LLM proxy router.

0High
0Medium
0Low

Helicone AI Gateway

The fastest, lightest, and easiest-to-integrate AI Gateway on the market.

Built by the team at Helicone, open-sourced for the community.

🚀 Quick Start • 📖 Docs • 💬 Discord • 🌐 Website

🚆 1 API. 100+ models.

Open-source, lightweight, and built on Rust.

Handle hundreds of models and millions of LLM requests with minimal latency and maximum reliability.

The NGINX of LLMs.

👩🏻‍💻 Set up in seconds

Set up your .env file with your PROVIDER_API_KEYs

OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key

Run locally in your terminal

npx @helicone/ai-gateway@latest

Make your requests using any OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/ai",
    api_key="placeholder-api-key" # Gateway handles API keys
)

# Route to any LLM provider through the same interface, we handle the rest.
response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet",  # Or other 100+ models..
    messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)

That's it. No new SDKs to learn, no integrations to maintain. Fully-featured and open-sourced.

-- For advanced config, check out our configuration guide and the providers we support.

Why Helicone AI Gateway?

🌐 Unified interface

Request any LLM provider using familiar OpenAI syntax. Stop rewriting integrations—use one API for OpenAI, Anthropic, Google, AWS Bedrock, and 20+ more providers.

⚡ Smart provider selection

Load balance to always hit the fastest, cheapest, or most reliable option. Built-in strategies include latency-based P2C + PeakEWMA, weighted distribution, and cost optimization. Always aware of provider uptime and rate limits.

💰 Control your spending

Rate limit to prevent runaway costs and usage abuse. Set limits per user, team, or globally with support for request counts, token usage, and dollar amounts.

🚀 Improve performance

Cache responses to reduce costs and latency by up to 95%. Supports Redis and S3 backends with intelligent cache invalidation.

📊 Simplified tracing

Monitor performance and debug issues with built-in Helicone integration, plus OpenTelemetry support for logs, metrics, and traces.

☁️ One-click deployment

Deploy in seconds to your own infrastructure by using our Docker or binary download following our deployment guides.

https://github.com/user-attachments/assets/ed3a9bbe-1c4a-47c8-98ec-2bb4ff16be1f

⚡ Scalable for production

| Metric | Helicone AI Gateway | Typical Setup | | ---------------- | ------------------- | ------------- | | P95 Latency | <10ms | ~60-100ms | | Memory Usage | ~64MB | ~512MB | | Requests/sec | ~2,000 | ~500 | | Binary Size | ~15MB | ~200MB | | Cold Start | ~100ms | ~2s |

Note: These are preliminary performance metrics. See benchmarks/README.md for detailed benchmarking methodology and results.

🎥 Demo

https://github.com/user-attachments/assets/dd6b6df1-0f5c-43d4-93b6-3cc751efb5e1

🏗️ How it works

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Your App      │───▶│ Helicone AI     │───▶│  LLM Providers  │
│                 │    │ Gateway         │    │                 │
│ OpenAI SDK      │    │                 │    │ • OpenAI        │
│ (any language)  │    │ • Load Balance  │    │ • Anthropic     │
│                 │    │ • Rate Limit    │    │ • AWS Bedrock   │
│                 │    │ • Cache         │    │ • Google Vertex │
│                 │    │ • Trace         │    │ • 20+ more      │
│                 │    │ • Fallbacks     │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                               │
                               ▼
                      ┌─────────────────┐
                      │ Helicone        │
                      │ Observability   │
                      │                 │
                      │ • Dashboard     │
                      │ • Observability │
                      │ • Monitoring    │
                      │ • Debugging     │
                      └─────────────────┘

⚙️ Custom configuration

1. Set up your environment variables

Include your PROVIDER_API_KEYs in your .env file.

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
HELICONE_API_KEY=sk-...

2. Customize your config file

Note: This is a sample config.yaml file. Please refer to our configuration guide for the full list of options, examples, and defaults. See our full provider list here.

helicone: # Include your HELICONE_API_KEY in your .env file
  features: all

cache-store:
  in-memory: {}

global: # Global settings for all routers
  cache:
    directive: "max-age=3600, max-stale=1800"

routers:
  your-router-name: # Single router configuration
    load-balance:
      chat:
        strategy: latency
        targets:
          - openai
          - anthropic
    rate-limit:
      per-api-key:
        capacity: 1000
        refill-frequency: 1m # 1000 requests per minute

3. Run with your custom configuration

npx @helicone/ai-gateway@latest --config config.yaml

4. Make your requests

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/router/your-router-name",
    api_key="placeholder-api-key" # Gateway handles API keys
)

# Route to any LLM provider through the same interface, we handle the rest.
response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet",  # Or other 100+ models..
    messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)

📚 Migration guide

From OpenAI (Python)

from openai import OpenAI

client = OpenAI(
-   api_key=os.getenv("OPENAI_API_KEY")
+   api_key="placeholder-api-key" # Gateway handles API keys
+   base_url="http://localhost:8080/router/your-router-name"
)

# No other changes needed!
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

From OpenAI (TypeScript)

import { OpenAI } from "openai";

const client = new OpenAI({
-   apiKey: os.getenv("OPENAI_API_KEY")
+   apiKey: "placeholder-api-key", // Gateway handles API keys
+   baseURL: "http://localhost:8080/router/your-router-name",
});

const response = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Hello from Helicone AI Gateway!" }],
});

📚 Resources

Documentation

📖 Full Documentation - Complete guides and API reference
🚀 Quickstart Guide - Get up and running in 1 minute
🔬 Advanced Configurations - Configuration reference & examples

Community

💬 Discord Server - Our community of passionate AI engineers
🐙 GitHub Discussions - Q&A and feature requests
🐦 Twitter - Latest updates and announcements
📧 Newsletter - Tips and tricks to deploying AI applications

Support

🎫 Report bugs: Github issues
💼 Enterprise Support: Book a discovery call with our team

📄 License

The Helicone AI Gateway is licensed under the Apache License - see the file for details.

Made with ❤️ by Helicone.

Website • Docs • Twitter • Discord