npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

omniroute

v3.5.2

Published

Smart AI Router with auto fallback — route to FREE & cheap models, zero downtime. Works with Cursor, Cline, Claude Desktop, Codex, and any OpenAI-compatible tool.

Readme

🚀 OmniRoute — The Free AI Gateway

Never stop coding. Smart routing to FREE & low-cost AI models with automatic fallback.

Your universal API proxy — one endpoint, 60+ providers, zero downtime. Now with MCP Server (25 tools), A2A Protocol, Memory/Skills Systems & Electron Desktop App.

Chat Completions • Embeddings • Image Generation • Video • Music • Audio • Reranking • Web Search • MCP Server • A2A Protocol • 100% TypeScript


npm version Docker Hub

NPM Downloads NPM Downloads

NPM Downloads Docker Pulls GitHub Downloads (all assets, all releases)

stars open issues license last commit total contributions code size pr closed tag github streak followers fork watch

License Website WhatsApp

🌐 Website🚀 Quick Start💡 Features📖 Docs💰 Pricing💬 WhatsApp

🌐 Available in: 🇺🇸 English | 🇧🇷 Português (Brasil) | 🇪🇸 Español | 🇫🇷 Français | 🇮🇹 Italiano | 🇷🇺 Русский | 🇨🇳 中文 (简体) | 🇩🇪 Deutsch | 🇮🇳 हिन्दी | 🇹🇭 ไทย | 🇺🇦 Українська | 🇸🇦 العربية | 🇯🇵 日本語 | 🇻🇳 Tiếng Việt | 🇧🇬 Български | 🇩🇰 Dansk | 🇫🇮 Suomi | 🇮🇱 עברית | 🇭🇺 Magyar | 🇮🇩 Bahasa Indonesia | 🇰🇷 한국어 | 🇲🇾 Bahasa Melayu | 🇳🇱 Nederlands | 🇳🇴 Norsk | 🇵🇹 Português (Portugal) | 🇷🇴 Română | 🇵🇱 Polski | 🇸🇰 Slovenčina | 🇸🇪 Svenska | 🇵🇭 Filipino | 🇨🇿 Čeština


🖼️ Main Dashboard


📸 Dashboard Preview

| Page | Screenshot | | -------------- | ------------------------------------------------- | | Providers | Providers | | Combos | Combos | | Analytics | Analytics | | Health | Health | | Translator | Translator | | Settings | Settings | | CLI Tools | CLI Tools | | Usage Logs | Usage | | Endpoints | Endpoints |


🤖 Free AI Provider for your favorite coding agents

Connect any AI-powered IDE or CLI tool through OmniRoute — free API gateway for unlimited coding.

📡 All agents connect via http://localhost:20128/v1 or http://cloud.omniroute.online/v1 — one config, unlimited models and quota


🤔 Why OmniRoute?

Stop wasting money and hitting limits:

  • Subscription quota expires unused every month
  • Rate limits stop you mid-coding
  • Expensive APIs ($20-50/month per provider)
  • Manual switching between providers

OmniRoute solves this:

  • Maximize subscriptions - Track quota, use every bit before reset
  • Auto fallback - Subscription → API Key → Cheap → Free, zero downtime
  • Multi-account - Round-robin between accounts per provider
  • Universal - Works with Claude Code, Codex, Gemini CLI, Cursor, Cline, OpenClaw, any CLI tool

📧 Support

💬 Join our community! WhatsApp Group — Get help, share tips, and stay updated.

🐛 Reporting a Bug?

When opening an issue, please run the system-info command and attach the generated file:

npm run system-info

This generates a system-info.txt with your Node.js version, OmniRoute version, OS details, installed CLI tools (qoder, gemini, claude, codex, antigravity, droid, etc.), Docker/PM2 status, and system packages — everything we need to reproduce your issue quickly. Attach the file directly to your GitHub issue.


🔄 How It Works

┌─────────────┐
│  Your CLI   │  (Claude Code, Codex, Gemini CLI, OpenClaw, Cursor, Cline...)
│   Tool      │
└──────┬──────┘
       │ http://localhost:20128/v1
       ↓
┌─────────────────────────────────────────┐
│           OmniRoute (Smart Router)        │
│  • Format translation (OpenAI ↔ Claude) │
│  • Quota tracking + Embeddings + Images │
│  • Auto token refresh                   │
└──────┬──────────────────────────────────┘
       │
       ├─→ [Tier 1: SUBSCRIPTION] Claude Code, Codex, Gemini CLI
       │   ↓ quota exhausted
       ├─→ [Tier 2: API KEY] DeepSeek, Groq, xAI, Mistral, NVIDIA NIM, etc.
       │   ↓ budget limit
       ├─→ [Tier 3: CHEAP] GLM ($0.6/1M), MiniMax ($0.2/1M)
       │   ↓ budget limit
       └─→ [Tier 4: FREE] Qoder, Qwen, Kiro (unlimited)

Result: Never stop coding, minimal cost

🎯 What OmniRoute Solves — 30 Real Pain Points & Use Cases

Every developer using AI tools faces these problems daily. OmniRoute was built to solve them all — from cost overruns to regional blocks, from broken OAuth flows to protocol operations and enterprise observability.

Developers pay $20–200/month for Claude Pro, Codex Pro, or GitHub Copilot. Even paying, quota has a ceiling — 5h of usage, weekly limits, or per-minute rate limits. Mid-coding session, the provider stops responding and the developer loses flow and productivity.

How OmniRoute solves it:

  • Smart 4-Tier Fallback — If subscription quota runs out, automatically redirects to API Key → Cheap → Free with zero manual intervention
  • Provider Limits Tracking — Cached quota snapshots refresh on a server-side schedule (default PROVIDER_LIMITS_SYNC_INTERVAL_MINUTES=70) with manual refresh available in the UI
  • Multi-Account Support — Multiple accounts per provider with auto round-robin — when one runs out, switches to the next
  • Custom Combos — Customizable fallback chains with 9 balancing strategies (priority, weighted, fill-first, round-robin, P2C, random, least-used, cost-optimized, strict-random)
  • Codex Business Quotas — Business/Team workspace quota monitoring directly in the dashboard

OpenAI uses one format, Claude (Anthropic) uses another, Gemini yet another. If a dev wants to test models from different providers or fallback between them, they need to reconfigure SDKs, change endpoints, deal with incompatible formats. Custom providers (FriendLI, NIM) have non-standard model endpoints.

How OmniRoute solves it:

  • Unified Endpoint — A single http://localhost:20128/v1 serves as proxy for all 60+ providers
  • Format Translation — Automatic and transparent: OpenAI ↔ Claude ↔ Gemini ↔ Responses API
  • Response Sanitization — Strips non-standard fields (x_groq, usage_breakdown, service_tier) that break OpenAI SDK v1.83+
  • Role Normalization — Converts developersystem for non-OpenAI providers; systemuser for GLM/ERNIE
  • Think Tag Extraction — Extracts <think> blocks from models like DeepSeek R1 into standardized reasoning_content
  • Structured Output for Geminijson_schemaresponseMimeType/responseSchema automatic conversion
  • stream defaults to false — Aligns with OpenAI spec, avoiding unexpected SSE in Python/Rust/Go SDKs

Providers like OpenAI/Codex block access from certain geographic regions. Users get errors like unsupported_country_region_territory during OAuth and API connections. This is especially frustrating for developers from developing countries.

How OmniRoute solves it:

  • 3-Level Proxy Config — Configurable proxy at 3 levels: global (all traffic), per-provider (one provider only), and per-connection/key
  • Color-Coded Proxy Badges — Visual indicators: 🟢 global proxy, 🟡 provider proxy, 🔵 connection proxy, always showing the IP
  • OAuth Token Exchange Through Proxy — OAuth flow also goes through the proxy, solving unsupported_country_region_territory
  • Connection Tests via Proxy — Connection tests use the configured proxy (no more direct bypass)
  • SOCKS5 Support — Full SOCKS5 proxy support for outbound routing
  • TLS Fingerprint Spoofing — Browser-like TLS fingerprint via wreq-js to bypass bot detection
  • 🔏 CLI Fingerprint Matching — Reorders headers and body fields to match native CLI binary signatures, drastically reducing account flagging risk. The proxy IP is preserved — you get both stealth and IP masking simultaneously

Not everyone can pay $20–200/month for AI subscriptions. Students, devs from emerging countries, hobbyists, and freelancers need access to quality models at zero cost.

How OmniRoute solves it:

  • Free Tier Providers Built-in — Native support for 100% free providers: Qoder (5 unlimited models via OAuth: kimi-k2-thinking, qwen3-coder-plus, deepseek-r1, minimax-m2, kimi-k2), Qwen (4 unlimited models: qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-next, vision-model), Kiro (Claude + AWS Builder ID for free), Gemini CLI (180K tokens/month free)
  • Ollama Cloud — Cloud-hosted Ollama models at api.ollama.com with free "Light usage" tier; use ollamacloud/<model> prefix
  • Free-Only Combos — Chain gc/gemini-3-flash → if/kimi-k2-thinking → qw/qwen3-coder-plus = $0/month with zero downtime
  • NVIDIA NIM Free Access — ~40 RPM dev-forever free access to 70+ models at build.nvidia.com (transitioning from credits to pure rate limits)
  • Cost Optimized Strategy — Routing strategy that automatically chooses the cheapest available provider

When exposing an AI gateway to the network (LAN, VPS, Docker), anyone with the address can consume the developer's tokens/quota. Without protection, APIs are vulnerable to misuse, prompt injection, and abuse.

How OmniRoute solves it:

  • API Key Management — Generation, rotation, and scoping per provider with a dedicated /dashboard/api-manager page
  • Model-Level Permissions — Restrict API keys to specific models (openai/*, wildcard patterns), with Allow All/Restrict toggle
  • API Endpoint Protection — Require a key for /v1/models and block specific providers from the listing
  • Auth Guard + CSRF Protection — All dashboard routes protected with withAuth middleware + CSRF tokens
  • Rate Limiter — Per-IP rate limiting with configurable windows
  • IP Filtering — Allowlist/blocklist for access control
  • Prompt Injection Guard — Sanitization against malicious prompt patterns
  • AES-256-GCM Encryption — Credentials encrypted at rest

AI providers can become unstable, return 5xx errors, or hit temporary rate limits. If a dev depends on a single provider, they're interrupted. Without circuit breakers, repeated retries can crash the application.

How OmniRoute solves it:

  • Circuit Breaker per-model — Auto-open/close with configurable thresholds and cooldown (Closed/Open/Half-Open), scoped per-model to avoid cascading blocks
  • Exponential Backoff — Progressive retry delays
  • Anti-Thundering Herd — Mutex + semaphore protection against concurrent retry storms
  • Combo Fallback Chains — If the primary provider fails, automatically falls through the chain with no intervention
  • Combo Circuit Breaker — Auto-disables failing providers within a combo chain
  • Health Dashboard — Uptime monitoring, circuit breaker states, lockouts, cache stats, p50/p95/p99 latency

Developers use Cursor, Claude Code, Codex CLI, OpenClaw, Gemini CLI, Kilo Code... Each tool needs a different config (API endpoint, key, model). Reconfiguring when switching providers or models is a waste of time.

How OmniRoute solves it:

  • CLI Tools Dashboard — Dedicated page with one-click setup for Claude Code, Codex CLI, OpenClaw, Kilo Code, Antigravity, Cline
  • GitHub Copilot Config Generator — Generates chatLanguageModels.json for VS Code with bulk model selection
  • Onboarding Wizard — Guided 4-step setup for first-time users
  • One endpoint, all models — Configure http://localhost:20128/v1 once, access 60+ providers

Claude Code, Codex, Gemini CLI, Copilot — all use OAuth 2.0 with expiring tokens. Developers need to re-authenticate constantly, deal with client_secret is missing, redirect_uri_mismatch, and failures on remote servers. OAuth on LAN/VPS is particularly problematic.

How OmniRoute solves it:

  • Auto Token Refresh — OAuth tokens refresh in background before expiration
  • OAuth 2.0 (PKCE) Built-in — Automatic flow for Claude Code, Codex, Gemini CLI, Copilot, Kiro, Qwen, Qoder
  • Multi-Account OAuth — Multiple accounts per provider via JWT/ID token extraction
  • OAuth LAN/Remote Fix — Private IP detection for redirect_uri + manual URL mode for remote servers
  • OAuth Behind Nginx — Uses window.location.origin for reverse proxy compatibility
  • Remote OAuth Guide — Step-by-step guide for Google Cloud credentials on VPS/Docker

Developers use multiple paid providers but have no unified view of spending. Each provider has its own billing dashboard, but there's no consolidated view. Unexpected costs can pile up.

How OmniRoute solves it:

  • Cost Analytics Dashboard — Per-token cost tracking and budget management per provider
  • Budget Limits per Tier — Spending ceiling per tier that triggers automatic fallback
  • Per-Model Pricing Configuration — Configurable prices per model
  • Usage Statistics Per API Key — Request count and last-used timestamp per key
  • Analytics Dashboard — Stat cards, model usage chart, provider table with success rates and latency

When a call fails, the dev doesn't know if it was a rate limit, expired token, wrong format, or provider error. Fragmented logs across different terminals. Without observability, debugging is trial-and-error.

How OmniRoute solves it:

  • Unified Logs Dashboard — 4 tabs: Request Logs, Proxy Logs, Audit Logs, Console
  • Console Log Viewer — Real-time terminal-style viewer with color-coded levels, auto-scroll, search, filter
  • SQLite Proxy Logs — Persistent logs that survive server restarts
  • Translator Playground — 4 debugging modes: Playground (format translation), Chat Tester (round-trip), Test Bench (batch), Live Monitor (real-time)
  • Request Telemetry — p50/p95/p99 latency + X-Request-Id tracing
  • File-Based Logging with Rotation — App logs rotate by size, retention days, and archive count; call log artifacts rotate by retention days and file count
  • System Info Reportnpm run system-info generates system-info.txt with your full environment (Node version, OmniRoute version, OS, CLI tools, Docker/PM2 status). Attach it when reporting issues for instant triage.

Installing, configuring, and maintaining an AI proxy across different environments (local, VPS, Docker, cloud) is labor-intensive. Problems like hardcoded paths, EACCES on directories, port conflicts, and cross-platform builds add friction.

How OmniRoute solves it:

  • npm global installnpm install -g omniroute && omniroute — done
  • Docker Multi-Platform — AMD64 + ARM64 native (Apple Silicon, AWS Graviton, Raspberry Pi)
  • Docker Compose Profilesbase (no CLI tools) and cli (with Claude Code, Codex, OpenClaw)
  • Electron Desktop App — Native app for Windows/macOS/Linux with system tray, auto-start, offline mode
  • Split-Port Mode — API and Dashboard on separate ports for advanced scenarios (reverse proxy, container networking)
  • Cloud Sync — Config synchronization across devices via Cloudflare Workers
  • DB Backups — Automatic backup, restore, export and import of all settings, with DISABLE_SQLITE_AUTO_BACKUP for externally managed backups

Teams in non-English-speaking countries, especially in Latin America, Asia, and Europe, struggle with English-only interfaces. Language barriers reduce adoption and increase configuration errors.

How OmniRoute solves it:

  • Dashboard i18n — 30 Languages — All 500+ keys translated including Arabic, Bulgarian, Danish, German, Spanish, Finnish, French, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese (PT/BR), Romanian, Russian, Slovak, Swedish, Thai, Ukrainian, Vietnamese, Chinese, Filipino, English
  • RTL Support — Right-to-left support for Arabic and Hebrew
  • Multi-Language READMEs — 30 complete documentation translations
  • Language Selector — Globe icon in header for real-time switching

AI isn't just chat completion. Devs need to generate images, transcribe audio, create embeddings for RAG, rerank documents, and moderate content. Each API has a different endpoint and format.

How OmniRoute solves it:

  • Embeddings/v1/embeddings with 6 providers and 9+ models
  • Image Generation/v1/images/generations with 10 providers and 20+ models (OpenAI, xAI, Together, Fireworks, Nebius, Hyperbolic, NanoBanana, Antigravity, SD WebUI, ComfyUI)
  • Text-to-Video/v1/videos/generations — ComfyUI (AnimateDiff, SVD) and SD WebUI
  • Text-to-Music/v1/music/generations — ComfyUI (Stable Audio Open, MusicGen)
  • Audio Transcription/v1/audio/transcriptions — Whisper + Nvidia NIM, HuggingFace, Qwen3
  • Text-to-Speech/v1/audio/speech — ElevenLabs, Nvidia NIM, HuggingFace, Coqui, Tortoise, Qwen3, Inworld, Cartesia, PlayHT, + existing providers
  • Moderations/v1/moderations — Content safety checks
  • Reranking/v1/rerank — Document relevance reranking
  • Responses API — Full /v1/responses support for Codex

Developers want to know which model is best for their use case — code, translation, reasoning — but comparing manually is slow. No integrated eval tools exist.

How OmniRoute solves it:

  • LLM Evaluations — Golden set testing with 10 pre-loaded cases covering greetings, math, geography, code generation, JSON compliance, translation, markdown, safety refusal
  • 4 Match Strategiesexact, contains, regex, custom (JS function)
  • Translator Playground Test Bench — Batch testing with multiple inputs and expected outputs, cross-provider comparison
  • Chat Tester — Full round-trip with visual response rendering
  • Live Monitor — Real-time stream of all requests flowing through the proxy

As request volume grows, without caching the same questions generate duplicate costs. Without idempotency, duplicate requests waste processing. Per-provider rate limits must be respected.

How OmniRoute solves it:

  • Semantic Cache — Two-tier cache (signature + semantic) reduces cost and latency
  • Request Idempotency — 5s deduplication window for identical requests
  • Rate Limit Detection — Per-provider RPM, min gap, and max concurrent tracking
  • Editable Rate Limits — Configurable defaults in Settings → Resilience with persistence
  • API Key Validation Cache — 3-tier cache for production performance
  • Health Dashboard with Telemetry — p50/p95/p99 latency, cache stats, uptime

Developers who want all responses in a specific language, with a specific tone, or want to limit reasoning tokens. Configuring this in every tool/request is impractical.

How OmniRoute solves it:

  • System Prompt Injection — Global prompt applied to all requests
  • Thinking Budget Validation — Reasoning token allocation control per request (passthrough, auto, custom, adaptive)
  • 9 Routing Strategies — Global strategies that determine how requests are distributed
  • Wildcard Routerprovider/* patterns route dynamically to any provider
  • Combo Enable/Disable Toggle — Toggle combos directly from the dashboard
  • Provider Toggle — Enable/disable all connections for a provider with one click
  • Blocked Providers — Exclude specific providers from /v1/models listing

Many AI gateways expose MCP only as a hidden implementation detail. Teams need a visible, manageable operation layer.

How OmniRoute solves it:

  • MCP appears in the dashboard navigation and endpoint protocol tab
  • Dedicated MCP management page with process, tools, scopes, and audit
  • Built-in quick-start for omniroute --mcp and client onboarding

Agent workflows need both direct replies and long-running streamed execution with lifecycle control.

How OmniRoute solves it:

  • A2A JSON-RPC endpoint (POST /a2a) with message/send and message/stream
  • SSE streaming with terminal state propagation
  • Task lifecycle APIs for tasks/get and tasks/cancel

Operational teams need to know if MCP is actually alive, not just whether an API is reachable.

How OmniRoute solves it:

  • Runtime heartbeat file with PID, timestamps, transport, tool count, and scope mode
  • MCP status API combining heartbeat + recent activity
  • UI status cards for process/uptime/heartbeat freshness

When tools mutate config or trigger ops actions, teams need forensic traceability.

How OmniRoute solves it:

  • SQLite-backed audit logging for MCP tool calls
  • Filters by tool, success/failure, API key, and pagination
  • Dashboard audit table + stats endpoints for automation

Different clients should have least-privilege access to tool categories.

How OmniRoute solves it:

  • 10 granular MCP scopes for controlled tool access
  • Scope enforcement and visibility in MCP management UI
  • Safe default posture for operational tooling

Teams need quick runtime changes during incidents or cost events.

How OmniRoute solves it:

  • Switch combo activation directly from MCP dashboard
  • Apply resilience profiles from pre-defined policy packs
  • Reset circuit breaker state from the same operations panel

Without lifecycle visibility, task incidents become hard to triage.

How OmniRoute solves it:

  • Task listing/filtering by state/skill with pagination
  • Drill-down on task metadata, events, and artifacts
  • Task cancellation endpoint and UI action with confirmation

Streaming workflows require operational insight into concurrency and live connections.

How OmniRoute solves it:

  • Active stream counters integrated into A2A status
  • Last task timestamp and per-state counts
  • A2A dashboard cards for real-time ops monitoring

External clients and orchestrators need machine-readable metadata for onboarding.

How OmniRoute solves it:

  • Agent Card exposed at /.well-known/agent.json
  • Capabilities and skills shown in management UI
  • A2A status API includes discovery metadata for automation

If users cannot discover protocol surfaces, adoption and support quality drop.

How OmniRoute solves it:

  • Consolidated Endpoints page with tabs for Proxy, MCP, A2A, and API Endpoints
  • Inline service status toggles (Online/Offline) for MCP and A2A
  • Links from overview to dedicated management tabs

Mock tests are not enough to validate protocol compatibility before release.

How OmniRoute solves it:

  • E2E suite that boots app and uses real MCP SDK client transport
  • A2A client tests for discovery, send, stream, get, and cancel flows
  • Cross-check assertions against MCP audit and A2A tasks APIs

Splitting observability by protocol creates blind spots and longer MTTR.

How OmniRoute solves it:

  • Unified dashboards/logs/analytics in one product
  • Health + audit + request telemetry across OpenAI, MCP, and A2A layers
  • Operational APIs for status and automation

Running many separate services increases operational cost and failure modes.

How OmniRoute solves it:

  • OpenAI-compatible proxy, MCP server, and A2A server in one stack
  • Shared auth, resilience, data store, and observability
  • Consistent policy model across all interaction surfaces

Teams lose velocity when stitching multiple ad-hoc services and scripts.

How OmniRoute solves it:

  • Unified endpoint strategy for clients and agents
  • Built-in protocol management UIs and smoke validation paths
  • Production-ready foundations (security, logging, resilience, backup)

Example Playbooks (Integrated Use Cases)

Playbook A: Maximize paid subscription + cheap backup

Combo: "maximize-claude"
  1. cc/claude-opus-4-6
  2. glm/glm-4.7
  3. if/kimi-k2-thinking

Monthly cost: $20 + small backup spend
Outcome: higher quality, near-zero interruption

Playbook B: Zero-cost coding stack

Combo: "free-forever"
  1. gc/gemini-3-flash
  2. if/kimi-k2-thinking
  3. qw/qwen3-coder-plus

Monthly cost: $0
Outcome: stable free coding workflow

Playbook C: 24/7 always-on fallback chain

Combo: "always-on"
  1. cc/claude-opus-4-6
  2. cx/gpt-5.2-codex
  3. glm/glm-4.7
  4. minimax/MiniMax-M2.1
  5. if/kimi-k2-thinking

Outcome: deep fallback depth for deadline-critical workloads

Playbook D: Agent ops with MCP + A2A

1) Start MCP transport (`omniroute --mcp`) for tool-driven operations
2) Run A2A tasks via `message/send` and `message/stream`
3) Observe via /dashboard/endpoint (MCP and A2A tabs)
4) Toggle services via inline status controls

🆓 Start Free — Zero Configuration Cost

Setup AI coding in minutes at $0/month. Connect these free accounts and use the built-in Free Stack combo.

| Step | Action | Providers Unlocked | | ---- | -------------------------------------------------- | ------------------------------------------------------------------ | | 1 | Connect Kiro (AWS Builder ID OAuth) | Claude Sonnet 4.5, Haiku 4.5 — unlimited | | 2 | Connect Qoder (Google OAuth) | kimi-k2-thinking, qwen3-coder-plus, deepseek-r1... — unlimited | | 3 | Connect Qwen (Device Code) | qwen3-coder-plus, qwen3-coder-flash... — unlimited | | 4 | Connect Gemini CLI (Google OAuth) | gemini-3-flash, gemini-2.5-pro — 180K/mo free | | 5 | /dashboard/combosFree Stack ($0) template | Round-robin all free providers automatically |

Point any IDE/CLI to: http://localhost:20128/v1 · API Key: any-string · Done.

Optional extra coverage (also free): Groq API key (30 RPM free), NVIDIA NIM (40 RPM free, 70+ models), Cerebras (1M tok/day), LongCat API key (50M tokens/day!), Cloudflare Workers AI (10K Neurons/day, 50+ models).

⚡ Quick Start

1) Install and run

npm install -g omniroute
omniroute

pnpm users: Run pnpm approve-builds -g after install to enable native build scripts required by better-sqlite3 and @swc/core:

pnpm install -g omniroute
pnpm approve-builds -g   # Select all packages → approve
omniroute

Dashboard opens at http://localhost:20128 and API base URL is http://localhost:20128/v1.

| Command | Description | | ----------------------- | ----------------------------------------------------------- | | omniroute | Start server (PORT=20128, API and dashboard on same port) | | omniroute --port 3000 | Set canonical/API port to 3000 | | omniroute --mcp | Start MCP server (stdio transport) | | omniroute --no-open | Don't auto-open browser | | omniroute --help | Show help |

Optional split-port mode:

PORT=20128 DASHBOARD_PORT=20129 omniroute
# API:       http://localhost:20128/v1
# Dashboard: http://localhost:20129

Long-Running Streaming Timeouts

For most deployments, you only need:

| Variable | Default | Purpose | | ------------------------ | ----------------------------- | --------------------------------------------------------------------------------------------------------------------------- | | REQUEST_TIMEOUT_MS | 600000 | Shared baseline for upstream fetch, hidden Undici timeouts, TLS fingerprint requests, and API bridge request/proxy timeouts | | STREAM_IDLE_TIMEOUT_MS | inherits REQUEST_TIMEOUT_MS | Maximum gap between streaming chunks before OmniRoute aborts the SSE stream |

Backward compatibility is preserved: existing FETCH_TIMEOUT_MS, API_BRIDGE_PROXY_TIMEOUT_MS, and other per-layer timeout vars still work and override the shared baseline.

Advanced overrides are available if you need finer control:

| Variable | Default | Purpose | | ---------------------------------------- | ------------------------------------------ | -------------------------------------------------------------------- | | FETCH_TIMEOUT_MS | inherits REQUEST_TIMEOUT_MS | Total upstream request timeout used by the main fetch abort signal | | FETCH_HEADERS_TIMEOUT_MS | inherits FETCH_TIMEOUT_MS | Undici time limit for receiving upstream response headers | | FETCH_BODY_TIMEOUT_MS | inherits FETCH_TIMEOUT_MS | Undici time limit between upstream body chunks (0 disables it) | | FETCH_CONNECT_TIMEOUT_MS | 30000 | Undici TCP connect timeout | | FETCH_KEEPALIVE_TIMEOUT_MS | 4000 | Undici idle keep-alive socket timeout | | TLS_CLIENT_TIMEOUT_MS | inherits FETCH_TIMEOUT_MS | Timeout for TLS fingerprint requests made through wreq-js | | API_BRIDGE_PROXY_TIMEOUT_MS | inherits REQUEST_TIMEOUT_MS or 30000 | Timeout for /v1 proxy forwarding from API port to dashboard port | | API_BRIDGE_SERVER_REQUEST_TIMEOUT_MS | max(API_BRIDGE_PROXY_TIMEOUT_MS, 300000) | Incoming request timeout on the API bridge server | | API_BRIDGE_SERVER_HEADERS_TIMEOUT_MS | 60000 | Incoming header timeout on the API bridge server | | API_BRIDGE_SERVER_KEEPALIVE_TIMEOUT_MS | 5000 | Keep-alive timeout on the API bridge server | | API_BRIDGE_SERVER_SOCKET_TIMEOUT_MS | 0 | Socket inactivity timeout on the API bridge server (0 disables it) |

If you run OmniRoute behind Nginx, Caddy, Cloudflare, or another reverse proxy, make sure the proxy timeouts are also higher than your OmniRoute stream/fetch timeouts.

2) Connect providers and create your API key

  1. Open Dashboard → Providers and connect at least one provider (OAuth or API key).
  2. Open Dashboard → Endpoints and create an API key.
  3. (Optional) Open Dashboard → Combos and set your fallback chain.

3) Point your coding tool to OmniRoute

Base URL: http://localhost:20128/v1
API Key:  [copy from Endpoint page]
Model:    if/kimi-k2-thinking (or any provider/model prefix)

Works with Claude Code, Codex CLI, Gemini CLI, Cursor, Cline, OpenClaw, OpenCode, and OpenAI-compatible SDKs.

4) Enable and validate protocols (v2.0)

MCP (for tool-driven operations):

omniroute --mcp

Then connect your MCP client over stdio and test tools like:

  • omniroute_get_health
  • omniroute_list_combos

A2A (for agent-to-agent workflows):

curl http://localhost:20128/.well-known/agent.json
curl -X POST http://localhost:20128/a2a \
  -H 'content-type: application/json' \
  -d '{"jsonrpc":"2.0","id":"quickstart","method":"message/send","params":{"skill":"quota-management","messages":[{"role":"user","content":"Give me a short quota summary."}]}}'

5) Validate everything end-to-end (recommended)

npm run test:protocols:e2e

This suite validates real MCP and A2A client flows against a running app.

Alternative: run from source

cp .env.example .env
npm install
PORT=20128 DASHBOARD_PORT=20129 NEXT_PUBLIC_BASE_URL=http://localhost:20129 npm run dev

For Void Linux users, you can build a native package using xbps-src. Save this block as srcpkgs/omniroute/template:

# Template file for 'omniroute'
pkgname=omniroute
version=3.4.1
revision=1
hostmakedepends="nodejs python3 make"
depends="openssl"
short_desc="Universal AI gateway with smart routing for multiple LLM providers"
maintainer="zenobit <[email protected]>"
license="MIT"
homepage="https://github.com/diegosouzapw/OmniRoute"
distfiles="https://github.com/diegosouzapw/OmniRoute/archive/refs/tags/v${version}.tar.gz"
checksum=009400afee90a9f32599d8fe734145cfd84098140b7287990183dde45ae2245b
system_accounts="_omniroute"
omniroute_homedir="/var/lib/omniroute"
export NODE_ENV=production
export npm_config_engine_strict=false
export npm_config_loglevel=error
export npm_config_fund=false
export npm_config_audit=false

do_build() {
	# Determine target CPU arch for node-gyp
	local _gyp_arch
	case "$XBPS_TARGET_MACHINE" in
		aarch64*) _gyp_arch=arm64 ;;
		armv7*|armv6*) _gyp_arch=arm ;;
		i686*) _gyp_arch=ia32 ;;
		*) _gyp_arch=x64 ;;
	esac

	# 1) Install all deps – skip scripts (no network in do_build, native modules
	#    compiled separately below; better-sqlite3 is serverExternalPackage so
	#    Next.js does not execute it during next build)
	NODE_ENV=development npm ci --ignore-scripts

	# 2) Build the Next.js standalone bundle
	npm run build

	# 3) Copy static assets into standalone
	cp -r .next/static .next/standalone/.next/static
	[ -d public ] && cp -r public .next/standalone/public || true

	# 4) Compile better-sqlite3 native binding for the target architecture.
	#    Use node-gyp directly so CC/CXX from xbps-src cross-toolchain are used
	#    without npm altering them.
	local _node_gyp=/usr/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js
	(cd node_modules/better-sqlite3 && node "$_node_gyp" rebuild --arch="$_gyp_arch")

	# 5) Place the compiled binding into the standalone bundle
	local _bs3_release=.next/standalone/node_modules/better-sqlite3/build/Release
	mkdir -p "$_bs3_release"
	cp node_modules/better-sqlite3/build/Release/better_sqlite3.node "$_bs3_release/"

	# 6) Remove arch-specific sharp bundles – upstream sets images.unoptimized=true
	#    so sharp is not used at runtime; x64 .so files would break aarch64 strip
	rm -rf .next/standalone/node_modules/@img

	# 7) Copy pino runtime deps omitted by Next.js static analysis:
	#    pino-abstract-transport – required by pino's worker thread
	#    split2 – dep of pino-abstract-transport
	#    process-warning – dep of pino itself
	for _mod in pino-abstract-transport split2 process-warning; do
		cp -r "node_modules/$_mod" .next/standalone/node_modules/
	done
}

do_check() {
	npm run test:unit
}

do_install() {
	vmkdir usr/lib/omniroute/.next

	vcopy .next/standalone/. usr/lib/omniroute/.next/standalone

	# Prevent removal of empty Next.js app router dirs by the post-install hook
	for _d in \
		.next/standalone/.next/server/app/dashboard \
		.next/standalone/.next/server/app/dashboard/settings \
		.next/standalone/.next/server/app/dashboard/providers; do
		touch "${DESTDIR}/usr/lib/omniroute/${_d}/.keep"
	done

	cat > "${WRKDIR}/omniroute" <<'EOF'
#!/bin/sh
export PORT="${PORT:-20128}"
export DATA_DIR="${DATA_DIR:-${XDG_DATA_HOME:-${HOME}/.local/share}/omniroute}"
export LOG_TO_FILE="${LOG_TO_FILE:-false}"
mkdir -p "${DATA_DIR}"
exec node /usr/lib/omniroute/.next/standalone/server.js "$@"
EOF
	vbin "${WRKDIR}/omniroute"
}

post_install() {
	vlicense LICENSE
}

🐳 Docker

OmniRoute is available as a public Docker image on Docker Hub.

Quick run:

docker run -d \
  --name omniroute \
  --restart unless-stopped \
  --stop-timeout 40 \
  -p 20128:20128 \
  -v omniroute-data:/app/data \
  diegosouzapw/omniroute:latest

With environment file:

# Copy and edit .env first
cp .env.example .env

docker run -d \
  --name omniroute \
  --restart unless-stopped \
  --stop-timeout 40 \
  --env-file .env \
  -p 20128:20128 \
  -v omniroute-data:/app/data \
  diegosouzapw/omniroute:latest

Using Docker Compose:

# Base profile (no CLI tools)
docker compose --profile base up -d

# CLI profile (Claude Code, Codex, OpenClaw built-in)
docker compose --profile cli up -d

Dashboard support for Docker deployments now includes a one-click Cloudflare Quick Tunnel on Dashboard → Endpoints. The first enable downloads cloudflared only when needed, starts a temporary tunnel to your current /v1 endpoint, and shows the generated https://*.trycloudflare.com/v1 URL directly below your normal public URL.

Notes:

  • Quick Tunnel URLs are temporary and change after every restart.
  • Quick Tunnels are not auto-restored after an OmniRoute or container restart. Re-enable them from the dashboard when needed.
  • Managed install currently supports Linux, macOS, and Windows on x64 / arm64.
  • Managed Quick Tunnels default to HTTP/2 transport to avoid noisy QUIC UDP buffer warnings in constrained container environments. Set CLOUDFLARED_PROTOCOL=quic or auto if you want a different transport.
  • Docker images bundle system CA roots and pass them to managed cloudflared, which avoids TLS trust failures when the tunnel bootstraps inside the container.
  • SQLite runs in WAL mode. docker stop should be allowed to finish so OmniRoute can checkpoint the latest changes back into storage.sqlite.
  • The bundled Compose files already set a 40s stop grace period. If you run the image directly, keep --stop-timeout 40 (or similar) so manual stops do not cut off shutdown cleanup.
  • Set CLOUDFLARED_BIN=/absolute/path/to/cloudflared if you want OmniRoute to use an existing binary instead of downloading one.

Using Docker Compose with Caddy (HTTPS Auto-TLS):

OmniRoute can be securely exposed using Caddy's automatic SSL provisioning. Ensure your domain's DNS A record points to your server's IP.

services:
  omniroute:
    image: diegosouzapw/omniroute:latest
    container_name: omniroute
    restart: unless-stopped
    volumes:
      - omniroute-data:/app/data
    environment:
      - PORT=20128
      - NEXT_PUBLIC_BASE_URL=https://your-domain.com

  caddy:
    image: caddy:latest
    container_name: caddy
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    command: caddy reverse-proxy --from https://your-domain.com --to http://omniroute:20128

volumes:
  omniroute-data:

| Image | Tag | Size | Description | | ------------------------ | -------- | ------ | --------------------- | | diegosouzapw/omniroute | latest | ~250MB | Latest stable release | | diegosouzapw/omniroute | 1.0.3 | ~250MB | Current version |


🖥️ Desktop App — Offline & Always-On

🆕 NEW! OmniRoute is now available as a native desktop application for Windows, macOS, and Linux.

Run OmniRoute as a standalone desktop app — no terminal, no browser, no internet required for local models. The Electron-based app includes:

  • 🖥️ Native Window — Dedicated app window with system tray integration
  • 🔄 Auto-Start — Launch OmniRoute on system login
  • 🔔 Native Notifications — Get alerts for quota exhaustion or provider issues
  • One-Click Install — NSIS (Windows), DMG (macOS), AppImage (Linux)
  • 🌐 Offline Mode — Works fully offline with bundled server

Quick Start

# Development mode
npm run electron:dev

# Build for your platform
npm run electron:build         # Current platform
npm run electron:build:win     # Windows (.exe)
npm run electron:build:mac     # macOS (.dmg) — x64 & arm64
npm run electron:build:linux   # Linux (.AppImage)

System Tray

When minimized, OmniRoute lives in your system tray with quick actions:

  • Open dashboard
  • Change server port
  • Quit application

📖 Full documentation: electron/README.md


💰 Pricing at a Glance

| Tier | Provider | Cost | Quota Reset | Best For | | ------------------- | --------------------------- | ------------------------- | ---------------- | --------------------------------- | | 💳 SUBSCRIPTION | Claude Code (Pro) | $20/mo | 5h + weekly | Already subscribed | | | Codex (Plus/Pro) | $20-200/mo | 5h + weekly | OpenAI users | | | Gemini CLI | FREE | 180K/mo + 1K/day | Everyone! | | | GitHub Copilot | $10-19/mo | Monthly | GitHub users | | 🔑 API KEY | NVIDIA NIM | FREE (dev forever) | ~40 RPM | 70+ open models | | | Cerebras | FREE (1M tok/day) | 60K TPM / 30 RPM | World's fastest | | | Groq | FREE (30 RPM) | 14.4K RPD | Ultra-fast Llama/Gemma | | | DeepSeek V3.2 | $0.27/$1.10 per 1M | None | Best price/quality reasoning | | | xAI Grok-4 Fast | $0.20/$0.50 per 1M 🆕 | None | Fastest + tool calling, ultralow | | | xAI Grok-4 (standard) | $0.20/$1.50 per 1M 🆕 | None | Reasoning flagship from xAI | | | Mistral | Free trial + paid | Rate limited | European AI | | | OpenRouter | Pay-per-use | None | 100+ models aggr. | | 💰 CHEAP | GLM-5 (via Z.AI) 🆕 | $0.5/1M | Daily 10AM | 128K output, newest flagship | | | GLM-4.7 | $0.6/1M | Daily 10AM | Budget backup | | | MiniMax M2.5 🆕 | $0.3/1M input | 5-hour rolling | Reasoning + agentic tasks | | | MiniMax M2.1 | $0.2/1M | 5-hour rolling | Cheapest option | | | Kimi K2.5 (Moonshot API) 🆕 | Pay-per-use | None | Direct Moonshot API access | | | Kimi K2 | $9/mo flat | 10M tokens/mo | Predictable cost | | 🆓 FREE | Qoder | $0 | Unlimited | 5 models unlimited | | | Qwen | $0 | Unlimited | 4 models unlimited | | | Kiro | $0 | Unlimited | Claude Sonnet/Haiku (AWS Builder) | | | LongCat Flash-Lite 🆕 | $0 (50M tok/day 🔥) | 1 RPS | Largest free quota on Earth | | | Pollinations AI 🆕 | $0 (no key needed) | 1 req/15s | GPT-5, Claude, DeepSeek, Llama 4 | | | Cloudflare Workers AI 🆕 | $0 (10K Neurons/day) | ~150 resp/day | 50+ models, global edge | | | Scaleway AI 🆕 | $0 (1M tokens total) | Rate limited | EU/GDPR, Qwen3 235B, Llama 70B |

🆕 New models added (Mar 2026): Grok-4 Fast family at $0.20/$0.50/M (benchmarked at 1143ms — 30% faster than Gemini 2.5 Flash), GLM-5 via Z.AI with 128K output, MiniMax M2.5 reasoning, DeepSeek V3.2 updated pricing, Kimi K2.5 via Moonshot direct API.

💡 $0 Combo Stack — The Complete Free Setup:

# 🆓 Ultimate Free Stack 2026 — 11 Providers, $0 Forever
Kiro (kr/)             → Claude Sonnet/Haiku UNLIMITED
Qoder (if/)            → kimi-k2-thinking, qwen3-coder-plus, deepseek-r1 UNLIMITED
LongCat Lite (lc/)     → LongCat-Flash-Lite — 50M tokens/day 🔥
Pollinations (pol/)    → GPT-5, Claude, DeepSeek, Llama 4 — no key needed
Qwen (qw/)             → qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-next UNLIMITED
Gemini (gemini/)       → Gemini 2.5 Flash — 1,500 req/day free API key
Cloudflare AI (cf/)    → Llama 70B, Gemma 3, Mistral — 10K Neurons/day
Scaleway (scw/)        → Qwen3 235B, Llama 70B — 1M free tokens (EU)
Groq (groq/)           → Llama/Gemma ultra-fast — 14.4K req/day
NVIDIA NIM (nvidia/)   → 70+ open models — 40 RPM forever
Cerebras (cerebras/)   → Llama/Qwen world-fastest — 1M tok/day

Zero cost. Never stops coding. Configure this as one OmniRoute combo and all fallbacks happen automatically — no manual switching ever.



🆓 Free Models — What You Actually Get

All models below are 100% free with zero credit card required. OmniRoute auto-routes between them when one quota runs out — combine them all for an unbreakable $0 combo.

🔵 CLAUDE MODELS (via Kiro — AWS Builder ID)

| Model | Prefix | Limit | Rate Limit | | ------------------- | ------ | ------------- | --------------------- | | claude-sonnet-4.5 | kr/ | Unlimited | No reported daily cap | | claude-haiku-4.5 | kr/ | Unlimited | No reported daily cap | | claude-opus-4.6 | kr/ | Unlimited | Latest Opus via Kiro |

🟢 QODER MODELS (Free PAT via qodercli)

| Model | Prefix | Limit | Rate Limit | | ------------------ | ------ | ------------- | --------------- | | kimi-k2-thinking | if/ | Unlimited | No reported cap | | qwen3-coder-plus | if/ | Unlimited | No reported cap | | deepseek-r1 | if/ | Unlimited | No reported cap | | minimax-m2.1 | if/ | Unlimited | No reported cap | | kimi-k2 | if/ | Unlimited | No reported cap |

Recommended connection method: Personal Access Token + qodercli. Browser OAuth is experimental and disabled by default unless QODER_OAUTH_* environment variables are configured.

🟡 QWEN MODELS (Device Code Auth)

| Model | Prefix | Limit | Rate Limit | | ------------------- | ------ | ------------- | ------------------- | | qwen3-coder-plus | qw/ | Unlimited | No reported cap | | qwen3-coder-flash | qw/ | Unlimited | No reported cap | | qwen3-coder-next | qw/ | Unlimited | No reported cap | | vision-model | qw/ | Unlimited | Multimodal (images) |

🟣 GEMINI CLI (Google OAuth)

| Model | Prefix | Limit | Rate Limit | | ------------------------ | ------ | --------------------------- | ------------- | | gemini-3-flash-preview | gc/ | 180K tok/month + 1K/day | Monthly reset | | gemini-2.5-pro | gc/ | 180K/month (shared pool) | High quality |

⚫ NVIDIA NIM (Free API Key — build.nvidia.com)

| Tier | Daily Limit | Rate Limit | Notes | | ---------- | ------------ | ----------- | ------------------------------------------------------ | | Free (Dev) | No token cap | ~40 RPM | 70+ models; transitioning to pure rate limits mid-2025 |

Popular free models: moonshotai/kimi-k2.5 (Kimi K2.5), z-ai/glm4.7 (GLM 4.7), deepseek-ai/deepseek-v3.2 (DeepSeek V3.2), nvidia/llama-3.3-70b-instruct, deepseek/deepseek-r1

⚪ CEREBRAS (Free API Key — inference.cerebras.ai)

| Tier | Daily Limit | Rate Limit | Notes | | ---- | ----------------- | ---------------- | ------------------------------------------- | | Free | 1M tokens/day | 60K TPM / 30 RPM | World's fastest LLM inference; resets daily |

Available free: llama-3.3-70b, llama-3.1-8b, deepseek-r1-distill-llama-70b

🔴 GROQ (Free API Key — console.groq.com)

| Tier | Daily Limit | Rate Limit | Notes | | ---- | ------------- | ---------------- | ----------------------------------------- | | Free | 14.4K RPD | 30 RPM per model | No credit card; 429 on limit, not charged |

Available free: llama-3.3-70b-versatile, gemma2-9b-it, mixtral-8x7b, whisper-large-v3

🔴 LONGCAT AI (Free API Key — longcat.chat) 🆕

| Model | Prefix | Daily Free Quota | Notes | | ----------------------------- | ------ | ----------------- | ----------------------- | | LongCat-Flash-Lite | lc/ | 50M tokens 💥 | Largest free quota ever | | LongCat-Flash-Chat | lc/ | 500K tokens | Multi-turn chat | | LongCat-Flash-Thinking | lc/ | 500K tokens | Reasoning / CoT | | LongCat-Flash-Thinking-2601 | lc/ | 500K tokens | Jan 2026 version | | LongCat-Flash-Omni-2603 | lc/ | 500K tokens | Multimodal |

100% free while in public beta. Sign up at longcat.chat with email or phone. Resets daily 00:00 UTC.

🟢 POLLINATIONS AI (No API Key Required) 🆕

| Model | Prefix | Rate Limit | Provider Behind | | ---------- | ------ | ---------- | ------------------ | | openai | pol/ | 1 req/15s | GPT-5 | | claude | pol/ | 1 req/15s | Anthropic Claude | | gemini | pol/ | 1 req/15s | Google Gemini | | deepseek | pol/ | 1 req/15s | DeepSeek V3 | | llama | pol/ | 1 req/15s | Meta Llama 4 Scout | | mistral | pol/ | 1 req/15s | Mistral AI |

Zero friction: No signup, no API key. Add the Pollinations provider with an empty key field and it works immediately.

🟠 CLOUDFLARE WORKERS AI (Free API Key — cloudflare.com) 🆕

| Tier | Daily Neurons | Equivalent Usage | Notes | | ---- | ------------- | --------------------------------------- | ----------------------- | | Free | 10,000 | ~150 LLM resp / 500s audio / 15K embeds | Global edge, 50+ models |

Popular free models: @cf/meta/llama-3.3-70b-instruct, @cf/google/gemma-3-12b-it, @cf/openai/whisper-large-v3-turbo (free audio!), @cf/qwen/qwen2.5-coder-15b-instruct

Requires API Token + Account ID from dash.cloudflare.com. Store Account ID in provider settings.

🟣 SCALEWAY AI (1M Free Tokens — scaleway.com) 🆕

| Tier | Free Quota | Location | Notes | | ---- | ------------- | ------------ | ----------------------------------- | | Free | 1M tokens | 🇫🇷 Paris, EU | No credit card needed within limits |

Available free: qwen3-235b-a22b-instruct-2507 (Qwen3 235B!), llama-3.1-70b-instruct, mistral-small-3.2-24b-instruct-2506, deepseek-v3-0324

EU/GDPR compliant. Get API key at console.scaleway.com.

💡 The Ultimate Free Stack (11 Providers, $0 Forever):

Kiro (kr/)             → Claude Sonnet/Haiku UNLIMITED
Qoder (if/)            → kimi-k2-thinking, qwen3-coder-plus, deepseek-r1 UNLIMITED
LongCat Lite (lc/)     → LongCat-Flash-Lite — 50M tokens/day 🔥
Pollinations (pol/)    → GPT-5, Claude, DeepSeek, Llama 4 — no key needed
Qwen (qw/)             → qwen3-coder models UNLIMITED
Gemini (gemini/)       → Gemini 2.5 Flash — 1,500 req/day free
Cloudflare AI (cf/)    → 50+ models — 10K Neurons/day
Scaleway (scw/)        → Qwen3 235B, Llama 70B — 1M free tokens (EU)
Groq (groq/)           → Llama/Gemma — 14.4K req/day ultra-fast
NVIDIA NIM (nvidia/)   → 70+ open models — 40 RPM forever
Cerebras (cerebras/)   → Llama/Qwen world-fastest — 1M tok/day

🎙️ Free Transcription Combo

Transcribe any audio/video for $0 — Deepgram leads with $200 free, AssemblyAI $50 fallback, Groq Whisper as unlimited emergency backup.

| Provider | Free Credits | Best Model | Rate Limit | | ----------------- | ---------------------- | -------------------------------------------- | ---------------------------- | | 🟢 Deepgram | $200 free (signup) | nova-3 — best accuracy, 30+ languages | No RPM limit on free credits | | 🔵 AssemblyAI | $50 free (signup) | universal-3-pro — chapters, sentiment, PII | No RPM limit on free credits | | 🔴 Groq | Free forever | whisper-large-v3 — OpenAI Whisper | 30 RPM (rate limited) |

Suggested combo in /dashboard/combos:

Name: free-transcription
Strategy: Priority
Nodes:
  [1] deepgram/nova-3          → uses $200 free first
  [2] assemblyai/universal-3-pro → fallback when Deepgram credits run out
  [3] groq/whisper-large-v3    → free forever, emergency fallback

Then in /dashboard/mediaTranscription tab: upload any audio or video file → select your combo endpoint → get transcription in supported formats.

💡 Key Features

OmniRoute v2.0 is built as an operational platform, not just a relay proxy.

🆕 New — ClawRouter-Inspired Improvements (Mar 2026)

| Feature | What It Does | | ------------------------------------ | ------------------------------------------------------------------------------------------- | | ⚡ Grok-4 Fast Family | xAI models at $0.20/$0.50/M — benchmarked 1143ms (30% faster than Gemini 2.5 Flash) | | 🧠 GLM-5 via Z.AI | 128K output context, $0.5/1M — newest flagship from the GLM family | | 🔮 MiniMax M2.5 | Reasoning + agentic tasks at $0.30/1M — significant upgrade from M2.1 | | 🎯 toolCalling Flag per Model | Per-model toolCalling: true/false in registry — AutoCombo skips non-tool-capable models | | 🌍 Multilingual Intent Detection | PT/ZH/ES/AR keywords in AutoCombo scoring — better model selection for no