arbiter-cli
v0.12.1
Published
AI inference optimization — smart routing + context compression saves up to 98% on LLM costs
Maintainers
Readme
arbiter-cli
Cut LLM API costs 69% with one line of code. Smart routing proxy that sends each request to the cheapest model capable of handling it.
Quick Start
# Interactive chat (like Claude CLI, but 69% cheaper)
npx arbiter-cli chat
# AI coding agent (reads files, writes code, runs commands)
npx arbiter-cli code "add error handling to utils.py"
# Set up in your project (zero code changes to your app)
npx arbiter-cli init
# Check your savings
npx arbiter-cli statsWhat it does
Arbiter routes every LLM request to the cheapest model that can handle it:
- Simple questions → Gemini Flash / GPT-4o Mini (95% cheaper)
- Medium code tasks → Qwen / Mistral (90% cheaper)
- Complex reasoning → Claude Sonnet 4 / GPT-4o (full quality)
You get the same quality. You pay 69% less on average.
Setup Options
Option 1: Interactive Chat
npx arbiter-cli chatChat like you would in Claude CLI. Each response shows which model was picked and how much you saved.
⚡ Arbiter Chat
› What is the capital of France?
Paris.
↳ gemini-2.5-flash · saved <$0.001 (95%)
› Design a CRDT for collaborative editing
Here's an approach using operation-based CRDTs...
↳ claude-sonnet-4.6 · saved $0.00 (0%) — frontier neededOption 2: Coding Agent
npx arbiter-cli code "fix the bug in main.py"
npx arbiter-cli code # interactive modeReads files, writes code, runs commands. Routes cheap for simple file ops, frontier for architecture decisions.
Option 3: Drop-in Proxy (for your existing code)
npx arbiter-cli initThis adds OPENAI_BASE_URL to your .env. Your existing OpenAI SDK code routes through Arbiter automatically — no code changes.
from openai import OpenAI
# Works unchanged — Arbiter routes behind the scenes
client = OpenAI() # Reads OPENAI_BASE_URL from .env
response = client.chat.completions.create(
model="gpt-4o", # Arbiter overrides intelligently
messages=[{"role": "user", "content": "What is 2+2?"}]
)
# → Routed to Gemini Flash, saved 95%CLI Commands
| Command | Description |
|---------|-------------|
| chat | Interactive chat with smart routing |
| chat --fast | Prefer low-latency models |
| chat --model claude | Force a specific model |
| code | AI coding agent (interactive) |
| code "task" | One-shot coding task |
| init | Add Arbiter to current project |
| status | Check proxy connection |
| stats | View cost savings |
Chat Commands
| Command | Description |
|---------|-------------|
| /stats | Session cost breakdown |
| /model claude | Switch model (claude, gpt4o, flash, haiku, fable, auto) |
| /good or /bad | Rate response (improves routing) |
| /copy | Copy last response to clipboard |
| /save name | Save conversation |
| /load name | Load conversation |
| """ | Start/end multi-line input |
| quit | Exit |
How It Works
- Classify — Each request is analyzed for task type (code, reasoning, analysis, creative, etc.) and complexity (simple/medium/complex) in <1ms
- Route — Performance matrix picks the cheapest model that meets the quality bar
- Quality Gate — If cheap model gives garbage, transparently retries on frontier
- Cache — Identical requests return instantly at $0
- Compress — Non-frontier responses use concise prompts (fewer output tokens)
Models Available
| Model | Best for | Cost | |-------|----------|------| | Claude Sonnet 4 | Complex reasoning, analysis | $$$ | | Claude Fable 5 | Autonomous coding agents | $$$$ | | GPT-4o | Complex code, multi-step | $$$ | | Gemini 2.5 Flash | Simple Q&A, classification | $ | | GPT-4o Mini | Simple tasks, extraction | $ | | Qwen 2.5 72B | Code generation, math | $ | | Llama 3.3 70B | General tasks | $ | | Mistral Large | Code review, analysis | $$ | | Claude 3.5 Haiku | Fast responses | $$ |
Requirements
- Node.js 18+
- An OpenRouter API key (one key, all models)
Set your key:
export OPENROUTER_API_KEY=sk-or-v1-...
# or add to .env in your project directorySavings Breakdown
From real testing across 90 varied requests:
| Traffic Type | Routed To | Savings | |-------------|-----------|---------| | Simple Q&A (40%) | Gemini Flash | 95% | | Classification (15%) | Gemini Flash | 95% | | Code tasks (25%) | Qwen / GPT-4o | 50-93% | | Complex reasoning (10%) | Claude Sonnet 4 | 0% | | Analysis (10%) | Claude Sonnet 4 | 0% | | Average | Mixed | 69% |
Links
- Landing page: https://arbiter-ai.com
- API docs: https://app.arbiter-ai.com/docs
- NPM: https://www.npmjs.com/package/arbiter-cli
License
MIT
