api-key-lb
v1.0.5
Published
Transparent API key load balancer with session-aware sticky routing. Works with any OpenAI-compatible API provider.
Maintainers
Readme
api-key-lb
Transparent API key load balancer with session-aware sticky routing. Works with any OpenAI-compatible API.
Why? Agentic systems (Hermes, OpenCode, Claude Code, etc.) build up context caches per API key. If you round-robin between keys, you lose cache affinity on every other request. This proxy uses sticky routing — same session always hits the same key.
Features
- Session-aware sticky routing — same session fingerprint → same key (cache-friendly)
- Automatic 429 fallback — throttled key triggers fallback, reverts when unthrottled
- Works with anything — Hermes, OpenCode/kimaki, Claude Code, curl, any OpenAI-compatible client
- Zero-config target — proxy is transparent, forwards any path to the target API
- Health endpoint —
GET /healthfor monitoring - macOS LaunchAgent — auto-starts on login, auto-restarts on crash
Quick Start
# Install globally (or use directly from this dir)
npm install -g .
# Setup — saves config, patches known configs, installs LaunchAgent
api-key-lb setup \
--keys "sk-key1,sk-key2" \
--target "https://api.z.ai" \
--port 4577
# Check status
api-key-lb status
# Stop
api-key-lb stopConfig
Priority: CLI flags → env vars → config file → defaults
Config file: ~/.config/api-key-lb/config.json
{
"target": "https://api.z.ai",
"keys": "key1,key2",
"port": 4577,
"cooldown_ms": 60000,
"session_ttl_ms": 3600000
}Environment variables:
| Variable | Default | Description |
|---|---|---|
| API_KEYS | required | Comma-separated API keys |
| TARGET | https://api.openai.com | Target API base URL |
| PORT | 4577 | Proxy listen port |
| COOLDOWN_MS | 60000 | 429 cooldown per key |
| SESSION_TTL_MS | 3600000 | Session sticky TTL |
| API_KEY_LB_CONFIG | — | Path to config file |
How Sticky Routing Works
- Extracts a session fingerprint from the request body (session_id, conversation_id, or model+system prompt hash)
- Hashes the fingerprint to deterministically pick a key
- Same fingerprint always routes to the same key
- Different sessions get distributed across keys
- On 429: falls back to alternate key, reverts to sticky when unthrottled
Connecting Your Tools
Just change the base URL to point at the proxy:
Hermes (~/.hermes/config.yaml):
model:
base_url: http://127.0.0.1:4577/api/coding/paas/v4OpenCode (~/.config/opencode/opencode.json):
{
"provider": {
"zai": {
"options": {
"baseURL": "http://127.0.0.1:4577/api/coding/paas/v4"
}
}
}
}Any OpenAI-compatible client:
curl http://127.0.0.1:4577/v1/chat/completions \
-H "Authorization: Bearer anything" \
-d '{"model":"gpt-4","messages":[...]}'The Authorization header gets replaced by the proxy — the key you pass doesn't matter.
Health Check
curl http://127.0.0.1:4577/healthReturns per-key stats: requests, errors, cache hits, throttle status, active sessions.
Architecture
┌─────────┐ ┌──────────────────┐ ┌──────────┐
│ Client │────▶│ api-key-lb proxy │────▶│ API │
│ (Hermes) │ │ :4577 │ │ (z.ai) │
│ (OpenCode)│ │ sticky routing │ │ │
│ (curl) │ │ 429 fallback │ │ │
└─────────┘ └──────────────────┘ └──────────┘The proxy is fully transparent — it forwards whatever path/headers the client sends, only replacing the Authorization bearer token and Host header.
License
MIT
