api-key-lb

v1.0.5

Published

16 days ago

Transparent API key load balancer with session-aware sticky routing. Works with any OpenAI-compatible API provider.

0High
0Medium
0Low

jairodri

proxy load-balancer api-keys openai openrouter cache-affinity rate-limit sticky-routing

api-key-lb

Transparent API key load balancer with session-aware sticky routing. Works with any OpenAI-compatible API.

Why? Agentic systems (Hermes, OpenCode, Claude Code, etc.) build up context caches per API key. If you round-robin between keys, you lose cache affinity on every other request. This proxy uses sticky routing — same session always hits the same key.

Features

Session-aware sticky routing — same session fingerprint → same key (cache-friendly)
Automatic 429 fallback — throttled key triggers fallback, reverts when unthrottled
Works with anything — Hermes, OpenCode/kimaki, Claude Code, curl, any OpenAI-compatible client
Zero-config target — proxy is transparent, forwards any path to the target API
Health endpoint — GET /health for monitoring
macOS LaunchAgent — auto-starts on login, auto-restarts on crash

Quick Start

# Install globally (or use directly from this dir)
npm install -g .

# Setup — saves config, patches known configs, installs LaunchAgent
api-key-lb setup \
  --keys "sk-key1,sk-key2" \
  --target "https://api.z.ai" \
  --port 4577

# Check status
api-key-lb status

# Stop
api-key-lb stop

Config

Priority: CLI flags → env vars → config file → defaults

Config file: ~/.config/api-key-lb/config.json

{
  "target": "https://api.z.ai",
  "keys": "key1,key2",
  "port": 4577,
  "cooldown_ms": 60000,
  "session_ttl_ms": 3600000
}

Environment variables:

| Variable | Default | Description | |---|---|---| | API_KEYS | required | Comma-separated API keys | | TARGET | https://api.openai.com | Target API base URL | | PORT | 4577 | Proxy listen port | | COOLDOWN_MS | 60000 | 429 cooldown per key | | SESSION_TTL_MS | 3600000 | Session sticky TTL | | API_KEY_LB_CONFIG | — | Path to config file |

How Sticky Routing Works

Extracts a session fingerprint from the request body (session_id, conversation_id, or model+system prompt hash)
Hashes the fingerprint to deterministically pick a key
Same fingerprint always routes to the same key
Different sessions get distributed across keys
On 429: falls back to alternate key, reverts to sticky when unthrottled

Connecting Your Tools

Just change the base URL to point at the proxy:

Hermes (~/.hermes/config.yaml):

model:
  base_url: http://127.0.0.1:4577/api/coding/paas/v4

OpenCode (~/.config/opencode/opencode.json):

{
  "provider": {
    "zai": {
      "options": {
        "baseURL": "http://127.0.0.1:4577/api/coding/paas/v4"
      }
    }
  }
}

Any OpenAI-compatible client:

curl http://127.0.0.1:4577/v1/chat/completions \
  -H "Authorization: Bearer anything" \
  -d '{"model":"gpt-4","messages":[...]}'

The Authorization header gets replaced by the proxy — the key you pass doesn't matter.

Health Check

curl http://127.0.0.1:4577/health

Returns per-key stats: requests, errors, cache hits, throttle status, active sessions.

Architecture

┌─────────┐     ┌──────────────────┐     ┌──────────┐
│ Client   │────▶│ api-key-lb proxy │────▶│ API      │
│ (Hermes) │     │  :4577           │     │ (z.ai)   │
│ (OpenCode)│    │  sticky routing  │     │          │
│ (curl)   │     │  429 fallback    │     │          │
└─────────┘     └──────────────────┘     └──────────┘

The proxy is fully transparent — it forwards whatever path/headers the client sends, only replacing the Authorization bearer token and Host header.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

api-key-lb

Features

Quick Start

Config

How Sticky Routing Works

Connecting Your Tools

Health Check

Architecture

License