keymux

v0.3.0

Published

a month ago

Transparent API key pooling for the OpenAI SDK — smart scheduling with proactive rate limit avoidance

keymux

Transparent API key pooling for the OpenAI SDK — smart scheduling with proactive rate limit avoidance.

Why keymux?

Many LLM providers offer free tiers with generous token allowances — but rate limits are enforced per API key, not per account. The only way to multiply your effective throughput is to pool keys from multiple accounts and rotate automatically when one hits its limit.

keymux does exactly that. Drop it in as a replacement for the OpenAI client and it handles rotation transparently — no changes to your existing calls required. With smart scheduling, it tracks per-key budgets and avoids 429s before they happen. Works with any OpenAI-compatible provider: Gemini, Groq, OpenRouter, and more.

How it works

  ┌───────────┐         ┌─────────────────────────────────┐         ┌─────────────┐
  │           │         │            keymux               │         │             │
  │  Your App │────────►│  ┌───────┐ ┌───────┐ ┌───────┐  │────────►│   LLM API   │
  │           │◄────────│  │ Key 1 │ │ Key 2 │ │ Key 3 │  │◄────────│             │
  └───────────┘         │  └───────┘ └───────┘ └───────┘  │         └─────────────┘
                        │                                 │
                        │  Smart: picks key with budget   │
                        │  Basic: auto-rotates on 429     │
                        └─────────────────────────────────┘

Free-tier providers with OpenAI-compatible endpoints

| Provider | Free tier | Rate limit | Base URL | |----------|-----------|------------|----------| | Gemini | Permanent | 15 RPM / 1,500 RPD | https://generativelanguage.googleapis.com/v1beta/openai | | Groq | Permanent | 30 RPM / 6,000 tokens/min | https://api.groq.com/openai/v1 | | Cerebras | Permanent | 30 RPM / 1M tokens/day | https://api.cerebras.ai/v1 | | OpenRouter | Permanent (29+ free models) | 20 RPM / 200 req/day | https://openrouter.ai/api/v1 | | NVIDIA NIM | Permanent | ~40 RPM / 100+ models | https://integrate.api.nvidia.com/v1 |

[!NOTE] Rate limits apply per API key. Each key must come from a separate account to get an independent quota — multiple keys from the same account share the same limit.

Installation

npm i keymux

[!IMPORTANT] keymux requires openai >= 6.0.0. The async apiKey function support used internally was introduced in v6.

Getting Started

Gemini (Google AI Studio)

import { KeyPool } from 'keymux'

const client = new KeyPool({
  keys: process.env.GEMINI_KEYS!, // "key1,key2,key3"
  baseURL: 'https://generativelanguage.googleapis.com/v1beta/openai',
  strategy: 'least-recently-used',
})

const response = await client.chat.completions.create({
  model: 'gemini-2.0-flash',
  messages: [{ role: 'user', content: 'Hello!' }],
})

console.log(response.choices[0]?.message.content)

Groq

const client = new KeyPool({
  keys: process.env.GROQ_KEYS!,
  baseURL: 'https://api.groq.com/openai/v1',
})

Cerebras

const client = new KeyPool({
  keys: process.env.CEREBRAS_KEYS!,
  baseURL: 'https://api.cerebras.ai/v1',
})

OpenRouter

const client = new KeyPool({
  keys: process.env.OPENROUTER_KEYS!,
  baseURL: 'https://openrouter.ai/api/v1',
})

NVIDIA NIM

const client = new KeyPool({
  keys: process.env.NVIDIA_KEYS!,
  baseURL: 'https://integrate.api.nvidia.com/v1',
})

[!TIP] keys accepts both a string array and a comma-separated string, so you can use a single env var per provider instead of one per key.

By default, keymux retries automatically with the next key when one hits a 429. If all keys are exhausted, a KeyPoolExhaustedError is thrown.

Smart Scheduling

Add quotas to enable proactive rate limit avoidance. Instead of waiting for 429 errors, keymux tracks per-key budgets and picks the key with available capacity before sending the request.

import { KeyPool, KeyCooldownError } from 'keymux'

const client = new KeyPool({
  keys: [
    process.env.GEMINI_KEY_1!,
    process.env.GEMINI_KEY_2!,
    process.env.GEMINI_KEY_3!,
  ],
  baseURL: 'https://generativelanguage.googleapis.com/v1beta/openai',
  strategy: 'least-recently-used',
  quotas: 'gemini-free', // ← enables smart scheduling
})

try {
  const response = await client.chat.completions.create({
    model: 'gemini-2.0-flash',
    messages: [{ role: 'user', content: 'Hello!' }],
  })
} catch (err) {
  if (err instanceof KeyCooldownError) {
    // All keys are temporarily on cooldown — you know exactly when to retry
    console.log(`Retry in ${err.retryAfterMs}ms`)
    await new Promise((r) => setTimeout(r, err.retryAfterMs))
  }
}

What smart scheduling does

| Feature | Without quotas | With quotas | |---------|-----------------|---------------| | Key selection | Blind rotation (round-robin/LRU) | Picks key with available budget | | Rate limit detection | After 429 (wasted request) | Before sending (proactive) | | Daily limit handling | Keeps retrying dead keys | Marks key until midnight reset | | Failing keys | Keeps using them | Circuit breaker excludes them | | Error on exhaustion | KeyPoolExhaustedError | KeyCooldownError with retryAfterMs | | Token tracking | None | Estimates before, corrects after with real usage |

Provider presets

Use a preset string for known providers:

| Preset | RPM | TPM | RPD | |--------|-----|-----|-----| | 'gemini-free' | 15 | — | 1,500 | | 'openai-tier-1' | 60 | 60,000 | — | | 'openai-tier-2' | 3,500 | 90,000 | — | | 'groq-free' | 30 | — | 14,400 | | 'openrouter-free' | 20 | — | 200 |

Or pass a custom QuotaConfig:

const client = new KeyPool({
  keys: [...],
  quotas: { rpm: 100, tpm: 50_000, rpd: 10_000 },
})

Health monitoring

Keys that return repeated errors (5xx, network failures) are automatically excluded via a circuit breaker:

3 failures within 60s → key excluded for 60s
After cooldown → one probe request allowed
Probe succeeds → key returns to rotation
Probe fails → excluded again with doubled cooldown (up to 5 min)

Disable with health: false. Configure thresholds with health: { threshold: 5, cooldownMs: 30_000 }.

Error handling

import { KeyPool, KeyCooldownError, KeyPoolExhaustedError } from 'keymux'

try {
  const response = await client.chat.completions.create({ ... })
} catch (err) {
  if (err instanceof KeyCooldownError) {
    // Smart scheduling: all keys temporarily on cooldown
    // No HTTP request was made — blocked proactively
    console.log(`Retry in ${err.retryAfterMs}ms`)
  }
  if (err instanceof KeyPoolExhaustedError) {
    // Basic rotation: all keys hit 429 after retrying
    console.error(`All ${err.keys.length} keys exhausted:`, err.keys)
  }
}

Provider Guides

[!WARNING] Keys must come from different accounts.
Multiple keys created under the same account share the same rate limit quota. Creating 10 keys from the same account does NOT give you 10× the rate limit. Each key must come from a completely separate account to get an independent quota.

[!TIP] Use strategy: 'least-recently-used' for free-tier providers with per-minute limits. It always picks the key unused for the longest time, maximizing the window between reuses.

Gemini

Go to Google AI Studio
Sign in with a Google account
Click Get API key → Create API key
Copy the key (format: AIzaSy...)
Repeat with different Google accounts to get more keys

Free tier: 15 RPM and 1,500 RPD per key with Gemini 2.0 Flash.

Groq

Go to console.groq.com and create an account
Navigate to API Keys → Create API Key
Repeat with different accounts to get more keys

Free tier: 30 RPM and 6,000 tokens/min per key.

Cerebras

Go to cloud.cerebras.ai and create an account
Navigate to API Keys → Create API Key
Repeat with different accounts to get more keys

Free tier: 30 RPM and 1M tokens/day per key. No credit card required.

OpenRouter

Go to openrouter.ai and create an account
Navigate to Keys → Create Key
Use model IDs ending in :free (e.g. meta-llama/llama-3.3-70b-instruct:free)
Repeat with different accounts to get more keys

Free tier: 20 RPM and 200 requests/day per key.

NVIDIA NIM

Join the NVIDIA Developer Program (free)
Navigate to any model page and click Get API Key
Copy the key (format: nvapi-...)
Repeat with different accounts to get more keys

Free tier: ~40 RPM, 100+ models available.

API Reference

`KeyPool`

KeyPool extends OpenAI. All methods, properties, and namespaces (.chat, .embeddings, .models, .images, .audio, etc.) are inherited.

const client = new KeyPool(config: KeyPoolConfig)

`KeyPoolConfig`

| Field | Type | Default | Description | |-------|------|---------|-------------| | keys | string[] | required | API keys for rotation. Minimum 1; rotation is effective with 2+. | | baseURL | string | OpenAI default | Provider base URL. | | strategy | Strategy | 'round-robin' | Key rotation strategy. | | maxRetries | number | keys.length | Maximum retry attempts before giving up. | | onExhausted | (maskedKeys: string[]) => void | — | Called when all keys are exhausted via 429. Not called for KeyCooldownError. | | quotas | ProviderPreset \| QuotaConfig | — | Enables smart scheduling. Pass a preset string or custom config. | | health | HealthConfig \| false | {} | Circuit breaker config. false disables health monitoring. | | tokenCounter | (body: unknown) => number | chars/4 heuristic | Custom token estimation function for budget tracking. | | openaiOptions | Omit<ClientOptions, ...> | — | Pass-through options for the underlying OpenAI client. |

[!NOTE] Without quotas, keymux behaves exactly like v0.1.x — reactive rotation only. Smart scheduling is fully opt-in. When quotas is set, strategy is ignored — smart scheduling uses its own key selection (lowest budget utilization).

`Strategy`

type Strategy = 'round-robin' | 'least-recently-used'

round-robin (default): Cycles through keys in order. O(1). Deterministic.
least-recently-used: Returns the key that was used least recently. O(N). Best for free-tier providers with per-minute limits — maximizes time between reuses of the same key.

`QuotaConfig`

Custom rate limit configuration. rpm is required — other dimensions are optional (untracked if omitted).

interface QuotaConfig {
  rpm: number              // Requests per minute (required)
  tpm?: number             // Tokens per minute
  rpd?: number             // Requests per day
  tpd?: number             // Tokens per day
  dailyResetHour?: number  // UTC hour for daily reset (default: 7 = midnight PT)
}

`HealthConfig`

Circuit breaker configuration. All fields optional with sensible defaults.

interface HealthConfig {
  threshold?: number       // Failures to trip circuit (default: 3)
  windowSize?: number      // Failure counting window in ms (default: 60,000)
  cooldownMs?: number      // Base cooldown when tripped (default: 60,000)
  maxCooldownMs?: number   // Max cooldown after backoff (default: 300,000)
}

`KeyCooldownError`

Thrown proactively when smart scheduling determines no key has available budget. No HTTP request is made.

| Property | Type | Description | |----------|------|-------------| | name | 'KeyCooldownError' | For reliable instanceof checks. | | message | string | e.g. 'All API keys are on cooldown. Retry after 23s' | | retryAfterMs | number | Shortest cooldown remaining across all keys, in milliseconds. |

`KeyPoolExhaustedError`

Thrown reactively when all keys have been rate-limited after exhausting all retry attempts (429 errors).

| Property | Type | Description | |----------|------|-------------| | name | 'KeyPoolExhaustedError' | For reliable instanceof checks. | | message | string | e.g. 'All 3 API keys are rate-limited' | | keys | string[] | All keys that were tried, masked (e.g. 'AIza...cdef'). Safe to log. | | cause | RateLimitError | The original RateLimitError from the OpenAI SDK. |

`maskKey(key: string): string`

Masks an API key for safe logging: shows the first 4 and last 4 characters separated by .... Keys shorter than 8 characters are returned as '***'.

import { maskKey } from 'keymux'

maskKey('AIzaSyB1234567890abcdef') // → 'AIza...cdef'
maskKey('sk-proj-abc123xyz')       // → 'sk-p...3xyz'
maskKey('short')                   // → '***'

[!NOTE] maskKey is exported so you can use it in your own logging — for example when you store keys in a database and want to display them safely in a UI.

TypeScript

Full TypeScript types ship with the package. No @types/ package needed.

import { KeyPool, KeyCooldownError, KeyPoolExhaustedError, PRESETS, maskKey } from 'keymux'
import type { KeyPoolConfig, Strategy, QuotaConfig, HealthConfig, ProviderPreset } from 'keymux'

Project Structure

keymux/
├── src/
│   ├── index.ts              # Public exports
│   ├── key-pool.ts           # KeyPool — extends OpenAI, wires everything together
│   ├── smart-scheduler.ts    # 3-stage key selection (health → budget → tie-break)
│   ├── budget-tracker.ts     # Per-key sliding window RPM/TPM/RPD tracking
│   ├── health-monitor.ts     # Per-key circuit breaker with exponential backoff
│   ├── token-estimator.ts    # Pre-request token estimation (heuristic or custom)
│   ├── presets.ts            # Provider preset definitions and resolution
│   ├── request-context.ts    # AsyncLocalStorage for request-scoped state
│   ├── scheduler.ts          # KeyScheduler — round-robin and LRU logic
│   ├── errors.ts             # KeyPoolExhaustedError + KeyCooldownError + maskKey
│   ├── types.ts              # Shared type definitions
│   └── *.test.ts             # Co-located test files (136 tests)
├── dist/                     # Build output (ESM + CJS + .d.ts)
├── tsup.config.ts            # Build config
├── vitest.config.ts          # Test config
└── package.json

Contributing

Bug reports and feature requests are welcome — please use the issue templates.

For code contributions:

git clone https://github.com/iammalego/keymux.git
cd keymux
npm install

npm test           # run tests
npx tsc --noEmit   # type check
npm run build      # build dist/

[!NOTE] This project follows strict TDD — tests are written before implementation. All PRs must include tests for new behavior.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

keymux

Why keymux?

How it works

Free-tier providers with OpenAI-compatible endpoints

Installation

Getting Started

Smart Scheduling

What smart scheduling does

Provider presets

Health monitoring

Error handling

Provider Guides

Gemini

Groq

Cerebras

OpenRouter

NVIDIA NIM

API Reference

KeyPool

KeyPoolConfig

Strategy

QuotaConfig

HealthConfig

KeyCooldownError

KeyPoolExhaustedError

maskKey(key: string): string

TypeScript

Project Structure

Contributing

License

`KeyPool`

`KeyPoolConfig`

`Strategy`

`QuotaConfig`

`HealthConfig`

`KeyCooldownError`

`KeyPoolExhaustedError`

`maskKey(key: string): string`