swarm-gem

v1.0.0

Published

2 months ago

Google Gemini API Key Swarm Manager - Automatically rotate multiple API keys to handle rate limits

0High
0Medium
0Low

anhhuydng2

gemini google-ai api-key-manager rate-limit swarm generative-ai

🐝 SwarmGem

Google Gemini API Key Swarm Manager - Automatically rotate multiple API keys to handle rate limits and maximize throughput.

⚠️ Important Warning

CRITICAL: Gemini free tier has EXTREMELY LOW rate limits:

gemini-2.5-flash: Only 5 requests per minute per API key
gemini-3-flash: Only 5 requests per minute per API key

To handle 10 requests/second, you would need at least 120 API keys!

This package is suitable for:

✅ Burst traffic (short periods of high load)
✅ Moderate usage (< 5 req/min with few keys)
✅ Development and testing
❌ NOT suitable for sustained high throughput with free tier

For production high-throughput applications, consider upgrading to paid tier with higher limits.

🚀 Quick Start

Installation

npm install swarm-gem

Setup Environment Variables

# .env
GEMINI_SWARM_KEYS="key1,key2,key3,key4,key5"
GEMINI_MODEL="gemini-2.5-flash"

Get your API keys from: https://aistudio.google.com/app/apikey

Basic Usage

import SwarmGem from 'swarm-gem'

// Initialize from environment variables
const sg = SwarmGem.init()

// Generate content
const response = await sg.generate("Hello, how are you?")
console.log(response)

📖 Usage Examples

With Options

import SwarmGem from 'swarm-gem'

const sg = SwarmGem.init()

const response = await sg.generate("Write a poem about the ocean", {
  temperature: 0.9,   // Higher creativity
  maxTokens: 1000     // Limit response length
})

console.log(response)

Manual Configuration

import SwarmGem from 'swarm-gem'

const sg = SwarmGem.init({
  keys: ['key1', 'key2', 'key3'],
  model: 'gemini-3-flash',
  rateLimitsPath: './custom-rate-limits.json'  // Optional
})

const response = await sg.generate("Explain quantum computing")
console.log(response)

Error Handling

import SwarmGem from 'swarm-gem'

const sg = SwarmGem.init()

try {
  const response = await sg.generate("Your prompt here")
  console.log(response)
} catch (error) {
  if (error.message.includes('exhausted')) {
    console.error('All API keys hit rate limit. Please wait or add more keys.')
  } else if (error.message.includes('permanently failed')) {
    console.error('Some API keys are invalid. Check your keys.')
  } else {
    console.error('API error:', error.message)
  }
}

⚙️ Configuration

Environment Variables

| Variable | Description | Required | Default | |----------|-------------|----------|---------| | GEMINI_SWARM_KEYS | Comma-separated API keys | Yes | - | | GEMINI_MODEL | Model name to use | No | gemini-2.5-flash |

Supported Models

Check config/rate-limits.json for all supported models:

| Model | RPM | TPM | RPD | Use Case | |-------|-----|-----|-----|----------| | gemini-2.5-flash-lite | 10 | 250K | 20 | Fastest, lightweight tasks | | gemini-2.5-flash | 5 | 250K | 20 | Fast, general purpose | | gemini-3-flash | 5 | 250K | 20 | Latest fast model | | gemini-2.5-flash-tts | 3 | 10K | 10 | Text-to-speech | | gemini-2.5-flash-native-audio-dialog | Unlimited | 1M | Unlimited | Live audio API |

RPM = Requests Per Minute | TPM = Tokens Per Minute | RPD = Requests Per Day

🔧 Custom Rate Limits

You can customize rate limits by providing your own configuration file:

1. Create Custom Configuration

[
  {
    "model": "gemini-2.5-flash",
    "category": "Text-out models",
    "rpm": "0 / 5",
    "tpm": "0 / 250K",
    "rpd": "0 / 20"
  }
]

Format Notes:

rpm: "current / limit" (e.g., "0 / 10")
Supports suffixes: K (thousands), M (millions)
Use Unlimited for no limit

2. Use Custom Configuration

const sg = SwarmGem.init({
  keys: ['key1', 'key2'],
  model: 'gemini-2.5-flash',
  rateLimitsPath: './my-rate-limits.json'
})

📊 How It Works

Key Rotation Logic

┌─────────────────────────────────────┐
│  Request comes in                   │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│  Pick available key                 │
│  - Not in cooldown                  │
│  - Within rate limits               │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│  Make API call                      │
└──────────────┬──────────────────────┘
               │
        ┌──────┴──────┐
        │             │
        ▼             ▼
    Success      Error (429/401/403)
        │             │
        ▼             ▼
   Mark used    Mark failed
   Return       Try next key

Rate Limit Tracking

Per-Key Tracking: Each key tracks its own usage counters
Time Windows:
- Minute counter resets after 60 seconds
- Day counter resets after 24 hours
Cooldown: Failed keys go into cooldown:
- 429 (Rate Limit): 60 seconds (or retry-after header)
- 401/403 (Auth): Permanent (Infinity)

Automatic Retry

When a key hits rate limit:

Mark key as failed with cooldown time
Automatically pick next available key
Retry request with new key
Continue until success or all keys exhausted

🧮 Calculating Required Keys

To determine how many keys you need:

Required Keys = (Requests per second × 60) / Rate limit per minute

Examples

Scenario 1: 1 request/second

(1 req/s × 60) / 5 RPM = 12 keys minimum

Scenario 2: 10 requests/second

(10 req/s × 60) / 5 RPM = 120 keys minimum

Scenario 3: Burst of 50 requests over 10 seconds

Average: 5 req/s × 60 = 300 req/min
300 / 5 RPM = 60 keys minimum

Add 20-30% buffer for safety and account for retry delays.

🔍 Monitoring & Debugging

Check Available Keys

import { SwarmKeyManager } from 'swarm-gem'

// After initialization, logs show:
// SwarmGem initialized: 5 keys, model: gemini-2.5-flash, available: 5/5

Console Output

The package automatically logs warnings for:

Rate limits hit
Authentication failures
Key cooldowns

Example output:

Rate limit hit on key AIzaSyAbc1... (retry after 60s). Attempting next key...
API key AIzaSyDef2... marked as permanently failed. Key is invalid or lacks permissions.
SwarmGem initialized: 5 keys, model: gemini-2.5-flash, available: 5/5

🛡️ Best Practices

1. Key Management

Rotate Keys: Regularly rotate your API keys for security
Separate Projects: Use different key pools for different projects
Monitor Usage: Track which keys are hitting limits frequently
Invalid Keys: The package automatically detects and disables invalid keys

2. Error Handling

Always wrap API calls in try-catch:

try {
  const response = await sg.generate(prompt)
  return response
} catch (error) {
  // Log error
  // Implement fallback logic
  // Alert monitoring system
}

3. Rate Limit Strategy

Burst Traffic: Keep extra keys as buffer (20-30% more than calculated)
Sustained Load: Consider upgrading to paid tier
Off-Peak: Batch requests during off-peak hours when possible

4. Request Optimization

Batch Prompts: Combine multiple questions into one request when possible
Cache Responses: Cache common queries to reduce API calls
Limit Tokens: Set reasonable maxTokens to avoid hitting TPM limits

🧪 Testing

Run Tests

npm test

Example Test

import { describe, it, expect } from 'vitest'
import { parseRateLimit } from './src/utils.js'

describe('parseRateLimit', () => {
  it('should parse basic numbers', () => {
    expect(parseRateLimit('0 / 10')).toBe(10)
  })

  it('should parse K suffix', () => {
    expect(parseRateLimit('0 / 250K')).toBe(250000)
  })

  it('should parse M suffix', () => {
    expect(parseRateLimit('0 / 1M')).toBe(1000000)
  })

  it('should parse Unlimited', () => {
    expect(parseRateLimit('0 / Unlimited')).toBe(Infinity)
  })
})

📝 API Reference

`SwarmGem.init(config?)`

Initialize a new SwarmGem instance.

Parameters:

config (optional): Configuration object
- keys?: string[] - Array of API keys (overrides env)
- model?: string - Model name (overrides env, defaults to 'gemini-2.5-flash')
- rateLimitsPath?: string - Path to custom rate limits file

Returns: SwarmGem instance

Throws: Error if no keys provided or invalid configuration

`sg.generate(prompt, options?)`

Generate content using Gemini API with automatic key rotation.

Parameters:

prompt: string - Text prompt to send to the model
options?: GenerateOptions (optional)
- temperature?: number - 0.0 to 1.0 (default: 0.7)
- maxTokens?: number - Maximum tokens (default: 8192)

Returns: Promise<string> - Generated text response

Throws: Error if all keys exhausted or non-recoverable error

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

# Clone the repository
git clone https://github.com/yourusername/swarm-gem.git
cd swarm-gem

# Install dependencies
npm install

# Build TypeScript
npm run build

# Run tests
npm test

📄 License

MIT License - see LICENSE file for details.

🔗 Links

💡 Tips & Tricks

Reducing API Calls

Cache Results: Store responses for identical prompts
Debounce Requests: Delay rapid requests from user input
Batch Processing: Process multiple items in fewer requests

Monitoring Costs

Free tier limits (per key per day):

gemini-2.5-flash: 20 requests/day = 100 requests with 5 keys
gemini-3-flash: 20 requests/day = 100 requests with 5 keys

Track daily usage to avoid hitting daily limits.

Scaling Strategy

| Stage | Keys Needed | Cost | Throughput | |-------|-------------|------|------------| | Development | 2-3 | Free | < 1 req/s | | Testing | 5-10 | Free | 1-2 req/s | | Small Production | 20-50 | Free | 2-5 req/s | | Medium Production | 100-200 | Free | 10-15 req/s | | Large Production | Consider Paid Tier | $$$ | Unlimited |

❓ FAQ

Q: Why do I get "All API keys exhausted" error?

A: This means all your keys have hit their rate limits. Solutions:

Wait for the minute/day window to reset
Add more API keys
Reduce request frequency
Upgrade to paid tier

Q: Can I mix free and paid tier keys?

A: Yes, but you'll need to update the rate limits config for paid keys to reflect their higher limits.

Q: What happens if one key is invalid?

A: The package automatically detects invalid keys (401/403 errors), marks them as permanently failed, and continues with remaining valid keys.

Q: Does this package cache responses?

A: No, SwarmGem focuses solely on key rotation and rate limiting. Implement caching in your application layer if needed.

Q: Can I use this with streaming responses?

A: Current version (v1.0.0) does not support streaming. This may be added in future versions.

Q: How accurate is the rate limit tracking?

A: Very accurate for request count limits (RPM, RPD). Token count limits (TPM) are not tracked, so you may still hit TPM limits if generating very long responses.

🐛 Troubleshooting

Issue: Module not found errors

# Make sure to build TypeScript first
npm run build

Issue: Environment variables not loading

# Install dotenv if needed
npm install dotenv

# Load in your app
import 'dotenv/config'
import SwarmGem from 'swarm-gem'

Issue: Rate limits still being hit

Check if you're hitting token limits (TPM) instead of request limits
Reduce maxTokens in generation options
Add more API keys
Verify your keys are valid

Made with ❤️ for the Gemini API community