gemback

v0.7.1

Published

13 days ago

Smart Gemini API Fallback Library for Node.js & TypeScript

0High
0Medium
0Low

laeyoung

gemini google-ai ai fallback retry api rate-limit typescript generative-ai

💎 Gem Back

Smart Gemini API Fallback Library with Multi-Key Rotation & Monitoring

Gem Back is an NPM library that provides an intelligent fallback system and production-grade monitoring for Google Gemini API, automatically handling RPM (Requests Per Minute) rate limits.

한국어 문서 | Examples | Changelog

🎯 Why Gem Back?

The Gemini API has RPM (Requests Per Minute) limits on the free tier, causing 429 Too Many Requests errors in high-traffic scenarios. Gem Back solves this problem with:

Key Features ✨

✅ Automatic Fallback: Seamlessly switches to alternate models when one fails
✅ Smart Retry: Handles transient errors with Exponential Backoff
✅ Multi-Key Rotation: Rotates through multiple API keys to bypass RPM limits
✅ Streaming Support: Real-time response streaming (generateStream())
✅ Conversational Interface: Multi-turn chat support (chat())
✅ Statistics Tracking: Monitor usage and success rates per model/key
✅ Zero Configuration: Works out of the box with sensible defaults
✅ Full TypeScript Support: Complete type definitions and autocomplete
✅ Dual Module Format: CommonJS + ESM support
✅ Extensively Tested: 248 tests verify reliability
✅ Monitoring & Tracking: Rate limit prediction and model health monitoring

🚀 Supported Models

Gem Back supports automatic fallback across Gemini models:

Default Fallback Chain (Optimized for Free Tier — v0.7.0, RPD-first):

gemini-3.1-flash-lite — stable, 500 RPD (dominant daily quota)
gemini-3.5-flash — newest, highest quality (20 RPD)
gemini-3-flash-preview — backup (20 RPD) ⚠️

If output quality matters more than daily throughput, pass an explicit fallbackOrder putting gemini-3.5-flash first.

Free-Tier Quota Snapshot (2026-05-28):

| Model | RPM | TPM | RPD | Notes | |---|---|---|---|---| | gemini-3.1-flash-lite | 15 | 250K | 500 | | | gemini-2.5-flash-lite | 10 | 250K | 20 | ⚠️ deprecated (shutdown 2026-07-22 → gemini-3.1-flash-lite) | | gemini-3.5-flash | 5 | 250K | 20 | | | gemini-3-flash-preview | 5 | 250K | 20 | ⚠️ preview | | gemini-2.5-flash | 5 | 250K | 20 | ⚠️ deprecated (shutdown 2026-06-17 → gemini-3.1-flash-lite) |

Paid-Only Models (still in ALL_MODELS; runtime warning when used on free-tier keys):

gemini-3.1-pro-preview
gemini-2.5-pro
gemini-2.0-flash
gemini-2.0-flash-lite

Deprecation Warnings (v0.6.0+): Models scheduled for shutdown are automatically tracked. Enable logLevel: 'warn' to see deprecation warnings, or use the DEPRECATED_MODELS export for programmatic access.

import { DEPRECATED_MODELS } from 'gemback';

// Check which models are deprecated
DEPRECATED_MODELS.forEach(({ model, shutdownDate, replacement }) => {
  console.log(`${model} → ${replacement} (by ${shutdownDate})`);
});

Model Auto-Update System: The library includes automation scripts to keep the model list current with Google's API updates. See Contributing Guide for details on updating models.

📦 Installation

npm install gemback
# or
yarn add gemback
# or
pnpm add gemback

🔄 Migrating from v0.6 to v0.7

v0.7.0 includes one always-on breaking change and one conditional one. Most call sites need no update.

Confirmed breaking change

DeprecatedModelInfo.reason is now a string-literal union ('replaced_by_newer' | 'removed_from_api' | 'tier_change') instead of free-form string. If you read reason, switch to the union. The old prose strings on existing entries were moved to a new optional notes: string field.
```
// Before (v0.6)
const reason: string = info.reason; // e.g. "Gemini 2.0 series end of life"

// After (v0.7)
const reason: DeprecationReason = info.reason; // 'replaced_by_newer' | ...
const detail: string | undefined = info.notes; // original prose, if any
```

Default fallback order changed

If you didn't pass fallbackOrder to GemBack, the default sequence now optimizes for daily RPD instead of model quality:

gemini-3.1-flash-lite → gemini-3.5-flash → gemini-3-flash-preview

To keep the v0.6 quality-first behavior, pass it explicitly:

new GemBack({
  apiKey: process.env.GEMINI_API_KEY,
  fallbackOrder: ['gemini-3.5-flash', 'gemini-3-flash-preview', 'gemini-3.1-flash-lite'],
});

New utilities you can adopt

REMOVED_MODELS and DEPRECATED_MODELS exports are the single source of truth for what was removed/replaced.
RateLimitStatus.currentTPM / maxTPM / tpmUtilizationPercent are now populated for free-tier models.
gemini-3.5-flash and stable gemini-3.1-flash-lite are available.

Paid-only models on free-tier keys

If you invoke gemini-2.5-pro, gemini-2.0-flash, gemini-2.0-flash-lite, or gemini-3.1-pro-preview on a free-tier API key, you'll see a one-time logger.warn explaining the 4xx you can expect. Either upgrade the key or pin fallbackOrder to free-tier models.

⚡ Quick Start

Basic Usage

import { GemBack } from 'gemback';

// Create client
const client = new GemBack({
  apiKey: process.env.GEMINI_API_KEY
});

// Generate text
const response = await client.generate('Hello, Gemini!');
console.log(response.text);
// Automatically selects the best model and handles fallback

Custom Fallback Order

const client = new GemBack({
  apiKey: process.env.GEMINI_API_KEY,
  fallbackOrder: [
    'gemini-3.5-flash',       // Optional: top-quality first if quality > daily throughput
    'gemini-3.1-flash-lite',  // Stable, highest free-tier RPD (500/day)
    'gemini-3-flash-preview', // Last-resort backup
  ],
  maxRetries: 3,
  timeout: 30000,
  debug: true, // Enable detailed logging
});

Streaming Response

const stream = client.generateStream('Tell me a long story');

for await (const chunk of stream) {
  process.stdout.write(chunk.text);
}

Multi-Key Rotation (New!)

Effectively bypass RPM limits by using multiple API keys:

const client = new GemBack({
  apiKeys: [
    process.env.GEMINI_API_KEY_1,
    process.env.GEMINI_API_KEY_2,
    process.env.GEMINI_API_KEY_3
  ],
  apiKeyRotationStrategy: 'round-robin' // or 'least-used'
});

// Automatically rotates through keys for each request
const response1 = await client.generate('First question'); // Uses key_1
const response2 = await client.generate('Second question'); // Uses key_2
const response3 = await client.generate('Third question'); // Uses key_3

// Check per-key statistics
const stats = client.getFallbackStats();
console.log(stats.apiKeyStats); // Usage and success rate per key

Rotation Strategies:

round-robin (default): Rotate through keys sequentially
least-used: Prioritize the least-used key

Monitoring & Tracking (New!)

Improve stability with real-time rate limit tracking and model health monitoring:

const client = new GemBack({
  apiKey: process.env.GEMINI_API_KEY,
  enableMonitoring: true  // Enable monitoring
});

// Use the API
await client.generate('Question 1');
await client.generate('Question 2');
// ...

// Get detailed monitoring statistics
const stats = client.getFallbackStats();

// Check rate limit status
console.log(stats.monitoring?.rateLimitStatus);
// [
//   {
//     model: 'gemini-2.5-flash',
//     currentRPM: 5,          // Current requests per minute
//     maxRPM: 15,             // Maximum RPM
//     utilizationPercent: 33, // Utilization percentage
//     isNearLimit: false,     // Near limit warning
//     willExceedSoon: false,  // Will exceed soon warning
//     windowStats: {
//       requestsInLastMinute: 5,
//       requestsInLast5Minutes: 12,
//       averageRPM: 2.4
//     }
//   }
// ]

// Check model health status
console.log(stats.monitoring?.modelHealth);
// [
//   {
//     model: 'gemini-2.5-flash',
//     status: 'healthy',           // healthy | degraded | unhealthy
//     successRate: 0.98,           // Success rate
//     averageResponseTime: 1234,   // Average response time (ms)
//     availability: 0.99,          // Availability
//     consecutiveFailures: 0,      // Consecutive failures
//     metrics: {
//       totalRequests: 100,
//       successfulRequests: 98,
//       failedRequests: 2,
//       p50ResponseTime: 1100,     // 50th percentile
//       p95ResponseTime: 1800,     // 95th percentile
//       p99ResponseTime: 2100      // 99th percentile
//     }
//   }
// ]

// Overall summary
console.log(stats.monitoring?.summary);
// {
//   healthyModels: 3,
//   degradedModels: 1,
//   unhealthyModels: 0,
//   overallSuccessRate: 0.96,
//   averageResponseTime: 1500
// }

Monitoring Features:

✅ Rate Limit Tracking: Real-time RPM usage tracking per model
✅ Predictive Warnings: Automatic warnings before hitting limits (80%, 90% thresholds)
✅ Health Monitoring: Track success rate, response time, and availability per model
✅ Percentile Metrics: Analyze p50, p95, p99 response times
✅ Failure Detection: Automatic status detection (healthy/degraded/unhealthy)

📖 Core Features

1. Automatic Fallback

// Automatically falls back through the fallback chain
// when a model hits rate limit (default v0.7.0: gemini-3.1-flash-lite → gemini-3.5-flash → gemini-3-flash-preview)
const response = await client.generate('Complex question');

2. Retry Logic

const client = new GemBack({
  apiKey: 'YOUR_KEY',
  maxRetries: 3, // Max retries per model
  retryDelay: 1000 // Initial retry delay (ms)
});

3. Error Handling

try {
  const response = await client.generate('Hello');
} catch (error) {
  if (error instanceof GeminiBackError) {
    console.log('Models attempted:', error.allAttempts);
    console.log('Last error:', error.message);
  }
}

4. Statistics

const stats = client.getFallbackStats();
console.log(stats);
// {
//   totalRequests: 100,
//   successRate: 0.95,
//   failureCount: 5,
//   modelUsage: {
//     'gemini-3-flash-preview': 70,
//     'gemini-2.5-flash': 30
//   },
//   apiKeyStats: [  // Only in multi-key mode
//     {
//       keyIndex: 0,
//       totalRequests: 35,
//       successCount: 33,
//       failureCount: 2,
//       successRate: 0.94,
//       lastUsed: Date
//     },
//     // ... other keys
//   ],
//   monitoring: {  // Only when enableMonitoring: true
//     rateLimitStatus: [...],  // Rate limit status per model
//     modelHealth: [...],      // Health status per model
//     summary: {
//       healthyModels: 3,
//       degradedModels: 1,
//       unhealthyModels: 0,
//       overallSuccessRate: 0.96,
//       averageResponseTime: 1500
//     }
//   }
// }

5. System Instructions (v0.5.0+)

Control the model's behavior, personality, and response style:

// String format
const response = await client.generate('Explain TypeScript', {
  systemInstruction: 'You are a helpful programming tutor. Explain concepts clearly for beginners.',
});

// Structured Content format
const response2 = await client.generate('What is async/await?', {
  systemInstruction: {
    role: 'user',
    parts: [{ text: 'You are a senior engineer. Provide technical, detailed explanations.' }],
  },
});

// Works with all generation methods
const stream = client.generateStream('Explain promises', {
  systemInstruction: 'Keep explanations under 100 words. Use bullet points.',
});

const chatResponse = await client.chat(messages, {
  systemInstruction: 'You are a friendly coding mentor. Use analogies to explain.',
});

Use Cases:

Guide model personality and tone
Enforce output formatting requirements
Create role-based assistants (tutor, technical writer, etc.)
Maintain consistent behavior across conversations

6. Function Calling / Tool Use (v0.5.0+)

Enable the model to call external functions with structured parameters:

import type { FunctionDeclaration } from 'gemback';

// Define a function
const weatherFunction: FunctionDeclaration = {
  name: 'get_current_weather',
  description: 'Get the current weather in a given location',
  parameters: {
    type: 'object',
    properties: {
      location: {
        type: 'string',
        description: 'The city name, e.g. Tokyo, London',
      },
      unit: {
        type: 'string',
        enum: ['celsius', 'fahrenheit'],
      },
    },
    required: ['location'],
  },
};

// Use the function
const response = await client.generate("What's the weather in Tokyo?", {
  tools: [weatherFunction],
  toolConfig: {
    functionCallingMode: 'auto', // 'auto' | 'any' | 'none'
  },
});

// Check if model called the function
if (response.functionCalls && response.functionCalls.length > 0) {
  response.functionCalls.forEach((call) => {
    console.log('Function:', call.name);
    console.log('Arguments:', call.args);

    // Execute your actual function here
    const result = getCurrentWeather(call.args.location, call.args.unit);
    console.log('Result:', result);
  });
}

Function Calling Modes:

auto: Model decides when to call functions (default)
any: Force model to call at least one function
none: Disable function calling

Advanced Features:

// Restrict to specific functions
const response = await client.generate(prompt, {
  tools: [weatherFunction, calculatorFunction, databaseFunction],
  toolConfig: {
    functionCallingMode: 'any',
    allowedFunctionNames: ['get_current_weather'], // Only allow weather
  },
});

// Multi-turn conversation with function results
const followUpResponse = await client.generateContent([
  { role: 'user', parts: [{ text: "What's the weather?" }] },
  { role: 'model', parts: [{ functionCall: { name: 'get_current_weather', args: {...} } }] },
  { role: 'user', parts: [{ functionResponse: { name: 'get_current_weather', response: {...} } }] },
  { role: 'user', parts: [{ text: 'Should I bring an umbrella?' }] },
]);

Use Cases:

Integrate with external APIs and databases
Perform calculations and data processing
Access real-time information
Create structured workflows and automation
Build AI agents with tool access

7. Safety Settings (v0.5.0+)

Configure content filtering and safety thresholds for different harm categories:

import { HarmCategory, HarmBlockThreshold } from '@google/genai';

// Basic safety settings
const response = await client.generate('Tell me about content moderation', {
  safetySettings: [
    {
      category: HarmCategory.HARM_CATEGORY_HARASSMENT,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
    {
      category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
  ],
});

// Strict filtering for children's content
const childContent = await client.generate('Tell a story for kids', {
  safetySettings: [
    {
      category: HarmCategory.HARM_CATEGORY_HARASSMENT,
      threshold: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    },
    {
      category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
      threshold: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    },
    {
      category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
      threshold: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    },
    {
      category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
      threshold: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    },
  ],
});

// Combine with other options
const response3 = await client.generate('Write an educational article', {
  systemInstruction: 'You are an educational content writer.',
  safetySettings: [
    {
      category: HarmCategory.HARM_CATEGORY_HARASSMENT,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
  ],
  temperature: 0.7,
});

Available Harm Categories:

HARM_CATEGORY_HARASSMENT
HARM_CATEGORY_HATE_SPEECH
HARM_CATEGORY_SEXUALLY_EXPLICIT
HARM_CATEGORY_DANGEROUS_CONTENT

Blocking Thresholds:

BLOCK_NONE: No blocking
BLOCK_ONLY_HIGH: Block only high severity content
BLOCK_MEDIUM_AND_ABOVE: Block medium and high severity (recommended)
BLOCK_LOW_AND_ABOVE: Block low, medium, and high severity (strictest)

Use Cases:

Child-safe content generation
Compliance with content policies
Brand-appropriate responses
Educational content filtering

8. JSON Mode (v0.5.0+)

Get structured JSON responses with schema validation:

import type { ResponseSchema } from 'gemback';

// Basic JSON mode
const response = await client.generate('Generate a user profile with name, age, and email', {
  responseMimeType: 'application/json',
});

console.log(response.json);  // Parsed JSON object
console.log(response.text);  // Raw JSON string

// JSON mode with schema validation
const userSchema: ResponseSchema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    age: { type: 'number' },
    email: { type: 'string' },
  },
  required: ['name', 'age', 'email'],
};

const response2 = await client.generate('Generate a user profile', {
  responseMimeType: 'application/json',
  responseSchema: userSchema,
});

// Type-safe usage
interface User {
  name: string;
  age: number;
  email: string;
}

const user = response2.json as User;
console.log(user.name, user.age, user.email);

// Array of objects
const productsSchema: ResponseSchema = {
  type: 'array',
  items: {
    type: 'object',
    properties: {
      id: { type: 'number' },
      name: { type: 'string' },
      price: { type: 'number' },
    },
    required: ['id', 'name', 'price'],
  },
};

const products = await client.generate('Generate 3 products', {
  responseMimeType: 'application/json',
  responseSchema: productsSchema,
});

// Complex nested structures
const blogPostSchema: ResponseSchema = {
  type: 'object',
  properties: {
    title: { type: 'string' },
    author: {
      type: 'object',
      properties: {
        name: { type: 'string' },
        email: { type: 'string' },
      },
    },
    tags: {
      type: 'array',
      items: { type: 'string' },
    },
  },
  required: ['title', 'author'],
};

Schema Types Supported:

object: Object with defined properties
array: Array of items
string, number, boolean, null: Primitive types

Use Cases:

API response formatting
Data extraction and structuring
Type-safe API integration
Structured content generation
Database-ready outputs

🔧 API Reference

`GemBack`

Constructor Options

import type { GeminiModel, RateLimitConfig } from 'gemback';

interface GemBackOptions {
  apiKey?: string;                   // Gemini API key (single key)
  apiKeys?: string[];                // Multiple API keys (multi-key mode)
  fallbackOrder?: GeminiModel[];     // Optional: Fallback order
  maxRetries?: number;               // Optional: Max retries (default: 2)
  timeout?: number;                  // Optional: Request timeout (default: 30000ms)
  retryDelay?: number;               // Optional: Initial retry delay (default: 1000ms)
  debug?: boolean;                   // Optional: Debug logging (default: false)
  logLevel?: 'debug' | 'info' | 'warn' | 'error' | 'silent';
  apiKeyRotationStrategy?: 'round-robin' | 'least-used'; // Key rotation strategy (default: round-robin)
  enableMonitoring?: boolean;        // Optional: Enable monitoring (default: false)
  enableRateLimitPrediction?: boolean; // Optional: Rate limit prediction warnings (default: false)
  customRateLimits?: Partial<Record<GeminiModel, Partial<RateLimitConfig>>>; // Optional: per-model
                                     // RPM/TPM/RPD overrides applied on top of
                                     // FREE_TIER_LIMITS defaults. Per-entry field merge,
                                     // so { 'gemini-2.5-flash': { rpm: 10 } } preserves
                                     // the existing tpm/rpd. Only consulted when
                                     // `enableMonitoring: true`.
}

Note: Either apiKey or apiKeys must be provided.

Methods

`generate(prompt, options?)`

Generate text response

const response = await client.generate('Hello!', {
  model: 'gemini-2.5-flash',  // Specify model
  temperature: 0.7,
  maxTokens: 1000,
  systemInstruction: 'You are a helpful assistant',  // v0.5.0+
  tools: [weatherFunction],  // v0.5.0+
  toolConfig: { functionCallingMode: 'auto' },  // v0.5.0+
  safetySettings: [{ category: HarmCategory.HARM_CATEGORY_HARASSMENT, threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE }],  // v0.5.0+
  responseMimeType: 'application/json',  // v0.5.0+
  responseSchema: { type: 'object', properties: { ... } }  // v0.5.0+
});

GenerateOptions:

interface GenerateOptions {
  model?: GeminiModel;
  temperature?: number;           // 0.0 - 2.0
  maxTokens?: number;            // Max output tokens
  topP?: number;                 // 0.0 - 1.0
  topK?: number;                 // Top-K sampling
  systemInstruction?: string | Content;  // v0.5.0+: Control model behavior
  tools?: FunctionDeclaration[];         // v0.5.0+: Available functions
  toolConfig?: ToolConfig;               // v0.5.0+: Function calling config
  safetySettings?: SafetySetting[];      // v0.5.0+: Content filtering
  responseMimeType?: string;             // v0.5.0+: Response format (e.g., 'application/json')
  responseSchema?: ResponseSchema;       // v0.5.0+: JSON schema validation
}

interface ToolConfig {
  functionCallingMode?: 'auto' | 'any' | 'none';
  allowedFunctionNames?: string[];
}

`generateStream(prompt, options?)`

Generate streaming response

const stream = client.generateStream('Tell me a story');
for await (const chunk of stream) {
  console.log(chunk.text);
}

`chat(messages, options?)`

Conversational interface

const response = await client.chat([
  { role: 'user', content: 'Hello' },
  { role: 'assistant', content: 'Hi! How can I help?' },
  { role: 'user', content: 'Tell me about TypeScript' }
]);

`getFallbackStats()`

Get fallback statistics

const stats = client.getFallbackStats();

⚙️ Configuration

Basic Configuration

const client = new GemBack({
  apiKey: 'YOUR_KEY',

  // Specify models to use
  fallbackOrder: [
    'gemini-3.1-flash-lite',
    'gemini-3.5-flash',
    'gemini-3-flash-preview',
  ],

  // Retry settings
  maxRetries: 3,
  retryDelay: 2000,

  // Timeout settings
  timeout: 60000,

  // Logging settings
  debug: true,
  logLevel: 'info'
});

Advanced Configuration (v0.2.0)

const client = new GemBack({
  // Multi-key rotation (v0.2.0+)
  apiKeys: ['KEY_1', 'KEY_2', 'KEY_3'],
  apiKeyRotationStrategy: 'least-used',  // or 'round-robin'

  // Monitoring & tracking (v0.2.0+)
  enableMonitoring: true,                // Enable monitoring
  enableRateLimitPrediction: true,       // Rate limit prediction warnings

  // Base settings
  fallbackOrder: ['gemini-3.1-flash-lite', 'gemini-3.5-flash', 'gemini-3-flash-preview'],
  maxRetries: 2,
  timeout: 30000,
  logLevel: 'info'
});

🔄 Fallback Behavior

Error Handling Scenarios

| Error Type | Handling | |-----------|-----------| | 429 RPM Limit | ⚡ Immediate fallback to next model | | 5xx Server Error | 🔄 Retry then fallback | | Timeout | 🔄 Retry then fallback | | 401/403 Auth Error | ❌ Immediate failure (stop fallback) | | All Models Failed | ❌ Return detailed error info |

Retry Strategy

Exponential Backoff: 1s → 2s → 4s → ...
Retryable Errors: 5xx, Timeout, Network Error
Non-retryable Errors: 4xx (except 429), Auth errors

📊 Logging Examples

Basic Logging (`debug: true`)

[GemBack] Attempting: gemini-3-flash-preview
[GemBack] Failed (429 RPM Limit): gemini-3-flash-preview
[GemBack] Fallback to: gemini-2.5-flash
[GemBack] Retry attempt 1/2: gemini-2.5-flash
[GemBack] Success: gemini-2.5-flash (2nd attempt)

With Monitoring Enabled (`enableMonitoring: true`)

[GemBack] Monitoring enabled: Rate limit tracking and health monitoring
[GemBack] Attempting: gemini-2.5-flash (API Key #1)
[GemBack] Rate limit warning for gemini-2.5-flash: 12/15 RPM
[GemBack] Success: gemini-2.5-flash (1234ms)

🗺️ Roadmap

Phase 1: Core Features ✅ (Completed - v0.1.0)

[x] Project structure
[x] Basic fallback logic
[x] 4 model support
[x] TypeScript type definitions
[x] Automatic retry with Exponential Backoff
[x] Streaming response support
[x] Conversational interface (chat)
[x] Statistics tracking
[x] Comprehensive test coverage (100 tests)
[x] Complete documentation and examples

Phase 2: Advanced Features ✅ (Completed - v0.2.0)

Phase 2 added advanced features to improve production stability.

🔐 Multi-Key Support & Rotation ✅

[x] Load balancing with multiple API keys
- Automatic key rotation to bypass RPM limits
- Support for round-robin and least-used strategies
- Per-key usage tracking and statistics

📊 Monitoring & Tracking ✅

[x] Rate Limit Tracking & Prediction
- Real-time usage tracking per model
- Predictive warnings before hitting limits (80%, 90% thresholds)
- Sliding window analysis (1-minute, 5-minute)
[x] Health Check & Model Status Monitoring
- Status monitoring per model (response time, success rate, availability)
- Real-time health status (healthy/degraded/unhealthy)
- Percentile-based performance metrics (p50, p95, p99)
- Consecutive failure detection and tracking

Phase 2 Achievements:

✅ 165 comprehensive tests (65% increase from Phase 1)
✅ Production-level monitoring system
✅ Multi-key rotation for RPM limit bypass
✅ Real-time model health tracking

Phase 2.5: Advanced Content Generation ✅ (Completed - v0.5.0)

Phase 2.5 adds production-grade content generation features from the Google GenAI SDK, including function calling, system instructions, safety controls, and structured outputs.

🎯 System Instructions ✅

[x] Control model behavior and response style
- Guide model personality, tone, and output format
- Support both string and structured Content format
- Apply instructions across all generation methods
- Maintain instructions through fallback chains

🔧 Function Calling (Tool Use) ✅

[x] Enable AI to call external functions
- Define functions with structured parameters (JSON Schema)
- Multiple function calling modes: auto, any, none
- Restrict allowed functions with allowedFunctionNames
- Extract function calls from model responses
- Support multi-turn conversations with function results

🛡️ Safety Settings ✅

[x] Content filtering and moderation
- Configure safety thresholds for different harm categories
- Support for harassment, hate speech, sexually explicit, and dangerous content filtering
- Multiple blocking levels: none, low, medium, high
- Child-safe content generation
- Compliance with content policies

📊 JSON Mode ✅

[x] Structured JSON responses
- Automatic JSON parsing with response.json field
- Schema validation with OpenAPI-compatible schemas
- Support for objects, arrays, and nested structures
- Type-safe integration with TypeScript interfaces
- Structured data extraction and API response formatting

Phase 2.5 Achievements:

✅ 248 comprehensive tests
✅ Full GenAI SDK compatibility for all advanced features
✅ Production-ready content safety controls
✅ Type-safe structured outputs with schema validation
✅ Comprehensive examples for all features

Phase 3: Performance & Ecosystem (Planned)

Phase 3 will focus on performance optimization and ecosystem expansion.

⚡ Performance Optimization

[ ] Response Caching
- Reduce API calls with caching
- TTL-based cache expiration
- Memory-efficient cache strategy
[ ] Connection Pooling
- Improve performance with connection reuse
- Optimize concurrent request handling
- Efficient resource usage

🛡️ Advanced Reliability Patterns

[ ] Circuit Breaker Pattern
- Temporary blocking on persistent failures
- Automatic recovery and retry
- System overload prevention

🌐 Ecosystem Expansion

[ ] CLI tools
[ ] Web dashboard (real-time monitoring)
[ ] Monitoring integration (Prometheus, Grafana)
[ ] Additional AI model support (Claude, GPT, etc.)

🤝 Contributing

Contributions are welcome! You can participate by:

Reporting issues
Suggesting features
Submitting pull requests
Improving documentation

See CONTRIBUTING.md for details.

📄 License

MIT License - Free to use, modify, and distribute.

🔗 Links

Documentation: API Documentation
Issues: GitHub Issues
NPM: npm package
Gemini API: Google AI Gemini

💡 FAQ

Q: Where can I get an API key?

A: Get a free API key at Google AI Studio.

Q: What happens when all models fail?

A: Throws GeminiBackError with details of all attempts.

Q: Can I use only specific models?

A: Yes, pass your preferred models in the fallbackOrder option.

Q: What are the costs?

A: Only Gemini API costs apply. Gem Back is free and open-source.

🌟 Projects Using Gem Back

Be the first to showcase your project using Gem Back!

If you're using Gem Back in your project, we'd love to feature it here. Your project could be the first one listed!

Updated: 2025-11-29

Made with ❤️ by Laeyoung

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

💎 Gem Back

🎯 Why Gem Back?

Key Features ✨

🚀 Supported Models

📦 Installation

🔄 Migrating from v0.6 to v0.7

Confirmed breaking change

Default fallback order changed

New utilities you can adopt

Paid-only models on free-tier keys

⚡ Quick Start

Basic Usage

Custom Fallback Order

Streaming Response

Multi-Key Rotation (New!)

Monitoring & Tracking (New!)

📖 Core Features

1. Automatic Fallback

2. Retry Logic

3. Error Handling

4. Statistics

5. System Instructions (v0.5.0+)

6. Function Calling / Tool Use (v0.5.0+)

7. Safety Settings (v0.5.0+)

8. JSON Mode (v0.5.0+)

🔧 API Reference

GemBack

Constructor Options

Methods

generate(prompt, options?)

generateStream(prompt, options?)

chat(messages, options?)

getFallbackStats()

⚙️ Configuration

Basic Configuration

Advanced Configuration (v0.2.0)

🔄 Fallback Behavior

Error Handling Scenarios

Retry Strategy

📊 Logging Examples

Basic Logging (debug: true)

With Monitoring Enabled (enableMonitoring: true)

🗺️ Roadmap

Phase 1: Core Features ✅ (Completed - v0.1.0)

Phase 2: Advanced Features ✅ (Completed - v0.2.0)

🔐 Multi-Key Support & Rotation ✅

📊 Monitoring & Tracking ✅

Phase 2.5: Advanced Content Generation ✅ (Completed - v0.5.0)

🎯 System Instructions ✅

🔧 Function Calling (Tool Use) ✅

🛡️ Safety Settings ✅

📊 JSON Mode ✅

Phase 3: Performance & Ecosystem (Planned)

⚡ Performance Optimization

🛡️ Advanced Reliability Patterns

🌐 Ecosystem Expansion

🤝 Contributing

📄 License

🔗 Links

💡 FAQ

Q: Where can I get an API key?

Q: What happens when all models fail?

Q: Can I use only specific models?

Q: What are the costs?

🌟 Projects Using Gem Back

`GemBack`

`generate(prompt, options?)`

`generateStream(prompt, options?)`

`chat(messages, options?)`

`getFallbackStats()`

Basic Logging (`debug: true`)

With Monitoring Enabled (`enableMonitoring: true`)