cognigate

v1.0.0

Published

3 months ago

AI gateway with budget controls, local fallback, and voice mode - never overspend on AI again

0High
0Medium
0Low

highdragon

ai llm openai anthropic gateway budget cost-control ollama local-ai voice typescript

Cognigate

A unified AI gateway with budget controls, local fallback, and voice mode.

Why Cognigate?

Never overspend on AI again. Cognigate gives you hard budget limits, automatic fallback to free local models, and seamless provider switching.

Key Features

💰 Budget Protection - Hard daily spending limits, never exceed your budget
🔄 Smart Fallback - Automatically switches to Ollama/LM Studio/WebLLM when budget runs out
🎤 Voice Mode - Built-in speech recognition and text-to-speech for conversational AI
🚀 Multi-Provider - OpenAI, Anthropic, Google - one API for all
📊 Real-Time Monitoring - Track costs, get webhook alerts (Slack/Discord)
⚡ Performance - Semantic caching and compression reduce costs by 40%+
🌐 Cross-Platform - Works in Node.js, browsers, React, and Next.js

Quick Start

Installation

npm install cognigate

Basic Usage

import { createGateway } from 'cognigate';

const ai = createGateway({
  dailyBudget: 10,        // $10/day limit
  cacheEnabled: true,
  cloudProviders: {
    openai: { apiKey: process.env.OPENAI_API_KEY }
  },
  localFallback: { enabled: true }
});

// Simple completion
const answer = await ai.complete("What is TypeScript?");
console.log(answer);

// Check budget
const status = ai.getBudgetStatus();
console.log(`Used: $${status.used.toFixed(2)} / $${status.dailyLimit}`);

Features in Detail

💰 Budget Protection

Set daily spending limits and never worry about unexpected costs:

const ai = createGateway({
  dailyBudget: 5.00,  // Hard limit: $5/day
  cloudProviders: {
    openai: { apiKey: process.env.OPENAI_API_KEY }
  }
});

// Track budget in real-time
const status = ai.getBudgetStatus();
console.log(`Used: $${status.used} / $${status.dailyLimit}`);
console.log(`Remaining: $${status.remaining}`);
console.log(`Resets at: ${status.resetAt}`);

Budget automatically resets at midnight UTC. When exceeded, requests throw BudgetExceededError or fall back to local models.

🔄 Multi-Provider Support

Use multiple AI providers with automatic fallback:

const ai = createGateway({
  dailyBudget: 10,
  cloudProviders: {
    openai: {
      apiKey: process.env.OPENAI_API_KEY,
      models: ['gpt-4o-mini', 'gpt-4o']  // Tries mini first
    },
    anthropic: {
      apiKey: process.env.ANTHROPIC_API_KEY,
      models: ['claude-3-haiku', 'claude-3-sonnet']
    },
    google: {
      apiKey: process.env.GOOGLE_API_KEY,
      models: ['gemini-1.5-flash']
    }
  }
});

// Automatically tries providers in order until one succeeds
const answer = await ai.complete("Hello!");

🏠 Local Fallback

When budget runs out or cloud providers fail, automatically switch to free local models:

const ai = createGateway({
  dailyBudget: 1.00,  // Small budget
  cloudProviders: {
    openai: { apiKey: process.env.OPENAI_API_KEY }
  },
  localFallback: {
    enabled: true,
    providers: ['ollama', 'lmstudio', 'webllm']  // Tries in order
  }
});

// If budget exceeded, automatically uses Ollama (free!)
const answer = await ai.complete("Hello!");

Supported local providers:

Ollama - Most popular, GPU-accelerated
LM Studio - Great for Mac with Metal acceleration
WebLLM - Runs entirely in browser via WebGPU

⚡ Streaming Responses

Get tokens as they're generated for better UX:

for await (const token of ai.stream("Write a poem")) {
  process.stdout.write(token);  // Print each word immediately
}

🎤 Voice Mode

Built-in speech recognition and text-to-speech:

import { createGateway } from 'cognigate';
import { Conversation } from 'cognigate/voice';

const ai = createGateway({ dailyBudget: 5 });
const conversation = new Conversation(ai, {
  continuous: true,
  autoSpeak: true,
  language: 'en-US'
});

// Listen for events
conversation.on('transcript', (text) => {
  console.log(`You said: ${text}`);
});

conversation.on('response', (text) => {
  console.log(`AI said: ${text}`);
});

// Start listening
await conversation.start();

📊 Webhook Alerts

Get notified when budget thresholds are reached:

const ai = createGateway({
  dailyBudget: 10,
  cloudProviders: {
    openai: { apiKey: process.env.OPENAI_API_KEY }
  },
  webhooks: {
    slack: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL',
    discord: 'https://discord.com/api/webhooks/YOUR/WEBHOOK/URL',
    custom: 'https://your-server.com/webhook'
  }
});

// Automatically sends alerts at 50%, 80%, and 100% budget usage

Alert types:

50% Warning - Yellow alert
80% Urgent - Orange alert
100% Exceeded - Red alert

🗄️ Smart Caching

Reduce costs with intelligent caching:

const ai = createGateway({
  dailyBudget: 10,
  cacheEnabled: true,
  semanticCaching: true,      // Match similar prompts
  similarityThreshold: 0.9,   // How similar (0-1)
  cloudProviders: {
    openai: { apiKey: process.env.OPENAI_API_KEY }
  }
});

// First call - hits API
await ai.complete("What is TypeScript?");  // Cost: $0.0001

// Second call - cached!
await ai.complete("What is TypeScript?");  // Cost: $0 (cached)

// Third call - semantic match!
await ai.complete("Explain TypeScript");   // Cost: $0 (similar prompt)

🗜️ Prompt Compression

Reduce token usage with automatic compression:

const ai = createGateway({
  dailyBudget: 10,
  compressionLevel: 'high',  // 'low' | 'medium' | 'high'
  cloudProviders: {
    openai: { apiKey: process.env.OPENAI_API_KEY }
  }
});

// Automatically compresses prompts before sending
// High compression: ~40% token reduction

Platform Support

Node.js

npm install cognigate

import { createGateway } from 'cognigate';

const ai = createGateway({
  dailyBudget: 10,
  cloudProviders: {
    openai: { apiKey: process.env.OPENAI_API_KEY }
  }
});

Browser (CDN)

<!DOCTYPE html>
<script type="module">
import { createGateway } from 'https://cdn.jsdelivr.net/npm/[email protected]/dist/index.mjs';

const ai = createGateway({ dailyBudget: 0 }); // unlimited
const answer = await ai.complete("Hello!");
console.log(answer);
</script>

React / Next.js

'use client';
import { createGateway } from 'cognigate';
import { useEffect, useState } from 'react';

export default function ChatPage() {
  const [ai] = useState(() => createGateway({
    dailyBudget: 5,
    cloudProviders: {
      openai: { apiKey: process.env.NEXT_PUBLIC_OPENAI_KEY }
    }
  }));

  const [response, setResponse] = useState('');

  const handleSubmit = async (prompt: string) => {
    const answer = await ai.complete(prompt);
    setResponse(answer);
  };

  return (
    <div>
      <button onClick={() => handleSubmit("Hello!")}>
        Ask AI
      </button>
      <p>{response}</p>
    </div>
  );
}

API Reference

`createGateway(config)`

Creates a new gateway instance.

interface GatewayConfig {
  dailyBudget?: number;              // Daily spending limit in USD (0 = unlimited)
  cacheEnabled?: boolean;             // Enable response caching
  semanticCaching?: boolean;          // Cache similar prompts
  similarityThreshold?: number;       // Similarity threshold (0-1)
  compressionLevel?: 'low' | 'medium' | 'high';
  localFallback?: LocalFallbackConfig;
  cloudProviders?: CloudProvidersConfig;
  webhooks?: WebhooksConfig;
}

`ai.complete(prompt, options?)`

Send a prompt and get a text response.

const answer = await ai.complete("What is AI?", {
  model: 'gpt-4o',              // Override default model
  temperature: 0.7,              // Creativity (0-2)
  maxTokens: 500,                // Max response length
  forceProvider: 'cloud' | 'local'  // Force provider type
});

`ai.stream(prompt, options?)`

Stream response tokens in real-time.

for await (const token of ai.stream("Write a story")) {
  console.log(token);  // Each word as it's generated
}

`ai.getBudgetStatus()`

Get current budget information.

const status = ai.getBudgetStatus();
// Returns: { dailyLimit, used, remaining, resetAt }

`ai.clearCache()`

Manually clear the response cache.

ai.clearCache();

Examples

See the examples/ directory for complete working examples:

basic-chat.ts - Simple Q&A with budget tracking
voice-assistant.ts - Voice-to-voice conversation
budget-aware-app.ts - Advanced budget management
streaming-chat.ts - Real-time streaming responses

Configuration Guide

Environment Variables

# Cloud providers
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."

# Webhooks (optional)
export SLACK_WEBHOOK_URL="https://hooks.slack.com/..."
export DISCORD_WEBHOOK_URL="https://discord.com/api/webhooks/..."

Cost Optimization

Reduce costs by 70%+ with these settings:

const ai = createGateway({
  dailyBudget: 5,
  cacheEnabled: true,
  semanticCaching: true,
  similarityThreshold: 0.85,    // Aggressive caching
  compressionLevel: 'high',      // Max compression
  cloudProviders: {
    openai: {
      apiKey: process.env.OPENAI_API_KEY,
      models: ['gpt-4o-mini']    // Use cheapest model
    }
  },
  localFallback: {
    enabled: true                 // Free fallback
  }
});

Troubleshooting

"Budget exceeded" error

Budget resets at midnight UTC. Options:

Increase dailyBudget
Enable localFallback for free models
Wait for reset (check status.resetAt)

Caching not working

Enable caching explicitly:

cacheEnabled: true,
semanticCaching: true

Voice features not working

Voice requires:

Browser environment (Chrome, Safari, Edge)
HTTPS or localhost
Microphone permissions granted

Local models not found

Install Ollama or LM Studio:

# macOS
brew install ollama
ollama serve

# Or download LM Studio
# https://lmstudio.ai/

Development

# Install dependencies
npm install

# Run tests
npm test

# Build
npm run build

# Run examples
npx tsx examples/basic-chat.ts

License

MIT License - see LICENSE for details.

Support

Documentation: README.md
Examples: examples/
Issues: GitHub Issues
Discussions: GitHub Discussions

Made with ❤️ by developers who hate surprise AI bills.