cognigate
v1.0.0
Published
AI gateway with budget controls, local fallback, and voice mode - never overspend on AI again
Maintainers
Readme
Cognigate
A unified AI gateway with budget controls, local fallback, and voice mode.
Why Cognigate?
Never overspend on AI again. Cognigate gives you hard budget limits, automatic fallback to free local models, and seamless provider switching.
Key Features
- 💰 Budget Protection - Hard daily spending limits, never exceed your budget
- 🔄 Smart Fallback - Automatically switches to Ollama/LM Studio/WebLLM when budget runs out
- 🎤 Voice Mode - Built-in speech recognition and text-to-speech for conversational AI
- 🚀 Multi-Provider - OpenAI, Anthropic, Google - one API for all
- 📊 Real-Time Monitoring - Track costs, get webhook alerts (Slack/Discord)
- ⚡ Performance - Semantic caching and compression reduce costs by 40%+
- 🌐 Cross-Platform - Works in Node.js, browsers, React, and Next.js
Quick Start
Installation
npm install cognigateBasic Usage
import { createGateway } from 'cognigate';
const ai = createGateway({
dailyBudget: 10, // $10/day limit
cacheEnabled: true,
cloudProviders: {
openai: { apiKey: process.env.OPENAI_API_KEY }
},
localFallback: { enabled: true }
});
// Simple completion
const answer = await ai.complete("What is TypeScript?");
console.log(answer);
// Check budget
const status = ai.getBudgetStatus();
console.log(`Used: $${status.used.toFixed(2)} / $${status.dailyLimit}`);Features in Detail
💰 Budget Protection
Set daily spending limits and never worry about unexpected costs:
const ai = createGateway({
dailyBudget: 5.00, // Hard limit: $5/day
cloudProviders: {
openai: { apiKey: process.env.OPENAI_API_KEY }
}
});
// Track budget in real-time
const status = ai.getBudgetStatus();
console.log(`Used: $${status.used} / $${status.dailyLimit}`);
console.log(`Remaining: $${status.remaining}`);
console.log(`Resets at: ${status.resetAt}`);Budget automatically resets at midnight UTC. When exceeded, requests throw BudgetExceededError or fall back to local models.
🔄 Multi-Provider Support
Use multiple AI providers with automatic fallback:
const ai = createGateway({
dailyBudget: 10,
cloudProviders: {
openai: {
apiKey: process.env.OPENAI_API_KEY,
models: ['gpt-4o-mini', 'gpt-4o'] // Tries mini first
},
anthropic: {
apiKey: process.env.ANTHROPIC_API_KEY,
models: ['claude-3-haiku', 'claude-3-sonnet']
},
google: {
apiKey: process.env.GOOGLE_API_KEY,
models: ['gemini-1.5-flash']
}
}
});
// Automatically tries providers in order until one succeeds
const answer = await ai.complete("Hello!");🏠 Local Fallback
When budget runs out or cloud providers fail, automatically switch to free local models:
const ai = createGateway({
dailyBudget: 1.00, // Small budget
cloudProviders: {
openai: { apiKey: process.env.OPENAI_API_KEY }
},
localFallback: {
enabled: true,
providers: ['ollama', 'lmstudio', 'webllm'] // Tries in order
}
});
// If budget exceeded, automatically uses Ollama (free!)
const answer = await ai.complete("Hello!");Supported local providers:
- Ollama - Most popular, GPU-accelerated
- LM Studio - Great for Mac with Metal acceleration
- WebLLM - Runs entirely in browser via WebGPU
⚡ Streaming Responses
Get tokens as they're generated for better UX:
for await (const token of ai.stream("Write a poem")) {
process.stdout.write(token); // Print each word immediately
}🎤 Voice Mode
Built-in speech recognition and text-to-speech:
import { createGateway } from 'cognigate';
import { Conversation } from 'cognigate/voice';
const ai = createGateway({ dailyBudget: 5 });
const conversation = new Conversation(ai, {
continuous: true,
autoSpeak: true,
language: 'en-US'
});
// Listen for events
conversation.on('transcript', (text) => {
console.log(`You said: ${text}`);
});
conversation.on('response', (text) => {
console.log(`AI said: ${text}`);
});
// Start listening
await conversation.start();📊 Webhook Alerts
Get notified when budget thresholds are reached:
const ai = createGateway({
dailyBudget: 10,
cloudProviders: {
openai: { apiKey: process.env.OPENAI_API_KEY }
},
webhooks: {
slack: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL',
discord: 'https://discord.com/api/webhooks/YOUR/WEBHOOK/URL',
custom: 'https://your-server.com/webhook'
}
});
// Automatically sends alerts at 50%, 80%, and 100% budget usageAlert types:
- 50% Warning - Yellow alert
- 80% Urgent - Orange alert
- 100% Exceeded - Red alert
🗄️ Smart Caching
Reduce costs with intelligent caching:
const ai = createGateway({
dailyBudget: 10,
cacheEnabled: true,
semanticCaching: true, // Match similar prompts
similarityThreshold: 0.9, // How similar (0-1)
cloudProviders: {
openai: { apiKey: process.env.OPENAI_API_KEY }
}
});
// First call - hits API
await ai.complete("What is TypeScript?"); // Cost: $0.0001
// Second call - cached!
await ai.complete("What is TypeScript?"); // Cost: $0 (cached)
// Third call - semantic match!
await ai.complete("Explain TypeScript"); // Cost: $0 (similar prompt)🗜️ Prompt Compression
Reduce token usage with automatic compression:
const ai = createGateway({
dailyBudget: 10,
compressionLevel: 'high', // 'low' | 'medium' | 'high'
cloudProviders: {
openai: { apiKey: process.env.OPENAI_API_KEY }
}
});
// Automatically compresses prompts before sending
// High compression: ~40% token reductionPlatform Support
Node.js
npm install cognigateimport { createGateway } from 'cognigate';
const ai = createGateway({
dailyBudget: 10,
cloudProviders: {
openai: { apiKey: process.env.OPENAI_API_KEY }
}
});Browser (CDN)
<!DOCTYPE html>
<script type="module">
import { createGateway } from 'https://cdn.jsdelivr.net/npm/[email protected]/dist/index.mjs';
const ai = createGateway({ dailyBudget: 0 }); // unlimited
const answer = await ai.complete("Hello!");
console.log(answer);
</script>React / Next.js
'use client';
import { createGateway } from 'cognigate';
import { useEffect, useState } from 'react';
export default function ChatPage() {
const [ai] = useState(() => createGateway({
dailyBudget: 5,
cloudProviders: {
openai: { apiKey: process.env.NEXT_PUBLIC_OPENAI_KEY }
}
}));
const [response, setResponse] = useState('');
const handleSubmit = async (prompt: string) => {
const answer = await ai.complete(prompt);
setResponse(answer);
};
return (
<div>
<button onClick={() => handleSubmit("Hello!")}>
Ask AI
</button>
<p>{response}</p>
</div>
);
}API Reference
createGateway(config)
Creates a new gateway instance.
interface GatewayConfig {
dailyBudget?: number; // Daily spending limit in USD (0 = unlimited)
cacheEnabled?: boolean; // Enable response caching
semanticCaching?: boolean; // Cache similar prompts
similarityThreshold?: number; // Similarity threshold (0-1)
compressionLevel?: 'low' | 'medium' | 'high';
localFallback?: LocalFallbackConfig;
cloudProviders?: CloudProvidersConfig;
webhooks?: WebhooksConfig;
}ai.complete(prompt, options?)
Send a prompt and get a text response.
const answer = await ai.complete("What is AI?", {
model: 'gpt-4o', // Override default model
temperature: 0.7, // Creativity (0-2)
maxTokens: 500, // Max response length
forceProvider: 'cloud' | 'local' // Force provider type
});ai.stream(prompt, options?)
Stream response tokens in real-time.
for await (const token of ai.stream("Write a story")) {
console.log(token); // Each word as it's generated
}ai.getBudgetStatus()
Get current budget information.
const status = ai.getBudgetStatus();
// Returns: { dailyLimit, used, remaining, resetAt }ai.clearCache()
Manually clear the response cache.
ai.clearCache();Examples
See the examples/ directory for complete working examples:
- basic-chat.ts - Simple Q&A with budget tracking
- voice-assistant.ts - Voice-to-voice conversation
- budget-aware-app.ts - Advanced budget management
- streaming-chat.ts - Real-time streaming responses
Configuration Guide
Environment Variables
# Cloud providers
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."
# Webhooks (optional)
export SLACK_WEBHOOK_URL="https://hooks.slack.com/..."
export DISCORD_WEBHOOK_URL="https://discord.com/api/webhooks/..."Cost Optimization
Reduce costs by 70%+ with these settings:
const ai = createGateway({
dailyBudget: 5,
cacheEnabled: true,
semanticCaching: true,
similarityThreshold: 0.85, // Aggressive caching
compressionLevel: 'high', // Max compression
cloudProviders: {
openai: {
apiKey: process.env.OPENAI_API_KEY,
models: ['gpt-4o-mini'] // Use cheapest model
}
},
localFallback: {
enabled: true // Free fallback
}
});Troubleshooting
"Budget exceeded" error
Budget resets at midnight UTC. Options:
- Increase
dailyBudget - Enable
localFallbackfor free models - Wait for reset (check
status.resetAt)
Caching not working
Enable caching explicitly:
cacheEnabled: true,
semanticCaching: trueVoice features not working
Voice requires:
- Browser environment (Chrome, Safari, Edge)
- HTTPS or localhost
- Microphone permissions granted
Local models not found
Install Ollama or LM Studio:
# macOS
brew install ollama
ollama serve
# Or download LM Studio
# https://lmstudio.ai/Development
# Install dependencies
npm install
# Run tests
npm test
# Build
npm run build
# Run examples
npx tsx examples/basic-chat.tsLicense
MIT License - see LICENSE for details.
Support
- Documentation: README.md
- Examples: examples/
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Made with ❤️ by developers who hate surprise AI bills.
