ai-gateway-kit
v3.0.0
Published
Provider-agnostic AI gateway with capability-based routing, in-memory rate limiting, and observability hooks.
Maintainers
Readme
ai-gateway-kit
A boring, provider-agnostic AI Gateway for Node.js.
This library exists to solve the “production gateway” problems around LLM usage:
- Capability-based routing (agents request capabilities, not models)
- Ordered fallback (graceful degradation, never silent failure)
- In-memory rate limiting (instance-scoped by design)
- Observability hooks (you choose logging/metrics/tracing)
Why capability-based routing?
Model names change, providers change, and quotas fluctuate. A gateway that routes by capability lets your agents stay stable while the model fleet evolves.
Example capabilities:
fast_textdeep_reasoningsearchspeech_to_text
Why in-memory state?
This kit intentionally uses in-memory rate limit state.
- Works in serverless environments (Vercel-compatible)
- No shared storage dependency
- Predictable failure modes
Trade-off: multi-instance deployments do not share quotas. Each instance enforces limits based on its own in-memory view.
If you need cross-instance coordination, you can replace the in-memory RateLimitManager with your own implementation.
This is not a chat wrapper
This library is infrastructure:
- routing
- backoff
- fallbacks
- hooks
It does not provide prompt templates, product policies, UI, or agent logic.
Install
npm install ai-gateway-kitQuick start
import { createAIGateway, createGitHubModelsProvider } from "ai-gateway-kit";
const gateway = createAIGateway({
models: [
{
id: "gpt-4o-mini",
provider: "github",
capabilities: ["fast_text"],
limits: { rpm: 15, rpd: 150, tpmInput: 150000, tpmOutput: 20000, concurrency: 3 }
}
],
providers: {
github: createGitHubModelsProvider({
token: process.env.GITHUB_TOKEN!
})
}
});
const result = await gateway.execute({
capability: "fast_text",
input: {
kind: "chat",
messages: [{ role: "user", content: "Say hi." }]
}
});
console.log(result.output);Core Features
Capability-based routing
Route requests by capability, not model names. See examples/02-capability-routing.ts.
Automatic fallback
Graceful degradation across models. See examples/03-fallback-handling.ts.
Rate limiting
In-memory rate limits (rpm, rpd, tpm, concurrency). See examples/03-fallback-handling.ts.
Multiple providers
GitHub Models, Gemini, or custom providers. See examples/04-multi-provider.ts.
Advanced features
- JSON mode: examples/06-json-mode.ts
- Web search: examples/07-search-capability.ts
- Temperature control: examples/08-temperature-control.ts
- Request cancellation: examples/11-abort-requests.ts
- Dynamic registration: examples/12-dynamic-registration.ts
Providers
- GitHub Models: OpenAI models via GitHub (docs)
- Gemini: Google Gemini models with search (docs)
- Custom provider: Implement
ProviderAdapterinterface
Observability hooks
You can subscribe to lifecycle events without taking a dependency on any logging stack:
onRequestStart- When a request beginsonRequestEnd- When a request completes (success or failure)onRateLimit- When rate limits are encounteredonFallback- When falling back to another modelonError- When errors occur
Example: examples/09-observability-hooks.ts
import { createAIGateway, createGitHubModelsProvider, type GatewayHooks } from "ai-gateway-kit";
const hooks: GatewayHooks = {
onRequestStart: (event) => {
console.log(`Starting: ${event.modelId}`);
},
onRequestEnd: (event) => {
const duration = event.endedAt - event.startedAt;
console.log(`${event.ok ? 'Success' : 'Failed'}: ${event.modelId} (${duration}ms)`);
},
onRateLimit: (event) => {
console.log(`Rate limit: ${event.modelId} - ${event.decision.reason}`);
},
onFallback: (event) => {
console.log(`Fallback: ${event.fromModelId} → ${event.toModelId}`);
},
onError: (event) => {
console.error(`Error: ${event.modelId} - ${event.error.message}`);
}
};
const gateway = createAIGateway({
models: [...],
providers: {
github: createGitHubModelsProvider({ token: process.env.GITHUB_TOKEN! })
},
hooks
});Examples
The examples directory contains comprehensive examples for all features:
| Example | Description | |---------|-------------| | 01-basic-setup.ts | Minimal setup to get started | | 02-capability-routing.ts | Route by capability, not model name | | 03-fallback-handling.ts | Automatic fallback when rate limited | | 04-multi-provider.ts | Use GitHub + Gemini together | | 05-custom-routing.ts | Implement custom routing logic | | 06-json-mode.ts | Request structured JSON output | | 07-search-capability.ts | Web search with Gemini | | 08-temperature-control.ts | Control creativity with temperature | | 09-observability-hooks.ts | Monitor with lifecycle hooks | | 10-agent-tracking.ts | Track multi-agent systems | | 11-abort-requests.ts | Cancel in-flight requests | | 12-dynamic-registration.ts | Add models at runtime |
License
MIT
