ai-edge
v1.0.21
Published
Local LLM routing server with OpenAI compatibility
Readme
ai-edge
A local LLM API proxy server with rate limiting, caching, and multi-backend support. Works as an OpenAI-compatible API endpoint.
Quick Start
1. Initialize Configuration
Create a new model.jsonc configuration file:
npx ai-edge initOr skip prompts for automation:
npx ai-edge init --skip-prompts2. Configure Your Models
Edit the generated model.jsonc and add your LLM providers:
{
"$schema": "https://raw.githubusercontent.com/dotlab-hq/ai-edge/refs/heads/main/schema.json",
"state-adapter": "memory",
"models": {
"openai": [
{
"id": "primary-openai",
"name": "Primary Instance",
"models": ["gpt-3.5-turbo", "gpt-4"],
"individualLimit": true,
"baseUrl": "https://api.openai.com/v1",
"apiKey": "sk-your-api-key-here",
"rateLimit": {
"tokensPerMinute": 90000,
"requestsPerMinute": 3500,
"requestsPerDay": 200000,
},
},
],
},
}3. Start the Server
npx ai-edge serveThe server starts on port 25789 by default. If busy, it auto-selects the next available port.
With --skip-prompts, the server starts immediately without prompts.
Optional Runtime Tuning
AI_EDGE_UPSTREAM_TIMEOUT_MS- Upstream request timeout in milliseconds (default:45000).
CLI Commands
init
Initialize a new model.jsonc configuration:
npx ai-edge init
npx ai-edge init --skip-promptsserve
Start the LLM Proxy server:
npx ai-edge serve
npx ai-edge serve --skip-promptsWhat it does:
- Loads configuration from
model.jsonc - Starts on port 25789 (auto-selects next available if busy)
- Shows server configuration details
- Press Ctrl+C to stop
Options
--skip-prompts- Skip all interactive prompts and use defaults
Configuration Format
JSONC (JSON with Comments)
The configuration uses JSONC format for better user experience:
- ✅ Comments preserved for documentation
- ✅ Inline field explanations
- ✅ Example values shown
- ✅ Optional fields documented
Dynamic Schema References
The generated schema reference always points to the latest version:
"$schema": "https://raw.githubusercontent.com/dotlab-hq/ai-edge/refs/heads/main/schema.json"When installed as an NPM package and linked locally, it references:
"$schema": "./node_modules/ai-edge/schema.json"Models Configuration
Each backend configuration includes:
{
// Unique identifier for this backend
"id": "primary-openai",
// Display name
"name": "Primary Instance",
// Models this backend supports
"models": ["gpt-3.5-turbo", "gpt-4"],
// Track rate limits per instance
"individualLimit": true,
// API endpoint (OpenAI-compatible)
"baseUrl": "https://api.openai.com/v1",
// API authentication key
"apiKey": "sk-your-api-key-here",
// Rate limiting per backend
"rateLimit": {
"tokensPerMinute": 90000,
"requestsPerMinute": 3500,
"requestsPerDay": 200000,
},
}Code Interpreter (Daytona)
Configure a Daytona sandbox to handle OpenAI code_interpreter and Anthropic code_execution tool requests:
{
"tools": {
"codeInterpreter": {
"type": "daytona",
"apiKey": "${DAYTONA_API_KEY}",
"language": "python",
"timeout": 300,
"target": "us",
},
},
}code_interpreter is accepted as an alias for codeInterpreter.
Web Search Performance Tuning
Configure built-in web search defaults to keep latency bounded:
{
"tools": {
"webSearch": {
"defaults": {
"maxResults": 6,
"expandQueries": true,
"maxExpandedQueries": 2,
"parallelQueries": 2,
"softTimeoutMs": 8000,
"providerTimeoutMs": 7000
},
"tools": [
{
"type": "tavily",
"apiKey": "${TAVILY_API_KEY}",
"timeoutMs": 7000,
"options": {
"searchDepth": "basic",
"includeRawContent": false,
"includeAnswer": true,
"maxResults": 6
}
}
]
}
}
}Recommended for faster responses:
- Keep
maxExpandedQueriesat 1-2 for most prompts. - Use
searchDepth: "basic"and disable raw content unless needed. - Set provider timeout lower than total
softTimeoutMsso partial results can return sooner.
Tool Search Compatibility
- OpenAI
tool_search+defer_loadingpassthrough is supported onPOST /v1/responses. - For
POST /v1/chat/completions,tool_searchtools anddefer_loadingflags are removed for upstream compatibility. - Anthropic
tool_search_tool_*uses a proxy-side compatibility implementation: server search tools are removed, deferred tools are eagerly exposed as normal callable tools, and usage includesserver_tool_use.tool_search_requests.
API Endpoints
Get Cache Status
curl http://localhost:25789/Get Statistics
curl http://localhost:25789/statsAuto-loaded on server startup with current rate limit usage.
OpenAI Compatible Endpoints
POST /v1/chat/completions- Chat completionsPOST /v1/completions- Text completionsPOST /v1/embeddings- Embeddings (embeddings: trueproviders are reserved for this endpoint and excluded from chat/completions/responses routing)GET /v1/models- List available models
Example:
curl -X POST http://localhost:25789/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ai-edge" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Hello!"}]
}'Development
Run in Development Mode
bun run devType Checking
bun run typesRunning Tests
bun run testifyLink Package Locally
bun linkRegister locally for use in other projects:
bun link ai-edgeBuild CLI for Distribution
bun run buildCreates dist/cli.js (standalone CLI executable).
Features
✅ OpenAI Compatible - Drop-in replacement for OpenAI API
✅ Multi-Backend - Load balance across multiple providers
✅ Rate Limiting - Granular token/request limits
✅ Caching - Memory or Redis-backed state
✅ Auto-Load Stats - Statistics loaded on server startup
✅ Modular CLI - Separated commands and utilities
✅ Dynamic Templates - Always uses latest schema from GitHub
✅ JSONC Configuration - Comments for better documentation
✅ Bun Linked - Local package development support
✅ TypeScript - Full type safety
Error Handling
All errors are returned in OpenAI-compatible format:
- 400 - Invalid request (missing model, invalid parameters)
- 429 - Rate limit exceeded (tries next backend)
- 429 / 5xx - Affected provider+model pair is put on a temporary 30s cooldown before being considered again
- 502 - All backends failed
- 503 - No backend configured
Stack traces are suppressed, only user-friendly messages shown.
CLI Architecture
For detailed information about the CLI structure and development, see CLI_ARCHITECTURE.md.
Production Deployment
1. Build the CLI
bun run build2. Use in Projects
npx ai-edge init
npx ai-edge serveEnvironment Variables
Use ${VAR_NAME} in your model.jsonc to reference environment variables:
{
"models": {
"openai": [
{
"apiKey": "${OPENAI_API_KEY}",
"baseUrl": "https://api.openai.com/v1",
"models": ["gpt-3.5-turbo"],
},
],
},
}Architecture
- Hono - Fast, lightweight HTTP server
- Zod - Type-safe configuration validation
- @clack/prompts - Beautiful interactive CLI
- Bun - Fast JavaScript runtime
- Multi-backend - Load balancing across providers
- Rate Limiting - Granular usage tracking
- Caching - Pluggable adapters (memory/Redis)
License
MIT
Created with ❤️ by dotlab HQ
