ai-edge

v1.0.21

Published

3 days ago

Local LLM routing server with OpenAI compatibility

0High
0Medium
0Low

dotlabs-admin

llm proxy openai api

ai-edge

A local LLM API proxy server with rate limiting, caching, and multi-backend support. Works as an OpenAI-compatible API endpoint.

Quick Start

1. Initialize Configuration

Create a new model.jsonc configuration file:

npx ai-edge init

Or skip prompts for automation:

npx ai-edge init --skip-prompts

2. Configure Your Models

Edit the generated model.jsonc and add your LLM providers:

{
  "$schema": "https://raw.githubusercontent.com/dotlab-hq/ai-edge/refs/heads/main/schema.json",
  "state-adapter": "memory",
  "models": {
    "openai": [
      {
        "id": "primary-openai",
        "name": "Primary Instance",
        "models": ["gpt-3.5-turbo", "gpt-4"],
        "individualLimit": true,
        "baseUrl": "https://api.openai.com/v1",
        "apiKey": "sk-your-api-key-here",
        "rateLimit": {
          "tokensPerMinute": 90000,
          "requestsPerMinute": 3500,
          "requestsPerDay": 200000,
        },
      },
    ],
  },
}

3. Start the Server

npx ai-edge serve

The server starts on port 25789 by default. If busy, it auto-selects the next available port.

With --skip-prompts, the server starts immediately without prompts.

Optional Runtime Tuning

AI_EDGE_UPSTREAM_TIMEOUT_MS - Upstream request timeout in milliseconds (default: 45000).

CLI Commands

`init`

Initialize a new model.jsonc configuration:

npx ai-edge init
npx ai-edge init --skip-prompts

`serve`

Start the LLM Proxy server:

npx ai-edge serve
npx ai-edge serve --skip-prompts

What it does:

Loads configuration from model.jsonc
Starts on port 25789 (auto-selects next available if busy)
Shows server configuration details
Press Ctrl+C to stop

Options

--skip-prompts - Skip all interactive prompts and use defaults

Configuration Format

JSONC (JSON with Comments)

The configuration uses JSONC format for better user experience:

✅ Comments preserved for documentation
✅ Inline field explanations
✅ Example values shown
✅ Optional fields documented

Dynamic Schema References

The generated schema reference always points to the latest version:

"$schema": "https://raw.githubusercontent.com/dotlab-hq/ai-edge/refs/heads/main/schema.json"

When installed as an NPM package and linked locally, it references:

"$schema": "./node_modules/ai-edge/schema.json"

Models Configuration

Each backend configuration includes:

{
  // Unique identifier for this backend
  "id": "primary-openai",
  // Display name
  "name": "Primary Instance",
  // Models this backend supports
  "models": ["gpt-3.5-turbo", "gpt-4"],
  // Track rate limits per instance
  "individualLimit": true,
  // API endpoint (OpenAI-compatible)
  "baseUrl": "https://api.openai.com/v1",
  // API authentication key
  "apiKey": "sk-your-api-key-here",
  // Rate limiting per backend
  "rateLimit": {
    "tokensPerMinute": 90000,
    "requestsPerMinute": 3500,
    "requestsPerDay": 200000,
  },
}

Code Interpreter (Daytona)

Configure a Daytona sandbox to handle OpenAI code_interpreter and Anthropic code_execution tool requests:

{
  "tools": {
    "codeInterpreter": {
      "type": "daytona",
      "apiKey": "${DAYTONA_API_KEY}",
      "language": "python",
      "timeout": 300,
      "target": "us",
    },
  },
}

code_interpreter is accepted as an alias for codeInterpreter.

Web Search Performance Tuning

Configure built-in web search defaults to keep latency bounded:

{
  "tools": {
    "webSearch": {
      "defaults": {
        "maxResults": 6,
        "expandQueries": true,
        "maxExpandedQueries": 2,
        "parallelQueries": 2,
        "softTimeoutMs": 8000,
        "providerTimeoutMs": 7000
      },
      "tools": [
        {
          "type": "tavily",
          "apiKey": "${TAVILY_API_KEY}",
          "timeoutMs": 7000,
          "options": {
            "searchDepth": "basic",
            "includeRawContent": false,
            "includeAnswer": true,
            "maxResults": 6
          }
        }
      ]
    }
  }
}

Recommended for faster responses:

Keep maxExpandedQueries at 1-2 for most prompts.
Use searchDepth: "basic" and disable raw content unless needed.
Set provider timeout lower than total softTimeoutMs so partial results can return sooner.

Tool Search Compatibility

OpenAI tool_search + defer_loading passthrough is supported on POST /v1/responses.
For POST /v1/chat/completions, tool_search tools and defer_loading flags are removed for upstream compatibility.
Anthropic tool_search_tool_* uses a proxy-side compatibility implementation: server search tools are removed, deferred tools are eagerly exposed as normal callable tools, and usage includes server_tool_use.tool_search_requests.

API Endpoints

Get Cache Status

curl http://localhost:25789/

Get Statistics

curl http://localhost:25789/stats

Auto-loaded on server startup with current rate limit usage.

OpenAI Compatible Endpoints

POST /v1/chat/completions - Chat completions
POST /v1/completions - Text completions
POST /v1/embeddings - Embeddings (embeddings: true providers are reserved for this endpoint and excluded from chat/completions/responses routing)
GET /v1/models - List available models

Example:

curl -X POST http://localhost:25789/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ai-edge" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Development

Run in Development Mode

bun run dev

Type Checking

bun run types

Running Tests

bun run testify

Link Package Locally

bun link

bun link ai-edge

Build CLI for Distribution

bun run build

Creates dist/cli.js (standalone CLI executable).

Features

✅ OpenAI Compatible - Drop-in replacement for OpenAI API
✅ Multi-Backend - Load balance across multiple providers
✅ Rate Limiting - Granular token/request limits
✅ Caching - Memory or Redis-backed state
✅ Auto-Load Stats - Statistics loaded on server startup
✅ Modular CLI - Separated commands and utilities
✅ Dynamic Templates - Always uses latest schema from GitHub
✅ JSONC Configuration - Comments for better documentation
✅ Bun Linked - Local package development support
✅ TypeScript - Full type safety

Error Handling

All errors are returned in OpenAI-compatible format:

400 - Invalid request (missing model, invalid parameters)
429 - Rate limit exceeded (tries next backend)
429 / 5xx - Affected provider+model pair is put on a temporary 30s cooldown before being considered again
502 - All backends failed
503 - No backend configured

Stack traces are suppressed, only user-friendly messages shown.

CLI Architecture

For detailed information about the CLI structure and development, see CLI_ARCHITECTURE.md.

Production Deployment

1. Build the CLI

bun run build

2. Use in Projects

npx ai-edge init
npx ai-edge serve

Environment Variables

Use ${VAR_NAME} in your model.jsonc to reference environment variables:

{
  "models": {
    "openai": [
      {
        "apiKey": "${OPENAI_API_KEY}",
        "baseUrl": "https://api.openai.com/v1",
        "models": ["gpt-3.5-turbo"],
      },
    ],
  },
}

Architecture

Hono - Fast, lightweight HTTP server
Zod - Type-safe configuration validation
@clack/prompts - Beautiful interactive CLI
Bun - Fast JavaScript runtime
Multi-backend - Load balancing across providers
Rate Limiting - Granular usage tracking
Caching - Pluggable adapters (memory/Redis)

License

MIT

Created with ❤️ by dotlab HQ

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ai-edge

Quick Start

1. Initialize Configuration

2. Configure Your Models

3. Start the Server

Optional Runtime Tuning

CLI Commands

init

serve

Options

Configuration Format

JSONC (JSON with Comments)

Dynamic Schema References

Models Configuration

Code Interpreter (Daytona)

Web Search Performance Tuning

Tool Search Compatibility

API Endpoints

Get Cache Status

Get Statistics

OpenAI Compatible Endpoints

Development

Run in Development Mode

Type Checking

Running Tests

Link Package Locally

Build CLI for Distribution

Features

Error Handling

CLI Architecture

Production Deployment

1. Build the CLI

2. Use in Projects

Environment Variables

Architecture

License

`init`

`serve`