npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

ai-edge

v1.0.21

Published

Local LLM routing server with OpenAI compatibility

Readme

ai-edge

A local LLM API proxy server with rate limiting, caching, and multi-backend support. Works as an OpenAI-compatible API endpoint.

Quick Start

1. Initialize Configuration

Create a new model.jsonc configuration file:

npx ai-edge init

Or skip prompts for automation:

npx ai-edge init --skip-prompts

2. Configure Your Models

Edit the generated model.jsonc and add your LLM providers:

{
  "$schema": "https://raw.githubusercontent.com/dotlab-hq/ai-edge/refs/heads/main/schema.json",
  "state-adapter": "memory",
  "models": {
    "openai": [
      {
        "id": "primary-openai",
        "name": "Primary Instance",
        "models": ["gpt-3.5-turbo", "gpt-4"],
        "individualLimit": true,
        "baseUrl": "https://api.openai.com/v1",
        "apiKey": "sk-your-api-key-here",
        "rateLimit": {
          "tokensPerMinute": 90000,
          "requestsPerMinute": 3500,
          "requestsPerDay": 200000,
        },
      },
    ],
  },
}

3. Start the Server

npx ai-edge serve

The server starts on port 25789 by default. If busy, it auto-selects the next available port.

With --skip-prompts, the server starts immediately without prompts.

Optional Runtime Tuning

  • AI_EDGE_UPSTREAM_TIMEOUT_MS - Upstream request timeout in milliseconds (default: 45000).

CLI Commands

init

Initialize a new model.jsonc configuration:

npx ai-edge init
npx ai-edge init --skip-prompts

serve

Start the LLM Proxy server:

npx ai-edge serve
npx ai-edge serve --skip-prompts

What it does:

  • Loads configuration from model.jsonc
  • Starts on port 25789 (auto-selects next available if busy)
  • Shows server configuration details
  • Press Ctrl+C to stop

Options

  • --skip-prompts - Skip all interactive prompts and use defaults

Configuration Format

JSONC (JSON with Comments)

The configuration uses JSONC format for better user experience:

  • ✅ Comments preserved for documentation
  • ✅ Inline field explanations
  • ✅ Example values shown
  • ✅ Optional fields documented

Dynamic Schema References

The generated schema reference always points to the latest version:

"$schema": "https://raw.githubusercontent.com/dotlab-hq/ai-edge/refs/heads/main/schema.json"

When installed as an NPM package and linked locally, it references:

"$schema": "./node_modules/ai-edge/schema.json"

Models Configuration

Each backend configuration includes:

{
  // Unique identifier for this backend
  "id": "primary-openai",
  // Display name
  "name": "Primary Instance",
  // Models this backend supports
  "models": ["gpt-3.5-turbo", "gpt-4"],
  // Track rate limits per instance
  "individualLimit": true,
  // API endpoint (OpenAI-compatible)
  "baseUrl": "https://api.openai.com/v1",
  // API authentication key
  "apiKey": "sk-your-api-key-here",
  // Rate limiting per backend
  "rateLimit": {
    "tokensPerMinute": 90000,
    "requestsPerMinute": 3500,
    "requestsPerDay": 200000,
  },
}

Code Interpreter (Daytona)

Configure a Daytona sandbox to handle OpenAI code_interpreter and Anthropic code_execution tool requests:

{
  "tools": {
    "codeInterpreter": {
      "type": "daytona",
      "apiKey": "${DAYTONA_API_KEY}",
      "language": "python",
      "timeout": 300,
      "target": "us",
    },
  },
}

code_interpreter is accepted as an alias for codeInterpreter.

Web Search Performance Tuning

Configure built-in web search defaults to keep latency bounded:

{
  "tools": {
    "webSearch": {
      "defaults": {
        "maxResults": 6,
        "expandQueries": true,
        "maxExpandedQueries": 2,
        "parallelQueries": 2,
        "softTimeoutMs": 8000,
        "providerTimeoutMs": 7000
      },
      "tools": [
        {
          "type": "tavily",
          "apiKey": "${TAVILY_API_KEY}",
          "timeoutMs": 7000,
          "options": {
            "searchDepth": "basic",
            "includeRawContent": false,
            "includeAnswer": true,
            "maxResults": 6
          }
        }
      ]
    }
  }
}

Recommended for faster responses:

  • Keep maxExpandedQueries at 1-2 for most prompts.
  • Use searchDepth: "basic" and disable raw content unless needed.
  • Set provider timeout lower than total softTimeoutMs so partial results can return sooner.

Tool Search Compatibility

  • OpenAI tool_search + defer_loading passthrough is supported on POST /v1/responses.
  • For POST /v1/chat/completions, tool_search tools and defer_loading flags are removed for upstream compatibility.
  • Anthropic tool_search_tool_* uses a proxy-side compatibility implementation: server search tools are removed, deferred tools are eagerly exposed as normal callable tools, and usage includes server_tool_use.tool_search_requests.

API Endpoints

Get Cache Status

curl http://localhost:25789/

Get Statistics

curl http://localhost:25789/stats

Auto-loaded on server startup with current rate limit usage.

OpenAI Compatible Endpoints

  • POST /v1/chat/completions - Chat completions
  • POST /v1/completions - Text completions
  • POST /v1/embeddings - Embeddings (embeddings: true providers are reserved for this endpoint and excluded from chat/completions/responses routing)
  • GET /v1/models - List available models

Example:

curl -X POST http://localhost:25789/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ai-edge" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Development

Run in Development Mode

bun run dev

Type Checking

bun run types

Running Tests

bun run testify

Link Package Locally

bun link

Register locally for use in other projects:

bun link ai-edge

Build CLI for Distribution

bun run build

Creates dist/cli.js (standalone CLI executable).

Features

OpenAI Compatible - Drop-in replacement for OpenAI API
Multi-Backend - Load balance across multiple providers
Rate Limiting - Granular token/request limits
Caching - Memory or Redis-backed state
Auto-Load Stats - Statistics loaded on server startup
Modular CLI - Separated commands and utilities
Dynamic Templates - Always uses latest schema from GitHub
JSONC Configuration - Comments for better documentation
Bun Linked - Local package development support
TypeScript - Full type safety

Error Handling

All errors are returned in OpenAI-compatible format:

  • 400 - Invalid request (missing model, invalid parameters)
  • 429 - Rate limit exceeded (tries next backend)
  • 429 / 5xx - Affected provider+model pair is put on a temporary 30s cooldown before being considered again
  • 502 - All backends failed
  • 503 - No backend configured

Stack traces are suppressed, only user-friendly messages shown.

CLI Architecture

For detailed information about the CLI structure and development, see CLI_ARCHITECTURE.md.

Production Deployment

1. Build the CLI

bun run build

2. Use in Projects

npx ai-edge init
npx ai-edge serve

Environment Variables

Use ${VAR_NAME} in your model.jsonc to reference environment variables:

{
  "models": {
    "openai": [
      {
        "apiKey": "${OPENAI_API_KEY}",
        "baseUrl": "https://api.openai.com/v1",
        "models": ["gpt-3.5-turbo"],
      },
    ],
  },
}

Architecture

  • Hono - Fast, lightweight HTTP server
  • Zod - Type-safe configuration validation
  • @clack/prompts - Beautiful interactive CLI
  • Bun - Fast JavaScript runtime
  • Multi-backend - Load balancing across providers
  • Rate Limiting - Granular usage tracking
  • Caching - Pluggable adapters (memory/Redis)

License

MIT


Created with ❤️ by dotlab HQ