@matkoson/llm-proxy

v2026.309.1

Published

23 days ago

Morph model router proxy between LLM agents and CLIProxyAPI

0High
0Medium
0Low

llm-proxy

Intelligent model routing proxy for LLM API requests. Sits between your AI agents and CLIProxyAPI, using Morph's Model Router to classify task difficulty and dynamically select the most cost-effective model.

Overview

llm-proxy is the repo-owned Bun proxy layer that sits in front of CLIProxyAPI and applies request-shaping logic that belongs to this repository. It keeps lifecycle ownership, local observability, compatibility rewrites, and router-driven model selection in one place while routine-worker owns the outer service supervision.

The package also exposes the thin filesystem command surface that routine-worker and local operators use to start, stop, health-check, and inspect the proxy. Backend-owned capabilities stay pass-through, while proxy-owned behavior stays implemented here.

Why not @matkoson/lib

This repository cannot live in @matkoson/lib because the code is not generic shared infrastructure. It is a proxy product with a concrete runtime contract, provider-specific request semantics, local process lifecycle scripts, and CLIProxyAPI integration rules that only make sense in the llm-proxy domain.

Moving it into the global lib would blur ownership boundaries, force proxy-specific behavior into shared packages, and make the routine-worker integration less explicit. The correct split is shared facades in @matkoson/lib, proxy behavior in this repository, and backend account/orchestration behavior in CLIProxyAPI.

Architecture

Agent (Claude Code / Codex / etc.)
        │
        ▼
  llm-proxy (:8317)          ◄── Bun.serve() proxy + JSONL logging
        │
        ├── Morph Router API  ◄── classify difficulty → pick model
        │
        ▼
  CLIProxyAPI (:18317)       ◄── OAuth token management + round-robin
        │
        ▼
  Provider APIs (Anthropic, OpenAI, Google)

What it does: Intercepts LLM API requests, classifies the task difficulty using Morph's router, rewrites the model field to a cost-appropriate model, then forwards to CLIProxyAPI. Simple tasks get cheaper models; complex tasks keep powerful models. Adds resilience (retry, circuit breaker, timeout) via Cockatiel and JSONL request logging.

What it doesn't do: It only mutates upstream-facing request details that must change for compatibility: the model field and proxy-only auth headers. Request/response bodies, streaming, and all other headers are preserved.

Setup

Prerequisites

Bun v1.3+
CLIProxyAPI running on :18317
Morph API key for model routing

Install

bun install

Configure

Copy .env.example to .env and fill in your values:

cp .env.example .env

Required:

MORPH_API_KEY — your Morph API key

Optional (with defaults):

MORPH_ROUTER_MODE — balanced (default) or aggressive
MORPH_ROUTER_ENABLED — true (default) or false for pass-through
MORPH_ROUTER_PROVIDER — anthropic (default), openai, or gemini
LLM_PROXY_PORT — proxy port (default: 8317)
UPSTREAM_URL — CLIProxyAPI address (default: http://127.0.0.1:18317)
UPSTREAM_API_KEY — explicit CLIProxyAPI key override; otherwise llm-proxy auto-discovers the first key from the local Quotio config
JSONL_LOG_DIR — JSONL request log directory (default: ./logs)

Usage

Lifecycle commands

routine-worker should interact with llm-proxy through the explicit filesystem lifecycle surface:

./bin/start
./bin/health
./bin/stop

./bin/start is a one-shot launcher: it starts the Bun server in the background, waits for /health, writes runtime state under ./runtime/, and then returns. It does not stay resident as a supervisor. Process stdout/stderr are written under ./logs/.

./bin/health returns deterministic exit codes for automation:

0 healthy
1 stopped
2 port conflict
3 unhealthy response
4 health endpoint unreachable

Point your agents at the proxy

Agents should point at the llm-proxy port directly:

# Agents keep using the same URL
ANTHROPIC_BASE_URL=http://localhost:8317

Backend pass-through commands

Backend-owned features from CLIProxyAPI are exposed as thin bin/ wrappers instead of being re-implemented locally.

Static backend settings are not exposed as one-command-per-knob wrappers anymore. Repo-owned defaults live in ./JSON/cliproxyapi-base-settings.json, and direct CLIProxyAPI binary calls build their base argument list from that file before adding command-specific flags.

Examples:

./bin/model-list
./bin/stats-usage
./bin/version-latest
./bin/logs 100
./bin/logs-request-by-id 1234abcd
./bin/auth-files
./bin/auth-upload ./auth.json
./bin/api-keys
./bin/api-keys --patch-json '{"old":null,"new":"sk-example"}'
./bin/stats-usage-import ./stats-usage-export.json
./bin/provider-vertex-import ./service-account.json
./bin/provider-claude-login
./bin/provider-gemini-auth-url

Most management commands require the plaintext MANAGEMENT_PASSWORD. Public API-key-backed commands such as ./bin/model-list use the local CLIProxyAPI API key. If the backend config stores remote-management.secret-key as a bcrypt hash, the wrapper cannot derive the plaintext and you must export it explicitly.

The same pass-through layer also exposes backend mutation flows without re-implementing them locally, for example ./bin/stats-usage-import, ./bin/auth-upload, ./bin/auth-delete, ./bin/auth-file-status, ./bin/auth-fields, ./bin/provider-vertex-import, and ./bin/api-call.

Usage statistics are mandatory in this repository. There is intentionally no ./bin/stats-usage-enabled command, and any backend configuration managed through this repo must keep usage-statistics-enabled: true.

These backend settings are repo-owned defaults in ./JSON/cliproxyapi-base-settings.json, not public bin/ commands:

debug
logging-to-file
logs-max-total-size-mb
error-logs-max-files
logs.request
request-retry
max-retry-interval
proxy-url
force-model-prefix
ws-auth
routing-strategy
quota-exceeded.switch-project
quota-exceeded.switch-preview-model
oauth-excluded-models as an array of strings
oauth-model-alias as an array of strings
openai-compatibility
ampcode.upstream-url
ampcode.restrict-management-to-localhost
ampcode.force-model-mappings
ampcode.model-mappings

Raw backend config surfaces are intentionally not exposed as public commands. There is no ./bin/config, ./bin/config-yaml, or ./bin/config-yaml-set; repo-owned static backend defaults must come from ./JSON/cliproxyapi-base-settings.json.

Log preservation is mandatory in this repository. There is intentionally no ./bin/logs-clear command, ./bin/logs --delete is blocked, and the repo-owned JSON defaults keep automatic log cleanup disabled.

Mutation wrappers remain thin. They pass the method, query, body, or form data straight through to the backend route instead of re-implementing backend logic.

Supported request-shaping flags:

--put-bool
--put-int
--put-string
--put-json
--patch-json
--delete
--query key=value
--body-file /absolute/path
--form key=value
--form-file field=/absolute/path

Repo-local helpers

These remain local to this repository and are not backend duplicates:

./bin/qa
./bin/logs-follow
./bin/logs-tail
./bin/stats-update

API Endpoints

| Endpoint | Method | Description | |----------|--------|-------------| | /health | GET | Health check (includes upstream reachability) | | /router/status | GET | Router config and status | | /v1/messages | POST | Anthropic API — routed | | /v1/chat/completions | POST | OpenAI API — routed | | /* | * | All other paths forwarded transparently |

Response Headers

When MORPH_ROUTER_EXPOSE_HEADERS=true (default), routed requests include:

| Header | Example | Description | |--------|---------|-------------| | x-llm-proxy-original-model | claude-sonnet-4-5 | Model requested by the agent | | x-llm-proxy-selected-model | claude-sonnet-4-5-20250929 | Model after routing | | x-llm-proxy-difficulty | easy | Classification: easy, medium, hard | | x-llm-proxy-routing-ms | 563 | Routing latency in ms |

How Routing Works

Agent sends a request with a model (e.g., claude-sonnet-4-5)
Proxy extracts the last user message text
Morph Router classifies the task difficulty and selects a model
Proxy rewrites model in the body, forwards to CLIProxyAPI
Response streams back to the agent transparently

Fallback: If Morph API is unreachable or returns an error, the original model passes through unchanged. Routing never blocks a request.

Resilience: Morph API calls are wrapped with retry (3 attempts, exponential backoff), timeout (2s), and circuit breaker. Upstream calls have retry (1), timeout (configurable), and circuit breaker (opens after 5 consecutive failures).

Model Selection (Morph SDK v0.2.114)

| Provider | Easy | Hard | |----------|------|------| | Anthropic | claude-4.5-haiku | claude-4.5-sonnet | | OpenAI | gpt-5-mini | gpt-5-high | | Gemini | gemini-2.5-flash | gemini-2.5-pro |

Custom Model Maps

Override routing decisions with modelMap in config:

router: {
  modelMap: {
    easy: 'claude-haiku-4-5-20251001',
    hard: 'claude-sonnet-4-5-20250929',
  }
}

Development

bun test             # run tests
bun run typecheck    # TypeScript check

Project Structure

bin/                # explicit lifecycle commands, backend pass-throughs, QA, observability helpers
src/
├── server/          # Bun.serve() proxy (request handling, forwarding, resilience)
├── router/          # Morph SDK integration (classify, select model)
├── config/          # Environment + config loading
├── types/           # Shared TypeScript types
├── auth/            # Request authentication
├── errors/          # Error handling + response formatting
├── health/          # Health check endpoints
├── logging/         # Structured logging + JSONL request logging
└── converters/      # Claude ↔ OpenAI format converters

License

Private.