@matkoson/llm-proxy
v2026.309.1
Published
Morph model router proxy between LLM agents and CLIProxyAPI
Readme
llm-proxy
Intelligent model routing proxy for LLM API requests. Sits between your AI agents and CLIProxyAPI, using Morph's Model Router to classify task difficulty and dynamically select the most cost-effective model.
Overview
llm-proxy is the repo-owned Bun proxy layer that sits in front of CLIProxyAPI and applies request-shaping logic that belongs to this repository. It keeps lifecycle ownership, local observability, compatibility rewrites, and router-driven model selection in one place while routine-worker owns the outer service supervision.
The package also exposes the thin filesystem command surface that routine-worker and local operators use to start, stop, health-check, and inspect the proxy. Backend-owned capabilities stay pass-through, while proxy-owned behavior stays implemented here.
Why not @matkoson/lib
This repository cannot live in @matkoson/lib because the code is not generic shared infrastructure. It is a proxy product with a concrete runtime contract, provider-specific request semantics, local process lifecycle scripts, and CLIProxyAPI integration rules that only make sense in the llm-proxy domain.
Moving it into the global lib would blur ownership boundaries, force proxy-specific behavior into shared packages, and make the routine-worker integration less explicit. The correct split is shared facades in @matkoson/lib, proxy behavior in this repository, and backend account/orchestration behavior in CLIProxyAPI.
Architecture
Agent (Claude Code / Codex / etc.)
│
▼
llm-proxy (:8317) ◄── Bun.serve() proxy + JSONL logging
│
├── Morph Router API ◄── classify difficulty → pick model
│
▼
CLIProxyAPI (:18317) ◄── OAuth token management + round-robin
│
▼
Provider APIs (Anthropic, OpenAI, Google)What it does: Intercepts LLM API requests, classifies the task difficulty using Morph's router, rewrites the model field to a cost-appropriate model, then forwards to CLIProxyAPI. Simple tasks get cheaper models; complex tasks keep powerful models. Adds resilience (retry, circuit breaker, timeout) via Cockatiel and JSONL request logging.
What it doesn't do: It only mutates upstream-facing request details that must change for compatibility: the model field and proxy-only auth headers. Request/response bodies, streaming, and all other headers are preserved.
Setup
Prerequisites
- Bun v1.3+
- CLIProxyAPI running on
:18317 - Morph API key for model routing
Install
bun installConfigure
Copy .env.example to .env and fill in your values:
cp .env.example .envRequired:
MORPH_API_KEY— your Morph API key
Optional (with defaults):
MORPH_ROUTER_MODE—balanced(default) oraggressiveMORPH_ROUTER_ENABLED—true(default) orfalsefor pass-throughMORPH_ROUTER_PROVIDER—anthropic(default),openai, orgeminiLLM_PROXY_PORT— proxy port (default:8317)UPSTREAM_URL— CLIProxyAPI address (default:http://127.0.0.1:18317)UPSTREAM_API_KEY— explicit CLIProxyAPI key override; otherwise llm-proxy auto-discovers the first key from the local Quotio configJSONL_LOG_DIR— JSONL request log directory (default:./logs)
Usage
Lifecycle commands
routine-worker should interact with llm-proxy through the explicit filesystem lifecycle surface:
./bin/start
./bin/health
./bin/stop./bin/start is a one-shot launcher: it starts the Bun server in the background, waits for /health, writes runtime state under ./runtime/, and then returns. It does not stay resident as a supervisor.
Process stdout/stderr are written under ./logs/.
./bin/health returns deterministic exit codes for automation:
0healthy1stopped2port conflict3unhealthy response4health endpoint unreachable
Point your agents at the proxy
Agents should point at the llm-proxy port directly:
# Agents keep using the same URL
ANTHROPIC_BASE_URL=http://localhost:8317Backend pass-through commands
Backend-owned features from CLIProxyAPI are exposed as thin bin/ wrappers instead of being re-implemented locally.
Static backend settings are not exposed as one-command-per-knob wrappers anymore. Repo-owned defaults live in ./JSON/cliproxyapi-base-settings.json, and direct CLIProxyAPI binary calls build their base argument list from that file before adding command-specific flags.
Examples:
./bin/model-list
./bin/stats-usage
./bin/version-latest
./bin/logs 100
./bin/logs-request-by-id 1234abcd
./bin/auth-files
./bin/auth-upload ./auth.json
./bin/api-keys
./bin/api-keys --patch-json '{"old":null,"new":"sk-example"}'
./bin/stats-usage-import ./stats-usage-export.json
./bin/provider-vertex-import ./service-account.json
./bin/provider-claude-login
./bin/provider-gemini-auth-urlMost management commands require the plaintext MANAGEMENT_PASSWORD. Public API-key-backed commands such as ./bin/model-list use the local CLIProxyAPI API key. If the backend config stores remote-management.secret-key as a bcrypt hash, the wrapper cannot derive the plaintext and you must export it explicitly.
The same pass-through layer also exposes backend mutation flows without re-implementing them locally, for example ./bin/stats-usage-import, ./bin/auth-upload, ./bin/auth-delete, ./bin/auth-file-status, ./bin/auth-fields, ./bin/provider-vertex-import, and ./bin/api-call.
Usage statistics are mandatory in this repository. There is intentionally no ./bin/stats-usage-enabled command, and any backend configuration managed through this repo must keep usage-statistics-enabled: true.
These backend settings are repo-owned defaults in ./JSON/cliproxyapi-base-settings.json, not public bin/ commands:
debuglogging-to-filelogs-max-total-size-mberror-logs-max-fileslogs.requestrequest-retrymax-retry-intervalproxy-urlforce-model-prefixws-authrouting-strategyquota-exceeded.switch-projectquota-exceeded.switch-preview-modeloauth-excluded-modelsas an array of stringsoauth-model-aliasas an array of stringsopenai-compatibilityampcode.upstream-urlampcode.restrict-management-to-localhostampcode.force-model-mappingsampcode.model-mappings
Raw backend config surfaces are intentionally not exposed as public commands. There is no ./bin/config, ./bin/config-yaml, or ./bin/config-yaml-set; repo-owned static backend defaults must come from ./JSON/cliproxyapi-base-settings.json.
Log preservation is mandatory in this repository. There is intentionally no ./bin/logs-clear command, ./bin/logs --delete is blocked, and the repo-owned JSON defaults keep automatic log cleanup disabled.
Mutation wrappers remain thin. They pass the method, query, body, or form data straight through to the backend route instead of re-implementing backend logic.
Supported request-shaping flags:
--put-bool--put-int--put-string--put-json--patch-json--delete--query key=value--body-file /absolute/path--form key=value--form-file field=/absolute/path
Repo-local helpers
These remain local to this repository and are not backend duplicates:
./bin/qa
./bin/logs-follow
./bin/logs-tail
./bin/stats-updateAPI Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| /health | GET | Health check (includes upstream reachability) |
| /router/status | GET | Router config and status |
| /v1/messages | POST | Anthropic API — routed |
| /v1/chat/completions | POST | OpenAI API — routed |
| /* | * | All other paths forwarded transparently |
Response Headers
When MORPH_ROUTER_EXPOSE_HEADERS=true (default), routed requests include:
| Header | Example | Description |
|--------|---------|-------------|
| x-llm-proxy-original-model | claude-sonnet-4-5 | Model requested by the agent |
| x-llm-proxy-selected-model | claude-sonnet-4-5-20250929 | Model after routing |
| x-llm-proxy-difficulty | easy | Classification: easy, medium, hard |
| x-llm-proxy-routing-ms | 563 | Routing latency in ms |
How Routing Works
- Agent sends a request with a model (e.g.,
claude-sonnet-4-5) - Proxy extracts the last user message text
- Morph Router classifies the task difficulty and selects a model
- Proxy rewrites
modelin the body, forwards to CLIProxyAPI - Response streams back to the agent transparently
Fallback: If Morph API is unreachable or returns an error, the original model passes through unchanged. Routing never blocks a request.
Resilience: Morph API calls are wrapped with retry (3 attempts, exponential backoff), timeout (2s), and circuit breaker. Upstream calls have retry (1), timeout (configurable), and circuit breaker (opens after 5 consecutive failures).
Model Selection (Morph SDK v0.2.114)
| Provider | Easy | Hard |
|----------|------|------|
| Anthropic | claude-4.5-haiku | claude-4.5-sonnet |
| OpenAI | gpt-5-mini | gpt-5-high |
| Gemini | gemini-2.5-flash | gemini-2.5-pro |
Custom Model Maps
Override routing decisions with modelMap in config:
router: {
modelMap: {
easy: 'claude-haiku-4-5-20251001',
hard: 'claude-sonnet-4-5-20250929',
}
}Development
bun test # run tests
bun run typecheck # TypeScript checkProject Structure
bin/ # explicit lifecycle commands, backend pass-throughs, QA, observability helpers
src/
├── server/ # Bun.serve() proxy (request handling, forwarding, resilience)
├── router/ # Morph SDK integration (classify, select model)
├── config/ # Environment + config loading
├── types/ # Shared TypeScript types
├── auth/ # Request authentication
├── errors/ # Error handling + response formatting
├── health/ # Health check endpoints
├── logging/ # Structured logging + JSONL request logging
└── converters/ # Claude ↔ OpenAI format convertersLicense
Private.
