@hex4c59/claude-code-adapter

v0.0.3

Published

a month ago

API adapter for Claude Code - route requests to any LLM provider

Downloads

239

0High
0Medium
0Low

hex4c59

claude claude-code llm adapter proxy anthropic openai gemini copilot

claude-code-adapter

A high-performance API adapter that lets Claude Code work with any LLM provider. Written in Rust.

Claude Code only speaks the Anthropic Messages API. This adapter sits between Claude Code and your preferred LLM provider, translating requests and responses on the fly — so you can use GPT-4o, Gemini, DeepSeek, Qwen, Ollama, GitHub Copilot, or any OpenAI-compatible API as the backend.

Features

Multi-provider support — OpenAI, Gemini, DeepSeek, Groq, Qwen, Ollama, GitHub Copilot, Anthropic passthrough, and any OpenAI-compatible API
Anthropic + OpenAI-compatible APIs — expose both /v1/messages and /v1/chat/completions
Scenario-based routing — route different types of requests to different models:
- default — normal requests
- think — extended thinking / plan mode requests
- background — lightweight background tasks
- long_context — requests exceeding a token threshold
- web_search — requests using a web-search tool
- image — requests containing image inputs
Custom routing — external router scripts plus config-based custom rules
Streaming — full SSE streaming support, just like the native Anthropic API
Fallback chains — retry configured fallback models on upstream failures before a response stream is committed
Transformer pipeline — built-in request/response/provider-request transforms such as cache cleanup, max-token caps, custom provider params, stream options, sampling defaults, and image-tool injection
Image support — OpenAI image_url input mapping and analyzeImage image-agent interception for non-streaming requests
Auth, rate limiting, and observability — API key auth, global/model/provider/key rate limits, token counting, token usage tracking, Prometheus metrics, hooks, logs API, and detailed health endpoints
Multi-format configuration — TOML, JSON, JSON5, and YAML, with project-level overrides
Presets and shell integration — export/install presets, statusline output, daemon mode, and activate shell integration
Auto model discovery — automatically discover available models from provider APIs
Hot management — add/remove providers and switch models at runtime via REST API or CLI
Web UI — built-in web interface for configuration
GitHub Copilot integration — use your Copilot subscription as an LLM backend with device flow login

Installation

npm (recommended)

npm install -g @hex4c59/claude-code-adapter

The npm package installs both claude-code-adapter and the shorter cca command.

Build from source

git clone https://github.com/Hex4C59/claude-code-adapter.git
cd claude-code-adapter
cargo build --release

The binary will be at target/release/claude-code-adapter.

Quick Start

1. Add a provider

# Interactive — prompts for API key, auto-discovers models
cca add deepseek

# With API key
cca add openai --api-key sk-xxx

# GitHub Copilot (opens browser for device flow login)
cca add copilot

# Local Ollama (no API key needed)
cca add ollama

2. Run Claude Code through the adapter

cca code

cca code starts the local adapter daemon if needed, waits for /health, injects the Claude Code environment variables, runs claude, and stops the daemon when Claude Code exits.

Advanced/manual mode is still available:

cca serve
cca serve --daemon
cca stop

If you want to reuse an already-running adapter, run cca code --no-start. If you want the daemon to remain running after Claude Code exits, run cca code --keep-alive.

CLI Commands

| Command | Description | |---------|-------------| | cca serve | Start the HTTP proxy server | | cca serve --daemon | Start the server in the background | | cca stop | Stop the background server | | cca code | Auto-start the adapter, run Claude Code, then stop the adapter | | cca code --keep-alive | Run Claude Code and leave the adapter daemon running | | cca code --no-start | Run Claude Code against an already-running adapter | | cca add <provider> | Add a provider (auto-discovers models) | | cca remove <provider> | Remove a provider | | cca models | List all available models | | cca select | Interactively select a model (fuzzy search) | | cca switch <model> | Switch the default model | | cca model | Interactive provider, model, and routing management | | cca login | Login to GitHub Copilot | | cca activate | Print optional shell integration | | cca status | Show detailed token usage status | | cca statusline | Print a Claude Code statusline | | cca preset export <name> | Export current config as a preset | | cca preset install <source> | Install a preset from a file or URL | | cca preset list | List installed presets |

Known Providers

These providers are auto-detected with pre-configured base URLs:

| Name | Type | Base URL | |------|------|----------| | openai | OpenAI | https://api.openai.com/v1 | | deepseek | OpenAI | https://api.deepseek.com | | groq | OpenAI | https://api.groq.com/openai/v1 | | qwen / dashscope | OpenAI | https://dashscope.aliyuncs.com/compatible-mode/v1 | | ollama | OpenAI | http://localhost:11434/v1 | | gemini / google | Gemini | Google AI API | | copilot / github | Copilot | GitHub Copilot API | | anthropic / claude | Anthropic | https://api.anthropic.com |

Any other name is treated as a custom OpenAI-compatible provider — you'll be prompted for the base URL.

Configuration

Configuration is stored in config.toml by default or a custom path via --config. TOML, JSON, JSON5, YAML, and YML are supported.

Project-level overrides are loaded from ~/.claude/projects/<project-hash>/claude-code-adapter.{toml,json,yaml,yml} and merged over the global config.

[server]
host = "127.0.0.1"
port = 8080
# Optional adapter API authentication
# api_key_env = "ADAPTER_API_KEY"
# Optional upstream HTTP/HTTPS proxy
# proxy = "http://proxy:8080"

# Optional rate limiting
# [server.rate_limit]
# enabled = true
# requests_per_minute = 120
# burst = 20
# models = { "deepseek-chat" = { requests_per_minute = 60, burst = 10 } }
# providers = { "openai" = { requests_per_minute = 100, burst = 20 } }
# keys = { "my-secret-key" = { requests_per_minute = 30, burst = 5 } }

[[providers]]
name = "deepseek"
type = "openai"
api_key_env = "DEEPSEEK_API_KEY"
base_url = "https://api.deepseek.com"

  [[providers.models]]
  name = "deepseek-chat"
  model_id = "deepseek-chat"

  [[providers.models]]
  name = "deepseek-reasoner"
  model_id = "deepseek-reasoner"

[[providers]]
name = "gemini"
type = "gemini"
api_key_env = "GEMINI_API_KEY"

  [[providers.models]]
  name = "gemini-2.5-pro"
  model_id = "gemini-2.5-pro-preview-05-06"

  [[providers.models]]
  name = "gemini-2.5-flash"
  model_id = "gemini-2.5-flash-preview-04-17"

[routing]
default_model = "deepseek-chat"
think = "deepseek-reasoner"
background = "gemini-2.5-flash"
long_context = "gemini-2.5-pro"
long_context_threshold = 60000
web_search = "deepseek-chat"
image = "gemini-2.5-pro"
# fallback = ["deepseek-reasoner", "deepseek-chat"]
# router_script = "/path/to/router.lua"

[[routing.custom_rules]]
name = "large-context"
min_tokens = 100000
route_to = "gemini-2.5-pro"

[logging]
# log_to_file = true
# log_dir = "~/.claude-code-adapter/logs"
# log_json = true

# [logging.api]
# enabled = false
# allow_delete = false
# max_read_bytes = 1048576

[metrics]
# enabled = true

# [tokenizer.default]
# type = "heuristic"
# [tokenizer.providers.openai]
# type = "openai_approx"

# [routing.subagent_tags]
# enabled = true
# strip_tags = true
# compat_ccr = true

[[transformers]]
name = "clean_cache"

[[transformers]]
name = "max_tokens"
options = { max = 8192 }

[[transformers]]
name = "inject_image_tool"

# [[transformers]]
# name = "custom_params"
# providers = ["openai"]
# options = { parallel_tool_calls = false }

# [[transformers]]
# name = "stream_options"
# providers = ["openai"]
# options = { include_usage = true }

# [[transformers]]
# name = "sampling"
# options = { temperature = 0.7, top_p = 0.95, mode = "fill_missing" }

API Key Configuration

Each provider supports two ways to configure the API key:

api_key — hardcoded in the config file
api_key_env — read from an environment variable (recommended)

The adapter itself can also be protected with [server] api_key or api_key_env. Authenticated requests may use Authorization: Bearer <key> or x-api-key: <key>.

REST API

The adapter exposes management endpoints alongside the Anthropic-compatible API:

| Endpoint | Method | Description | |----------|--------|-------------| | /v1/messages | POST | Anthropic Messages API (proxied) | | /v1/messages/count_tokens | POST | Anthropic-compatible token counting | | /v1/chat/completions | POST | OpenAI-compatible chat completions API | | /v1/adapter/models | GET | List all models and routing config | | /v1/adapter/switch | POST | Switch default model or scenario routing | | /v1/adapter/provider | POST | Add a provider at runtime | | /v1/adapter/provider/{name} | DELETE | Remove a provider | | /v1/adapter/usage | GET | Session token usage and error summary | | /v1/adapter/config | GET/PUT | Read redacted config or hot-apply a validated config | | /v1/adapter/transformers | GET | List built-in transformers and phases | | /v1/adapter/logs/files | GET | List configured log files when [logging.api] enabled = true | | /v1/adapter/logs | GET/DELETE | Read bounded log slices or clear logs when explicitly enabled | | /v1/adapter/restart | POST | Disabled-by-default restart hook; returns unsupported without supervisor | | /health | GET | Basic JSON health check | | /health/detail | GET | Configured provider/model health detail | | /metrics | GET | Prometheus-style metrics when [metrics] enabled = true | | / | GET | Web UI |

Routing

The adapter uses scenario-based routing to pick the best model for each request:

Router script — if [routing] router_script returns a configured model name
Custom rules — if a [[routing.custom_rules]] entry matches model pattern, tools, or token threshold
Web search — if the request uses a web_search tool, route to web_search
Image — if the request contains images, route to image
Background — if the request targets a lightweight model (e.g. haiku), route to background
Long context — if tokenizer-estimated tokens exceed the threshold (default: 60,000), route to long_context
Think — if the request includes extended thinking, route to think
Subagent tag override — if [routing.subagent_tags] enabled = true, <CCA-SUBAGENT-MODEL>model</CCA-SUBAGENT-MODEL> or CCR-compatible tags route to that configured model
Direct — if the requested model exists in the registry, use it directly
Default — fall back to default_model

Router scripts are external processes. .lua files run through lua, .wasm files run through wasmtime, and other paths run directly. The request context is passed in CLAUDE_ADAPTER_ROUTING_CONTEXT; printing a model name selects it, while empty output or null defers to built-in routing.

Configure providers, models, and routing interactively:

cca model

The menu can list current routing, switch default/think/background/long_context/web_search/image, add models to an existing provider, and create a new provider with model discovery or manual model entry.

Or via the REST API:

curl -X POST http://127.0.0.1:8080/v1/adapter/switch \
  -H "Content-Type: application/json" \
  -d '{"model": "deepseek-reasoner", "scenario": "think"}'

Observability

Token usage is tracked for non-streaming responses and for streaming responses that include usage events. The usage endpoint returns session totals, per-provider token counts, request counts, and error counts:

curl http://127.0.0.1:8080/v1/adapter/usage

Enable Prometheus-style metrics with [metrics] enabled = true, then scrape GET /metrics. Metrics include request counters with provider/model/scenario labels, token counters, provider errors, and latency summaries.

Health endpoints:

GET /health — fast adapter health, version, uptime, model count, metrics status, and rate-limit status
GET /health/detail — configured provider/model detail plus cached probe status when [health] live_probes = true

Token counting:

curl http://127.0.0.1:8080/v1/messages/count_tokens \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-chat","messages":[{"role":"user","content":"hello"}],"max_tokens":100}'

Image Input

OpenAI-compatible image_url message parts are converted to Anthropic image blocks. Data URLs such as data:image/png;base64,... become Anthropic base64 image sources; normal URLs become URL image sources.

When inject_image_tool is enabled, the adapter can inject an analyzeImage tool for non-vision models. In non-streaming Anthropic and OpenAI-compatible requests, the adapter intercepts analyzeImage, routes the image question to the configured image model, appends the tool result, and asks the original model to finish the response.

MCP Integration

This adapter can also run as an MCP server, providing tools for managing providers and models directly from Claude Code:

add_provider — add a new provider with models
remove_provider — remove a provider
list_models — list all configured models
switch_model — switch the active model
discover_models — auto-discover models from a provider API
login_copilot — GitHub Copilot device flow login

Roadmap

P0-P4 feature work is complete. Remaining items in docs/roadmap.md are optional future enhancements, mainly live upstream health probes and streaming image-agent interception.

Release

npm releases are published by pushing a version tag such as v0.0.2. Always bump versions before creating the tag:

bash scripts/version-bump.sh 0.0.2
git add Cargo.toml package.json npm/*/package.json
git commit -m "chore: 发布 0.0.2"
git push origin master

git tag v0.0.2
git push origin v0.0.2

See docs/release.md for the full npm/GitHub Actions release guide, including how to fix a tag created before the version bump.

License

MIT