@hex4c59/claude-code-adapter
v0.0.3
Published
API adapter for Claude Code - route requests to any LLM provider
Downloads
239
Maintainers
Readme
claude-code-adapter
A high-performance API adapter that lets Claude Code work with any LLM provider. Written in Rust.
Claude Code only speaks the Anthropic Messages API. This adapter sits between Claude Code and your preferred LLM provider, translating requests and responses on the fly — so you can use GPT-4o, Gemini, DeepSeek, Qwen, Ollama, GitHub Copilot, or any OpenAI-compatible API as the backend.
Features
- Multi-provider support — OpenAI, Gemini, DeepSeek, Groq, Qwen, Ollama, GitHub Copilot, Anthropic passthrough, and any OpenAI-compatible API
- Anthropic + OpenAI-compatible APIs — expose both
/v1/messagesand/v1/chat/completions - Scenario-based routing — route different types of requests to different models:
default— normal requeststhink— extended thinking / plan mode requestsbackground— lightweight background taskslong_context— requests exceeding a token thresholdweb_search— requests using a web-search toolimage— requests containing image inputs
- Custom routing — external router scripts plus config-based custom rules
- Streaming — full SSE streaming support, just like the native Anthropic API
- Fallback chains — retry configured fallback models on upstream failures before a response stream is committed
- Transformer pipeline — built-in request/response/provider-request transforms such as cache cleanup, max-token caps, custom provider params, stream options, sampling defaults, and image-tool injection
- Image support — OpenAI
image_urlinput mapping andanalyzeImageimage-agent interception for non-streaming requests - Auth, rate limiting, and observability — API key auth, global/model/provider/key rate limits, token counting, token usage tracking, Prometheus metrics, hooks, logs API, and detailed health endpoints
- Multi-format configuration — TOML, JSON, JSON5, and YAML, with project-level overrides
- Presets and shell integration — export/install presets, statusline output, daemon mode, and
activateshell integration - Auto model discovery — automatically discover available models from provider APIs
- Hot management — add/remove providers and switch models at runtime via REST API or CLI
- Web UI — built-in web interface for configuration
- GitHub Copilot integration — use your Copilot subscription as an LLM backend with device flow login
Installation
npm (recommended)
npm install -g @hex4c59/claude-code-adapterThe npm package installs both claude-code-adapter and the shorter cca command.
Build from source
git clone https://github.com/Hex4C59/claude-code-adapter.git
cd claude-code-adapter
cargo build --releaseThe binary will be at target/release/claude-code-adapter.
Quick Start
1. Add a provider
# Interactive — prompts for API key, auto-discovers models
cca add deepseek
# With API key
cca add openai --api-key sk-xxx
# GitHub Copilot (opens browser for device flow login)
cca add copilot
# Local Ollama (no API key needed)
cca add ollama2. Run Claude Code through the adapter
cca codecca code starts the local adapter daemon if needed, waits for /health, injects the Claude Code environment variables, runs claude, and stops the daemon when Claude Code exits.
Advanced/manual mode is still available:
cca serve
cca serve --daemon
cca stopIf you want to reuse an already-running adapter, run cca code --no-start. If you want the daemon to remain running after Claude Code exits, run cca code --keep-alive.
CLI Commands
| Command | Description |
|---------|-------------|
| cca serve | Start the HTTP proxy server |
| cca serve --daemon | Start the server in the background |
| cca stop | Stop the background server |
| cca code | Auto-start the adapter, run Claude Code, then stop the adapter |
| cca code --keep-alive | Run Claude Code and leave the adapter daemon running |
| cca code --no-start | Run Claude Code against an already-running adapter |
| cca add <provider> | Add a provider (auto-discovers models) |
| cca remove <provider> | Remove a provider |
| cca models | List all available models |
| cca select | Interactively select a model (fuzzy search) |
| cca switch <model> | Switch the default model |
| cca model | Interactive provider, model, and routing management |
| cca login | Login to GitHub Copilot |
| cca activate | Print optional shell integration |
| cca status | Show detailed token usage status |
| cca statusline | Print a Claude Code statusline |
| cca preset export <name> | Export current config as a preset |
| cca preset install <source> | Install a preset from a file or URL |
| cca preset list | List installed presets |
Known Providers
These providers are auto-detected with pre-configured base URLs:
| Name | Type | Base URL |
|------|------|----------|
| openai | OpenAI | https://api.openai.com/v1 |
| deepseek | OpenAI | https://api.deepseek.com |
| groq | OpenAI | https://api.groq.com/openai/v1 |
| qwen / dashscope | OpenAI | https://dashscope.aliyuncs.com/compatible-mode/v1 |
| ollama | OpenAI | http://localhost:11434/v1 |
| gemini / google | Gemini | Google AI API |
| copilot / github | Copilot | GitHub Copilot API |
| anthropic / claude | Anthropic | https://api.anthropic.com |
Any other name is treated as a custom OpenAI-compatible provider — you'll be prompted for the base URL.
Configuration
Configuration is stored in config.toml by default or a custom path via --config. TOML, JSON, JSON5, YAML, and YML are supported.
Project-level overrides are loaded from ~/.claude/projects/<project-hash>/claude-code-adapter.{toml,json,yaml,yml} and merged over the global config.
[server]
host = "127.0.0.1"
port = 8080
# Optional adapter API authentication
# api_key_env = "ADAPTER_API_KEY"
# Optional upstream HTTP/HTTPS proxy
# proxy = "http://proxy:8080"
# Optional rate limiting
# [server.rate_limit]
# enabled = true
# requests_per_minute = 120
# burst = 20
# models = { "deepseek-chat" = { requests_per_minute = 60, burst = 10 } }
# providers = { "openai" = { requests_per_minute = 100, burst = 20 } }
# keys = { "my-secret-key" = { requests_per_minute = 30, burst = 5 } }
[[providers]]
name = "deepseek"
type = "openai"
api_key_env = "DEEPSEEK_API_KEY"
base_url = "https://api.deepseek.com"
[[providers.models]]
name = "deepseek-chat"
model_id = "deepseek-chat"
[[providers.models]]
name = "deepseek-reasoner"
model_id = "deepseek-reasoner"
[[providers]]
name = "gemini"
type = "gemini"
api_key_env = "GEMINI_API_KEY"
[[providers.models]]
name = "gemini-2.5-pro"
model_id = "gemini-2.5-pro-preview-05-06"
[[providers.models]]
name = "gemini-2.5-flash"
model_id = "gemini-2.5-flash-preview-04-17"
[routing]
default_model = "deepseek-chat"
think = "deepseek-reasoner"
background = "gemini-2.5-flash"
long_context = "gemini-2.5-pro"
long_context_threshold = 60000
web_search = "deepseek-chat"
image = "gemini-2.5-pro"
# fallback = ["deepseek-reasoner", "deepseek-chat"]
# router_script = "/path/to/router.lua"
[[routing.custom_rules]]
name = "large-context"
min_tokens = 100000
route_to = "gemini-2.5-pro"
[logging]
# log_to_file = true
# log_dir = "~/.claude-code-adapter/logs"
# log_json = true
# [logging.api]
# enabled = false
# allow_delete = false
# max_read_bytes = 1048576
[metrics]
# enabled = true
# [tokenizer.default]
# type = "heuristic"
# [tokenizer.providers.openai]
# type = "openai_approx"
# [routing.subagent_tags]
# enabled = true
# strip_tags = true
# compat_ccr = true
[[transformers]]
name = "clean_cache"
[[transformers]]
name = "max_tokens"
options = { max = 8192 }
[[transformers]]
name = "inject_image_tool"
# [[transformers]]
# name = "custom_params"
# providers = ["openai"]
# options = { parallel_tool_calls = false }
# [[transformers]]
# name = "stream_options"
# providers = ["openai"]
# options = { include_usage = true }
# [[transformers]]
# name = "sampling"
# options = { temperature = 0.7, top_p = 0.95, mode = "fill_missing" }API Key Configuration
Each provider supports two ways to configure the API key:
api_key— hardcoded in the config fileapi_key_env— read from an environment variable (recommended)
The adapter itself can also be protected with [server] api_key or api_key_env. Authenticated requests may use Authorization: Bearer <key> or x-api-key: <key>.
REST API
The adapter exposes management endpoints alongside the Anthropic-compatible API:
| Endpoint | Method | Description |
|----------|--------|-------------|
| /v1/messages | POST | Anthropic Messages API (proxied) |
| /v1/messages/count_tokens | POST | Anthropic-compatible token counting |
| /v1/chat/completions | POST | OpenAI-compatible chat completions API |
| /v1/adapter/models | GET | List all models and routing config |
| /v1/adapter/switch | POST | Switch default model or scenario routing |
| /v1/adapter/provider | POST | Add a provider at runtime |
| /v1/adapter/provider/{name} | DELETE | Remove a provider |
| /v1/adapter/usage | GET | Session token usage and error summary |
| /v1/adapter/config | GET/PUT | Read redacted config or hot-apply a validated config |
| /v1/adapter/transformers | GET | List built-in transformers and phases |
| /v1/adapter/logs/files | GET | List configured log files when [logging.api] enabled = true |
| /v1/adapter/logs | GET/DELETE | Read bounded log slices or clear logs when explicitly enabled |
| /v1/adapter/restart | POST | Disabled-by-default restart hook; returns unsupported without supervisor |
| /health | GET | Basic JSON health check |
| /health/detail | GET | Configured provider/model health detail |
| /metrics | GET | Prometheus-style metrics when [metrics] enabled = true |
| / | GET | Web UI |
Routing
The adapter uses scenario-based routing to pick the best model for each request:
- Router script — if
[routing] router_scriptreturns a configured model name - Custom rules — if a
[[routing.custom_rules]]entry matches model pattern, tools, or token threshold - Web search — if the request uses a
web_searchtool, route toweb_search - Image — if the request contains images, route to
image - Background — if the request targets a lightweight model (e.g. haiku), route to
background - Long context — if tokenizer-estimated tokens exceed the threshold (default: 60,000), route to
long_context - Think — if the request includes extended thinking, route to
think - Subagent tag override — if
[routing.subagent_tags] enabled = true,<CCA-SUBAGENT-MODEL>model</CCA-SUBAGENT-MODEL>or CCR-compatible tags route to that configured model - Direct — if the requested model exists in the registry, use it directly
- Default — fall back to
default_model
Router scripts are external processes. .lua files run through lua, .wasm files run through wasmtime, and other paths run directly. The request context is passed in CLAUDE_ADAPTER_ROUTING_CONTEXT; printing a model name selects it, while empty output or null defers to built-in routing.
Configure providers, models, and routing interactively:
cca modelThe menu can list current routing, switch default/think/background/long_context/web_search/image, add models to an existing provider, and create a new provider with model discovery or manual model entry.
Or via the REST API:
curl -X POST http://127.0.0.1:8080/v1/adapter/switch \
-H "Content-Type: application/json" \
-d '{"model": "deepseek-reasoner", "scenario": "think"}'Observability
Token usage is tracked for non-streaming responses and for streaming responses that include usage events. The usage endpoint returns session totals, per-provider token counts, request counts, and error counts:
curl http://127.0.0.1:8080/v1/adapter/usageEnable Prometheus-style metrics with [metrics] enabled = true, then scrape GET /metrics. Metrics include request counters with provider/model/scenario labels, token counters, provider errors, and latency summaries.
Health endpoints:
GET /health— fast adapter health, version, uptime, model count, metrics status, and rate-limit statusGET /health/detail— configured provider/model detail plus cached probe status when[health] live_probes = true
Token counting:
curl http://127.0.0.1:8080/v1/messages/count_tokens \
-H "Content-Type: application/json" \
-d '{"model":"deepseek-chat","messages":[{"role":"user","content":"hello"}],"max_tokens":100}'Image Input
OpenAI-compatible image_url message parts are converted to Anthropic image blocks. Data URLs such as data:image/png;base64,... become Anthropic base64 image sources; normal URLs become URL image sources.
When inject_image_tool is enabled, the adapter can inject an analyzeImage tool for non-vision models. In non-streaming Anthropic and OpenAI-compatible requests, the adapter intercepts analyzeImage, routes the image question to the configured image model, appends the tool result, and asks the original model to finish the response.
MCP Integration
This adapter can also run as an MCP server, providing tools for managing providers and models directly from Claude Code:
add_provider— add a new provider with modelsremove_provider— remove a providerlist_models— list all configured modelsswitch_model— switch the active modeldiscover_models— auto-discover models from a provider APIlogin_copilot— GitHub Copilot device flow login
Roadmap
P0-P4 feature work is complete. Remaining items in docs/roadmap.md are optional future enhancements, mainly live upstream health probes and streaming image-agent interception.
Release
npm releases are published by pushing a version tag such as v0.0.2. Always bump versions before creating the tag:
bash scripts/version-bump.sh 0.0.2
git add Cargo.toml package.json npm/*/package.json
git commit -m "chore: 发布 0.0.2"
git push origin master
git tag v0.0.2
git push origin v0.0.2See docs/release.md for the full npm/GitHub Actions release guide, including how to fix a tag created before the version bump.
License
MIT
