@ascdong/copilot-proxy
v0.1.25
Published
GitHub Copilot Model API Proxy — expose Copilot as OpenAI/Anthropic-compatible endpoints for Claude Code, Codex, and Gemini CLIs.
Downloads
1,874
Maintainers
Readme
copilot-proxy
Expose your GitHub Copilot subscription as OpenAI-, Anthropic-, and Gemini-compatible HTTP endpoints, so any tool that speaks those APIs — Claude Code, Codex CLI, Gemini CLI, custom scripts — can route through Copilot.
Runs on Node.js ≥ 20 (or Bun in development).
Features
- OpenAI-compatible:
POST /v1/chat/completions,/chat/completions,/v1/responses(streaming + non-streaming), plusGET /v1/models - Anthropic-compatible:
POST /v1/messages,/v1/messages/count_tokens(streaming + non-streaming) - Gemini-compatible:
POST /v1beta/models/{model}:generateContent/:streamGenerateContent/:countTokens, plusGET /v1beta/modelsfor CLI preflight - Format translation: Anthropic ↔ OpenAI, Gemini ↔ OpenAI, and Responses ↔ Chat Completions for models that don't support
/v1/responsesnatively (e.g. Claude via Copilot) - Web search fallback via Tavily or WebIQ — when a model rejects Anthropic's
web_searchtool, the proxy runs the query and injects syntheticserver_tool_use/web_search_tool_resultblocks (Anthropic/v1/messagesonly — i.e. Claude Code) - Reasoning effort control — set one target effort (
low…max) for all thinking requests; the proxy clamps it to the nearest level each model supports. Tunable live via CLI or the web portal, no restart - Vision passthrough — image inputs are forwarded to vision-capable Copilot models
- GitHub Device Flow OAuth — one-time login, tokens persisted locally
- Auto-refreshing Copilot token — the short-lived upstream token is refreshed in the background
- Model aliasing —
claude-opus-4-6↔claude-opus-4.6, etc. - 5-category token tracking — every request logged with input / cache-creation / cache-read / output / reasoning tokens, duration, status
- One-shot client configuration for Claude Code, Codex, and Gemini CLIs
- systemd user service — install/uninstall as a background service on Linux/WSL
Getting started
Use this from inside WSL. Install the package in your WSL distro, log in once, then install it as a background service. The proxy listens on
127.0.0.1:8989and serves both WSL and Windows clients on that port — there is no Windows-side install needed.
1. Open a WSL shell
wsl # or launch your distro from the Start menuMake sure Node.js ≥ 20 is available inside WSL (node -v).
2. Install the package
npm install -g @ascdong/copilot-proxy3. Sign in to GitHub
copilot-proxy loginA device-flow code is printed; open the URL in your browser and confirm. Tokens are persisted under ~/.copilot-proxy/.
4. Install as a background service
copilot-proxy service installThis registers a user-level systemd unit that starts the proxy on login and restarts it on failure. Verify it's up:
systemctl --user status copilot-proxy
curl http://127.0.0.1:8989/v1/models # should return JSONFrom here, any OpenAI / Anthropic / Gemini client — running in WSL or in Windows — can point at http://127.0.0.1:8989. Any API key will do; the proxy uses your Copilot session, not the client-provided key.
5. (Optional) wire up a client CLI
copilot-proxy config claude # writes ~/.claude/settings.json (+ Windows path under WSL)
copilot-proxy config codex # writes ~/.codex/config.toml
copilot-proxy config gemini # writes ~/.gemini/.env + ~/.gemini/settings.jsonExisting settings are merged, not overwritten.
Service management
copilot-proxy service reinstall # apply package updates
copilot-proxy service uninstall
journalctl --user -u copilot-proxy -fConfiguration options
Optional YAML at ~/.copilot-proxy/config.yaml (auto-generated on first run). Defaults are fine for most users; override port / address / log level as needed. Environment variable GITHUB_TOKEN can be used in place of the interactive login.
Web search fallback (Claude Code only)
When a model rejects Anthropic's web_search tool, the proxy can transparently run the query through a search provider and synthesize server_tool_use / web_search_tool_result blocks so the client still gets a normal Anthropic-shaped response.
This only applies to the Anthropic
/v1/messagesendpoint — in practice, Claude Code. The OpenAI and Gemini endpoints do not have an equivalentweb_searchtool, so this fallback never triggers for Codex or Gemini CLI.
Configure it with the web-search command — no need to hand-edit the config file:
copilot-proxy web-search use tavily tvly-... # set Tavily key, switch provider, enable
copilot-proxy web-search use webiq <key> # set WebIQ key, switch provider, enable
copilot-proxy web-search use tavily # switch to a provider with a saved key (no re-entry)
copilot-proxy web-search on # enable (current provider)
copilot-proxy web-search off # disable
copilot-proxy web-search status # show current settings (key masked)Setting a provider key automatically enables web search and switches to that provider. Keys are saved in ~/.copilot-proxy/config.yaml, so switching providers later doesn't require re-entering them. Restart the proxy after any change for it to take effect.
Supported providers:
| Provider | Key from |
| --- | --- |
| Tavily | tvly-... API key |
| WebIQ (api.microsoft.ai) | Microsoft AI API key |
With web search disabled (the default) or no key set, web_search requests are passed through unchanged.
Reasoning effort
Reasoning-capable models accept an effort that controls how hard the model thinks. The proxy holds a single global target and applies it, clamped to the nearest effort the selected model actually supports (e.g. xhigh becomes max on a model whose ladder is low/medium/high/max).
copilot-proxy effort high # set target effort (low | medium | high | xhigh | max)
copilot-proxy effort max
copilot-proxy effort status # show the config-file value AND the running valueDefault is high. Setting it via the CLI applies instantly to the running proxy (the command calls the proxy's API, which hot-reloads its in-memory value and persists to config.yaml) — no restart needed. The same control is available on the web portal's Reasoning page. If the proxy isn't running, the CLI just writes config.yaml and the value is picked up on next start.
Scope: effort is injected for models that advertise a
reasoning_effortcapability on:
- the Anthropic
/v1/messagespath for thinking requests (Claude Code), and- the OpenAI
/v1/responsespath when the request already carries areasoningobject and the selected Copilot model supports the Responses API (for example Codex / GPT-5.x).Models without that capability (e.g. Claude 4.5 / Haiku), requests with no thinking/reasoning section, Gemini endpoints, and
/v1/responsesrequests that must fall back to/chat/completionsare left unchanged. On the direct/v1/responsespath, a caller-providedreasoning.effortis preserved (and clamped if needed); otherwise the proxy's configured effort is used as the default. The effort actually sent is recorded in the request logs (copilot-proxy logs).
CLI reference
copilot-proxy <command> [options]
Commands:
start [-p <port>] [-H <host>] start the proxy server (foreground)
login sign in to GitHub via Device Flow OAuth
config <claude|codex|gemini> configure a client tool
[-o <path>] override the settings output path
usage [-m <YYYY-MM>] show token usage statistics
logs [-l <n>] [-e] [--model <m>] show recent request logs
[-d <YYYY-MM-DD>]
web-search use <tavily|webiq> [key] configure web search fallback (Claude Code only)
| on | off | status
effort <low|medium|high|xhigh|max> set reasoning effort for supported requests
| status (applies live, no restart)
service <install|uninstall|reinstall>
manage the systemd user service (Linux/WSL)Run copilot-proxy --help for the full listing with examples.
logs — inspect requests
[13:34:22] 200 claude-opus-4.7 /v1/messages 1 in 12 cc 84 cr 938 out 0 r 21.2s
│ │ │ │ │ │ │ │ └─ reasoning tokens
│ │ │ │ │ │ │ └─ output tokens
│ │ │ │ │ │ └─ cache read tokens
│ │ │ │ │ └─ cache creation tokens
│ │ │ │ └─ fresh input tokens
│ │ │ └─ endpoint
│ │ └─ model (translated name when aliased)
│ └─ HTTP status
└─ request timeRequest logs live at ~/.copilot-proxy/logs/requests/YYYY-MM-DD.jsonl (180-day retention by default; cleanup runs on server startup).
References
License
MIT
