@dianshuv/copilot-api
v0.7.8
Published
Turn GitHub Copilot into OpenAI/Anthropic API compatible server. Usable with Claude Code!
Readme
Copilot API Proxy (Fork)
[!NOTE] This is a fork of @hsupu/copilot-api, which itself is a fork of ericc-ch/copilot-api, with additional improvements and bug fixes.
[!WARNING] This is a reverse-engineered proxy of GitHub Copilot API. It is not supported by GitHub, and may break unexpectedly. Use at your own risk.
New Features (over @hsupu/copilot-api)
- Responses API endpoint:
/v1/responsespassthrough for codex models (e.g.,gpt-5.2-codex,gpt-5.3-codex) used by tools like OpenCode. Includes stream ID synchronization for@ai-sdk/openaicompatibility. - SubagentStart marker support: Detects
__SUBAGENT_MARKER__injected by Claude Code hooks to overrideX-Initiatorheader to"agent"for subagent requests, ensuring correct credit tier usage. Includes a ready-to-use Claude plugin (claude-plugin/). - Token analytics tab: The
/historypage includes a Tokens tab with per-model token usage summary table and cumulative ECharts line chart for visualizing API consumption over time. - Real-time history updates: The
/historyUI uses WebSocket for live updates instead of polling, with automatic fallback to polling and exponential backoff reconnection. - Graceful shutdown: 4-phase shutdown sequence — stops accepting requests, waits for in-flight requests to complete, sends abort signal, then force-closes. Configurable via
--shutdown-graceful-waitand--shutdown-abort-wait. - Stream repetition detection: Detects when models get stuck in repetitive output loops using KMP-based pattern matching and logs a warning.
- Stale request reaping: Automatically force-fails requests that exceed a configurable maximum age (default 600s) to prevent resource leaks.
- Gemini API compatibility:
/v1beta/modelsendpoints translate Gemini API requests to OpenAI format for Copilot. Enables Google Gemini CLI to use Copilot models viaGOOGLE_GEMINI_BASE_URLenvironment variable. - PostHog analytics: Optional PostHog Cloud integration (
--posthog-key) sends per-request token usage events for long-term trend analysis. Free tier (1M events/month) is more than sufficient for individual use.
Quick Start
Install from npm (Recommended)
# Run directly with npx
npx @dianshuv/copilot-api start
# Or install globally
npm install -g @dianshuv/copilot-api
copilot-api startDevelopment
# Start the server (foreground, production mode)
make up
# Stop the server (graceful shutdown)
make downCommand Reference
| Command | Description |
|---------|-------------|
| start | Start the API server (handles auth if needed) |
| auth | Run GitHub authentication flow only |
| logout | Remove stored GitHub token |
| check-usage | Show Copilot usage and quota |
| debug | Display diagnostic information |
| patch-claude | Patch Claude Code's context window limit |
Start Command Options
| Option | Description | Default |
|--------|-------------|---------|
| --port, -p | Port to listen on | 4141 |
| --host, -H | Host/interface to bind to | (all interfaces) |
| --verbose, -v | Enable verbose logging | false |
| --account-type, -a | Account type (individual, business, enterprise) | individual |
| --manual | Manual request approval mode | false |
| --no-rate-limit | Disable adaptive rate limiting | false |
| --retry-interval | Seconds to wait before retrying after rate limit | 10 |
| --request-interval | Seconds between requests in rate-limited mode | 10 |
| --recovery-timeout | Minutes before attempting recovery | 10 |
| --consecutive-successes | Successes needed to exit rate-limited mode | 5 |
| --github-token, -g | Provide GitHub token directly | none |
| --claude-code, -c | Generate Claude Code launch command | false |
| --show-token | Show tokens on fetch/refresh | false |
| --proxy-env | Use proxy from environment | false |
| --no-history | Disable request history UI at /history | false |
| --history-limit | Max history entries in memory | 1000 |
| --no-auto-truncate | Disable auto-truncate when exceeding token limits | false |
| --compress-tool-results | Compress old tool results before truncating | false |
| --redirect-anthropic | Force Anthropic through OpenAI translation | false |
| --strip-server-tools | Strip server-side tools from Anthropic requests | false |
| --context-editing | Context editing mode: off, clear-thinking, clear-tooluse, clear-both | off |
| --timezone-offset | Timezone offset in hours from UTC for log timestamps (e.g., +8, -5, 0) | +8 |
| --posthog-key | PostHog API key for token usage analytics (opt-in) | none |
Patch-Claude Command Options
| Option | Description | Default |
|--------|-------------|---------|
| --limit, -l | Context window limit in tokens | 128000 |
| --restore, -r | Restore original 200k limit | false |
| --path, -p | Path to Claude Code cli.js | auto-detect |
| --status, -s | Show current patch status | false |
API Endpoints
OpenAI Compatible
| Endpoint | Method | Description |
|----------|--------|-------------|
| /v1/chat/completions | POST | Chat completions |
| /v1/models | GET | List available models |
| /v1/embeddings | POST | Text embeddings |
| /v1/responses | POST | Responses API (for codex models) |
Anthropic Compatible
| Endpoint | Method | Description |
|----------|--------|-------------|
| /v1/messages | POST | Messages API |
| /v1/messages/count_tokens | POST | Token counting |
| /v1/event_logging/batch | POST | Event logging (no-op) |
Gemini Compatible
| Endpoint | Method | Description |
|----------|--------|-------------|
| /v1beta/models/{model}:generateContent | POST | Non-streaming generation |
| /v1beta/models/{model}:streamGenerateContent | POST | Streaming generation (SSE) |
| /v1beta/models/{model}:countTokens | POST | Token counting |
Utility
| Endpoint | Method | Description |
|----------|--------|-------------|
| / | GET | Server status |
| /usage | GET | Copilot usage stats |
| /token | GET | Current Copilot token |
| /health | GET | Health check |
| /history | GET | Request history Web UI with token analytics (enabled by default) |
| /history/api/* | GET/DELETE | History API endpoints |
Auto-Truncate
When enabled (default), auto-truncate automatically compacts conversation history when it exceeds the model's token limit. This prevents request failures due to context overflow.
- Token-based truncation: Uses the model's
max_context_window_tokensfrom the Copilot API to determine when truncation is needed. A 2% safety margin is applied. - No preset byte limit: There is no hardcoded request body size limit. If the Copilot API returns a 413 (Request Entity Too Large), the proxy dynamically learns the byte limit and applies it to subsequent requests.
- Smart compression: With
--compress-tool-results, old tool results are compressed before removing messages, preserving more conversation context. - Orphan filtering: After truncation, orphaned tool results (without matching tool calls) are automatically removed.
Using with Claude Code
Create .claude/settings.json in your project:
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:4141",
"ANTHROPIC_AUTH_TOKEN": "dummy",
"ANTHROPIC_MODEL": "gpt-4.1",
"ANTHROPIC_SMALL_FAST_MODEL": "gpt-4.1",
"DISABLE_NON_ESSENTIAL_MODEL_CALLS": "1",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
},
"permissions": {
"deny": ["WebSearch"]
}
}Or use the interactive setup:
bun run start --claude-codeUsing with Gemini CLI
# Start the proxy
copilot-api start
# Configure Gemini CLI to use the proxy
export GEMINI_API_KEY="placeholder"
export GOOGLE_GEMINI_BASE_URL="http://localhost:4141"
# Basic conversation
gemini -p "Explain this code"
# Pipe review
git diff HEAD~1 | gemini -p "Review this diff for bugs"Upstream Project
For the original project documentation, features, and updates, see: ericc-ch/copilot-api
