copilot-custom-endpoint
v1.3.14
Published
Local proxies for VS Code Copilot custom endpoints — Kimi K2 & Qwen 3.x
Downloads
3,304
Maintainers
Readme
GitHub Copilot Custom Endpoints
TL;DR — GitHub Copilot switched to usage-based billing on June 1, 2026. Every chat and agent session now burns AI credits — fast. This repo shows you how to plug cheaper non-GitHub models (DeepSeek, Kimi, Qwen, MiMo, MiniMax, GLM) into VS Code's Copilot chat — often 5–55× cheaper than the built-ins — while keeping agent mode, tools, streaming, and vision.
What is this?
VS Code lets you add your own language-model endpoint via a small JSON config file. Many providers advertise "OpenAI-compatible" APIs but reject the exact request shapes VS Code sends. This repo collects real, tested setups — one per provider — plus a tiny local proxy that smooths over the rough edges when needed.
If OpenRouter is blocked by your network or too generic for your model's quirks, this is the workaround.
How it works (4 steps)
- Pick a model from the table below.
- Add it to your VS Code config — copy the snippet from the model's doc.
- Set the API key through VS Code's UI (it goes to your OS keychain, not the file).
- Open chat and pick the model from the model picker.
That's it. No code, no servers to manage (unless the model specifically needs the local proxy — the table tells you).
Pick a model
| Model | Provider | Needs proxy? | Vision | Setup guide | | --------------------------- | --------- | ---------------------- | ------------ | -------------------------------------------------------------------------------------------------- | | MiMo V2 Flash | Xiaomi | No | ❌ | Setup | | MiMo V2.5 | Xiaomi | No | ✅ | Setup | | MiMo V2.5 Pro | Xiaomi | No | ❌ | Setup | | Kimi K2.6 | Moonshot | Yes | ✅ | Setup | | Qwen 3.7 Plus | DashScope | Optional | ✅ | Setup | | Qwen 3.7 Max | DashScope | Optional | ❌ | Setup | | MiniMax M3 | MiniMax | No | ✅ | Setup | | GLM 5.1 | Z.ai | No | ❌ | Setup | | GLM 5V Turbo | Z.ai | No | ✅ | Setup | | DeepSeek V4 Pro / Flash | DeepSeek | No (uses an extension) | ✅ via proxy | Marketplace |
Setup
1. Find (or create) your config file
| OS | Path |
| ------- | ----------------------------------------------------------------- |
| Windows | %APPDATA%\Code\User\chatLanguageModels.json |
| macOS | ~/Library/Application Support/Code/User/chatLanguageModels.json |
| Linux | ~/.config/Code/User/chatLanguageModels.json |
If the file doesn't exist yet, create it with [] inside.
2. Add a model entry
Open the setup guide for the model you picked (links in the table above) and copy its JSON snippet into the file. Each snippet is a single provider object inside the array.
⚠️ Leave
apiKeyas""— never paste the key into the JSON file.
3. Set the API key
- Open the Command Palette (
Ctrl+Shift+P/Cmd+Shift+P). - Run Chat: Manage Language Models.
- Find your provider in the list, right-click the group name → Update API Key.
- Paste your key. It's stored in your OS keychain.
4. Chat
- Open Copilot chat (
Ctrl+Alt+I/Cmd+Ctrl+I). - Click the model picker (top-right).
- Pick your model and ask something.
If a model needs a proxy, the setup guide will tell you to run a command first. Keep that terminal open while you chat.
Common commands
Run from the repo root:
npm run proxy # Start both proxies (Kimi + Qwen)
npm run proxy:kimi # Start only the Kimi proxy
npm run proxy:qwen # Start only the Qwen proxy
npm run clean:logs # Remove debug_log/
npm test # Run the test suiteOr globally via npx (no clone needed):
npx copilot-custom-endpoint # Start both proxies
npx copilot-custom-endpoint kimi # Kimi only
npx copilot-custom-endpoint qwen # Qwen only
npx copilot-custom-endpoint clean # Remove debug_log/Pricing snapshot
All prices are USD per 1M tokens (cache miss). 1 AI credit = $0.01. MiniMax M3 figures reflect a permanent 50% off list price — see the model doc for the full rate card.
| Model | Input | Output | Context | | ---------------------------- | ----- | ------ | ------- | | MiMo V2 Flash 🏆 | $0.10 | $0.30 | 256K | | DeepSeek V4 Flash 🏆 | $0.14 | $0.28 | 1M | | Kimi K2.6 (non-thinking) | $0.16 | $0.95 | 256K | | MiniMax M3 | $0.30 | $1.20 | 1M | | MiMo V2.5 | $0.40 | $2.00 | 1M | | Qwen 3.7 Plus | $0.40 | $1.60 | 1M | | MiMo V2.5 Pro | $1.00 | $3.00 | 1M | | GLM 5V Turbo | $1.20 | $4.00 | 200K | | GLM 5.1 | $1.40 | $4.40 | 200K | | Qwen 3.7 Max | $2.50 | $7.50 | 1M |
For the full pricing comparison (cached rates, full Copilot roster, footnotes, sources) see docs/pricing.md. For a copy-paste config containing all providers at once, see docs/example-config.md.
Companion tools
ℹ️ These are third-party tools — not built into the custom endpoints or proxies in this repo. Each one must be installed, configured, and (where applicable) billed for separately, directly with its own provider. Nothing here is bundled, proxied, or auto-configured by
copilot-custom-endpointor the per-model setups above. The entries below are just pointers to tools the author has found useful alongside the model configs.
These work alongside the providers above and fill gaps that VS Code's built-in tool surface doesn't cover natively.
🎬 Video Context MCP — video understanding for AI coding assistants
VS Code's built-in view_image tool only accepts static images (PNG, JPG, GIF, WebP). That's a hard wall if you want to ask an AI assistant about a screen recording, a screencast, a product demo, or any other video. Several vision-capable models in this repo actually accept video natively — but VS Code's tool pipeline never gets the chance to forward it.
Video Context MCP is a small MCP server that bridges that gap. It works with GitHub Copilot, Cursor, and Claude Code out of the box, and:
- Extracts frames from local files or remote URLs (no
ffmpeggymnastics required). - Routes them through a multi-provider fallback chain —
Gemini → GLM 4.6V Flash → Qwen3.7-plus → Kimi K2.6 → MiMo-V2.5. - Answers natural-language questions about the video grounded in actual frames: "what does the speaker click in the last 30 seconds?", "summarize the demo", "find the frame where the error appears".
- Extras: timestamp search, audio transcription with speaker diarization, and video metadata (resolution, duration, codec).
🪣 Bitbucket MCP — secure, read-only Bitbucket access for VS Code Copilot, Cursor, and Claude Code
GitHub ships a first-party MCP server (and it's even bundled into Copilot), so asking "what's open in my org's repos / show me PR #123" works seamlessly on github.com. Bitbucket has no equivalent — Atlassian hasn't shipped one — which leaves Bitbucket Cloud users copy-pasting PR URLs, diffs, and file contents into chat by hand.
Bitbucket MCP is a small MCP server that closes that gap. It works with VS Code GitHub Copilot, Cursor, and Claude Code out of the box, and:
- 38 tools across 8 categories — repositories, pull requests, branches & commits, diffs & comparisons, CI/CD pipelines, issues, code search, and users.
- Read-only by design —
makeRequest()blocks all non-GET requests at runtime, so no write, delete, or modify operation is possible. - Token-friendly output — every tool supports
text,json, andtoon(Token-Oriented Object Notation) formats;tooncuts LLM token consumption by 30–60% on large PR/commit lists. - JMESPath filtering on all 38 tools, so you can trim responses (e.g. only open PRs, or just title + author) before they hit the model.
- One-call PR context —
bb_get_contextbundles PR metadata, diffstat, CI statuses, and comments in a single request. - Drop-in install —
npm install -g @tugudush/bitbucket-mcpplus a short.vscode/mcp.jsonentry, authenticated with a Bitbucket API token + your Atlassian email.
Need help?
- Per-model issues: check the troubleshooting section at the bottom of each model's doc.
- Repo questions / bugs: open an issue on GitHub.
Repo layout
.
├── docs/models/<provider>-<model>.md # Per-model setup guides (the real docs)
├── proxy/ # Local compatibility shims
├── tests/ # Test assets
└── debug_log/ # Runtime logs (git-ignored)Want to add a new model?
- Create
docs/models/<provider>-<model>.mdwith a clear walkthrough. - If the provider needs request rewriting, add a proxy under
proxy/. - Submit a PR.
Limitations
- Chat only. Inline completions, semantic search, and next-edit suggestions still need a GitHub-hosted model.
- Each proxy is tuned for a specific provider family. Don't point the Kimi proxy at an arbitrary OpenAI-compatible endpoint.
Support
If this helped, consider sponsoring or donating:
Solana (SOL)
CWZccD3Ny3XotFZtnkcyzP3hapmu3ExknN1PF4rEvP3u