claude-nim
v1.0.10
Published
Use NVIDIA NIM models (DeepSeek, Llama, Qwen, Mistral, 100+) with Claude Code. Translates Anthropic API to OpenAI-compatible format.
Maintainers
Readme
Claude-NIM Proxy
A VS Code extension that lets you use NVIDIA NIM models with Claude Code (and any Anthropic Messages API client).
It translates Anthropic Messages API requests into OpenAI-compatible requests for the NVIDIA NIM platform — no permanent config changes required.
How It Works
Claude Code ──→ Claude-NIM Proxy ──→ NVIDIA NIM API
(Anthropic API) (localhost:3456) (OpenAI-compatible)- Install the VS Code extension and set your NVIDIA NIM API key
- Start the proxy from the VS Code status bar or command palette
- Use the "Launch Claude Code with Proxy" command to open a pre-configured terminal
- When you stop the proxy, everything reverts — zero permanent changes
Quick Start
# One command — installs, configures, and launches
npx --yes claude-nim
# Or install globally first
npm install -g claude-nim
claude-nimWhy Claude-NIM Proxy?
Compared to CLI-only proxies (claude-code-proxy, CCProxy, LiteLLM)
| Feature | Claude-NIM Proxy | CLI Proxies (Python/Go) |
|---------|-----------------|------------------------|
| VS Code integration | Status bar, commands, SecretStorage | None — manual env vars |
| One-click model selection | Browse 100+ NIM models in VS Code | Config file editing |
| Encrypted API key storage | AES-256-GCM in VS Code SecretStorage | Plaintext env vars or config files |
| Launch Claude Code | One command opens pre-configured terminal | Manual export ANTHROPIC_BASE_URL=... |
| Zero-config onboarding | Interactive key prompt, auto-detection | Requires Python/pip/Go, manual setup |
| Reactive settings | Port/timeout/cache changes apply live | Requires restart |
Compared to Claude Code Router (26k stars)
| Feature | Claude-NIM Proxy | Claude Code Router |
|---------|-----------------|-------------------|
| Setup | npm install + VS Code command | npm install + config.json + ccr start |
| VS Code integration | Full (status bar, commands, model browser) | None |
| Model-family adapters | 12 adapters fixing per-model quirks | Generic passthrough |
| Security | Prompt injection scrubbing, context pruning, body limits | None |
| Test coverage | 32 tests + 100-stream stress test | Minimal |
| Language | TypeScript (zero runtime deps) | Node.js + YAML config |
| Reasoning toggle | Control <think> visibility from VS Code | Not available |
Compared to CCProxy (Orchestre)
| Feature | Claude-NIM Proxy | CCProxy |
|---------|-----------------|---------|
| Language | TypeScript (zero deps besides jsonrepair) | Go (binary) |
| VS Code integration | Full extension | CLI only |
| Model adapters | 12 model-family adapters | Generic |
| Security | Prompt injection defense, context pruning | None |
| CLI mode | Standalone npx claude-nim with encrypted keys | Requires config file |
| Test coverage | 32 tests + stress test | None public |
What Makes This Different
1. Model-Family Adapters (Unique)
12 built-in adapters that fix per-model quirks so you don't have to:
| Adapter | What it fixes |
|---------|--------------|
| DeepSeek R1/V3 | Disables tool_choice: any, caps temperature 0.6 |
| Llama 3.x/4.x | Caps temperature 0.7, enables structured JSON outputs |
| Qwen 2.5/3 | Caps temperature 0.7, sets stop tokens |
| Kimi K2.x | Caps temperature 0.6 |
| Nemotron Ultra/Super | Caps temperature 0.6, enforces max_tokens |
| Mistral, Phi, Gemma, Command-R | Model-specific handling |
Why it matters: Generic proxies pass through parameters unchanged, causing hallucinations, tool call failures, or crashes on incompatible models.
2. Full Anthropic Content Type Translation
| Content Type | Our Handling | Competitors |
|---|---|---|
| text blocks | Direct mapping | Basic |
| tool_use blocks | Converted to OpenAI tool_calls | Partial |
| tool_result blocks | Converted with tool_call_id | Often broken |
| Image (base64) | Converted to image_url data URI | Not handled |
| Mixed text+tool results | Split into separate messages | Crashes |
| tool_result with is_error | [ERROR] prefix preserved | Lost |
| system prompt (string/array) | Converted to system message | Basic |
| tool_choice (auto/any/tool) | Full mapping to OpenAI equivalents | auto only |
3. Security (Unique Among Proxies)
No other proxy in the ecosystem includes:
- Prompt injection scrubbing — Neutralizes
ignore previous instructionsandyou are nowpatterns before they reach the model - Context pruning — Auto-trims large tool outputs (over 100K chars) to prevent context overflow
- 10 MB request body limit — Prevents memory exhaustion
- Unicode sanitization — Strips U+FFFD replacement characters that cause encoding corruption
- Localhost-only binding — Server never exposed to network
4. VS Code Integration (Unique)
No CLI-only proxy offers:
- Status bar — Click to toggle proxy, shows running/stopped state with port
- API key management — Stored in VS Code SecretStorage (never plaintext)
- Model browser — Fetches 100+ NIM models, shows context windows, one-click selection
- One-click launch — Opens a pre-configured terminal ready for Claude Code
- Reactive settings — Port, timeout, cache TTL changes apply without restart
- Debug logging — Toggle and view proxy logs from VS Code
5. Standalone CLI
npx claude-nim # Interactive setup
npx claude-nim --port 8080 --model deepseek-r1 # Explicit configFeatures:
- AES-256-GCM encrypted key storage — Machine-specific encryption key
- Dynamic port selection — Falls back to ephemeral port if 3456 is busy
- Interactive onboarding — Prompts for API key if not stored
- Claude auto-detection — Checks if
claudeCLI is installed, offers to install - Zombie-free teardown — SIGINT/SIGTERM handlers kill all child processes
6. 7-Layer Error Handling
- HTTP status mapping (AUTH_FAILED → 401, RATE_LIMITED → 429)
- SSE error events in the stream
- VS Code error notifications
- Retry with exponential backoff (respects
Retry-Afterheaders) - Configurable stream idle timeout
- 10 MB body size limit
- JSON parse error messages with context
7. Production-Ready Infrastructure
- Zero runtime dependencies (except
jsonrepair) - 32 real tests including 100-concurrent-stream stress test
- TypeScript with full type safety
- CI/CD with GitHub Actions (build, lint, test, auto-package VSIX)
- Force-close connections — Tracks and destroys all sockets on stop (no hanging)
- CORS support — Works with browser-based clients
VS Code Commands
| Command | Description |
|---------|-------------|
| Claude NIM Proxy: Manage NVIDIA NIM API Key | Set, update, or clear your API key |
| Claude NIM Proxy: Toggle Proxy Server | Start or stop the proxy |
| Claude NIM Proxy: Toggle Debug Logging | Enable/disable debug output |
| Claude NIM Proxy: Open Debug Log | View proxy logs |
| Claude NIM Proxy: Select Default Model | Browse and select a default NIM model |
| Claude NIM Proxy: Launch Claude Code with Proxy | Open a terminal ready to use Claude Code |
| Claude NIM Proxy: Toggle Show Reasoning | Show/hide model thinking (<think>) output |
Configuration
| Setting | Default | Description |
|---------|---------|-------------|
| nvidia-nim.proxyPort | 3456 | Port for the proxy server |
| nvidia-nim.defaultModel | "" | Default model (empty = require in request) |
| nvidia-nim.modelsCacheTTL | 5 | Model cache TTL in minutes |
| nvidia-nim.requestTimeout | 120 | Stream idle timeout in seconds |
Supported Models
Any model available on build.nvidia.com, including:
- DeepSeek R1, V3, V4
- Llama 3.x, 4.x
- Mistral Large, Medium
- Qwen 3, 2.5
- Kimi K2.x
- Nemotron Ultra, Super
- Gemma 3
- Phi 4
- Command-R+
- And 100+ more
Dynamic Model Switching
Switch models on-the-fly without restarting the proxy or Claude Code.
From Claude Code's /model Command
/model deepseek-r1 # Switch to DeepSeek R1
/model #3 # Select model #3 from /models list
/model # Show available modelsAPI Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| /api/model | GET | Get current model |
| /api/model | POST | Set model (body: {"model": "deepseek-r1"}) |
| /api/models | GET | List all available NIM models |
| /api/key | POST | Update API key (body: {"apiKey": "nvapi-..."}) |
| /api/metrics | GET | SSE stream of real-time request metrics |
| /api/metrics/history | GET | Historical metrics (last 1000 requests) |
| /api/stats | GET | Aggregate stats (total requests, tokens, latency) |
Persistence
Model state and metrics persist across proxy restarts via ~/.claude-nim/:
state.json— Current model, cache TTLmetrics.jsonl— Request history (auto-rotated at 2 MB)
Dashboard
Open the dashboard in your browser:
http://127.0.0.1:3456/dashboardFeatures:
- Real-time packet animation — See requests flow between Claude Code ↔ Proxy ↔ NIM
- Live stats — Request count, total tokens, average latency, peak tokens/sec, uptime
- Model selector — Browse and switch between all available NIM models
- API key management — Update your key without restarting
- Request history — Table of all requests with model, token counts, latency, status
- Live logs — Color-coded real-time proxy log stream
ANTHROPIC_CUSTOM_MODEL_OPTION
The proxy injects the ANTHROPIC_CUSTOM_MODEL_OPTION environment variable when launching Claude Code, which makes all NIM models appear in Claude Code's native model picker.
Without this env var, Claude Code filters models to only show ones matching /^(claude|anthropic)/i. The custom option bypasses this filter.
// Example ANTHROPIC_CUSTOM_MODEL_OPTION value
[
{"value":"deepseek-r1","label":"DeepSeek R1","description":"via NVIDIA NIM"},
{"value":"meta/llama-3.3-70b-instruct","label":"Llama 3.3 70B","description":"via NVIDIA NIM"}
]This is set automatically when using "Launch Claude Code with Proxy" or the npx claude-nim CLI.
Limitations
- Claude Code's native model picker shows NIM models only when launched via this proxy's terminal/CLI (requires
ANTHROPIC_CUSTOM_MODEL_OPTION) - Tool use works for models that support it (DeepSeek R1, Llama 3.x/4x, etc.) — unsupported models may produce malformed tool calls
- Streaming requires the model to support SSE; some older NIM models may not stream correctly
- Max tokens is capped at 16K for most models; some support up to 32K
- Images are sent as base64 data URIs — the model must support vision
- Rate limiting is handled by retry with backoff, but aggressive usage may hit NIM quotas
- Reasoning/thinking toggle only affects display; the model still generates reasoning tokens
Build
npm install
npm run compile # Compile TypeScript → out/
npm run test # Run 32 tests
npm run lint # ESLint check
npm run package:vsix # Package as .vsixContributing
Contributions are welcome. Please open an issue or submit a pull request at github.com/claude-server/claude-nim.
License
MIT — see LICENSE for details.
Author
Rithika Liyanage — github.com/k-rithik04
