@dev-mn/budgetai
v1.0.4
Published
Claude Code × NVIDIA NIM — Free AI Coding, Zero Compromise
Maintainers
Readme
⚡ budgetai
AI Cost Saver for Coding — Route Claude Code to NVIDIA NIM for free
🚀 Drop-in proxy that routes Claude Code CLI to NVIDIA NIM's free inference backend.
Get 40 req/min of production-grade AI code completion — no Anthropic subscription required.
Quick Start · Configuration · Models · Troubleshooting
💸 The Story Behind This
I've been deep in the AI coding tools rabbit hole for a while now.
Started with GitHub Copilot, moved to Cursor, tried Claude Code with max mode, bounced between a dozen tools chasing the "best" experience. At some point I stopped counting subscriptions and just... paid. Month after month.
Then the bill came. $5,000+ in API token costs. Not including subscriptions.
That was the moment I sat down and asked myself: do I actually need to be paying this much?
So I started researching. Digging through free tiers, open-weight models, inference providers. Turns out NVIDIA NIM offers surprisingly capable models completely free — with a real API, low latency, and no credit card required for the free tier.
budgetai is what came out of that research. It's a proxy that makes Claude Code CLI talk to NIM's backend instead of Anthropic's — so you get the same familiar UX, for free.
⚠️ Honest disclaimer: This is not a perfect solution. It's a work in progress. The models aren't Claude, the quality varies, and there are rate limits. I'm actively researching better approaches and will keep improving this. If you're hitting limits or have ideas, open an issue — let's figure it out together.
✨ Features
| Feature | Details |
|---|---|
| 🎯 Drop-in Replacement | Fully Anthropic API-compatible — Claude Code works as-is |
| 🆓 Free Inference | NVIDIA NIM free tier: 40 req/min, no credit card needed |
| 🔀 Per-Model Routing | Route Opus / Sonnet / Haiku to different NIM models |
| ⚡ Trivial Request Optimization | 5 categories intercepted locally — zero latency, zero quota |
| 🛡️ Smart Rate Limiting | Rolling window throttle + automatic 429 handling |
| 🧠 Thinking Token Support | Parses <think> tags into native Anthropic-style blocks |
📦 Installation
# Recommended: install globally from npm
npm install -g @dev-mn/budgetai
# Verify
budgetai --versiongit clone https://github.com/dev-mn/budgetai.git
cd budgetai
npm install && npm run build🚀 Quick Start
1 — Get your NVIDIA API key
Head to build.nvidia.com/settings/api-keys and create a free key.
2 — Initialize config
budgetai initThis creates ~/.config/budgetai/.env. Open it and paste your key:
NVIDIA_NIM_API_KEY=nvapi-xxxxxxxxxxxxxxxxxxxxxxx3 — Start the proxy
budgetai start✅ Claude Code NIM Proxy running on port 8082
Model : nvidia_nim/z-ai/glm4.7
Base URL : https://integrate.api.nvidia.com/v1
export ANTHROPIC_BASE_URL=http://localhost:8082
Then run: claude4 — Run Claude Code
Open a second terminal:
export ANTHROPIC_BASE_URL=http://localhost:8082
claudeWindows (PowerShell)
$env:ANTHROPIC_BASE_URL="http://localhost:8082" claude
VSCode Extension
- Open Settings → search
claude-code.environmentVariables - Click Edit in settings.json and add:
"claudeCode.environmentVariables": [
{ "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" }
]- Reload extensions — done! ✅
⚙️ Configuration
All config lives in ~/.config/budgetai/.env:
| Variable | Default | Description |
|---|---|---|
| NVIDIA_NIM_API_KEY | required | Your key from build.nvidia.com |
| NIM_MODEL | nvidia_nim/z-ai/glm4.7 | Default model |
| NIM_BASE_URL | https://integrate.api.nvidia.com/v1 | NIM endpoint |
| PORT | 8082 | Local proxy port |
| RATE_LIMIT | 40 | Requests per minute |
| RATE_LIMIT_WINDOW | 60 | Window in seconds |
| ENABLE_THINKING | true | Parse thinking tokens |
Per-Model Routing
Map each Claude tier to a different NIM model:
MODEL_OPUS="nvidia_nim/minimaxai/minimax-m2.5"
MODEL_SONNET="nvidia_nim/qwen/qwen3.5-397b-a17b"
MODEL_HAIKU="nvidia_nim/z-ai/glm4.7"🧩 NVIDIA NIM Models
| Model | Tag | Notes |
|---|---|---|
| GLM 4.7 | nvidia_nim/z-ai/glm4.7 | ⚡ Default — fast & reliable |
| MiniMax M2.5 | nvidia_nim/minimaxai/minimax-m2.5 | 🏆 High quality |
| Qwen 3.5 397B | nvidia_nim/qwen/qwen3.5-397b-a17b | 🔥 Largest model |
| Kimi K2.5 | nvidia_nim/moonshotai/kimi-k2.5 | 🌙 Great for long context |
| Step 3.5 Flash | nvidia_nim/stepfun-ai/step-3.5-flash | ⚡ Fastest alternative |
Browse the full catalog → build.nvidia.com/explore/discover
🖥️ CLI Reference
budgetai init # Initialize config file
budgetai start # Start the proxy server
budgetai config # Show current configuration
budgetai --help # Show help
budgetai --version # Show version🔌 API Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /v1/messages | Main streaming endpoint |
| POST | /v1/messages/count_tokens | Token counting |
| GET | /v1/models | List available models |
| GET | /health | Health check |
| GET | / | Proxy info |
| POST | /stop | Stop server |
🛠️ Troubleshooting
budgetai init
nano ~/.config/budgetai/.env
# Add: NVIDIA_NIM_API_KEY=your-key-hereNVIDIA NIM free tier allows 40 req/min. The proxy will automatically retry after the window resets (60s). You can also switch to a different NIM model or API key.
Make sure the env variable is actually set in the terminal where you run claude:
echo $ANTHROPIC_BASE_URL
# Should print: http://localhost:8082If it's empty, re-run export ANTHROPIC_BASE_URL=http://localhost:8082 before launching Claude Code.
👩💻 Development
npm run dev # Dev mode with hot reload
npm run build # Production build
npm run typecheck # Type checking
npm run lint # Lint🤝 Contributing
PRs and issues are welcome! Please open an issue first for major changes.
Made with ❤️ · MIT License · Powered by NVIDIA NIM 🟢
