@dev-mn/budgetai

v1.0.4

Published

22 days ago

Claude Code × NVIDIA NIM — Free AI Coding, Zero Compromise

0High
0Medium
0Low

turtuvshin

claude-code claude nvidia nim ai coding proxy free anthropic llm openai

⚡ budgetai

AI Cost Saver for Coding — Route Claude Code to NVIDIA NIM for free

🚀 Drop-in proxy that routes Claude Code CLI to NVIDIA NIM's free inference backend.
Get 40 req/min of production-grade AI code completion — no Anthropic subscription required.

Quick Start · Configuration · Models · Troubleshooting

💸 The Story Behind This

I've been deep in the AI coding tools rabbit hole for a while now.

Started with GitHub Copilot, moved to Cursor, tried Claude Code with max mode, bounced between a dozen tools chasing the "best" experience. At some point I stopped counting subscriptions and just... paid. Month after month.

Then the bill came. $5,000+ in API token costs. Not including subscriptions.

That was the moment I sat down and asked myself: do I actually need to be paying this much?

So I started researching. Digging through free tiers, open-weight models, inference providers. Turns out NVIDIA NIM offers surprisingly capable models completely free — with a real API, low latency, and no credit card required for the free tier.

budgetai is what came out of that research. It's a proxy that makes Claude Code CLI talk to NIM's backend instead of Anthropic's — so you get the same familiar UX, for free.

⚠️ Honest disclaimer: This is not a perfect solution. It's a work in progress. The models aren't Claude, the quality varies, and there are rate limits. I'm actively researching better approaches and will keep improving this. If you're hitting limits or have ideas, open an issue — let's figure it out together.

✨ Features

| Feature | Details | |---|---| | 🎯 Drop-in Replacement | Fully Anthropic API-compatible — Claude Code works as-is | | 🆓 Free Inference | NVIDIA NIM free tier: 40 req/min, no credit card needed | | 🔀 Per-Model Routing | Route Opus / Sonnet / Haiku to different NIM models | | ⚡ Trivial Request Optimization | 5 categories intercepted locally — zero latency, zero quota | | 🛡️ Smart Rate Limiting | Rolling window throttle + automatic 429 handling | | 🧠 Thinking Token Support | Parses <think> tags into native Anthropic-style blocks |

📦 Installation

# Recommended: install globally from npm
npm install -g @dev-mn/budgetai

# Verify
budgetai --version

git clone https://github.com/dev-mn/budgetai.git
cd budgetai
npm install && npm run build

🚀 Quick Start

1 — Get your NVIDIA API key

Head to build.nvidia.com/settings/api-keys and create a free key.

2 — Initialize config

budgetai init

This creates ~/.config/budgetai/.env. Open it and paste your key:

NVIDIA_NIM_API_KEY=nvapi-xxxxxxxxxxxxxxxxxxxxxxx

3 — Start the proxy

budgetai start

✅ Claude Code NIM Proxy running on port 8082
   Model   : nvidia_nim/z-ai/glm4.7
   Base URL : https://integrate.api.nvidia.com/v1

   export ANTHROPIC_BASE_URL=http://localhost:8082
   Then run: claude

4 — Run Claude Code

Open a second terminal:

export ANTHROPIC_BASE_URL=http://localhost:8082
claude

Windows (PowerShell)

$env:ANTHROPIC_BASE_URL="http://localhost:8082"
claude

VSCode Extension

Open Settings → search claude-code.environmentVariables
Click Edit in settings.json and add:

"claudeCode.environmentVariables": [
  { "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" }
]

Reload extensions — done! ✅

⚙️ Configuration

All config lives in ~/.config/budgetai/.env:

| Variable | Default | Description | |---|---|---| | NVIDIA_NIM_API_KEY | required | Your key from build.nvidia.com | | NIM_MODEL | nvidia_nim/z-ai/glm4.7 | Default model | | NIM_BASE_URL | https://integrate.api.nvidia.com/v1 | NIM endpoint | | PORT | 8082 | Local proxy port | | RATE_LIMIT | 40 | Requests per minute | | RATE_LIMIT_WINDOW | 60 | Window in seconds | | ENABLE_THINKING | true | Parse thinking tokens |

Per-Model Routing

Map each Claude tier to a different NIM model:

MODEL_OPUS="nvidia_nim/minimaxai/minimax-m2.5"
MODEL_SONNET="nvidia_nim/qwen/qwen3.5-397b-a17b"
MODEL_HAIKU="nvidia_nim/z-ai/glm4.7"

🧩 NVIDIA NIM Models

| Model | Tag | Notes | |---|---|---| | GLM 4.7 | nvidia_nim/z-ai/glm4.7 | ⚡ Default — fast & reliable | | MiniMax M2.5 | nvidia_nim/minimaxai/minimax-m2.5 | 🏆 High quality | | Qwen 3.5 397B | nvidia_nim/qwen/qwen3.5-397b-a17b | 🔥 Largest model | | Kimi K2.5 | nvidia_nim/moonshotai/kimi-k2.5 | 🌙 Great for long context | | Step 3.5 Flash | nvidia_nim/stepfun-ai/step-3.5-flash | ⚡ Fastest alternative |

Browse the full catalog → build.nvidia.com/explore/discover

🖥️ CLI Reference

budgetai init      # Initialize config file
budgetai start     # Start the proxy server
budgetai config    # Show current configuration
budgetai --help    # Show help
budgetai --version # Show version

🔌 API Endpoints

| Method | Path | Description | |---|---|---| | POST | /v1/messages | Main streaming endpoint | | POST | /v1/messages/count_tokens | Token counting | | GET | /v1/models | List available models | | GET | /health | Health check | | GET | / | Proxy info | | POST | /stop | Stop server |

🛠️ Troubleshooting

budgetai init
nano ~/.config/budgetai/.env
# Add: NVIDIA_NIM_API_KEY=your-key-here

NVIDIA NIM free tier allows 40 req/min. The proxy will automatically retry after the window resets (60s). You can also switch to a different NIM model or API key.

Make sure the env variable is actually set in the terminal where you run claude:

echo $ANTHROPIC_BASE_URL
# Should print: http://localhost:8082

If it's empty, re-run export ANTHROPIC_BASE_URL=http://localhost:8082 before launching Claude Code.

👩‍💻 Development

npm run dev        # Dev mode with hot reload
npm run build      # Production build
npm run typecheck  # Type checking
npm run lint       # Lint

🤝 Contributing

PRs and issues are welcome! Please open an issue first for major changes.