loclaude
v0.0.5
Published
Claude Code with local Ollama LLMs - Zero API costs, no rate limits, complete privacy
Maintainers
Readme
loclaude
Read the docs
Claude Code with Local LLMs
Stop burning through Claude API usage limits. Run Claude Code's powerful agentic workflow with local Ollama models on your own hardware.
Requires ollama v0.14.2 or higher
Zero API costs. No rate limits. Complete privacy.
Quick Start • Why loclaude? • Installation • FAQ
Why loclaude?
Real Value
- No Rate Limits: Use Claude Code as much as you want
- Privacy: Your code never leaves your machine
- Cost Control: Use your own hardware, pay for electricity not tokens
- Offline Capable: Work without internet (after model download)
- GPU or CPU: Works with NVIDIA GPUs or CPU-only systems
What to Expect
loclaude provides:
- One-command setup for Ollama + Open WebUI containers
- Smart model management with auto-loading
- GPU auto-detection with CPU fallback
- Project scaffolding with Docker configs
Installation
# With npm (requires Node.js 18+)
npm install -g loclaude
# With bun (faster, recommended)
bun install -g loclaude # use bun-loclaude for commandsvs. Other Solutions
| Solution | Cost | Speed | Privacy | Limits | |----------|------|-------|---------|--------| | loclaude | Free after setup | Fast (GPU) | 100% local | None | | Claude API/Web | $20-200+/month | Fast | Cloud-based | Rate limited | | GitHub Copilot | $10-20/month | Fast | Cloud-based | Context limited | | Cursor/Codeium | $20+/month | Fast | Cloud-based | Usage limits |
loclaude gives you the utility of Ollama with the convenience of a managed solution for claude code integration.
Quick Start (5 Minutes)
# 1. Install loclaude
npm install -g loclaude
# 2. Install Claude Code (if you haven't already)
npm install -g @anthropic-ai/claude-code
# 3. Setup your project (auto-detects GPU)
loclaude init
# 4. Start Ollama container
loclaude docker-up
# 5. Pull a model (choose based on your hardware)
loclaude models-pull qwen3-coder:30b # GPU with 16GB+ VRAM
# OR
loclaude models-pull qwen2.5-coder:7b # CPU or limited VRAM
# 6. Run Claude Code with unlimited local LLM
loclaude runThat's it! You now have unlimited Claude Code sessions with local models.
Prerequisites
Required:
- Docker with Docker Compose v2
- Claude Code CLI (
npm install -g @anthropic-ai/claude-code)
Optional (for GPU acceleration):
- NVIDIA GPU with 16GB+ VRAM (RTX 3090, 4090, A5000, etc.)
- NVIDIA Container Toolkit
CPU-only systems work fine! Use --no-gpu flag during init and smaller models.
Check your setup:
loclaude doctorFeatures
Automatic Model Loading
When you run loclaude run, it automatically:
- Checks if your selected model is loaded in Ollama
- If not loaded, warms up the model with a 10-minute keep-alive (Configurable through env vars)
- Shows
[loaded]indicator in model selection for running models
GPU Auto-Detection
loclaude init automatically detects NVIDIA GPUs and configures the appropriate Docker setup:
- GPU detected: Uses
runtime: nvidiaand CUDA-enabled images - No GPU: Uses CPU-only configuration with smaller default models
Commands
Running Claude Code
loclaude run # Interactive model selection
loclaude run -m qwen3-coder:30b # Use specific model
loclaude run -- --help # Pass args to claudeProject Setup
loclaude init # Auto-detect GPU, scaffold project
loclaude init --gpu # Force GPU mode
loclaude init --no-gpu # Force CPU-only mode
loclaude init --force # Overwrite existing files
loclaude init --no-webui # Skip Open WebUI in compose fileDocker Management
loclaude docker-up # Start containers (detached)
loclaude docker-up --no-detach # Start in foreground
loclaude docker-down # Stop containers
loclaude docker-status # Show container status
loclaude docker-logs # Show logs
loclaude docker-logs --follow # Follow logs
loclaude docker-restart # Restart containersModel Management
loclaude models # List installed models
loclaude models-pull <name> # Pull a model
loclaude models-rm <name> # Remove a model
loclaude models-show <name> # Show model details
loclaude models-run <name> # Run model interactively (ollama CLI)Diagnostics
loclaude doctor # Check prerequisites
loclaude config # Show current configuration
loclaude config-paths # Show config file search pathsRecommended Models
For GPU (16GB+ VRAM) - Best Experience
| Model | Size | Speed | Quality | Best For |
|-------|------|-------|---------|----------|
| qwen3-coder:30b | ~17 GB | ~50-100 tok/s | Excellent | Most coding tasks, refactoring, debugging |
| deepseek-coder:33b | ~18 GB | ~40-80 tok/s | Excellent | Code understanding, complex logic |
Recommendation: Start with qwen3-coder:30b for the best balance of speed and quality.
For CPU or Limited VRAM (<16GB) - Still Productive
| Model | Size | Speed | Quality | Best For |
|-------|------|-------|---------|----------|
| qwen2.5-coder:7b | ~4 GB | ~10-20 tok/s | Good | Code completion, simple refactoring |
| deepseek-coder:6.7b | ~4 GB | ~10-20 tok/s | Good | Understanding existing code |
| llama3.2:3b | ~2 GB | ~15-30 tok/s | Fair | Quick edits, file operations |
Configuration
loclaude supports configuration via files and environment variables.
Config Files
Config files are loaded in priority order:
./.loclaude/config.json(project-local)~/.config/loclaude/config.json(user global)
Example config:
{
"ollama": {
"url": "http://localhost:11434",
"defaultModel": "qwen3-coder:30b"
},
"docker": {
"composeFile": "./docker-compose.yml",
"gpu": true
},
"claude": {
"extraArgs": ["--verbose"]
}
}Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| OLLAMA_URL | Ollama API endpoint | http://localhost:11434 |
| OLLAMA_MODEL | Default model name | qwen3-coder:30b |
| LOCLAUDE_COMPOSE_FILE | Path to docker-compose.yml | ./docker-compose.yml |
| LOCLAUDE_GPU | Enable GPU (true/false) | true |
Priority
Configuration is merged in this order (highest priority first):
- CLI arguments
- Environment variables
- Project config (
./.loclaude/config.json) - User config (
~/.config/loclaude/config.json) - Default values
Service URLs
When containers are running:
| Service | URL | Description | |---------|-----|-------------| | Ollama API | http://localhost:11434 | LLM inference API | | Open WebUI | http://localhost:3000 | Chat interface |
Project Structure
After running loclaude init:
.
├── .claude/
│ └── CLAUDE.md # Claude Code instructions
├── .loclaude/
│ └── config.json # Loclaude configuration
├── models/ # Ollama model storage (gitignored)
├── docker-compose.yml # Container definitions (GPU or CPU mode)
├── mise.toml # Task runner configuration
└── README.mdUsing with mise
The init command creates a mise.toml with convenient task aliases:
mise run up # loclaude docker-up
mise run down # loclaude docker-down
mise run claude # loclaude run
mise run pull <model> # loclaude models-pull <model>
mise run doctor # loclaude doctorFAQ
Is this really unlimited?
Yes! Once you have models downloaded, you can run as many sessions as you want with zero additional cost.
How does the quality compare to Claude API?
30B parameter models (qwen3-coder:30b) are comparable to GPT-3.5 and work okay for most coding tasks. Larger models have a bit more success. Claude API is still better, but this allows for continuing work when you have hit that pesky usage limit.
Do I need a GPU?
No, but highly recommended. CPU-only mode works with smaller models at ~10-20 tokens/sec. A GPU (16GB+ VRAM) gives you 50-100 tokens/sec with larger, better models.
Can I use this with the Claude API too?
Absolutely! Keep using Claude API for critical tasks, use loclaude for everything else to save money and avoid limits.
Troubleshooting
Check System Requirements
loclaude doctorThis verifies:
- Docker and Docker Compose installation
- NVIDIA GPU detection (optional)
- NVIDIA Container Toolkit (optional)
- Claude Code CLI
- Ollama API connectivity
Container Issues
# View logs
loclaude docker-logs --follow
# Restart containers
loclaude docker-restart
# Full reset
loclaude docker-down && loclaude docker-upConnection Issues
If Claude Code can't connect to Ollama:
- Verify Ollama is running:
loclaude docker-status - Check the API:
curl http://localhost:11434/api/tags - Verify your config:
loclaude config
GPU Not Detected
If you have a GPU but it's not detected:
- Check NVIDIA drivers:
nvidia-smi - Test Docker GPU access:
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi - Install NVIDIA Container Toolkit if missing
- Re-run
loclaude init --gputo force GPU mode
Running on CPU
If inference is slow on CPU:
- Use smaller, quantized models:
qwen2.5-coder:7b,llama3.2:3b - Expect ~10-20 tokens/sec on modern CPUs
- Consider cloud models via Ollama:
glm-4.7:cloud
Getting Help
- Issues/Bugs: GitHub Issues
- Questions: GitHub Discussions
- Documentation: Run
loclaude --helpor check this README - System Check: Run
loclaude doctorto diagnose problems
Development
Building from Source
git clone https://github.com/nicholasgalante1997/loclaude.git loclaude
cd loclaude
bun install
bun run buildRunning Locally
# With bun (direct)
bun bin/index.ts --help
# With node (built)
node bin/index.mjs --helpTesting
# Test both runtimes
bun bin/index.ts doctor
node bin/index.mjs doctorLicense
MIT
