claude-code-ollama

v0.1.0

Published

20 days ago

A Claude Code skill that routes inference to a locally running Ollama model or Ollama Cloud model.

0High
0Medium
0Low

vish000

agent-skill ollama local-model claude-code llm self-hosted

claude-code-ollama

Run Claude Code with any open-source model via Ollama. One script. Zero dependencies.

Quick Start

bash scripts/setup.sh glm-4.7-flash          # local
bash scripts/setup.sh deepseek-v4-pro:cloud   # cloud
bash scripts/setup.sh glm-4.7-flash --launch  # auto-open new terminal

What It Does

1. Detect OS (macOS / Linux / WSL)
2. Install Ollama if missing
3. Upgrade if below v0.14.0
4. Warn if your RAM can't handle the model
5. Pull the model if not present
6. Start Ollama server if not running
7. Print launch command (or --launch to auto-open a new terminal)

How It Works

Claude Code is an agent loop. The model is swappable. This script sets three env vars that point Claude Code at Ollama instead of Anthropic's API:

Claude Code CLI → localhost:11434 (Ollama) → Model (local or cloud)

Model Recommendations

Local (runs on your hardware)

| RAM | Model | Why | |--------|------------------------|------------------------------------| | 8GB | qwen3.5:7b | Fits tight | | 16GB | qwen2.5-coder:14b | Best coding bang per GB | | 32GB | glm-4.7-flash | 128K context + native tool calling | | 64GB+ | qwen3-coder:30b-a3b | MoE — 30B total, 3B active |

Cloud (runs on Ollama's servers, `:cloud` suffix)

| Model | Strength | |--------------------------|-----------------------------------| | deepseek-v4-pro:cloud | Top coding benchmarks, 1M context | | kimi-k2.5:cloud | Long-horizon agentic work | | glm-5:cloud | Best open-weight on SWE-Bench Pro |

Terminal Support (--launch)

Auto-detects and opens a new tab/window:

macOS: iTerm2 → WezTerm → Terminal.app
Linux: gnome-terminal → xterm → konsole
WSL: Windows Terminal → cmd.exe

Falls back to printing the command if detection fails.

Troubleshooting

Invalid discriminator value error — You're probably pointing Claude Code at an OpenAI-compatible API (like LM Studio) instead of Ollama. Ollama v0.14.0+ has native Anthropic API support. Use that.

Model runs slow — Try a quantized variant: ollama pull <model>:q4_K_M. Or switch to a :cloud model.

Out of memory — The script warns you before pulling. If you ignored it... try :cloud.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

claude-code-ollama

Quick Start

What It Does

How It Works

Model Recommendations

Local (runs on your hardware)

Cloud (runs on Ollama's servers, :cloud suffix)

Terminal Support (--launch)

Troubleshooting

License

Cloud (runs on Ollama's servers, `:cloud` suffix)