alt-llm-planner
v0.3.0
Published
Offload architectural planning Q&A to an alternate LLM (Gemini today) in a side terminal. MCP tool for Claude Code and Cursor. Returns a dense markdown blueprint to Claude.
Maintainers
Readme
alt-llm-planner
Turn a multi-hour architectural planning session with Claude into a one-shot tool call.
An MCP tool for Claude Code and Cursor that offloads the "ask clarifying questions, iterate, refine the design" loop to Gemini in a side terminal. When you type !finish, a dense markdown blueprint returns to Claude — ready to implement.
Your main Claude conversation stays clean. Your token bill stays low. Your context window stays open for the part that actually matters: shipping the code.
Claude Code MCP tool call Gemini terminal
─────────── ──────────── ───────────────
"plan X" ─────────► interactive_plan() ─────────► ◇ Q1: scope?
◇ Q2: storage?
◇ Q3: failure mode?
▶ !finish
◄───────── dense md blueprint ◄─────────
resumes implementation with the plan in contextThe problem
Architectural planning with Claude is expensive:
- A thorough design discussion is 10-20 turns of Q&A
- Each turn re-hydrates project context — 5-20K tokens per round trip
- Your main conversation window fills up before implementation even starts
- Running planning iterations on Opus adds up fast
You want Opus focused on writing production code, not burning tokens asking "should this be a queue or a pub/sub?"
The fix
interactive_plan delegates the planning loop to Gemini, entirely outside your main Claude context.
- Claude pauses and calls the MCP tool
- A separate terminal opens in your IDE with a Gemini chat session
- Gemini asks one clarifying question at a time, iterating with you until the design is clear
- You type
!finish - Only the final blueprint (a dense markdown spec) returns to Claude as tool output
Claude gets a complete, structured plan in ~2-5K tokens — then gets to work.
What you save
Typical 15-turn planning session, ~5K tokens of context per turn.
| | Planning inside main Claude | With interactive_plan |
| ------------------------ | --------------------------- | ------------------------------ |
| Tokens into main context | ~75K | ~5K (blueprint only) |
| Main-window consumed | ~40% of Opus's 200K | ~3% |
| Rough cost on Opus 4.7 | ~$1.50-$2.50 | ~$0.10-$0.15 |
| Rough cost on free Gemma | — | $0 (stays within free tier) |
Illustrative. Actual savings scale with turn count and context size. Disclaimer on pricing in the matrix below.
Install
# 1. Wire the MCP server into Claude Code (user scope = works in every project)
claude mcp add alt-llm-planner -s user -- npx -y alt-llm-planner
# 2. Configure: API key → tier → default model
npx alt-llm-planner setup
# 3. Install the Cursor / VS Code companion extension
npx alt-llm-planner installThen reload your IDE window.
Get a Gemini API key at aistudio.google.com/apikey. The free tier is generous and works great for planning.
Usage
In Claude Code:
Plan out a rate limiter middleware for this service using Redis.
Claude invokes interactive_plan. A Gemini terminal opens in your IDE. Chat through the design. When you have what you need:
> !finishThe blueprint returns to Claude. Keep going — now with a plan in context and your main window intact.
Override the model per call
Plan the auth redesign using
gemini-2.5-pro— this one needs deep reasoning.
Cancel a session
> !cancelModel comparison
Gemini — use these for planning
| Model | Tier | Context | Best for | Input / Output (per 1M) |
| ------------------------ | ---- | ------- | ------------------------------------- | ----------------------- |
| gemma-3-27b-it | Free | 128K | Default. Fast, capable, zero cost | Free |
| gemma-3-12b-it | Free | 128K | Faster, lighter questions | Free |
| gemini-2.5-flash-lite | Paid | 1M | Cheapest paid option, long codebases | ~$0.05 / ~$0.20 |
| gemini-2.5-flash | Paid | 1M | Balanced speed / quality | ~$0.15 / ~$0.60 |
| gemini-2.5-pro | Paid | 2M | Deepest reasoning, massive context | ~$1.25 / ~$10 |
| gemini-3-*-preview | Paid | varies | Preview models (setup shows live list)| preview pricing |
Claude — what you're already using in Claude Code
| Model | Context | Best for | Input / Output (per 1M) | | ----------------- | --------- | ----------------------------- | ----------------------- | | Claude Haiku 4.5 | 200K | Fast, cheap, batch work | ~$0.80 / ~$4 | | Claude Sonnet 4.6 | 200K | Balanced default | ~$3 / ~$15 | | Claude Opus 4.7 | 200K | Deep reasoning, complex code | ~$15 / ~$75 |
Pricing is approximate as of early 2026. Always check the provider's current rates: Google AI pricing, Anthropic pricing.
Why this split is worth it
| Axis | Gemini (for planning) | Claude (for coding) | | ---------------- | -------------------------------------- | ---------------------------------- | | Context window | Up to 2M (Pro) | 200K | | Free tier | Yes (Gemma, Flash Lite quota) | No | | Cheapest option | $0 | ~$0.80 / ~$4 per 1M | | Planning Q&A | Covered by free tier | Burns Opus tokens fast | | Writing code | Solid — but not Claude | Best-in-class |
You end up with Gemini burning its free-tier budget on the exploratory part and Claude's paid tokens focused on writing production code. Different tool for each job.
Commands
alt-llm-planner serve # MCP stdio server (default — what Claude Code runs)
alt-llm-planner setup # configure key, tier, model
alt-llm-planner install # install Cursor / VS Code extension
alt-llm-planner status # show config + install state
alt-llm-planner helpConfig
Stored at ~/.alt-llm-planner/config.json (mode 0600).
Env var overrides (useful in CI or multi-user setups):
GEMINI_API_KEY— API keyGEMINI_MODEL— default model namePLANNER_TIMEOUT_MS— session timeout in ms (default 30 minutes)
How it works
- Claude Code calls the
interactive_planMCP tool over stdio. - The server writes an atomic session file to
os.tmpdir()and waits. - The IDE extension watches
tmpdir, sees the session file, and launches a new terminal running the Gemini chat companion. - You interact with Gemini.
!finishwrites an atomic result file containing the blueprint;!cancelwrites an error result. - The server reads the result and returns it to Claude as tool output.
All IPC uses write-then-rename so readers never see partial files. Secrets live only in the user's own tmp dir with 0600 permissions. Stale session files (>1h) are swept on startup.
Requirements
- Node.js 20+
- Cursor or VS Code (for the companion terminal)
- Claude Code (or any MCP-compatible client)
- Gemini API key — get one free
Troubleshooting
| Symptom | Fix |
| --- | --- |
| interactive_plan not visible to Claude | Reload IDE window. Confirm with claude mcp list. |
| No terminal opens on tool call | Run npx alt-llm-planner install, then reload IDE. |
| GEMINI_API_KEY not set | Re-run npx alt-llm-planner setup, or export the env var. |
| Session hangs past 30 min | Bump PLANNER_TIMEOUT_MS. Session files in os.tmpdir() auto-sweep after 1h. |
| Want to change default model | npx alt-llm-planner setup re-prompts the model picker. |
Security
GEMINI_API_KEYis stored at~/.alt-llm-planner/config.jsonwith mode0600.- Session IPC files live only in the user's
os.tmpdir()and are deleted after the tool returns. - No telemetry. No network calls outside the Gemini API.
- Never commit a
.envcontaining a real key —.gitignoreships configured.
Contributing
Issues and PRs welcome. For substantial changes, open an issue first to discuss scope.
License
MIT © Vikrant Indi
