cheap-claude
v1.1.0
Published
57% token cost reduction for Claude Code. Zero quality loss. Plugin + API proxy.
Maintainers
Readme
_____ _ _____ _ _
/ ____| | / ____| | | |
| | | |__ ___ __ _ _ __ | | | | __ _ _ _ __| | ___
| | | '_ \ / _ \/ _` | '_ \ | | | |/ _` | | | |/ _` |/ _ \
| |____| | | | __/ (_| | |_) | | |____| | (_| | |_| | (_| | __/
\_____|_| |_|\___|\__,_| .__/ \_____|_|\__,_|\__,_|\__,_|\___|
| |
|_| 70% cost reduction on API calls. 57% on Claude Code sessions. Zero quality loss.
Why This Exists
On April 4, 2026, Anthropic blocked Claude Pro and Max subscribers from using their flat-rate plans with third-party AI agent frameworks — starting with OpenClaw, expanding to all third-party harnesses this month.
The impact is brutal:
- 135,000+ OpenClaw instances were running when the announcement hit
- A $200/month Max subscription was covering $1,000-$5,000 in actual compute — third-party tools bypass Anthropic's Prompt Cache optimizations that official tools use to cut costs
- Users are now forced onto pay-as-you-go API billing, where a heavy day of agentic coding can cost $20-50+
- Anthropic is offering a one-time credit and up to 30% off pre-purchased bundles, but that doesn't close the 5x price gap
- Google made a parallel move against Gemini CLI users connecting third-party tools — this is an industry-wide reckoning with the economics of subsidizing agentic compute at flat rates
Cheap Claude exists to close that gap. If you're one of the 135K+ developers who just lost flat-rate pricing, this tool automatically applies the same Prompt Cache optimizations that Anthropic's official tools use — bringing your API costs back down to something sustainable.
Before & After
BEFORE AFTER
(no Cheap Claude) (with Cheap Claude)
API session (10 turns):
Cost: $0.42/session → $0.13/session (70% saved)
Cache: 0% → 90-98%
Claude Code session (12 turns):
Cost: $0.39/session → $0.17/session (57% saved)
Cache: 0% → 75-94%
┌──────────────────────────────────────────────────────────────┐
│ Monthly cost (20 devs, Opus 4.6, 5 sessions/day): │
│ │
│ API users: │
│ Before: ████████████████████████████████████████ $1,269 │
│ After: ████████████ $384 │
│ Saved: ░░░░░░░░░░░░░░░░░░░░░░░░░░ $885 │
│ │
│ Claude Code users: │
│ Before: ████████████████████████████████████████ $1,172 │
│ After: ██████████████████ $446 │
│ Saved: ░░░░░░░░░░░░░░░░░░░░░░ $726 │
└──────────────────────────────────────────────────────────────┘Install
Claude Code Plugin (recommended)
# Option A: npm (fastest)
npm install -g cheap-claude
ln -sf $(npm root -g)/cheap-claude/plugin ~/.claude/plugins/cheap-claude
# Option B: clone
git clone https://github.com/ajsai47/cheap-claude.git ~/cheap-claude
ln -sf ~/cheap-claude/plugin ~/.claude/plugins/cheap-claudeRestart Claude Code. The plugin activates automatically — you'll see /cheap-stats available and duplicate read warnings in your sessions.
API Proxy (for SDKs, scripts, custom apps)
# Clone + start
git clone https://github.com/ajsai47/cheap-claude.git ~/cheap-claude
cd ~/cheap-claude && npm install
npx tsx src/proxy/server.ts &
export ANTHROPIC_BASE_URL=http://localhost:8082
# Dashboard at http://localhost:8082/dashboardCommands
Once the plugin is installed, these are available in any Claude Code session:
| Command | What it does |
|---------|-------------|
| /cheap-stats | Show session costs, duplicate reads, saving tips |
| cheap_session_stats | MCP tool — compact cost summary (~50 tokens) |
| cheap_session_details | MCP tool — full breakdown with per-tool stats |
| cheap_dedup_report | MCP tool — list every duplicate file read this session |
| cheap_cost_tip | MCP tool — get personalized cost-saving suggestions |
| cheap_outline | MCP tool — structural file overview (4-8x cheaper than Read) |
| cheap_unfold | MCP tool — read just one function from a file |
| cheap_search | MCP tool — compact grep across directory (max 20 results) |
Example: /cheap-stats
> /cheap-stats
Cheap Claude — Session Stats
─────────────────────────────────────
Tool calls: 47
Estimated cost: $0.14
File reads: 12 (3 duplicates caught)
Tokens wasted: ~4,200 on re-reads
Tips:
• You re-read src/index.ts 3 times. Use your existing knowledge.
• 8 Bash calls — consider using Grep/Glob instead.
─────────────────────────────────────How It Works
Plugin (Claude Code users)
The plugin drops into ~/.claude/plugins/ and activates automatically:
┌─────────────────────────────────────────────────────────┐
│ Claude Code │
│ │
│ ┌──────────┐ CLAUDE.md injects terse + efficiency │
│ │ Cheap │ hints every session │
│ │ Claude │ │
│ │ Plugin │ PreToolUse hook warns before re-reading │
│ │ │ files you already have in context │
│ │ │ │
│ │ │ PostToolUse hook logs every tool call │
│ │ │ and tracks file read patterns │
│ │ │ │
│ │ │ MCP server provides /cheap-stats and │
│ │ │ cost tracking tools │
│ └──────────┘ │
└─────────────────────────────────────────────────────────┘| Component | File | What it does |
|-----------|------|-------------|
| Context injection | CLAUDE.md | Terse output rules, file-read efficiency hints |
| Read dedup warning | scripts/pre-tool-use.sh | Catches duplicate reads before they happen |
| Tool tracking | scripts/post-tool-use.sh | Logs all tool calls with token estimates |
| Session init | scripts/session-start.sh | Initializes per-session tracking |
| Cost tools | mcp/server.ts | cheap_session_stats, cheap_dedup_report, cheap_cost_tip |
| Stats command | skills/cheap-stats/ | /cheap-stats slash command |
Proxy (API consumers)
Six engines run on every /v1/messages request:
Request ─→ Model Router ─→ MCP Optimizer ─→ History Compressor
│
api.anthropic.com ←─ Cache Maximizer ←─ Output Enforcer ←─ Result Dedup
│
└─→ SSE stream back to client (cost tracked per turn)| # | Engine | What it does | Savings | |---|--------|-------------|---------| | 1 | Model Router | Routes simple turns ("yes", "run tests") to Haiku | ~5% on Opus | | 2 | MCP Optimizer | Defers unused tool schemas from turn 1 | 85% tool overhead | | 3 | History Compressor | Haiku summarizes old turns (keeps last 4 messages) | 50-70% on history | | 4 | Result Deduplicator | SHA256 dedup of identical file re-reads | 2-5 dupes/session | | 5 | Output Enforcer | Terse prompt injection (never caps max_tokens) | ~35% output | | 6 | Cache Maximizer | Adds cache_control breakpoints to prefix | 80-94% cache hits |
Dashboard
The proxy serves a live dashboard at http://localhost:8082/dashboard:
┌──────────────────────────────────────────────────┐
│ Cheap Claude proxy active │
│ │
│ Today Savings Setup │
│ ─────────────────────────────────────────────── │
│ │
│ Before: $5.82 / day │
│ Today: $2.51 / day ↓ 57% │
│ │
│ ████████████████████░░░ $3.31 saved today │
│ │
│ Cache Maximizer $1.42 42% │
│ MCP Optimizer $0.81 24% │
│ History Compressor $0.68 21% │
│ Model Router $0.22 7% │
│ Result Deduplicator $0.12 4% │
│ Output Enforcer $0.06 2% │
└──────────────────────────────────────────────────┘Pricing Context
Why this matters — the cost difference between cached and uncached:
Input Cache Read You Save
Opus 4.6: $5.00/MTok → $0.50/MTok 90%
Sonnet 4.6: $3.00/MTok → $0.30/MTok 90%
Haiku 4.5: $1.00/MTok → $0.10/MTok 90%The Cache Maximizer's job: make as much of every request cacheable as possible. On a typical work turn, 90% of input tokens are cached — you pay 10 cents instead of a dollar.
API Users — What to Expect
The proxy works for any app using the Anthropic SDK. Your savings depend on your system prompt size:
Your System Prompt Cache Hit Rate Savings
──────────────────────── ────────────── ───────
8K+ tokens (large app) 90-98% 70% ← verified on 10 turns
2-8K tokens (medium app) 80-95% 40-60%
Under 2K tokens 0% 0% ← too small to cache
Minimum for caching:
Haiku 4.5: 2,048 tokens (~8K chars)
Sonnet/Opus: 1,024 tokens (~4K chars)If your system prompt is under the minimum, the proxy can't help with caching. The Output Enforcer and Model Router still provide modest savings.
Verified: 10-turn API chatbot with 8K system prompt
Turn 1: 0% cache $0.011 (one-time cache write)
Turn 2: 98% cache $0.001 ← 87% cheaper immediately
Turn 3: 97% cache $0.002
...
Turn 10: 90% cache $0.002
Total: $0.026 (would have been $0.085 → 70% saved)Best for
- Multi-turn chatbots with stable system prompts → 70% savings
- AI agents with tools and growing history → 57% savings
- RAG apps with large static context + dynamic retrieval → 30-50% savings
- Batch processing with repeated prompts → 50-60% savings
Not helpful for
- One-shot API calls with tiny system prompts
- Apps that already implement
cache_controlmanually - Streaming-only apps where latency matters more than cost (proxy adds <5ms)
Quick test for your app
# Start the proxy
git clone https://github.com/ajsai47/cheap-claude.git && cd cheap-claude
npm install && npx tsx src/proxy/server.ts &
# Point your app at it
export ANTHROPIC_BASE_URL=http://localhost:8082
# Run your app normally, then check savings
curl http://localhost:8082/statsThe Honest Story
We ran 21+ iterations using an autoagent-style loop. Here's what actually happened:
Iteration Score What we tried
───────── ───── ─────────────────────────────────────────
Baseline 14.5% 5 engines, nothing tuned
1 56.9% ← THE BIG ONE: MCP Optimizer from turn 1
3 57.9% Tuned compression thresholds
8 70.1% Aggressive file truncation (reverted — loses context)
15 76.6% Keep only 1 recent message (reverted — loses context)
23 79.3% Cap max_tokens at 30% (reverted — breaks code gen)
Reset 57.3% Reverted everything that hurt quality
Final 62%* Added model routing (*projected on Opus)
API test 70% 10-turn chatbot with 8K system prompt (verified)The jump from 14% to 57% came from one insight: run the MCP Optimizer from turn 1 so the cached prefix is byte-identical across all turns. Everything else was tuning.
We hit 79% by cheating — truncating file content Claude needs, capping output length, stripping conversation context. It looked great on the eval. It would break real usage. We reverted all of it.
The 70% API number is real — tested on a 10-turn multi-turn chatbot with an 8K-token system prompt. Turns 2-10 all hit 90-98% cache.
57% is the honest number. 62% with model routing on Opus.
Project Structure
cheap-claude/
├── plugin/ # Claude Code plugin
│ ├── .claude-plugin/plugin.json
│ ├── .mcp.json # MCP server registration
│ ├── CLAUDE.md # Injected every session
│ ├── hooks/hooks.json # Lifecycle hooks
│ ├── mcp/server.ts # Cost tracking tools
│ ├── scripts/ # Hook implementations
│ └── skills/cheap-stats/ # /cheap-stats command
├── src/proxy/ # API proxy (6 engines)
│ ├── server.ts # HTTP proxy + dashboard
│ ├── pipeline.ts # Engine orchestrator
│ └── engines/ # The 6 engines
├── src/dashboard/index.html # Live cost dashboard
├── src/cli/ # init + stop
├── test/harness.ts # 12-turn benchmark
├── eval.sh # Automated eval
└── program.md # Autoagent configData
All data at ~/.claude-cheap/. Nothing in your repo. Nothing leaves your machine.
License
MIT
