ccprune
v4.3.1
Published
Prune early messages from Claude Code session transcripts with AI summarization
Maintainers
Readme
ccprune
Next time your Claude Code context is running low, just quit cc then run npx ccprune - it auto-resumes you back into your last thread, now compacted with an intelligent rolling summary. Run it again when you're low again - the summaries stack, so context just keeps rolling forward.
Fork of claude-prune with enhanced features: percentage-based pruning, AI summarization enabled by default, and improved UX.
Features
- Zero-Config Default: Just run
ccprune- auto-detects latest session, keeps 55K tokens (~70K after Claude Code adds system context) - Token-Based Pruning: Prunes based on actual token count, not message count
- Smart Threshold: Automatically skips pruning if session is under 55K tokens
- AI Summarization: Automatically generates a summary of pruned content (enabled by default)
- Summary Synthesis: Re-pruning synthesizes old summary + new pruned content into one cohesive summary
- Small Session Warning: Prompts for confirmation when auto-selecting sessions with < 5 messages
- Safe by Default: Always preserves session summaries and metadata
- Auto Backup: Creates timestamped backups before modifying files
- Restore Support: Easily restore from backups with the
restorecommand - Dry-Run Preview: Preview changes and summary before committing
Installation
Run directly (recommended)
# Using npx (Node.js)
npx ccprune
# Using bunx (Bun)
bunx ccpruneInstall globally
# Using npm
npm install -g ccprune
# Using bun
bun install -g ccpruneQuick Start
Quit Claude Code - Press
Ctrl+Cor type/quitRun prune from the same project directory:
npx ccprune
That's it! ccprune auto-detects your latest session, prunes old messages (keeping a summary), and resumes automatically.
Setup (Recommended)
For fast, high-quality summarization, set up a Gemini API key:
- Get a free key from Google AI Studio
- Add to your shell profile (~/.zshrc or ~/.bashrc):
export GEMINI_API_KEY=your_key - Restart your terminal or run
source ~/.zshrc
With GEMINI_API_KEY set, ccprune automatically uses Gemini 2.5 Flash for fast summarization without chunking.
Note: If GEMINI_API_KEY is not set, ccprune automatically falls back to Claude Code CLI for summarization (no additional setup required).
Usage
# Zero-config: auto-detects latest session, keeps 55K tokens
ccprune
# Pick from available sessions interactively
ccprune --pick
# Explicit session ID (if you need a specific session)
ccprune <sessionId>
# Explicit token limit
ccprune --keep 55000
ccprune --keep-tokens 80000
# Subcommands
ccprune restore <sessionId> [--dry-run]Arguments
sessionId: (Optional) UUID of the Claude Code session. Auto-detects latest if omitted.
Subcommands
| Subcommand | Description |
|------------|-------------|
| restore <sessionId> | Restore a session from the latest backup |
| restore <sessionId> --dry-run | Preview restore without making changes |
Options
| Option | Description |
|--------|-------------|
| --pick | Interactively select from available sessions |
| -n, --no-resume | Skip automatic session resume |
| --yolo | Resume with --dangerously-skip-permissions |
| --resume-model <model> | Model for resumed session (opus, sonnet, haiku, opusplan) |
| -k, --keep <number> | Number of tokens to retain (default: 55000) |
| --keep-tokens <number> | Number of tokens to retain (alias for -k) |
| --dry-run | Preview changes and summary without modifying files |
| --no-summary | Skip AI summarization of pruned messages |
| --summary-model <model> | Model for summarization (haiku, sonnet, or full name) |
| --summary-timeout <ms> | Timeout for summarization in milliseconds (default: 360000) |
| --gemini | Use Gemini 3 Pro for summarization |
| --gemini-flash | Use Gemini 2.5 Flash for summarization |
| --claude-code | Use Claude Code CLI for summarization (chunks large transcripts) |
| --prune-tools | Replace all non-protected tool outputs with placeholders |
| --prune-tools-ai | Use AI to identify which tool outputs to prune |
| --prune-tools-dedup | Deduplicate identical tool calls, keep only most recent |
| --prune-tools-max | Maximum savings: dedup + AI analysis combined |
| --prune-tools-keep <tools> | Comma-separated tools to never prune (default: Edit,Write,TodoWrite,TodoRead,AskUserQuestion) |
| -h, --help | Show help information |
| -V, --version | Show version number |
If no session ID is provided, auto-detects the most recently modified session. If no keep option is specified, defaults to 55,000 tokens (~70K actual context after Claude Code adds system prompt and CLAUDE.md).
Summarization priority:
--claude-codeflag: Force Claude Code CLI (chunks transcripts >30K chars)--geminior--gemini-flashflags: Use Gemini API- Auto-detect: If
GEMINI_API_KEYis set, uses Gemini 2.5 Flash - Fallback: Claude Code CLI (no API key needed)
Examples
# Simplest: auto-detect, prune, and resume automatically
npx ccprune
# Prune only (don't resume)
npx ccprune -n
# Resume in yolo mode (--dangerously-skip-permissions)
npx ccprune --yolo
# Resume with a specific model (e.g., Opus 4.5)
npx ccprune --resume-model opus
# Combine yolo mode with Opus
npx ccprune --yolo --resume-model opus
# Pick from available sessions interactively
npx ccprune --pick
# Keep 55K tokens (default)
npx ccprune --keep 55000
# Keep 80K tokens (less aggressive pruning)
npx ccprune --keep-tokens 80000
# Preview what would be pruned (shows summary preview too)
npx ccprune --dry-run
# Skip summarization for faster pruning
npx ccprune --no-summary
# Use Claude Code CLI with haiku model (faster/cheaper)
npx ccprune --claude-code --summary-model haiku
# Use Gemini 3 Pro for summarization
npx ccprune --gemini
# Use Gemini 2.5 Flash (default when GEMINI_API_KEY is set)
npx ccprune --gemini-flash
# Force Claude Code CLI for summarization
npx ccprune --claude-code
# Target a specific session by ID
npx ccprune 03953bb8-6855-4e53-a987-e11422a03fc6 --keep 55000
# Restore from the latest backup
npx ccprune restore 03953bb8-6855-4e53-a987-e11422a03fc6Tool Output Pruning (Default)
Tool pruning runs automatically to reduce tokens before summarization:
- Dedup: Identical tool calls are deduplicated (keeps only most recent)
- AI analysis: Intelligently prunes irrelevant outputs using your summarization backend
# Default behavior (dedup + AI) - runs automatically
ccprune
# Disable automatic tool pruning
ccprune --skip-tool-pruning
# Explicit modes for specific behavior:
ccprune --prune-tools # Simple: replace ALL outputs (no AI)
ccprune --prune-tools-dedup # Dedup only (no AI)
ccprune --prune-tools-ai # AI only (no dedup)
ccprune --prune-tools-max # Explicit dedup + AI (same as default)
# Custom protected tools
ccprune --prune-tools-keep "Edit,Write,Bash"Protected tools (never pruned by default):
Edit,Write- file modification contextTodoWrite,TodoRead- task trackingAskUserQuestion- user interaction
Modes explained:
- Default (no flags): Runs dedup first (free), then AI analysis - maximum savings
- Simple (
--prune-tools): Replaces all non-protected tool outputs with[Pruned: {tool} output - {bytes} bytes] - AI (
--prune-tools-ai): Uses your summarization backend (Gemini or Claude Code CLI) to intelligently identify which outputs are no longer relevant - Dedup (
--prune-tools-dedup): Keeps only the most recent output when the same tool is called with identical input. Annotates with[{total} total calls] - Skip (
--skip-tool-pruning): Disable automatic tool pruning entirely
How It Works
BEFORE AFTER FIRST PRUNE AFTER RE-PRUNE
────── ──────────────── ──────────────
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ msg 1 (old) │─┐ │ [SUMMARY] │─┐ │ [NEW SUMMARY] │ ◄─ synthesized
│ msg 2 (old) │ │ │ "Previously.."│ │ │ (old+middle) │
│ ... │ ├──► ├───────────────┤ │ ├───────────────┤
│ msg N (old) │─┘ │ msg N+1 (kept)│ ├───────► │ msg X (kept) │
├───────────────┤ │ msg N+2 (kept)│ │ │ msg Y (kept) │
│ msg N+1 (new) │─────►│ msg N+3 (kept)│─┘ │ msg Z (kept) │
│ msg N+2 (new) │ └───────────────┘ └───────────────┘
│ msg N+3 (new) │
└───────────────┘ ▲ ▲
│ │
old msgs become old summary + middle
summary, recent kept synthesized, recent kept- Locates Session File: Finds
$CLAUDE_CONFIG_DIR/projects/{project-path}/{sessionId}.jsonl - Counts Tokens: Uses Claude's cumulative usage data from the last message:
input_tokens + cache_read_input_tokens + cache_creation_input_tokens. This matches Claude Code's UI display exactly - Early Exit: If total tokens ≤ threshold (55K default), skips pruning and auto-resumes
- Preserves Critical Data: Always keeps the first line (file-history-snapshot or session metadata)
- Token-Based Cutoff: Scans right-to-left, accumulating tokens until adding the next message would exceed the threshold
- Content Extraction: Extracts text from messages, including
tool_resultoutputs andthinkingblocks. Tool calls become[Used tool: ToolName]placeholders to provide context without verbose tool I/O - Orphan Cleanup: Removes
tool_resultblocks in kept messages that referencetool_useblocks from pruned messages - AI Summarization: Generates a structured summary with sections: Overview, What Was Accomplished, Files Modified, Key Technical Details, Current State & Pending Work
- Summary Synthesis: Re-pruning synthesizes old summary + new pruned content into one cohesive summary
- Gemini (default with API key): Handles large transcripts natively without chunking
- Claude Code CLI (fallback): May chunk transcripts >30K characters (see Claude Code CLI Summarization below)
- Safe Backup: Creates timestamped backup in
prune-backup/before modifying - Auto-Resume: Optionally resumes Claude Code session after pruning
Claude Code CLI Summarization
When using the --claude-code flag (or when GEMINI_API_KEY is not set), ccprune uses the Claude Code CLI for summarization with these specific behaviors:
Chunking for Large Transcripts:
- Transcripts >30,000 characters are automatically split into chunks
- Each chunk is summarized independently
- Chunk summaries are then combined into a final unified summary
- Why: Ensures reliable summarization even for very long sessions
Model Selection:
- Default: Uses your Claude Code CLI default model
- Override with
--summary-model haikuor--summary-model sonnet - Supports full model names (e.g.,
claude-3-5-sonnet-20241022)
Timeout & Retries:
- Default timeout: 360 seconds (6 minutes)
- Override with
--summary-timeout <ms> - Automatic retries: Up to 2 attempts on failure
When to Use:
- No API key required (uses existing Claude Code subscription)
- Handles extremely large transcripts via chunking
- Works offline (if Claude Code CLI works offline)
Trade-offs:
- Slower than Gemini API (spawns subprocess)
- Chunking may lose some context coherence for very large sessions
- Requires Claude Code CLI to be installed and authenticated
File Structure
Claude Code stores sessions in:
~/.claude/projects/{project-path-with-hyphens}/{sessionId}.jsonlFor example, a project at /Users/alice/my-app becomes:
~/.claude/projects/-Users-alice-my-app/{sessionId}.jsonlEnvironment Variables
CLAUDE_CONFIG_DIR
By default, ccprune looks for session files in ~/.claude. If Claude Code is configured to use a different directory, you can specify it with the CLAUDE_CONFIG_DIR environment variable:
CLAUDE_CONFIG_DIR=/custom/path/to/claude ccpruneGEMINI_API_KEY
When set, ccprune automatically uses Gemini 2.5 Flash for summarization (recommended). Get your free API key from Google AI Studio.
export GEMINI_API_KEY=your_api_key_here
ccprune # automatically uses Gemini 2.5 FlashUse --gemini for Gemini 3 Pro, or --claude-code to force Claude Code CLI.
Migrating from claude-prune
If you were using the original claude-prune package, ccprune v3.x has these changes:
# claude-prune v1.x (message-count based, summary was opt-in)
claude-prune <id> -k 10 --summarize-pruned
# ccprune v2.x (percentage-based, summary enabled by default)
ccprune <id> # defaults to 20% of messages
ccprune <id> --keep-percent 25 # keep latest 25% of messages
# ccprune v3.x (token-based, summary enabled by default)
ccprune <id> # defaults to 55K tokens
ccprune <id> -k 55000 # keep 55K tokens
ccprune <id> --keep-tokens 80000 # keep 80K tokensKey changes in v3.x:
- Token-based pruning:
-know means tokens, not message count - Removed:
-p, --keep-percentflag (replaced by token-based approach) - Auto-skip: Sessions under 55K tokens are not pruned
- Lenient boundary: Includes one extra message at the boundary to preserve context
- Summary is enabled by default (use
--no-summaryto disable) - Re-pruning synthesizes old summary + new pruned content into one summary
Key changes in v4.x:
- Accurate token counting: Uses Claude's cumulative usage data (
input_tokens + cache_read + cache_creation) to match Claude Code UI - Proportional scaling: Per-message tokens are scaled to match total context for accurate pruning
--resume-model: Specify which model to use when auto-resuming (opus, sonnet, haiku, opusplan)- 55K default: Results in ~70K total context after Claude Code adds system prompt, CLAUDE.md, and other overhead
Development
# Clone and install
git clone https://github.com/nicobailon/claude-prune.git
cd claude-prune
bun install
# Run tests
bun run test
# Build
bun run build
# Test locally
./dist/index.js --helpCredits
This project is a fork of claude-prune by Danny Aziz. Thanks for the original implementation!
License
MIT
