npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

token-optimizer-mcp

v2.17.0

Published

Intelligent context window optimization for Claude Code - store content externally via caching and compression, freeing up your context window for what matters

Downloads

73

Readme

Token Optimizer MCP

Intelligent token optimization through caching and compression for Claude Code and Claude Desktop

Overview

Token Optimizer MCP is a Model Context Protocol (MCP) server that reduces context window usage through intelligent caching and compression. By storing compressed content externally in SQLite, the server removes tokens from your context window while keeping them accessible. The server provides tools to compress text, cache results, and analyze token usage - helping you maximize your available context window.

Key Features

  • Context Window Optimization: Store content externally to free up context window space
  • High Compression: Brotli compression (2-4x typical, up to 82x for repetitive content)
  • Persistent Caching: SQLite-based cache that persists across sessions
  • Accurate Token Counting: Uses tiktoken for precise token measurements
  • Smart Analysis: Analyze text to determine optimal caching strategy
  • Zero External Dependencies: Completely offline operation
  • Production Ready: Built with TypeScript for reliability
  • Multi-Tool Support: Auto-detects and configures ALL installed AI tools

Supported AI Tools

The automated installer detects and configures token-optimizer-mcp for:

  • Claude Code - CLI with global hooks integration
  • Claude Desktop - Native desktop application
  • Cursor IDE - AI-first code editor
  • Cline - VS Code extension (formerly Claude Dev)
  • GitHub Copilot - VS Code with MCP support
  • Windsurf IDE - AI-powered development environment

No manual configuration needed - the installer automatically detects which tools you have installed and configures them all!

Implemented Features

Core Modules

  1. Cache Engine - SQLite-based persistent storage with automatic cleanup
  2. Token Counter - Accurate token counting using tiktoken (GPT-4 tokenizer)
  3. Compression Engine - Brotli compression for optimal token efficiency

Available MCP Tools

  1. optimize_text - Compress and cache text to reduce token usage
  2. get_cached - Retrieve previously cached and optimized text
  3. count_tokens - Count tokens in text using tiktoken
  4. compress_text - Compress text using Brotli compression
  5. decompress_text - Decompress base64-encoded Brotli-compressed text
  6. get_cache_stats - Get cache statistics including hit rate and compression ratio
  7. clear_cache - Clear all cached data
  8. analyze_optimization - Analyze text and get optimization recommendations

Technology Stack

  • Runtime: Node.js 20+
  • Language: TypeScript
  • Database: SQLite (better-sqlite3)
  • Token Counting: tiktoken
  • Compression: Brotli (built-in)
  • Caching: LRU Cache
  • Protocol: MCP SDK (@modelcontextprotocol/sdk)

Installation

Quick Install (Recommended)

Install the MCP server and global hooks in one command:

Windows

# Download and run the automated installer
irm https://raw.githubusercontent.com/ooples/token-optimizer-mcp/main/install-hooks.ps1 | iex

macOS / Linux

# Download and run the automated installer
curl -fsSL https://raw.githubusercontent.com/ooples/token-optimizer-mcp/main/install-hooks.sh | bash

This will:

  1. ✅ Install token-optimizer-mcp globally via npm
  2. ✅ Download and configure global Claude Code hooks
  3. Auto-detect and configure ALL installed AI tools:
    • Claude Desktop
    • Cursor IDE
    • Cline (VS Code extension)
    • GitHub Copilot (VS Code)
    • Windsurf IDE
  4. ✅ Set up automatic token optimization on every tool call
  5. ✅ Configure workspace trust and execution permissions
  6. ✅ Verify the installation

Token Reduction: 60-90% average across all operations on all supported AI tools!

Manual Installation

For detailed platform-specific installation instructions, see HOOKS-INSTALLATION.md.

Quick Overview

All Platforms:

# 1. Install the MCP server
npm install -g @ooples/token-optimizer-mcp

# 2. Run the installer script for your platform

Windows:

cd "$env:APPDATA\npm\node_modules\@ooples\token-optimizer-mcp"
.\install-hooks.ps1

macOS / Linux:

NPM_PREFIX=$(npm config get prefix)
cd "$NPM_PREFIX/lib/node_modules/@ooples/token-optimizer-mcp"
bash install-hooks.sh

See HOOKS-INSTALLATION.md for complete instructions including:

  • Platform-specific configuration
  • Workspace trust setup
  • Troubleshooting guides
  • Verification steps

How It Works

Global Hooks System (7-Phase Optimization)

When global hooks are installed, token-optimizer-mcp runs automatically on every tool call:

┌─────────────────────────────────────────────────────────────┐
│ Phase 1: PreToolUse - Tool Replacement                      │
│ ├─ Read   → smart_read   (60-90% token reduction)          │
│ ├─ Grep   → smart_grep   (60-90% token reduction)          │
│ └─ Glob   → smart_glob   (60-90% token reduction)          │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 2: Input Validation - Cache Lookups                   │
│ └─ get_cached checks if operation was already done          │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 3: PostToolUse - Output Optimization                  │
│ ├─ optimize_text for large outputs                          │
│ └─ compress_text for repeated content                       │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 4: Session Tracking                                   │
│ └─ Log all operations to operations-{sessionId}.csv         │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 5: UserPromptSubmit - Prompt Optimization             │
│ └─ Optimize user prompts before sending to API              │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 6: PreCompact - Pre-Compaction Optimization           │
│ └─ Optimize before Claude Code compacts the conversation    │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 7: Metrics & Reporting                                │
│ └─ Track token reduction metrics and generate reports       │
└─────────────────────────────────────────────────────────────┘

Result: Every Read, Grep, Glob, Edit, Bash operation is automatically optimized for token efficiency!

Token Reduction Metrics (Production Data)

Based on 38,000+ operations:

| Tool | Avg Before | Avg After | Reduction | |------|-----------|----------|-----------| | Read | 5,000 tokens | 1,850 tokens | 63% | | Grep | 2,000 tokens | 740 tokens | 63% | | Glob | 1,500 tokens | 555 tokens | 63% | | Edit | 3,500 tokens | 1,295 tokens | 63% |

Per-Session Savings: 300K-700K tokens (worth $0.90-$2.10 at $3/M tokens)

Usage Examples

Optimize and Cache Text

// Use the optimize_text tool to cache content externally
optimize_text({
  text: "Your large text content here...",
  key: "my-cache-key",
  quality: 11  // 0-11, higher = better compression
})

// Result - compressed data is stored in SQLite, NOT returned in context
{
  "success": true,
  "key": "my-cache-key",
  "originalTokens": 1500,
  "compressedTokens": 450,        // Tokens IF it were in context (not relevant)
  "tokensSaved": 1050,             // Context window savings (what matters)
  "percentSaved": 70.5,            // Based on compression + external storage
  "originalSize": 6000,
  "compressedSize": 1800,          // Stored in SQLite
  "cached": true
}

// Your context window now contains only the cache key (~50 tokens)
// instead of the original 1500 tokens - 96.7% reduction in context usage!

Retrieve Cached Text

// Use the get_cached tool
get_cached({ key: "my-cache-key" })

// Result:
{
  "success": true,
  "key": "my-cache-key",
  "text": "Your original text content...",
  "fromCache": true
}

Count Tokens

// Use the count_tokens tool
count_tokens({ text: "Your text here" })

// Result:
{
  "tokens": 150,
  "characters": 500
}

Analyze Optimization Potential

// Use the analyze_optimization tool
analyze_optimization({ text: "Your text here" })

// Result:
{
  "tokens": {
    "current": 1500,
    "afterCompression": 450,
    "saved": 1050,
    "percentSaved": 70
  },
  "size": {
    "current": 6000,
    "compressed": 1800,
    "ratio": 3.33,
    "percentSaved": 70
  },
  "recommendations": {
    "shouldCompress": true,
    "reason": "Compression will provide significant token savings"
  }
}

Get Cache Statistics

// Use the get_cache_stats tool
get_cache_stats({})

// Result:
{
  "totalEntries": 15,
  "totalSize": 45000,
  "hits": 42,
  "misses": 8,
  "hitRate": 84.0,
  "avgCompressionRatio": 3.2,
  "totalTokensSaved": 12500
}

How Token Optimization Works

Understanding Context Window Savings vs Compression Ratio

It's important to understand the difference between compression ratio and context window savings:

Compression Ratio

  • Measures how much the original data is reduced in size (e.g., 10KB → 2KB = 5x compression)
  • Brotli achieves 2-4x typical compression, up to 82x for highly repetitive content
  • Does NOT directly translate to token savings in the compressed form

Context Window Savings (The Real Benefit)

When you cache content using this MCP server:

  1. Original text is compressed with Brotli (up to 82x compression)
  2. Compressed data is stored externally in SQLite database
  3. 100% of original tokens are removed from your context window
  4. Only a small cache key remains in context (~50 tokens for key + metadata)

Example: A 10,000 token API response is cached:

  • Before: 10,000 tokens in your context window
  • After: ~50 tokens (cache key + metadata)
  • Savings: 9,950 tokens removed from context (99.5% reduction)

Why Base64 Encoding Increases Token Count

When you compress text without caching (using compress_text), the compressed data must be encoded as Base64 to be transmitted as text:

  • Base64 encoding adds ~33% overhead to the compressed size
  • This often results in MORE tokens than the original (unless compression ratio >4x)
  • Solution: Use optimize_text which caches the compressed data externally

When Token Optimization Works Best

High Value Use Cases:

  • Caching large API responses that are referenced multiple times
  • Storing repetitive configuration or data files
  • Caching large code files that need to be referenced repeatedly
  • Archiving conversation history while keeping it accessible

Lower Value Use Cases:

  • Small text snippets (<500 characters) - overhead exceeds savings
  • One-time use content - no benefit from caching
  • Content with low compression ratio - external storage still helps

Token Optimization Workflow

Original Text (10,000 tokens)
        ↓
  Brotli Compress (82x ratio)
        ↓
Store in SQLite (~122 bytes)
        ↓
Return Cache Key (~50 tokens)
        ↓
RESULT: 9,950 tokens removed from context window

The key insight: The value is in external storage, not compression alone. Even with modest compression ratios, moving data out of your context window provides massive savings.

Development

# Build the project
npm run build

# Run in development mode (watch)
npm run dev

# Run tests (when implemented)
npm test

# Run benchmarks (when implemented)
npm run benchmark

Core Compression & Caching Mechanism

  1. Compression: Uses Brotli compression (quality 11) to reduce text size
  2. Token Counting: Uses tiktoken with GPT-4 tokenizer for accurate counts
  3. Caching: Stores compressed text in SQLite database for persistence
  4. Cache Management: Automatic cleanup of old entries to prevent unbounded growth

Performance

  • Compression Ratio: Typically 2-4x size reduction (up to 82x for highly repetitive content)
  • Context Window Savings: Up to 100% for cached content (removed from context window)
  • Cache Hit Rate: >80% in typical usage
  • Overhead: <10ms for cache operations
  • Compression Speed: ~1ms per KB of text

Limitations

  • Small Text: Best for text >500 characters (cache overhead on small snippets)
  • Base64 Overhead: Compressed-only output (without caching) may use MORE tokens due to Base64 encoding
  • Cache Storage: Cache size limited to prevent disk usage issues (automatic cleanup after 7 days)
  • Compression Tradeoff: Quality setting affects speed vs ratio (default quality 11 is optimal)
  • Token Counting: Uses GPT-4 tokenizer (approximation for Claude, but close enough for optimization decisions)
  • One-Time Content: No benefit for content that won't be referenced again (caching provides the value)

License

ISC

Author

Built for optimal Claude Code token efficiency.