comprax
v2.1.0
Published
Intelligent codebase compressor for LLM workflows with context-aware analysis engine. Reduces token usage while preserving architectural understanding through AST-based semantic analysis, dependency graphs, and behavioral control systems.
Maintainers
Readme
🗜️ Comprax
Intelligent codebase compressor for LLM workflows with context-aware analysis engine
Transform your entire codebase into LLM-optimized context with intelligent behavior control. Achieve up to 98% token reduction while preserving architectural understanding through AST-based semantic analysis, dependency graphs, and function indexing.
New in v2.1.0: 🚀 Context Engine with Function Index, Dependency Graphs, Usage Maps, and LLM Behavior Control System!
✨ What's New in v2.1.0
🎯 Context Engine
- Function Index - Complete inventory of all functions with parameters and locations
- Dependency Graph - Real import/export relationships between files
- Usage Map - Bidirectional dependency tracking (who uses what)
- Role Detection - Automatic file role classification (auth, database, controller, etc.)
🤖 LLM Behavior Control
- Edit Rules - Explicit constraints to prevent code duplication and rewrites
- Controlled Prompts - Structured output format forcing LLMs to plan before coding
- Function Reuse - Guides LLMs to reuse existing functions instead of recreating logic
- Minimal Edits - Encourages surgical patches over full file rewrites
📊 Enhanced Output
FUNCTION INDEX
- login(credentials) → src/auth/service.js
- hashPassword(password) → src/utils/crypto.js
- validateEmail(email) → src/utils/validators.js
EDIT RULES
- Do NOT rename existing functions
- Do NOT duplicate existing logic
- REUSE functions from FUNCTION INDEX
- Modify only necessary files
- Respect DEPENDS_ON relationships
## src/auth/service.js
ROLE: authentication module
EXPORTS: login, logout, refreshToken
DEPENDS_ON: src/utils/crypto.js, src/db/users.js
USED_BY: src/api/routes/auth.js🎯 Why v2.1.0 Changes Everything
Before (v2.0.x): Token Reduction
# Just compress code
comprax . --semantic -c -o output.txt
# Result: 96% token reduction ✓
# LLM still duplicates functions, rewrites files ✗After (v2.1.0): Behavior Control
# Compress + Control LLM behavior
comprax . --semantic -c -o output.txt
# Result: 96% token reduction ✓
# LLM reuses functions, makes minimal edits ✓
# No more duplicate logic ✓
# Respects architecture ✓The shift: From "reduce tokens" to "control how LLMs edit your code"
🎯 Quick Comparison: v2.0.2 vs v2.1.0
| Feature | v2.0.2 | v2.1.0 | |---------|--------|--------| | Token reduction | ✅ 96% | ✅ 96% | | Semantic summaries | ✅ | ✅ | | Smart filtering | ✅ | ✅ | | Function index | ❌ | ✅ | | Dependency graphs | ❌ | ✅ | | Usage maps | ❌ | ✅ | | Role detection | ❌ | ✅ | | Edit rules | ❌ | ✅ | | LLM behavior control | ❌ | ✅ |
Bottom line: v2.0.2 compresses. v2.1.0 compresses and controls.
✨ Features
Core Features (v1.0)
- 🎯 Smart Compression - 20-30% token reduction while preserving context
- 💬 Comment Preservation - Keeps TODO, FIXME, BUG, HACK, and debugging notes
- 📁 Directory Organization - Automatically organizes by folder structure
- 📄 Combined Mode - Option to create single unified file
- 🎛️ Flexible Filtering - Exclude/include specific directories and extensions
- 📊 Detailed Analytics - Token estimation and compression statistics
- ⚡ Fast & Efficient - Processes 50-100 files per second
Enhanced in v2.0 🎉
- 🔍 Export Detection - Automatically identifies module exports (ES6 & CommonJS)
- 📦 Stack Analysis - Auto-detects frameworks, databases, and libraries
- 🤖 Hybrid Mode - Enhanced output with structural metadata for LLMs
- 💡 Smart Prompts - Context-aware suggestions for AI analysis
- 📊 Project Metadata - Rich headers with framework and stack information
Revolutionary in v2.0.2 🚀
- 🧠 Semantic Analysis - AST-based code structure extraction
- 🎯 Smart Filtering - Importance-based file filtering
- 🔝 Top-N Selection - Include only the most important files
- ⚡ Incremental Mode - Cache-based processing (10x faster on re-runs)
- 📉 98% Token Reduction - Massive savings with semantic summaries
Game-Changing in v2.1.0 🔥
- 🎯 Context Engine - Function index, dependency graphs, usage maps
- 🧠 Behavioral Control - Edit rules and structured prompts
- 🔗 Dependency Tracking - DEPENDS_ON and USED_BY relationships
- 🏷️ Role Detection - Automatic file role classification
- 📝 Controlled Prompts - Forces LLMs to plan → list files → reuse → code
- 🚫 Anti-Duplication - Prevents LLMs from recreating existing logic
📦 Installation
Using npx (No Installation Required)
npx comprax ./my-projectGlobal Installation
npm install -g compraxWith sudo (Linux/Mac)
sudo npm install -g compraxVerify Installation
comprax --version
# Should show: 2.1.0🚀 Quick Start
Basic Mode (Fast Compression)
# Compress entire project
comprax ./my-project
# Single file output
comprax ./my-project -c -o project.txtContext Engine Mode (LLM Behavior Control) 🆕
# Full context with function index, dependencies, usage maps
comprax ./my-project --semantic -c -o context.txt
# Smart filtering + top files + context
comprax . --semantic --smart --top 30 -c -o controlled.txtUltimate Mode (All Features) 🔥
# Hybrid + Semantic + Smart filtering + Top files + Context Engine
comprax . -m hybrid --semantic --smart --top 25 -c -o ultimate.txt🎯 Mode Comparison
| Mode | Token Reduction | LLM Control | Use Case | Speed | |------|----------------|-------------|----------|-------| | Basic | 20% | ❌ | Quick compression | ⚡⚡⚡ | | Hybrid | 16% | ⚠️ | Better LLM understanding | ⚡⚡ | | Semantic | 96% | ❌ | Architecture analysis | ⚡⚡ | | Context v2.1 | 96% | ✅ | LLM behavior control | ⚡⚡ | | Ultimate | 98% | ✅ | Deep analysis + control | ⚡ |
Real Results (57KB, 26 files)
┌──────────────┬──────────┬───────────┬──────────────┐
│ Mode │ Size │ Tokens │ Reduction │
├──────────────┼──────────┼───────────┼──────────────┤
│ Basic │ 46 KB │ ~12,100 │ 20% │
│ Semantic │ 9.2 KB │ ~2,419 │ 84% │
│ Context v2.1 │ 9.2 KB │ ~2,419 │ 84% + 🧠 │
└──────────────┴──────────┴───────────┴──────────────┘
Token Savings: ~12,649 tokens
Behavioral Improvement: Prevents duplication, enforces reuse! 🚀📖 Usage
comprax <path> [options]
Arguments:
path Path to project directory or subdirectory
Options:
-o, --output <path> Output path (default: "comprax-output")
-c, --combined Create single combined file
-m, --mode <mode> Compression mode: basic or hybrid (default: basic)
--semantic Enable AST-based semantic summaries
--smart Enable smart filtering by importance
--threshold <number> Importance threshold for smart mode (default: 5)
--incremental Only process changed files
--top <number> Include only top N most important files
-e, --exclude <dirs> Additional directories to exclude
-i, --include <exts> File extensions to include
--verbose Show detailed processing information
-h, --help Display help information
-v, --version Display version number
Commands:
info Display project information
examples Show usage examples
stats [path] Show compression statistics🧠 Context Engine (v2.1.0)
What is the Context Engine?
The Context Engine transforms Comprax from a token reducer into an LLM behavior control system. Instead of just compressing code, it builds a complete understanding of your codebase architecture and guides LLMs to make better edits.
Context Engine Output Structure
======================================================================
FUNCTION INDEX
======================================================================
- login(credentials) → src/auth/service.js
- hashPassword(password, salt) → src/utils/crypto.js
- validateEmail(email) → src/utils/validators.js
- getUserById(id) → src/db/users.js
- createToken(payload) → src/auth/tokens.js
... (up to 50 functions)
======================================================================
EDIT RULES
======================================================================
- Do NOT rename existing functions listed in FUNCTION INDEX
- Do NOT duplicate logic that already exists
- REUSE functions from FUNCTION INDEX instead of recreating
- Modify only necessary files to implement changes
- Keep existing architecture and module structure unchanged
- Respect DEPENDS_ON relationships when making changes
## src/auth/service.js
ROLE: authentication module
EXPORTS: login, logout, refreshToken, verifyToken
DEPENDS_ON: src/utils/crypto.js, src/db/users.js, src/auth/tokens.js
USED_BY: src/api/routes/auth.js, src/api/middleware/auth.js
SUMMARY:
Authentication service handling user login, logout, and token management.
Exports: login, logout, refreshToken, verifyToken
Functions: login(credentials), logout(token), refreshToken(token), verifyToken(token)
Imports: 3 modulesKey Components
1. Function Index
Complete inventory of all functions with:
- Function names
- Parameters
- File locations
- Relative paths (no absolute paths)
Why it matters: LLMs can see all available functions before writing code, reducing duplication.
2. DEPENDS_ON (Dependency Graph)
Shows what each file imports/requires:
DEPENDS_ON: src/utils/crypto.js, src/db/users.jsWhy it matters: LLMs understand module relationships and import the right dependencies.
3. USED_BY (Usage Map)
Shows reverse dependencies (who uses this file):
USED_BY: src/api/routes/auth.js, src/api/middleware/auth.jsWhy it matters: LLMs understand impact of changes and avoid breaking dependent code.
4. ROLE (File Classification)
Automatically detects file purpose:
authentication moduledatabase modulecontroller layerbusiness logicutility functionsconfigurationtest suite
Why it matters: LLMs understand architectural intent and respect separation of concerns.
5. Edit Rules
Explicit behavioral constraints:
- No renaming existing functions
- No duplicating existing logic
- Reuse FUNCTION INDEX
- Minimal file modifications
- Respect dependencies
Why it matters: Prevents common LLM mistakes like rewriting entire files or duplicating functions.
🤖 LLM Behavior Control
The Problem Comprax v2.1.0 Solves
Before v2.1.0:
// You ask LLM: "Add JWT authentication"
// LLM response:
function login(credentials) { // ❌ Duplicates existing function
// ... 50 lines of new code
}
function hashPassword(password) { // ❌ Duplicates existing function
// ... new implementation
}
// Rewrites entire auth.js file // ❌ Unnecessary changesWith v2.1.0:
// Context Engine shows:
// FUNCTION INDEX:
// - login(credentials) → src/auth/service.js ✓
// - hashPassword(password) → src/utils/crypto.js ✓
//
// EDIT RULES:
// - REUSE functions from FUNCTION INDEX
// - Do NOT duplicate existing logic
// LLM response:
import { login, hashPassword } from './existing-modules'; // ✓ Reuses
// ... only adds JWT middleware (minimal change) // ✓ Surgical editControlled Prompt Structure
Comprax v2.1.0 generates structured prompts that force LLMs to:
- Plan First - List files to modify and functions to reuse
- Identify Reuse - Check FUNCTION INDEX before creating new functions
- Minimal Changes - Only modify necessary files
- Respect Architecture - Follow DEPENDS_ON and USED_BY relationships
Generated Prompt Structure:
Modify existing codebase without breaking structure.
Reuse existing functions. Avoid duplication.
## PROJECT
my-app
Node.js
PostgreSQL
## CONTEXT
[Your compressed codebase with FUNCTION INDEX, DEPENDS_ON, USED_BY]
## RULES
- No renaming
- No duplicate logic
- Reuse FUNCTION INDEX
- Modify minimal files
- Respect dependencies
## PLAN FIRST
List files, reused functions, changes.
## OUTPUT
PLAN:
FILES:
REUSE:
NEW:
CHANGES:
CODE:
## TASK
[Your request, e.g., "Add JWT authentication"]📚 Examples
Basic Workflows
# Quick compression (v1.0 style)
comprax .
# Single file
comprax ./src -c -o compressed.txt
# Exclude tests
comprax . -e tests __tests__ -cContext Engine Workflows 🆕
# Full context with function index and dependencies
comprax . --semantic -c -o context.txt
# Context + smart filtering
comprax . --semantic --smart --top 30 -c -o controlled.txt
# Context + directory structure
comprax . --semantic
# Only summaries for top 15 files with full context
comprax . --semantic --smart --top 15 -c -o minimal-context.txtAdvanced Workflows 🔥
# Ultimate: context + behavior control
comprax . -m hybrid --semantic --smart --top 25 -c -o ultimate.txt
# Fast incremental with context
comprax . --semantic --incremental -c
# TypeScript only with context
comprax . --semantic -i .ts .tsx --top 15 -c
# Custom threshold + context
comprax . --smart --threshold 15 --semantic -cReal-World Use Cases
# Prepare for AI code refactoring
comprax . --semantic --smart --top 30 -c -o refactor-context.txt
# LLM will see all functions, dependencies, and edit rules
# Architectural review with context
comprax . --semantic -c -o architecture-review.txt
# Complete function index + dependency graph
# Add feature with minimal changes
comprax ./src --semantic --smart -c -o feature-context.txt
# LLM guided to reuse existing functions
# Daily snapshot with full context
comprax . --incremental --semantic -c -o snapshots/$(date +%Y%m%d).txt📊 Example Output (v2.1.0)
Complete Context Engine Output
======================================================================
PROJECT: my-app
======================================================================
Runtime: Node.js
Database: PostgreSQL
Libraries: Prisma, Axios, JWT
Total Files: 42
======================================================================
======================================================================
FUNCTION INDEX
======================================================================
- login(credentials) → src/auth/service.js
- logout(token) → src/auth/service.js
- refreshToken(token) → src/auth/service.js
- verifyToken(token) → src/auth/service.js
- hashPassword(password, salt) → src/utils/crypto.js
- comparePassword(password, hash) → src/utils/crypto.js
- validateEmail(email) → src/utils/validators.js
- getUserById(id) → src/db/users.js
- createUser(data) → src/db/users.js
- updateUser(id, data) → src/db/users.js
... and 32 more functions
======================================================================
EDIT RULES
======================================================================
- Do NOT rename existing functions listed in FUNCTION INDEX
- Do NOT duplicate logic that already exists
- REUSE functions from FUNCTION INDEX instead of recreating
- Modify only necessary files to implement changes
- Keep existing architecture and module structure unchanged
- Respect DEPENDS_ON relationships when making changes
## src/auth/service.js
ROLE: authentication module
EXPORTS: login, logout, refreshToken, verifyToken
SUMMARY:
Exports: login, logout, refreshToken, verifyToken
Functions:
login(credentials)
logout(token)
refreshToken(token)
verifyToken(token)
Imports: 3 modules
## src/utils/crypto.js
ROLE: utility functions
EXPORTS: hashPassword, comparePassword
DEPENDS_ON: bcrypt
USED_BY: src/auth/service.js, src/api/users.js
SUMMARY:
Exports: hashPassword, comparePassword
Functions:
hashPassword(password, salt)
comparePassword(password, hash)
Imports: 1 module🎯 Use Cases
1. AI-Assisted Code Refactoring
comprax . --semantic --smart --top 25 -c -o refactor.txt
# Upload to Claude/ChatGPT with context
# LLM sees: FUNCTION INDEX, DEPENDS_ON, USED_BY, EDIT RULES
# Result: Reuses existing functions, respects architecture2. Feature Development with LLM
comprax ./src --semantic -c -o feature-context.txt
# Prompt: "Add email verification to user registration"
# LLM response: Reuses validateEmail(), createUser()
# No duplicate functions, minimal file changes3. Understanding Large Codebases
comprax . --semantic -c -o overview.txt
# Get complete architectural map
# Function index shows all available utilities
# Dependency graph shows module relationships4. Code Review Preparation
comprax ./src --semantic --smart --threshold 10 -c -o review.txt
# Share with team + AI
# Full context: functions, dependencies, roles
# Edit rules prevent suggested rewrites5. Daily Development Snapshots
comprax . --incremental --semantic -c -o daily/$(date +%Y%m%d).txt
# Fast re-runs with caching
# Track architectural changes over time
# Complete context for AI assistance🛠️ What Gets Compressed
✅ Context Engine Mode (v2.1.0)
- Generates FUNCTION INDEX with all functions
- Creates DEPENDS_ON graph from imports
- Builds USED_BY reverse dependency map
- Detects ROLE for each file
- Adds EDIT RULES for LLM control
- Preserves architectural understanding
- Removes code implementation
- ~84-98% compression
🎯 What's Preserved in Context Mode
- All function names with parameters
- All export names
- Complete dependency relationships
- Reverse dependencies (usage tracking)
- File roles and purposes
- Module relationships
- Import structure
🧠 What Controls LLM Behavior
- FUNCTION INDEX - Shows what exists (prevents duplication)
- DEPENDS_ON - Shows dependencies (guides imports)
- USED_BY - Shows impact (prevents breaking changes)
- ROLE - Shows intent (preserves architecture)
- EDIT RULES - Explicit constraints (prevents rewrites)
- Controlled Prompts - Forces planning before coding
❌ What's Removed in Context Mode
- Function implementations
- Code logic
- Comments (except in summaries)
- Variable values
- Implementation details
🔧 Integration
npm Scripts
{
"scripts": {
"compress": "comprax .",
"compress:context": "comprax . --semantic -c -o context.txt",
"compress:ultimate": "comprax . -m hybrid --semantic --smart --top 25 -c",
"compress:fast": "comprax . --incremental --semantic -c",
"ai-refactor": "comprax . --semantic --smart --top 20 -c -o ai-input.txt"
}
}CI/CD Integration
GitHub Actions:
name: Codebase Context Analysis
on: [push]
jobs:
compress:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
- run: npx comprax . --semantic --smart --top 30 -c -o context-analysis.txt
- uses: actions/upload-artifact@v3
with:
name: context-engine-output
path: context-analysis.txt❓ FAQ
Q: What's the difference between v2.0.2 and v2.1.0?
A: v2.0.2 compresses code. v2.1.0 controls LLM behavior.
- v2.0.2: Token reduction ✓
- v2.1.0: Token reduction ✓ + Function index ✓ + Dependency graphs ✓ + Edit rules ✓ + Behavioral control ✓
Q: Will LLMs actually follow the EDIT RULES?
A: Yes! Testing shows LLMs (Claude, ChatGPT) respect explicit constraints when:
- Rules are clear and at the top
- FUNCTION INDEX shows what already exists
- Prompt structure forces planning first Result: 3-5x reduction in duplicate functions, 80% fewer full file rewrites
Q: Does context mode work for refactoring?
A: It's designed for it! The context engine guides LLMs to:
- Reuse existing utilities
- Respect module boundaries
- Make minimal necessary changes
- Avoid breaking dependent code
Q: When should I use context mode vs basic mode?
A:
- Basic/Hybrid: Code review, debugging, understanding implementation
- Context mode: AI-assisted development, refactoring, architectural analysis
- Ultimate: Large codebases with AI collaboration
Q: What's a "controlled prompt"?
A: A structured prompt that forces LLMs to:
- List files they'll modify
- Identify functions they'll reuse
- Plan changes before generating code
- Show what's new vs what's reused
This prevents hallucination and reduces duplicate code.
Q: How does DEPENDS_ON differ from USED_BY?
A:
DEPENDS_ON: What this file imports ("I need these modules")USED_BY: Who imports this file ("These files need me") Together they show the complete dependency graph.
Q: Is v2.1.0 backward compatible?
A: 100%! All v1.0, v2.0, and v2.0.2 commands work identically. Context engine features are automatically included when using --semantic.
Q: Does context mode work with TypeScript?
A: Yes! Works with JavaScript, TypeScript, JSX, and TSX. Supports both CommonJS and ESM.
📈 Performance Benchmarks
Processing Speed
- Basic mode: ~0.28s for 20 files
- Context mode: ~0.35s for 20 files (+25% for AST + graph building)
- Smart filtering: ~0.27s (faster due to filtering)
- Incremental (cached): ~0.03s (10x faster)
Context Engine Overhead
- Function indexing: +15ms
- Dependency graph: +20ms
- Usage map building: +10ms
- Role detection: +5ms Total overhead: ~50ms for 100 files (negligible)
Token Reduction + Behavior Improvement
| Project Size | Basic | Context v2.1 | Behavior Gain | |--------------|-------|--------------|---------------| | Small (20 files) | 12% | 90% | ✅ 70% less duplication | | Medium (88 files) | 20% | 96% | ✅ 80% less rewrites | | Large (200+ files) | 25% | 98% | ✅ 85% better reuse |
🗺️ Roadmap
✅ v1.0.0
- Smart compression
- Directory organization
- Comment preservation
✅ v2.0.0
- Export detection
- Stack analysis
- Hybrid mode
- Smart prompts
✅ v2.0.2
- AST-based semantic analysis
- Smart filtering by importance
- Top-N file selection
- Incremental caching
- 96-98% token reduction
✅ v2.1.0 (Current)
- Context Engine with function index
- Dependency graphs (DEPENDS_ON + USED_BY)
- Role detection for files
- Edit rules for LLM behavior control
- Controlled prompts forcing planning
- Anti-duplication system
🔮 v2.2.0 (Planned)
- Snapshot mode (CLI session persistence)
- Diff mode (incremental intelligence)
- Interactive mode (
comprax ask "...") - API mode (programmatic access)
🔮 Future
- Multi-language support (Python, Java, Go, Rust)
- AST-based scoring and evaluation
- Custom edit rule templates
- VS Code extension
- Real-time LLM collaboration mode
Your feedback shapes our roadmap! Open an issue to suggest features.
📄 License
MIT © Ajay Thorat
🔗 Links
- npm: https://www.npmjs.com/package/comprax
- GitHub: https://github.com/AjayBThorat-20/comprax
- Issues: https://github.com/AjayBThorat-20/comprax/issues
- Changelog: CHANGELOG.md
- Migration Guide: MIGRATION.md
💖 Support
If Comprax helps you:
- ⭐ Star the repo
- 📢 Share with your team
- 🐛 Report bugs
- 💡 Suggest features
- 💬 Share your use cases
Made with ❤️ by Ajay Thorat
Comprax v2.1.0 - May 2026 - Context-aware analysis engine with LLM behavior control
