@clawdcc/cvm-benchmark
v1.0.4
Published
Comprehensive benchmarking and performance analysis tools for Claude Code versions
Maintainers
Readme
@clawd/cvm-benchmark
Comprehensive benchmarking and performance analysis tools for Claude Code versions managed by CVM.
Features
- --version Spawn Benchmarks: Measure Claude startup time using
--versionflag - Interactive PTY Benchmarks: Measure full interactive startup time with terminal signals
- Multi-Run Comparison: Compare multiple benchmark runs to verify consistency
- HTML Reports: Beautiful Chart.js-powered performance visualization
- Session Cleanup: Automatic cleanup of test sessions for fair benchmarking
- Viability Detection: Identify minimum viable Claude Code versions
Installation
Method 1: NPM + Symlink (Recommended)
# Install via NPM
npm install -g @clawd/cvm-benchmark
# Link to CVM plugins directory
ln -s $(npm root -g)/@clawd/cvm-benchmark/index.js ~/.cvm/plugins/benchmark.js
# Verify installation
cvm pluginsMethod 2: Direct Clone
# Clone into a local directory
git clone https://github.com/clawd/cvm-benchmark.git
cd cvm-benchmark
npm install
# Link to CVM
ln -s $(pwd)/index.js ~/.cvm/plugins/benchmark.js
# Verify
cvm pluginsUsage
As CVM Plugin (Primary Method)
# Benchmark a specific version (interactive startup test)
cvm benchmark 2.0.42
# Benchmark all installed versions
cvm benchmark --all
# Compare multiple benchmark runs
cvm benchmark --compare 1 2
# Check loaded plugins
cvm pluginsStandalone Usage
# Run comprehensive benchmark suite
node index.js all
# Compare benchmark runs
node index.js compare 1 2
# Run specific benchmarks
node lib/benchmark-version.js
node lib/benchmark-interactive.js 2.0.42
node lib/benchmark-interactive-all.js
node lib/comprehensive-suite.js 3Benchmark Types
1. --version Spawn Test
- Spawns Claude with
--versionflag - Measures process spawn and execution time
- Fast, simple performance indicator
- Ideal for quick version comparison
2. Interactive PTY Test
- Spawns Claude in pseudo-terminal (PTY)
- Detects ready state via terminal signals:
ESC[?2004h- Bracketed paste modeESC[?1004h- Focus events>- Prompt character
- No timeout-based detection (signal-based only)
- Measures real interactive startup time
- Cleans up session files after each run
Trust Prompt Handling:
- Benchmark runs with
cwd: process.cwd()(directory where script runs) - Older versions (0.2.x, 1.0.x) show "Do you trust the files" security prompt
- Auto-accepts trust prompt by sending Enter key to proceed
- Each version spawns fresh in the benchmark directory
Version Requirement Detection:
- Detects versions < 1.0.24 that show "needs update" error
- Extracts minimum version requirement (expected: 1.0.24)
- Returns
result: 'error_detected'with full error message - Warns if minimum version changes from 1.0.24
Output Structure
All benchmark data is stored in ~/.cvm/benchmarks/:
~/.cvm/benchmarks/
├── benchmarks-all-3run.json # --version benchmarks for all versions
├── benchmark-startup-{version}.json # Individual interactive benchmarks
├── STARTUP_COMPARISON.html # Generated performance report
├── run-1/ # Multi-run comparison data
│ ├── version/
│ │ └── benchmarks-all-3run.json
│ ├── interactive/
│ │ ├── benchmark-startup-0-2-9.json
│ │ ├── benchmark-startup-2-0-42.json
│ │ └── ...
│ └── metadata.json
└── run-2/
└── ...Reports
Performance Report
Generated from individual benchmark runs, showing:
- --version spawn times across all versions
- Interactive startup times across all versions
- Version viability markers (1.0.24+)
- Performance trends and outliers
Comparison Report
Overlays multiple benchmark runs to show:
- Measurement consistency across runs
- Performance variance
- Reliability of benchmark data
Version States
The benchmark tool detects three version states:
- error_detected: Pre-0.2.103 versions that show error before UI
- ui_then_exit: Versions 0.2.103-1.0.23 that show UI with error but immediately close
- ready: Versions 1.0.24+ that are actually interactive
Minimum viable version: 1.0.24
Performance Data
Example benchmark results:
{
"version": "2.0.42",
"results": [
{
"time": 980,
"result": "ready",
"reason": "all terminal signals received and process stable",
"signals": {
"bracketedPaste": true,
"focusEvents": true,
"prompt": true
}
}
]
}API
Plugin API
module.exports = {
name: 'benchmark',
version: '0.1.0',
description: 'Benchmark and analyze Claude Code performance',
commands: [...],
hooks: {
afterInstall: (version) => { /* ... */ }
}
};Module API
const benchmarkVersion = require('@clawd/cvm-benchmark/lib/benchmark-version');
const benchmarkInteractive = require('@clawd/cvm-benchmark/lib/benchmark-interactive');
const compareRuns = require('@clawd/cvm-benchmark/lib/compare-runs');
const comprehensiveSuite = require('@clawd/cvm-benchmark/lib/comprehensive-suite');
// Run benchmarks
await benchmarkVersion.run({ runs: 3 });
await benchmarkInteractive.run('2.0.42', 3);
await comprehensiveSuite.runAll({ runNumber: 3 });
compareRuns.compare(['1', '2', '3']);Requirements
- Node.js >= 14.0.0
- CVM installed with at least one Claude Code version
node-ptyfor interactive benchmarks
Development
# Clone the repo
git clone https://github.com/clawd/cvm-benchmark.git
cd cvm-benchmark
# Install dependencies
npm install
# Run tests
npm test
# Run benchmarks
node lib/benchmark-interactive.js 2.0.42License
MIT
Related Projects
- @clawd/cvm - Claude Version Manager
- Claude Code - Official CLI for Claude
Credits
Built by the CVM team to enable comprehensive performance testing across all Claude Code versions.
Status: Production-ready, actively used for benchmarking 249 Claude Code versions (0.2.x → 2.0.x)
