@clawdcc/cvm-benchmark

v1.0.4

Published

3 months ago

Comprehensive benchmarking and performance analysis tools for Claude Code versions

0High
0Medium
0Low

humanwritten

cvm claude claude-code benchmark performance analysis version-testing

@clawd/cvm-benchmark

Comprehensive benchmarking and performance analysis tools for Claude Code versions managed by CVM.

Features

--version Spawn Benchmarks: Measure Claude startup time using --version flag
Interactive PTY Benchmarks: Measure full interactive startup time with terminal signals
Multi-Run Comparison: Compare multiple benchmark runs to verify consistency
HTML Reports: Beautiful Chart.js-powered performance visualization
Session Cleanup: Automatic cleanup of test sessions for fair benchmarking
Viability Detection: Identify minimum viable Claude Code versions

Installation

Method 1: NPM + Symlink (Recommended)

# Install via NPM
npm install -g @clawd/cvm-benchmark

# Link to CVM plugins directory
ln -s $(npm root -g)/@clawd/cvm-benchmark/index.js ~/.cvm/plugins/benchmark.js

# Verify installation
cvm plugins

Method 2: Direct Clone

# Clone into a local directory
git clone https://github.com/clawd/cvm-benchmark.git
cd cvm-benchmark
npm install

# Link to CVM
ln -s $(pwd)/index.js ~/.cvm/plugins/benchmark.js

# Verify
cvm plugins

Usage

As CVM Plugin (Primary Method)

# Benchmark a specific version (interactive startup test)
cvm benchmark 2.0.42

# Benchmark all installed versions
cvm benchmark --all

# Compare multiple benchmark runs
cvm benchmark --compare 1 2

# Check loaded plugins
cvm plugins

Standalone Usage

# Run comprehensive benchmark suite
node index.js all

# Compare benchmark runs
node index.js compare 1 2

# Run specific benchmarks
node lib/benchmark-version.js
node lib/benchmark-interactive.js 2.0.42
node lib/benchmark-interactive-all.js
node lib/comprehensive-suite.js 3

Benchmark Types

1. --version Spawn Test

Spawns Claude with --version flag
Measures process spawn and execution time
Fast, simple performance indicator
Ideal for quick version comparison

2. Interactive PTY Test

Spawns Claude in pseudo-terminal (PTY)
Detects ready state via terminal signals:
- ESC[?2004h - Bracketed paste mode
- ESC[?1004h - Focus events
- > - Prompt character
No timeout-based detection (signal-based only)
Measures real interactive startup time
Cleans up session files after each run

Trust Prompt Handling:

Benchmark runs with cwd: process.cwd() (directory where script runs)
Older versions (0.2.x, 1.0.x) show "Do you trust the files" security prompt
Auto-accepts trust prompt by sending Enter key to proceed
Each version spawns fresh in the benchmark directory

Version Requirement Detection:

Detects versions < 1.0.24 that show "needs update" error
Extracts minimum version requirement (expected: 1.0.24)
Returns result: 'error_detected' with full error message
Warns if minimum version changes from 1.0.24

Output Structure

All benchmark data is stored in ~/.cvm/benchmarks/:

~/.cvm/benchmarks/
├── benchmarks-all-3run.json          # --version benchmarks for all versions
├── benchmark-startup-{version}.json  # Individual interactive benchmarks
├── STARTUP_COMPARISON.html           # Generated performance report
├── run-1/                            # Multi-run comparison data
│   ├── version/
│   │   └── benchmarks-all-3run.json
│   ├── interactive/
│   │   ├── benchmark-startup-0-2-9.json
│   │   ├── benchmark-startup-2-0-42.json
│   │   └── ...
│   └── metadata.json
└── run-2/
    └── ...

Reports

Performance Report

Generated from individual benchmark runs, showing:

--version spawn times across all versions
Interactive startup times across all versions
Version viability markers (1.0.24+)
Performance trends and outliers

Comparison Report

Overlays multiple benchmark runs to show:

Measurement consistency across runs
Performance variance
Reliability of benchmark data

Version States

The benchmark tool detects three version states:

error_detected: Pre-0.2.103 versions that show error before UI
ui_then_exit: Versions 0.2.103-1.0.23 that show UI with error but immediately close
ready: Versions 1.0.24+ that are actually interactive

Minimum viable version: 1.0.24

Performance Data

Example benchmark results:

{
  "version": "2.0.42",
  "results": [
    {
      "time": 980,
      "result": "ready",
      "reason": "all terminal signals received and process stable",
      "signals": {
        "bracketedPaste": true,
        "focusEvents": true,
        "prompt": true
      }
    }
  ]
}

API

Plugin API

module.exports = {
  name: 'benchmark',
  version: '0.1.0',
  description: 'Benchmark and analyze Claude Code performance',
  commands: [...],
  hooks: {
    afterInstall: (version) => { /* ... */ }
  }
};

Module API

const benchmarkVersion = require('@clawd/cvm-benchmark/lib/benchmark-version');
const benchmarkInteractive = require('@clawd/cvm-benchmark/lib/benchmark-interactive');
const compareRuns = require('@clawd/cvm-benchmark/lib/compare-runs');
const comprehensiveSuite = require('@clawd/cvm-benchmark/lib/comprehensive-suite');

// Run benchmarks
await benchmarkVersion.run({ runs: 3 });
await benchmarkInteractive.run('2.0.42', 3);
await comprehensiveSuite.runAll({ runNumber: 3 });
compareRuns.compare(['1', '2', '3']);

Requirements

Node.js >= 14.0.0
CVM installed with at least one Claude Code version
node-pty for interactive benchmarks

Development

# Clone the repo
git clone https://github.com/clawd/cvm-benchmark.git
cd cvm-benchmark

# Install dependencies
npm install

# Run tests
npm test

# Run benchmarks
node lib/benchmark-interactive.js 2.0.42

License

MIT

Related Projects

@clawd/cvm - Claude Version Manager
Claude Code - Official CLI for Claude

Credits

Built by the CVM team to enable comprehensive performance testing across all Claude Code versions.

Status: Production-ready, actively used for benchmarking 249 Claude Code versions (0.2.x → 2.0.x)