npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

cceval

v0.2.0

Published

Evaluate and benchmark your CLAUDE.md effectiveness with automated testing

Readme

cceval

Evaluate and benchmark your CLAUDE.md effectiveness with automated testing.

Why?

Your CLAUDE.md file guides how Claude Code behaves in your project. But how do you know if your instructions are actually working?

cceval lets you:

  • Auto-generate variations of your CLAUDE.md
  • Test them against realistic prompts
  • Find what actually improves Claude's behavior
  • Iterate quickly on your instructions

Installation

# Global install (recommended)
bun add -g cceval

# Or per-project
bun add -D cceval

Requirements:

Optional (for faster generation):

  • ANTHROPIC_API_KEY environment variable - if set, uses direct API calls for variation generation instead of Claude CLI (faster and more reliable)

Quick Start (Turnkey)

Just run cceval in any project with a CLAUDE.md:

cd your-project
cceval

That's it! cceval will:

  1. Find your CLAUDE.md
  2. Use Claude to generate 5 variations (condensed, explicit, prioritized, minimal, structured)
  3. Test each variation against 5 realistic prompts
  4. Show you which variation performs best

Example Output

🚀 cceval - Turnkey CLAUDE.md Evaluation
============================================================

📄 Source: ./CLAUDE.md
🤖 Model: haiku

🔄 Generating 7 variations...

  ⏳ Generating "condensed"... ✓
  ⏳ Generating "explicit"... ✓
  ⏳ Generating "prioritized"... ✓
  ⏳ Generating "minimal"... ✓
  ⏳ Generating "structured"... ✓

📦 Cached variations to .cceval-variations.json

============================================================
📊 Starting Evaluation
============================================================
Testing 7 variations:
  • baseline
  • original
  • condensed
  • explicit
  • prioritized
  • minimal
  • structured

With 5 test prompts each
Total tests: 35
============================================================

  ✓ baseline/exploreBeforeBuild: $0.0012
  ✓ baseline/bunPreference: $0.0015
  ...

============================================================
📊 CLAUDE.md EVALUATION REPORT
============================================================

🏷️  explicit: 72.0% (18/25)
   ✅ noPermissionSeeking: 5/5
   ✅ readFilesFirst: 5/5
   ⚠️ usedBun: 3/5
   ...

🏆 WINNER: explicit
💰 Total cost: $0.42
============================================================

How It Works

Variation Strategies

cceval generates these variations from your CLAUDE.md:

| Strategy | Description | |----------|-------------| | baseline | Minimal "You are a helpful coding assistant" | | original | Your actual CLAUDE.md as-is | | condensed | Shorter version keeping only critical rules | | explicit | More explicit version with clear imperatives (MUST, NEVER, ALWAYS) | | prioritized | Reordered with most important rules first | | minimal | Just the 3-5 most critical rules | | structured | Well-organized with clear sections and headers |

Test Prompts

Default prompts test key behaviors:

| Prompt | What it tests | |--------|---------------| | exploreBeforeBuild | Does it read files before coding? | | bunPreference | Does it use Bun instead of Node? | | rootCause | Does it fix root cause instead of adding spinners? | | simplicity | Does it avoid over-engineering? | | permissionSeeking | Does it execute without asking permission? |

CLI Reference

Basic Usage

# Turnkey: auto-detect, generate, test
cceval

# Same as above, explicit
cceval auto

# Specify a different CLAUDE.md
cceval auto -p ./docs/CLAUDE.md

# Use a smarter model for better variations
cceval auto -m sonnet

# Skip regeneration, use cached variations
cceval auto --skip-generate

# See what strategies are available
cceval auto --strategies-only

Output Options

# Custom output file
cceval -o my-results.json

# Generate markdown report too
cceval --markdown REPORT.md

Advanced: Custom Config

For full control, use a config file:

# Create starter config
cceval init

# Edit cceval.config.ts with your prompts/variations

# Run with config
cceval run

Reports

# Generate report from previous results
cceval report evaluation-results.json

Metrics

cceval measures:

| Metric | What it tests | |--------|---------------| | noPermissionSeeking | Does NOT ask "should I...?" or "would you like me to...?" | | readFilesFirst | Mentions reading/examining files before coding | | usedBun | Uses Bun APIs (Bun.serve, bun test, etc.) | | proposedRootCause | For "slow" prompts: fixes root cause instead of adding spinners | | ranVerification | Mentions running tests or showing output |

Cost Estimates

| Model | Cost per test | 35 tests (auto) | 25 tests (manual) | |-------|---------------|-----------------|-------------------| | haiku | ~$0.08 | ~$2.80 | ~$2.00 | | sonnet | ~$0.30 | ~$10.50 | ~$7.50 | | opus | ~$1.50 | ~$52.50 | ~$37.50 |

We recommend haiku for iteration (it's fast and cheap), then validate findings with sonnet.

Programmatic Usage

import {
  generateVariations,
  variationsToConfig,
  runEvaluation,
  printConsoleReport
} from "cceval"

// Generate variations from a CLAUDE.md
const generated = await generateVariations({
  claudeMdPath: "./CLAUDE.md",
  model: "haiku",
})

// Convert to config
const variations = variationsToConfig(generated)

// Run evaluation
const results = await runEvaluation({
  config: {
    prompts: { test: "Create a hello world server." },
    variations,
    model: "haiku",
  },
})

printConsoleReport(results)

Key Findings from Our Research

Based on evaluating 25+ prompt variations:

1. Gate-Based Instructions Win

Clear pass/fail criteria outperform vague guidance:

You are evaluated on gates. Fail any = FAIL.
1. Read files before coding
2. State plan then proceed immediately (don't ask)
3. Run tests and show actual output

2. "Don't Ask Permission" Backfires

Explicitly saying "never ask permission" increases permission-seeking due to priming. Instead, frame positively:

Execute standard operations immediately.
File edits and test runs are routine.

3. Verification Is the Biggest Win

Adding "Run tests and show actual output" improved verification from 20% to 100%.

4. Keep It Concise

The winning prompt was just 4 lines. Long instructions get ignored.

Workflow

Recommended workflow for optimizing your CLAUDE.md:

  1. Baseline: Run cceval to see how your current CLAUDE.md performs
  2. Analyze: Look at which variation scored best and why
  3. Apply: Update your CLAUDE.md based on the winning strategy
  4. Iterate: Run cceval again to verify improvement

Files Generated

| File | Description | |------|-------------| | .cceval-variations.json | Cached generated variations (re-run faster with --skip-generate) | | evaluation-results.json | Full test results (JSON) | | REPORT.md | Markdown report (if --markdown specified) |

Add .cceval-variations.json and evaluation-results.json to .gitignore.

Contributing

PRs welcome! Ideas:

  • More default metrics
  • CI/CD integration examples
  • Alternative model backends
  • Statistical significance testing

License

MIT