npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@dzhechkov/skills-bto

v1.3.0

Published

Build-Benchmark-Test-Optimize skill pack for Claude Code — deterministic benchmarking, quality gates, witness chain, judge attestation, and optimization

Readme

@dzhechkov/skills-bto

Build-Benchmark-Test-Optimize skill pack for Claude Code

Multi-agent evaluation and iterative optimization pipeline for Claude Code skills, commands, and prompts. Includes deterministic benchmarking with golden sample comparison, test suites, consistency probes, and performance metrics. Part of the Keysarium ecosystem.


Quick Start

# One-command install via npx
npx @dzhechkov/skills-bto

# Or install globally
npm install -g @dzhechkov/skills-bto
skills-bto init

# Install into a project that already has @dzhechkov/keysarium
npx @dzhechkov/skills-bto init

After installation, open Claude Code in your project directory and start using BTO commands.


What You Get

| Component | Count | Description | |-----------|-------|-------------| | Skill | 1 | bto — core Build-Benchmark-Test-Optimize skill with 4 modules | | Commands | 5 | /bto, /bto-build, /bto-benchmark, /bto-test, /bto-optimize | | Rules | 1 | bto-quality-gates — quality gate enforcement (incl. benchmark gates) | | Shards | 1 | bto-evaluation — context shard for BTO evaluation pipeline | | Agent Templates | 2 | bto-judge-panel, bto-optimizer-worker | | References | 5 | Eval patterns, judge rubrics, optimization methods, quality checklist, golden samples | | Examples | 2 | Sample evaluation report, sample benchmark report |

Everything is installed into your project's .claude/ directory and works natively with Claude Code.


Commands

npx @dzhechkov/skills-bto                    # Full install (interactive, same as init)
npx @dzhechkov/skills-bto init               # Install all components
npx @dzhechkov/skills-bto init --force       # Overwrite existing files
npx @dzhechkov/skills-bto init --dry-run     # Preview without making changes
npx @dzhechkov/skills-bto update             # Update to latest version
npx @dzhechkov/skills-bto remove             # Clean uninstall
npx @dzhechkov/skills-bto list               # Show installed components
npx @dzhechkov/skills-bto doctor             # Health check

BTO Pipeline

BUILD ──→ BENCHMARK ──→ TEST ──→ OPTIMIZE
  │           │           │         │
  │           │           │         └── Evolutionary mutation + re-evaluation (3 rounds)
  │           │           └── Multi-layer evaluation: Layer 0 → Layer 1 → Layer 2
  │           └── Deterministic benchmarking: golden samples, test suite, consistency, metrics
  └── Generate skill/command from description

Usage in Claude Code

# Full BTO cycle: build → benchmark → test → optimize
/bto Create a skill for code review automation

# Build only — generate a new skill or command
/bto-build Create a skill that analyzes git commit patterns

# Benchmark only — deterministic benchmarking against golden samples
/bto-benchmark .claude/skills/my-skill/SKILL.md

# Test only — evaluate an existing artifact
/bto-test .claude/skills/my-skill/SKILL.md

# Optimize only — iteratively improve an artifact
/bto-optimize .claude/skills/my-skill/SKILL.md

Evaluation Architecture

Benchmark Layers (deterministic, pre-TEST)

| Layer | Cost | Purpose | |-------|------|---------| | B0 | Zero (deterministic) | Golden sample comparison — section coverage, ordering, proportions | | B1 | Zero (deterministic) | Deterministic test suite — 5 tests per artifact type, PASS/FAIL | | B2 | Minimal (3× haiku) | Consistency probe — 3 parallel agents, agreement measurement | | B3 | Zero (deterministic) | Performance metrics — token efficiency, bloat detection, redundancy |

Scoring: BENCHMARK = B0×0.30 + B1×0.35 + B2×0.15 + B3×0.20

Gate: < 0.50 BLOCK | 0.50–0.70 WARN | > 0.70 PASS → proceed to TEST

TEST Layer Model

| Layer | Agents | Model | Purpose | |-------|--------|-------|---------| | Layer 0 | 0 | — | Deterministic pre-checks (structure, completeness, encoding) | | Layer 1 | 1 | haiku | Fast semantic evaluation across 5 dimensions | | Layer 2 | 3 | sonnet | Full judge panel: Domain Expert + Critic + Completeness Auditor | | Meta | 1 | opus | Disagreement resolution (triggered when score delta > 3) |

Judge Panel

  • 3 independent judges evaluate each artifact in isolation
  • Judges never see each other's scores before submitting
  • Standard weights: Domain Expert (0.4) / Critic (0.3) / Completeness Auditor (0.3)
  • If max_score - min_score > 3 → meta-judge escalation

Quality Gates

  • BENCHMARK must pass (score ≥ 0.50) before TEST begins
  • BENCHMARK score < 0.50 → BLOCK (artifact needs rework)
  • Layer 0 must pass before Layer 1
  • Layer 1 must pass before Layer 2
  • Optimization accepted only if new_score - prev_score > 0.5
  • 3 consecutive iterations with delta ≤ 0.5 → convergence declared
  • Score decrease > 1.0 → automatic rollback to previous best

Optimization Process

The optimizer runs up to 3 rounds of evolutionary improvement:

  1. Round 1 — 5 parallel haiku agents generate mutations, fast-rank variants
  2. Round 2 — Top variants evaluated by sonnet judge panel
  3. Round 3 — 3×3 parallel sonnet agents for full Layer 2 evaluation of finalists

Each round selects the best-performing variant and uses it as the base for the next iteration.


Integration with Keysarium

BTO works standalone but integrates seamlessly with @dzhechkov/keysarium:

# Install Keysarium first (optional)
npx @dzhechkov/keysarium init

# Then add BTO — it detects Keysarium automatically
npx @dzhechkov/skills-bto init

When installed alongside Keysarium, BTO can evaluate and optimize any skill or command in the Keysarium toolkit.


Requirements

  • Claude Code CLI — installed and configured (installation guide)
  • Node.js >= 16.0.0 — required for the npm install method

License

MIT


Links