@versatly/skillbench
v2.0.0
Published
CLI benchmark system for tracking skill versions, scoring performance, and comparing improvements
Maintainers
Readme
@versatly/skillbench
Self-improving skill ecosystem for AI agents.
Track skill versions, benchmark performance, compare improvements, and get signals on what to fix next.
Part of the ClawVault ecosystem | tasktime | ClawHub
Install
npm i -g @versatly/skillbenchQuick Start
# Track a skill version
skillbench use [email protected]
# Run tests
skillbench test [email protected] --init # Create test suite
skillbench test [email protected] # Run tests
# Check scores
skillbench scoreCommands (18 total)
Core
skillbench use [email protected] # Set active skill version
skillbench record "Task" --success # Record benchmark (auto-pulls from tasktime)
skillbench score # Show scoreboard with grades
skillbench compare v1.0.0 v1.1.0 # Compare versions
skillbench skills # List tracked skillsTesting
skillbench test [email protected] # Run smoke test
skillbench test [email protected] --init # Create test suite template
skillbench test [email protected] --suite full # Run named suiteAnalysis
skillbench health # Health report with alerts
skillbench improve # Suggestions for weakest skill
skillbench trend tasktime --days 30 # Performance over time
skillbench leaderboard # Multi-agent comparisonMonitoring
skillbench watch --once # Run all test suites
skillbench watch --interval 300 # Continuous monitoring
skillbench baseline skill --set # Set performance baseline
skillbench baseline --check # Check baselines (CI-friendly)CI/CD
skillbench ci # Run tests + baselines
skillbench ci --json # JSON output for automation
skillbench badge # Generate shields.io badges
skillbench schedule --interval 60 # Generate cron configExport
skillbench export --format markdown # Export report
skillbench dashboard # Generate HTML dashboard
skillbench sync --clawhub # Import from ClawHub
skillbench sync --vault # Sync to ClawVaultGrading System
| Grade | Score | Meaning | |-------|-------|---------| | 🏆 A+ | 95-100 | Elite | | ✅ A | 85-94 | Excellent | | 👍 B | 70-84 | Good | | ⚠️ C | 50-69 | Needs work | | ❌ D | <50 | Broken |
GitHub Actions
Copy examples/github-action.yml to .github/workflows/skillbench.yml:
name: SkillBench CI
on: [push, pull_request]
jobs:
skillbench:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm i -g @versatly/skillbench
- run: skillbench ci --json
- run: skillbench baseline --checkTest Suites
Create YAML test suites in ~/.skillbench/suites/:
# ~/.skillbench/suites/myskill-smoke.yaml
name: My Skill Smoke Test
skill: myskill
version: "1.0.0"
steps:
- name: "Help available"
command: "myskill --help"
expect_exit: 0
timeout: 5sRelated
License
MIT
