@versatly/skillbench

v2.0.0

Published

8 days ago

CLI benchmark system for tracking skill versions, scoring performance, and comparing improvements

0High
0Medium
0Low

g9pedro

cli benchmark skills agent productivity versatly clawvault tasktime ai-agent

@versatly/skillbench

Self-improving skill ecosystem for AI agents.

Track skill versions, benchmark performance, compare improvements, and get signals on what to fix next.

Part of the ClawVault ecosystem | tasktime | ClawHub

Install

npm i -g @versatly/skillbench

Quick Start

# Track a skill version
skillbench use [email protected]

# Run tests
skillbench test [email protected] --init  # Create test suite
skillbench test [email protected]         # Run tests

# Check scores
skillbench score

Commands (18 total)

Core

skillbench use [email protected]           # Set active skill version
skillbench record "Task" --success    # Record benchmark (auto-pulls from tasktime)
skillbench score                      # Show scoreboard with grades
skillbench compare v1.0.0 v1.1.0      # Compare versions
skillbench skills                     # List tracked skills

Testing

skillbench test [email protected]           # Run smoke test
skillbench test [email protected] --init    # Create test suite template
skillbench test [email protected] --suite full  # Run named suite

Analysis

skillbench health                     # Health report with alerts
skillbench improve                    # Suggestions for weakest skill
skillbench trend tasktime --days 30   # Performance over time
skillbench leaderboard                # Multi-agent comparison

Monitoring

skillbench watch --once               # Run all test suites
skillbench watch --interval 300       # Continuous monitoring
skillbench baseline skill --set       # Set performance baseline
skillbench baseline --check           # Check baselines (CI-friendly)

CI/CD

skillbench ci                         # Run tests + baselines
skillbench ci --json                  # JSON output for automation
skillbench badge                      # Generate shields.io badges
skillbench schedule --interval 60     # Generate cron config

Export

skillbench export --format markdown   # Export report
skillbench dashboard                  # Generate HTML dashboard
skillbench sync --clawhub             # Import from ClawHub
skillbench sync --vault               # Sync to ClawVault

Grading System

| Grade | Score | Meaning | |-------|-------|---------| | 🏆 A+ | 95-100 | Elite | | ✅ A | 85-94 | Excellent | | 👍 B | 70-84 | Good | | ⚠️ C | 50-69 | Needs work | | ❌ D | <50 | Broken |

GitHub Actions

Copy examples/github-action.yml to .github/workflows/skillbench.yml:

name: SkillBench CI
on: [push, pull_request]
jobs:
  skillbench:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm i -g @versatly/skillbench
      - run: skillbench ci --json
      - run: skillbench baseline --check

Test Suites

Create YAML test suites in ~/.skillbench/suites/:

# ~/.skillbench/suites/myskill-smoke.yaml
name: My Skill Smoke Test
skill: myskill
version: "1.0.0"
steps:
  - name: "Help available"
    command: "myskill --help"
    expect_exit: 0
    timeout: 5s

ClawVault — Memory system for AI agents
tasktime — Task timer CLI
ClawHub — Skill marketplace

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@versatly/skillbench

Install

Quick Start

Commands (18 total)

Core

Testing

Analysis

Monitoring

CI/CD

Export

Grading System

GitHub Actions

Test Suites

Related

License