@skillmark/cli
v0.5.0
Published
CLI tool for benchmarking Claude agent skills
Downloads
25
Readme
@skillmark/cli
CLI tool for benchmarking Claude AI agent skills with standardized test suites and public leaderboards.
Website: skillmark.sh
Installation
npm install -g @skillmark/cli
# or
npx @skillmark/cliUsage
Run Benchmark
# Local skill
skillmark run ~/.claude/skills/my-skill
# Git repository
skillmark run https://github.com/user/skill-repo
# skill.sh reference
skillmark run skill.sh/user/skill-name
# With options
skillmark run ./my-skill \
--tests ./tests \
--model opus \
--runs 5 \
--output ./resultsPublish Results
skillmark publish ./skillmark-results/result.json --api-key <your-key>View Leaderboard
skillmark leaderboard
skillmark leaderboard my-skill-nameTest Definition Format
Create markdown files with YAML frontmatter in a tests/ directory:
---
name: multi-agent-reasoning
type: knowledge
concepts:
- orchestrator
- consensus
- context isolation
timeout: 120
---
# Prompt
How do you design multi-agent systems with context isolation?
# Expected
The response should cover:
- [ ] Orchestrator pattern for coordination
- [ ] Consensus mechanisms for decisions
- [ ] Context isolation strategiesMetrics
| Metric | Description | |--------|-------------| | accuracy | Percentage of expected concepts matched | | tokens_total | Total tokens consumed | | duration_ms | Wall-clock time | | tool_count | Number of tool calls | | cost_usd | Estimated API cost |
Output
skillmark-results/
├── result.json # Machine-readable metrics
└── report.md # Human-readable reportDocumentation
Full docs at github.com/claudekit/skillmark
License
MIT
