ancoder-skill-cli
v0.13.1
Published
CLI for managing everything-claude-code (ECC) components — agents, skills, commands, rules, hooks, MCP configs. Single binary, all assets embedded.
Maintainers
Readme
skill-cli
CLI for managing and testing Anthropic Agent Skills (e.g. anthropics/skills).
Install (npm)
# Global
npm install -g ancoder-skill-cli
# Or run without installing
npx ancoder-skill-cli --helpThe npm package is self-contained and includes prebuilt binaries for:
- macOS arm64
- macOS x64
- Linux arm64
- Linux x64
- Windows x64
After install, the wrapper selects the correct bundled binary for the current platform automatically.
Build from source (Go)
cd skill-cli
go build -o bin/skill-cli .
# Then run: ./bin/skill-cli --help
# Or via npm: node bin/skill-cli.js --helpCommands
| Command | Description |
|--------|-------------|
| skill-cli validate <path> | Validate SKILL.md, skill.contract.yaml, and evals/*.yaml |
| skill-cli list [--path <dir>] | List installed skills |
| skill-cli create <name> [--path <dir>] | Create a skill scaffold with contract and smoke eval templates |
| skill-cli test <path> | Check that a skill has trigger docs, contract, and eval coverage |
| skill-cli verify <path> [--suite smoke] | Run a machine-readable verification suite end-to-end |
| skill-cli generate <name> --desc "..." | Generate a complete skill using Claude CLI with OMC autopilot (default) |
| skill-cli generate <name> --desc "..." --adversarial | Generate with OMC, then run isolated generator/evaluator contract negotiation and review |
| skill-cli install [--no-omc] | Install ECC components into ~/.claude/ (includes OMC by default) |
| skill-cli install --component omc | Install only the bundled OMC multi-agent orchestration layer |
Machine-Readable Skill Layout
Task-oriented skills can now include a deterministic verification harness:
my-skill/
├── SKILL.md
├── skill.contract.yaml
├── evals/
│ └── smoke.yaml
├── fixtures/
└── scripts/skill.contract.yamldefines the executable contract: entrypoint, inputs, outputs, invariants, and datasets.evals/*.yamldefines runnable verification suites with deterministic checks like file existence, required content, and JSON assertions.skill-cli verifymaterializes fixture data into a temp workspace, runs the skill entrypoint, and enforces the declared checks.
skill-cli verify executes local code declared by the skill contract, so only run it against trusted skills and repositories.
Adversarial Skill Generation
skill-cli generate --adversarial adds an independent evaluator pass after the default OMC generation pipeline:
skill-cli generate pdf-to-md --desc "Convert PDF files to Markdown" --adversarialThis mode uses two isolated Claude CLI contexts:
- A generator-side
claude -pprocess proposes concrete acceptance criteria for the generated skill. - An evaluator-side
claude -pprocess negotiates the contract, runsskill-cli validate,skill-cli test, andskill-cli verify, and writes.adversarial/diff-report.json.
The evaluator fails the run when critical issues are found or the score is below 0.80. Deterministic gate failures are always converted into critical diff items and cap the score below 0.60.
Publish to npm
Set
repository.urlinpackage.jsonto your GitHub repo (e.g.git+https://github.com/your-org/skill-cli.git).Build binaries per platform before publishing the npm package:
bash scripts/build-all.shOptionally attach the same binaries to a GitHub Release with names:
skill-cli-darwin-arm64,skill-cli-darwin-x64skill-cli-linux-x64,skill-cli-linux-arm64skill-cli-win32-x64.exe
Publish the package:
npm login --registry=https://registry.npmjs.org/ npm publish --access public --registry=https://registry.npmjs.org/ --userconfig ~/.npmrc
Users who npm install -g ancoder-skill-cli get a fully bundled package. No extra binary download is required during install.
Test-Driven Skill Development (100:10:1 Architecture)
skill-cli adopts a test-driven approach to skill development, inspired by oh-my-claudecode's multi-agent orchestration patterns. The core principle: invest the majority of compute in building robust test skills, not the skill itself.
Time Allocation: 100:10:1
When creating a skill for a task, the system simultaneously creates a main skill and a test skill:
| Phase | Time Share | Purpose | |-------|-----------|---------| | Test skill development | 90% (100 units) | Build an automated evaluator that compares expected vs actual output, locating specific differences | | Main skill development | 9% (10 units) | Implement the actual skill, guided by test skill feedback | | Execution & verification | 1% (1 unit) | Final end-to-end smoke test |
Architecture
Phase 1: Test Skill Development (90% compute)
generate structured acceptance criteria
-> N planners generate test strategies in parallel
-> critic reviews + eliminates weak strategies
-> N executors implement test skills in parallel
-> golden test evaluation (tournament selection)
-> repeat until precision threshold met
-> best test skill selected
Phase 2: Main Skill Development (9% compute)
generate main skill
-> test skill verifies (independent executor)
-> structured diff feedback injected into next prompt
-> repeat until test skill passes
-> main skill complete
Phase 3: Final Verification (1% compute)
end-to-end smoke testKey Design Principles
1. Separation of Author and Reviewer
The agent that generates the main skill and the agent that runs the test skill operate in separate contexts. This prevents self-approval bias. The verify phase spawns an independent executor to run the test skill, ensuring honest evaluation (borrowed from OMC's verifier lane pattern).
2. Structured Diff Feedback
Test skills output structured diff reports instead of simple pass/fail:
diffs:
- location: "page 3, paragraph 2"
type: "content_loss"
severity: "critical"
expected: "table with 3 columns and 5 rows"
actual: "table missing entirely"
- location: "page 5, heading"
type: "format_drift"
severity: "warning"
expected: "## Second-level heading"
actual: "### Third-level heading"This structured feedback is injected back into the main skill's improvement loop, enabling targeted fixes rather than blind retries.
3. QA Cycling with Early Exit
Borrowed from OMC's UltraQA pattern:
- Test skill finds issues -> structured diagnosis -> main skill fixes -> retest -> loop
- Same error appearing 3 times triggers early exit (avoids infinite compute burn)
- Maximum 5 QA cycles per iteration
4. Tournament Selection for Test Skills
During the 90% test skill development phase, multiple test strategies are generated in parallel and evaluated against golden tests (known-correct input/output pairs). The strategy with the highest detection precision wins, similar to OMC's self-improve tournament selection.
5. PRD-Driven Acceptance Criteria
Test skills define concrete, testable acceptance criteria (not vague "implementation is complete"):
Bad: "PDF conversion works correctly"
Good: "All tables with merged cells are preserved as HTML <table> blocks
with correct colspan/rowspan attributes"Example: PDF-to-Markdown Skill
For a PDF-to-Markdown conversion skill:
- Test skill (100 min): Compares original PDF content with generated Markdown, detecting content loss (missing paragraphs, tables, images), format drift (heading levels, list styles), and encoding issues. Outputs structured diffs with page/paragraph-level location info.
- Main skill (10 min): Implements PDF parsing and Markdown generation, iteratively improved by test skill feedback.
- Verification (1 min): End-to-end smoke test on fixture PDFs.
skill_eval Check Type
The verify system supports a skill_eval check type that invokes a test skill as a verification oracle:
checks:
- id: quality-check
type: skill_eval
skill: pdf-to-md-test
config:
threshold: 0.95
output_format: structured_diffVerify Phase: Independent Executor
During the loop's verify phase, a separate Claude executor is spawned to run the test skill. This executor:
- Has no shared context with the main skill's executor
- Produces an objective evaluation report
- Returns structured diff feedback that feeds into the next iteration
This mirrors OMC's principle: "Keep authoring and review as separate passes."
oh-my-claudecode (OMC) Integration
skill-cli embeds the full oh-my-claudecode multi-agent orchestration bundle (synced from GitHub release v4.13.6) and installs it into ~/.claude/omc/ by default. This gives any skill-cli user a single-command path to OMC's agents, skills, hooks, and runtime scripts without needing to clone the OMC repo or configure the plugin marketplace separately.
What gets installed
When you run skill-cli install, OMC is installed alongside ECC components:
| OMC asset | Install target |
|-----------|---------------|
| 19 agents (analyst, architect, executor, planner, critic, verifier, …) | ~/.claude/omc/agents/ |
| 38 skills (autopilot, ralph, ralplan, deep-interview, team, ultrawork, ultraqa, self-improve, …) | ~/.claude/omc/skills/ |
| Runtime scripts (hook helpers, session lifecycle, skill injector, …) | ~/.claude/omc/scripts/ (executable bit preserved for .sh/.mjs/.cjs/.js/.ts) |
| hooks.json | Merged into ~/.claude/settings.json with $CLAUDE_PLUGIN_ROOT rewritten to the absolute OMC install path |
| Templates | ~/.claude/omc/templates/ |
| .claude-plugin/ manifest, LICENSE, CHANGELOG, VERSION | ~/.claude/omc/ |
Flags
# Default — installs ECC + OMC
skill-cli install
# Skip OMC entirely (opt-out)
skill-cli install --no-omc
# Install only the OMC bundle
skill-cli install --component omc
# Preview without writing files
skill-cli install --dry-runBrowse embedded OMC content
skill-cli list --type omc # list embedded OMC agents and skills
skill-cli info autopilot # show the autopilot skill content
skill-cli doctor # verify OMC install health and versionWhy the hook rewrite matters
OMC hooks are authored for the Claude Code plugin system and reference scripts via $CLAUDE_PLUGIN_ROOT/scripts/.... Because skill-cli installs OMC as a plain directory (not as a marketplace plugin), the installer rewrites $CLAUDE_PLUGIN_ROOT → ${claudeDir}/omc at merge time so hooks resolve correctly without the plugin loader.
If you already have OMC installed via the Claude Code plugin marketplace, the skill-cli install places a separate self-contained copy under ~/.claude/omc/ and will not touch the marketplace install. The two copies can coexist; hooks from both sources will simply fire in sequence.
Upgrading OMC
The embedded OMC version is pinned to the release tagged in embedded/omc/VERSION. To bump it, re-run the sync workflow that downloads a fresh GitHub release tarball into embedded/omc/ and rebuild.
Meta-Harness (experimental)
meta-harness/ is a Python sub-project that implements the outer-loop harness
optimizer from arXiv:2603.28052 (Stanford, 2026).
Architecture
meta-harness search ← outer loop (Python, Claude Code proposer)
│
└─ skill-cli eval validate / run / ls / diff ← evaluator backend (Go)
│
└─ harness.py (user-supplied Python) ← inner execution layerTwo independent binaries — intentionally decoupled:
skill-cliknows nothing aboutmeta-harness; it only runs harness candidates and emits scores/traces.meta-harnessknows nothing about OMC internals; it callsskill-clivia CLI contract only.
Quick start
# Build skill-cli
go build -o bin/skill-cli .
# Install meta-harness
cd meta-harness
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
# Run smoke test (no API key needed)
cd ..
bash scripts/meta-harness-smoke.sh
# Real search (requires ANTHROPIC_API_KEY + claude CLI)
meta-harness search \
--suite meta-harness/domains/text_classification/suite.yaml \
--out search-runs/run-01 \
--max-iter 5 \
--k 2 \
--seed meta-harness/domains/text_classification/seeds/zero_shot.py \
--seed meta-harness/domains/text_classification/seeds/few_shot.py \
--skill-cli bin/skill-cli \
--samples 20CLI contract (skill-cli eval)
| Command | Description |
|---|---|
| skill-cli eval validate <dir> | Cheap structural check (exit 0 = valid) |
| skill-cli eval run <dir> --suite <f> --out <d> | Full eval → scores.json + traces/ |
| skill-cli eval ls --store <d> [--pareto] | List / filter candidates |
| skill-cli eval diff <a> <b> --store <d> | Code + score diff |
Tuning
The meta-harness/src/meta_harness/skill.md file is the most important lever on search quality.
Per Appendix D of the paper: run 3–5 short iterations (--max-iter 3) specifically to
debug and refine it before committing to a full run.
License
MIT
