@dzhechkov/harness-cli

v0.3.89

Published

18 hours ago

The dz CLI — install AI skills for Claude Code, Codex, OpenCode, Hermes. 11 commands, 7 presets, 4 platform adapters.

0High
0Medium
0Low

agent-skills agentskills.io harness cli claude-code codex opencode hermes ai-skills cross-platform openclaude agentdb self-learning mcp

@dzhechkov/harness-cli

The dz CLI — the main entry point to the DZ Harness Hub. Install AI skills for Claude Code, Codex, OpenCode, Hermes, OpenClaude, GitHub Copilot from a single command.

Why dz?

dz is a package manager + cross-compiler for your AI agent harness. Write a skill once in one canonical form; dz installs it into any agent's harness, holds it to a quality bar, and lets the harness learn over time.

The problem. You accumulate ~117 skills (design-thinking, QE, devops, web3, MCP, academic…). Five pains follow:

Every agent wants a different layout. Claude Code reads .claude/skills/, Codex .codex/, OpenCode/Hermes/OpenClaude their own. Hand-maintaining N copies is sync hell.
Skills arrive from many upstream repos — they must be canonicalized (brought to one form) and kept in sync without losing provenance.
It's hard to know which skill to reach for out of a hundred.
Quality drifts — there's no single bar.
Experience doesn't accumulate — the harness doesn't learn from feedback.

The answer — one canon → many platforms. There is a single source of truth (a CanonicalSkill); dz compiles it for each target — so the same skill drops into .claude/skills/, .codex/, etc. without hand-copying.

Every command maps to one of five jobs:

| Job | Commands | What it does | |-----|----------|--------------| | Author / canonicalize | auto-canonicalize, sync-upstream, diff, create-skill | pull a skill from any repo into one canonical form + keep it in sync with upstream | | Install / assemble | init, setup, install, compose, presets, upgrade | deploy the right set of skills into a chosen agent harness (6 targets) | | Find / recommend | registry, scout, recommend, skill-advisor | for a task, suggest which skill / preset / package to use | | Guarantee quality | benchmark (L0 A–F), verify, doctor | one bar — 20 deterministic checks per skill | | Learn | teach, pretrain, roam (reward-learning) | accumulate patterns, improve recommendations over time |

(+ ops: publish, stats, downloads, dashboard, plugin.)

Analogy: npm for distribution, a compiler / Babel for one source → many targets (adapters for 6 agents), and a linter / CI for a quality bar (benchmark) — but for AI agent skills, not ordinary code.

Install

npm install -g @dzhechkov/harness-cli

# Update to latest version (run from outside any workspace project):
cd /tmp && npm install -g @dzhechkov/harness-cli@latest

Note: If you get EUNSUPPORTEDPROTOCOL workspace:*, you're inside a pnpm/yarn workspace. Run the install from /tmp or ~ instead.

User Journey — from install to mastery

All 32 commands mapped to a real workflow:

DISCOVER → INSTALL → USE → CREATE → MAINTAIN → SHARE

Phase 1: Discover (what's available?)

npm install -g @dzhechkov/harness-cli    # install the CLI

dz help                                   # see all commands
dz pretrain                                # analyze project files → recommend by tech stack
dz recommend "build API and deploy to K8s" # keyword match → skills + toolkits
dz recommend "work on this project"        # generic? → auto-runs pretrain → recommends by stack
dz stats                                  # 33 packages, 117 skills, 6 targets, 11 presets
dz dashboard                              # visual panel — packages, adapters, skill packs
dz registry                               # browse all 117 skills by category
dz registry search kubernetes             # find specific skills
dz registry --category devops             # filter by domain
dz downloads                              # npm weekly download stats

Phase 2: Install (set up your workspace)

# Full setup with self-learning (recommended):
dz setup --target claude-code --preset devops  # pretrain + hooks + JSONL memory

# With AgentDB vector memory (semantic search + self-learning):
dz setup --target claude-code --preset devops --memory agentdb  # .rvf + 41 MCP tools

# Or just install skills (no learning):
dz init --target claude-code --preset devops   # 28 DevOps skills
dz init --target openclaude --preset web3      # 12 DeFi skills for OpenClaude
dz init --target codex --preset mcp            # 16 MCP skills for Codex

# Or pick individual skills:
dz init --target claude-code --select terraform,kubernetes,docker-compose

# Or install from any npm package:
dz install @dzhechkov/skills-devops            # npm install + copy skills

# Verify everything is correct:
dz verify                                       # structural validation
dz doctor                                       # 7 health checks
dz list                                         # show installed skills
dz info --id terraform                          # detailed info about a skill

Phase 3: Use (work with your agent)

# Now use Claude Code / Codex / OpenCode / Hermes normally.
# Skills are auto-discovered from the platform's skills directory.
# Example in Claude Code:
#   "Review this PR" → pr-review skill activates
#   "Design an API" → api-design skill activates
#   "Fix this CI" → ci-fix skill activates

Phase 4: Create (build your own skills)

# Scaffold a new skill:
dz create-skill --name my-skill --description "What it does" --tier 2

# With BTO-compatible eval templates:
dz create-skill --name my-skill --bto

# Benchmark your skill (aim for Grade A):
dz benchmark .claude/skills/my-skill           # single skill — 20 L0 checks
dz benchmark packages/@dzhechkov/skills-devops --all   # batch all
dz benchmark skill-a --compare skill-b          # A/B compare

# Find skills to canonicalize from the ecosystem:
dz scout                                        # scan 9 sources (GitHub, npm+plugins, HN, ...)
dz scout --deep                                 # deep analysis with SKILL.md parsing
dz auto-canonicalize --source github.com/user/repo --pack packages/@dzhechkov/skills-devops

Phase 5: Maintain (keep skills fresh)

# Check for upstream changes (canonicalized skills):
dz sync-upstream --list                                 # which packages have external sources?
dz sync-upstream --all                                  # check all against upstream
dz sync-upstream --package packages/@dzhechkov/skills-devops  # check one

# Check installed skills vs canonical:
dz upgrade                                      # shows which skills need update
dz upgrade --target openclaude                  # check specific platform

# Sync canonical to legacy layout:
dz sync                                         # canonical → project skills
dz migrate                                      # detect legacy installations

# Orchestrate dynamic workflows:
dz workflow --task coverage-lift                 # parallel coverage improvement
dz workflow --task security-audit               # adversarial security scan

# Cross-host state sync:
dz roam --apply                                 # sync agent state across machines

Phase 6: Share (publish to the world)

# Publish updated packages to npm:
dz publish --dry-run                            # preview
dz publish --filter skills-devops               # publish specific package
dz publish                                      # publish all changed packages

Three Ways to Install Skills

| | Individual Skill | Preset | npx Package | |---|---|---|---| | What | 1 SKILL.md file | Curated list of skill names | Full toolkit with orchestration | | Contains | Instructions for 1 task | N skill references | Skills + commands + rules + shards + agents + memory | | Pipeline | No | No | Yes (phases, checkpoints, governance) | | Self-learning | No | dz setup adds it | Built-in | | Install | dz init --select X | dz setup --preset X | npx @dzhechkov/X init | | Example | terraform | devops (28 skills) | keysarium (7-phase research) |

# One skill:
dz init --target claude-code --select design-thinking

# Curated set by topic (recommended):
dz setup --target claude-code --preset meta          # 16 development skills + self-learning

# Full toolkit with orchestrated pipeline:
npx @dzhechkov/keysarium init                        # 7-phase research + commands + memory

When to use which:

Need 1 specific capability → --select
Need a themed set that works together → --preset
Need a full pipeline with commands and governance → npx

Available Presets (11)

| Preset | Skills | Description | |--------|--------|-------------| | meta | 16 | Development process (explore, goap-research, problem-solver, design-thinking, feature-adr, knowledge-extractor, understand-anything-bridge, agentshield-scan, adversarial-verifier, skill-advisor) | | qe-engineer | 20 | Quality engineering (test-gen, coverage, chaos, defect, ...) | | bto | 1 | Build-Benchmark-Test-Optimize pipeline | | health | 8 | Medical AI (diagnostics, drugs, labs, clinical decisions) | | keysarium | 9 | Full research toolkit (feature-adr, presentation, reverse-eng) | | p-replicator | 10 | AI product development (/replicate, SPARC PRD, pipeline-forge) | | feature-adr | 5 | Feature pipeline (feature-adr, explore, frontend-design) | | devops | 28 | DevOps skills (terraform, kubernetes, c4-architecture, incident-response, problem-management, risk-assessment, ...) | | web3 | 12 | Web3/DeFi (quicknode, zerion, symbiosis, bankr, veil, neynar, ...) | | mcp | 16 | MCP servers (agentdb, brave-search, gmail, gitlab, comfyui, notion, ...) | | academic | 5 | Thesis defense (review, questions, doc-check, live defense + answer eval) |

Standalone Packages (install via npx, no dz CLI needed)

| Package | Install | What it does | |---------|---------|-------------| | @dzhechkov/keysarium | npx @dzhechkov/keysarium init | Full 7-phase research toolkit | | @dzhechkov/design-thinking | npx @dzhechkov/design-thinking init | d.school 6-phase Design Thinking (8 skills) | | @dzhechkov/trip-planner | npx @dzhechkov/trip-planner init | Travel itinerary → interactive mobile site (pending publish) | | @dzhechkov/p-replicator | npx @dzhechkov/p-replicator init | AI product development (/replicate pipeline) | | @dzhechkov/health-advisor | npx @dzhechkov/health-advisor init | Medical AI (25 skills) | | @dzhechkov/skills-bto | npx @dzhechkov/skills-bto init | BTO benchmarking (Build-Test-Optimize) | | @dzhechkov/skills-feature-adr | npx @dzhechkov/skills-feature-adr init | 11-step feature pipeline | | @dzhechkov/skills-edu-site | npx @dzhechkov/skills-edu-site init | Gamified edu site generator | | @dzhechkov/skills-transcript-site | npx @dzhechkov/skills-transcript-site init | Transcript → interactive site | | @dzhechkov/skills-analyst-manual | npx @dzhechkov/skills-analyst-manual init | 3-phase analyst composite |

Difference: dz init --preset installs individual skills from .claude/skills/ source into a target platform tree. Standalone npx packages have their own CLI and install a complete toolkit with commands, rules, shards, and agents — a richer but self-contained experience.

A skill and its npx toolkit are not duplicates — they're a graduation. Several skills (e.g. feature-adr, design-thinking) exist BOTH as a skill inside a dz preset AND as a standalone npx package. The preset's SKILL.md is fully functional on its own (the whole methodology — modules + references — travels with it, and it auto-activates by description), and it's the only way to compile that capability to the non-Claude platforms (Codex/OpenCode/Hermes/OpenClaude) via dz. The npx package adds project-level runtime governance around the same skill: a slash command, governance rules, a context shard, and (for feature-adr) reward-learning + /harvest. So: pick the skill/preset for a working capability across platforms; pick the npx toolkit when you want it as a governed, command-driven fixture of one project.

All Commands (32)

dz setup             --target <name> [--preset <name>] [--memory agentdb] [--no-hooks] [--install-driver] [--force]
dz init              --target <name> [--preset <name>] [--select id,id,...] [--force]
dz install           <npm-pkg> [--target <name>] [--project <dir>]
dz teach             "<pattern>" [--reward <0-1>] [--domain <name>]
dz pretrain          [--project <dir>]
dz recommend         "<task description>"
dz compose           <preset1+preset2+...> [--target <name>]
dz diff              <skill-dir>
dz upgrade           [--target <name>] [--project <dir>]
dz verify            [--skills-dir <dir>] [--target <name>]
dz sync              [--canonical <dir>] [--project <dir>] [--dry-run] [--force]
dz update            (alias for sync)
dz list              [--skills-dir <dir>]
dz info              --id <skill-id> [--skills-dir <dir>]
dz create-skill      --name <id> [--description <text>] [--tier 1|2|3] [--bto]
dz registry          [search <query>] [--category <cat>]
dz benchmark         <skill-dir> [--compare <dir>] [--all]
dz mcp-scan          [path] [--json]   (static agent-permission audit; exit 0/1/2 = clean/medium/high)
dz publish           [--filter <name>] [--dry-run] [--bump-only]
dz auto-canonicalize --source <github-url> --pack <skills-pack>
dz sync-upstream     [--package <dir>] [--list] [--all]
dz scout             [--topics <list>] [--since <date>] [--deep]
dz workflow          --task <name> [--dry-run]
dz plugin            [--version <ver>]
dz downloads
dz migrate           [--project <dir>]
dz stats
dz dashboard
dz doctor            [--project <dir>]
dz roam              [--apply] [--slug <slug>]
dz import-ecc       [--local-path <dir>] [--select id,id,...] [--limit N] [--output <dir>] [--force]
dz help

Targets (5 platforms)

All 5 platforms natively support the agentskills.io SKILL.md format:

| Target | Skills directory | Native SKILL.md? | |--------|-----------------|:---:| | claude-code | .claude/skills/ | Yes | | codex | .agents/skills/ | Yes (docs) | | opencode | .opencode/skills/ | Yes (also scans .claude/skills/) | | hermes | .hermes/skills/ | Yes | | openclaude | .openclaude/skills/ | Yes |

Same SKILL.md file, different directory — no format conversion needed.

Optional platform enrichment (skills work without these):

| Platform | Optional extra | What it adds | |----------|---------------|-------------| | Codex | agents/openai.yaml | UI metadata (icons, display_name, MCP deps) | | OpenCode | opencode.json + .opencode/agents/*.md | Config, custom agents | | Hermes | cli-config.yaml | Agent config, persona, memory |

Workflows (Opus 4.8+ dynamic workflows)

dz workflow --task coverage-lift     # parallel coverage improvement
dz workflow --task mutation-kill     # kill surviving mutants
dz workflow --task canonicalize      # canonicalize new packages
dz workflow --task security-audit    # adversarial security scan

Scout (ecosystem intelligence)

dz scout                              # quick scan — radar mode
dz scout --deep                       # deep analysis — AI analyst mode
dz scout --topics mcp-server,ai-agent # custom topics
dz scout --since 2026-05-01           # only recent repos

Radar mode (dz scout) scans 9 sources in parallel (GitHub + npm + HN + MCP Registry + Glama + OSSInsight + Smithery + Semantic Scholar + arXiv):

Detects skill format — SKILL.md, plugin.json, .claude/skills/, .claude-plugin/, MCP manifests
Scores relevance — format (40%) + stars (30%) + recency (20%) + novelty (10%)
Compares against our 32 packages — finds skills we don't have
Recommends — integrate (score ≥70) / monitor (40-69 + ≥50 stars) / skip

Deep analyst mode (dz scout --deep) goes further for top-scored repos:

Downloads SKILL.md from each repo, parses frontmatter + body
Finds closest match in our inventory by keyword overlap
Explains the delta — what the found skill adds that ours doesn't
Recommends integration path:
- canonicalize — high-signal novel skill → new @dzhechkov/skills-* pack
- merge — similar to existing skill → add unique features to ours
- new-preset — novel skill → add to preset or create new pack
- skip — already in our inventory
Gap analysis — identifies trending categories across the ecosystem that our harness lacks

Example deep analysis output:

## 🔬 Deep Analysis

### cool/agent-toolkit (★500)
2/3 skills are novel

| Skill | Description | Closest match | Integration | Rationale |
|-------|------------|---------------|-------------|-----------|
| code-review | Automated OWASP-focused review | brutal-honesty-review | **merge** | Similar to ours — merge OWASP checklist |
| deploy-check | Pre-deploy validation gates | — | **canonicalize** | High-signal novel skill (500 stars) |

## 📊 Harness Gap Analysis

| Category | Frequency | Recommendation |
|----------|-----------|---------------|
| deploy-automation | 12 repos | Create @dzhechkov/skills-devops — high demand |
| data-pipeline | 5 repos | Monitor — emerging trend |

BTO integration (create-skill --bto)

# Scaffold a new skill with BTO-compatible 3-layer evaluation:
dz create-skill --name my-skill --bto

# What you get:
#   evals/my-skill.yaml       — BTO eval with L0/L1/L2 layers
#   references/judge-rubrics.md — scoring rubrics for 3-judge panel

The --bto flag generates eval templates compatible with /bto-test:

| Layer | What | Gate | |-------|------|------| | L0 | Deterministic checks (U1-U5 universal + S1-S15 skill-specific) | Pass rate >= 80% | | L1 | Single LLM judge (Haiku) — 5 dimensions: Clarity, Completeness, Actionability, Quality, Anti-patterns | Average >= 7.0 | | L2 | 3-judge panel (Sonnet) — Expert (0.40), Critic (0.30), Auditor (0.30) — 5 dimensions: Methodology, Depth, Correctness, Usability, Robustness | Weighted avg >= 7.0 |

After scaffolding, fill in the SKILL.md protocol and run /bto-test .claude/skills/my-skill to evaluate.

dz install — install skills from any npm package

# Install skills from any npm package directly
dz install @dzhechkov/skills-devops
dz install @dzhechkov/skills-web3 --target openclaude
dz install @lythos/skill-curator --target claude-code

Runs npm install, discovers SKILL.md files in the package, copies them to the target platform directory. Works with any agentskills.io-compatible npm package.

dz sync-upstream — check for upstream updates

dz sync-upstream --list                                    # show packages with external sources
dz sync-upstream --all                                     # check ALL packages against upstream
dz sync-upstream --package packages/@dzhechkov/skills-devops  # check one package

Discovers all skill packs with sources.json, fetches SKILL.md from origin repos, reports which skills have upstream changes.

dz upgrade — check installed skills for updates

dz upgrade                           # check .claude/skills/ against canonical
dz upgrade --target openclaude       # check .openclaude/skills/

Compares installed skills with canonical source, reports which need dz init --force to update.

dz downloads — npm weekly download stats

dz downloads     # fetch weekly downloads for all 31 packages

dz benchmark — L0 quality gate

dz benchmark packages/@dzhechkov/skills-devops/terraform     # single skill
dz benchmark packages/@dzhechkov/skills-devops --all          # batch all
dz benchmark skill-a --compare skill-b                        # A/B compare

20 graded deterministic checks (U1-U5 universal + S1-S15 skill-specific) + S16 advisory (capability-declaration nudge, not graded). Grade A = 95%+. For L1/L2 LLM judges, use /bto-test inside Claude Code.

dz mcp-scan — static agent-permission audit

dz mcp-scan .                  # scan a project/pack (default: .)
dz mcp-scan . --json           # machine-readable report

"npm audit for agent tools." Reads (never executes) .claude/settings*.json and .mcp.json/.vscode/mcp.json, then emits a 3-tier verdict with capability-level findings. Exit codes: 0 clean · 1 medium · 2 high (so CI fails on any non-clean surface). Flags: wildcard/shell grants, secrets-reachable (Read + MCP active, no .env deny), hardcoded MCP env secrets, interpreter/package-runner MCP servers, enableAllProjectMcpServers, missing default-deny. Rules adapted from the MetaHarness threat-model.

# Build-time capability reconciliation (project grants vs installed skills' declarations):
dz mcp-scan . --reconcile                  # report under-grant (skill needs a denied capability) + over-grant
dz mcp-scan . --reconcile --emit-policy    # also write .dz/policy/mcp-policy.json (least-privilege, advisory)
dz mcp-scan . --reconcile --fail-on-undergrant   # CI: exit 1 if a skill declares a need the grants forbid

--reconcile is build-time and advisory — dz never enforces; it reports the grant-vs-declaration gap and (with --emit-policy) emits a least-privilege policy for a host to enforce. Under-grant is MEDIUM (the host will starve the skill); over-grant is an advisory CANDIDATE (a grant may be for the operator). Declared limits are reported but inert (settings.json has no timeout field). Verdict-neutral unless --fail-on-undergrant.

dz publish — automated npm publish

dz publish --dry-run                          # preview what would publish
dz publish --filter skills-devops             # publish specific package
dz publish --filter skills-devops --bump-only # bump version only, no publish

dz auto-canonicalize — discover skills in GitHub repos

dz auto-canonicalize --source github.com/user/repo --pack packages/@dzhechkov/skills-devops

Scans a GitHub repo for SKILL.md files, generates dz create-skill commands.

dz registry — searchable skill index

dz registry                    # visual panel: 117 skills in 6 categories
dz registry search security    # fuzzy search
dz registry --category mcp     # filter by category

dz stats + dz dashboard

dz stats        # Quick metrics: packages, skills, targets, presets
dz dashboard    # Visual panel with all packages, adapters, skill packs

Example: Thesis Defense Preparation (Academic Preset)

# Install with AgentDB (remembers patterns across students):
dz setup --target claude-code --preset academic --memory agentdb

# Or lightweight:
# dz init --target claude-code --preset academic

Prepare: Create a folder per student with thesis.pdf + review.pdf + external-review.pdf + antiplagiat.pdf.

Pre-defense (open Claude Code in student folder):

"Check document package completeness"     → document-checker
"Analyze this thesis"                     → dissertation-review (format, criteria, grade)
"Generate 6 defense questions"            → question-generator (basic → critical, page refs)

During defense (feed live transcript via Whisper + VB-Cable):

"Analyze this defense transcript"         → defense-evaluator (structure, coverage, delivery)
"Evaluate the student's answers"          → answer-assessor (completeness, depth, reviewer alignment)

| When | Skill | What it does | |------|-------|-------------| | Before | document-checker | Package completeness: thesis, reviews, antiplagiat | | Before | dissertation-review | ГЭК criteria, research/project format, grade 1-10, team project check | | Before | question-generator | 4-6 questions with page refs and expected keywords | | During | defense-evaluator | Live transcript → structure, coverage, delivery quality | | During | answer-assessor | Q&A evaluation → completeness, depth, reviewer remarks |

Key features: Grade corridor, per-criterion 1-10 scoring, TO BE vs data detection, LTV/CAC > 10 warning, reviewer divergence, raise/lower conditions, compact mode (1-page справка: "компактная справка"), summary table across all students. With AgentDB, patterns persist.

Skills contain only evaluation criteria and methodology — no student data.

Batch mode: S3 archive → agent swarm

# Download and extract: each student = subfolder with .zip
curl -o students.zip "https://s3.example.com/bucket/students.zip"
mkdir students && cd students && 7z x ../students.zip
for f in *.zip; do mkdir -p "${f%.zip}" && cd "${f%.zip}" && 7z x "../$f" && cd ..; done

Then in Claude Code:

"For each student folder: run document-checker → dissertation-review → question-generator.
 Save справка.md per student with clickable inline links to pages (стр. 45, разд. 2.3)
 and external sources ([JTBD](https://hbr.org/...)). Run all students in parallel."

With AgentDB, patterns persist across students — grading calibration improves with each analysis.

Example: Product Discovery with Design Thinking

# With self-learning (recommended — remembers HADI patterns, JTBD insights across sessions):
dz setup --target claude-code --preset meta
dz setup --target claude-code --preset meta --memory agentdb  # + semantic search

# Or without self-learning:
dz init --target claude-code --preset meta
# Or individually:
dz setup --target claude-code --select design-thinking

Then in Claude Code:

"Design a mobile app for booking coworking spaces"
→ design-thinking skill activates
→ 6-phase protocol runs with complexity tier auto-selection

6-Phase Protocol

Phase 1: EMPATHIZE  → STOP gate: request user interview data + goap-research for market data
Phase 2: DEFINE     → JTBD Canvas + CJM AS IS + Ishikawa root cause analysis
Phase 3: IDEATE     → HADI hypotheses + Lean Canvas / Osterwalder BMC + GTM + Unit Economics
Phase 4: PROTOTYPE  → MVP (fidelity spectrum) + CJM/VSM TO BE (labeled as hypotheses)
Phase 5: TEST       → STOP gate: request usability test data + risk analysis + HADI validation
Phase 6: VALIDATE   → Pilot with variance analysis: projected vs actual → Scale/Iterate/Pivot/Kill

Complexity Tiers (auto-selected)

| Tier | When | Phases | Integrations | |------|------|--------|-------------| | S | Quick user insight | 1→2→5 | explore + goap-research | | M | New feature | 1→2→3→4→5 | + frontend-design + six-thinking-hats | | L | New product | 1→2→3→4→5→6 | + qcsd-swarm + reverse-engineering-unicorn | | XL | Platform / ecosystem | All | All optional integrations (aqe init recommended) |

Key Safeguards

Never fabricates data — STOP gates pause for real interview/survey/test data
TO BE ≠ data — projections labeled as hypotheses, validated via pilot (Phase 6)
LTV/CAC > 10 flagged as suspicious (Skok 2013)
Loop-back protocol — Phase 5 can invalidate Phase 2 and return upstream
22 methodologies with academic validation tiers (Strong/Moderate/Practitioner/Weak)
23 validation rules (DT-001 through DT-023) enforce quality per tier

What's included vs what's optional

Core DT — the meta preset includes all required dependencies (16 skills):

dz setup --target claude-code --preset meta
# → explore, goap-research-ed25519, problem-solver-enhanced,
#   design-thinking, feature-adr, knowledge-extractor,
#   understand-anything-bridge, ... (15 total)

Full DT — for ALL optional integrations, install agentic-qe:

npm install -g agentic-qe && aqe init --auto
# → 94 QE skills + 55 agents in .claude/skills/ and .claude/agents/
# → six-thinking-hats, qcsd-ideation-swarm, frontend-design, brutal-honesty-review

Or cherry-pick: dz compose meta+keysarium for competitive analysis.

| Optional Skill | Source | What it adds | |---------------|--------|-------------| | frontend-design | aqe init / keysarium | HTML/React prototypes (Phase 4) | | six-thinking-hats | aqe init | Team ideation (Phase 3) | | qcsd-ideation-swarm | aqe init | 9-agent quality risk (Phase 2-3) | | reverse-engineering-unicorn | keysarium | Competitor CJM+JTBD (Phase 1) |

Without optional skills, design-thinking uses built-in fallbacks.

BTO benchmark: L0 Grade A (100%), L2 Opus weighted 7.58/10.

Example: Import Skills from ECC

dz install @dzhechkov/skills-ecc                 # 20 curated ECC skills
dz import-ecc --limit 50                         # import 50 from GitHub
dz import-ecc --local-path /path/to/ECC          # from local clone (fast)
dz import-ecc --select docker-patterns,tdd       # cherry-pick

Example: Security Scan with AgentShield

# In Claude Code: "scan my agent config for security issues"
# → agentshield-scan skill activates (170 rules, 10 categories)
npx ecc-agentshield scan --format sarif           # SARIF for GitHub Code Scanning

Example: 4-Axis Risk Scoring

dz init --target codex --preset meta --enrich
# → agents/openai.yaml includes risk_level per skill
# Axes: base_tool + file_sensitivity + blast_radius + irreversibility

Example: Understand & Develop an Existing Project

# 1. Analyze project → get recommendations
dz pretrain                                     # detects stack, recommends presets
dz recommend "work on this Node.js API"         # suggests skills + toolkits

# 2. Install skills (choose your level)
dz setup --target claude-code --preset meta --memory agentdb  # 16 skills (includes feature-adr)
dz setup --target claude-code --preset qe-engineer             # + 20 QE skills

# Want the full feature-adr toolkit with /feature-adr command + governance?
npx @dzhechkov/skills-feature-adr init                         # adds slash command + rules + shards
# See: https://www.npmjs.com/package/@dzhechkov/skills-feature-adr

# preset = SKILL.md only (auto-activates on matching tasks)
# npx = full toolkit (slash command + governance + rules)

Install Understand-Anything plugin, then in Claude Code:

# 3. Map the codebase
/understand                                      # builds knowledge graph
# → understand-anything-bridge feeds architecture context to all skills

# 4. Develop with full context
"Add a payment module"
# → feature-adr runs with architecture awareness (layers, hot spots, dependencies)
# → see: https://www.npmjs.com/package/@dzhechkov/skills-feature-adr
# → code generation informed by real dependency graph
# → QE review targets tests at high-impact files
# → agentshield-scan checks new configs for security

# 5. Verify impact
"What files are affected by my changes?"
# → blast radius calculation → targeted test generation

Architecture-aware development: every skill knows the codebase structure.

Example: AI-Assisted Reasoning & Self-Improvement

# Auto-select reasoning strategy:
"Compare 3 architectures"      → structured-reasoning: Tree-of-Thought (branches + scoring)
"Debug this test"              → structured-reasoning: Chain-of-Thought (linear trace)
"We've been looping"           → structured-reasoning: Reflection-Suppression (break loop)

# Self-review before delivering:
"Write a migration and verify" → reflection-loop: draft → critique → revise (max 3 rounds)

# Manage long sessions:
"Context is getting long"      → context-window-management: checkpoint + prune + continue

# Learn from success:
"Extract this as a skill"      → skill-crystallizer: trace → reusable SKILL.md

All included in meta preset.

Self-Learning: JSONL vs AgentDB

DZ Harness supports two memory backends for self-learning:

dz setup --target claude-code --preset devops                    # JSONL (default, lightweight)
dz setup --target claude-code --preset devops --memory agentdb   # AgentDB (vector memory)

| Capability | JSONL (default) | AgentDB (--memory agentdb) | |------------|----------------|------------------------------| | Session tracking | Append-only JSONL log | HNSW vector store (.rvf) | | Pattern storage | dz teach → patterns.jsonl | dz teach → .rvf + agentdb_pattern_store | | Search | Keyword (grep) | Semantic (HNSW nearest-neighbor, cosine similarity) | | Retrieval | Sequential scan | O(log n) approximate nearest neighbor | | Self-learning | Frequency-based | 9 RL algorithms + Thompson Sampling bandit | | Memory tiers | Flat file | 3-tier (working → short-term → long-term) | | Reflexion | Reward scores (0-1) | Episodic memory (task + outcome + self-critique) | | Causal reasoning | No | Cypher-like graph queries (X caused Y) | | Skill composition | Manual (presets) | Bandit-picked skill chains (A→B→C) | | Audit trail | No | Cryptographic attestation log | | Size | ~0 KB | 4.6 MB (agentdb) | | MCP tools | 0 | 41 tools (pattern, reflexion, causal, skill, hierarchy) | | Dependencies | None | agentdb (optional, via npx) |

AgentDB self-learning algorithms

When using --memory agentdb, the following algorithms automatically tune search quality:

Thompson Sampling — multi-armed bandit for ranking search results
UCB1 (Upper Confidence Bound) — exploration-exploitation balancing
EXP3 — adversarial bandit for non-stationary environments
Softmax — temperature-based action selection
Epsilon-Greedy — simple exploration with decay
Gradient Bandit — preference-based action selection
Contextual Bandit — context-aware ranking using features
REINFORCE — policy gradient for complex reward landscapes
PPO-lite — proximal policy optimization for stable learning

The bandit automatically selects the best algorithm for your usage pattern — no manual tuning needed.

How to enable AgentDB

# One command — everything is set up:
dz setup --target claude-code --preset devops --memory agentdb

This creates .dz/memory.rvf, registers the agentdb MCP server (41 tools), and configures session hooks. The agent can immediately use agentdb_pattern_store, agentdb_reflexion_recall, etc. — no additional dz init needed.

| Command | When to use | |---------|-------------| | dz setup --memory agentdb | Recommended — full setup in one step | | dz init --select agentdb-memory | Lightweight — only the SKILL.md guide (see below) |

What does `dz init --select agentdb-memory` actually do?

This is the lightweight path — it installs only the skill documentation, without configuring the backend:

Step 1: Auto-discovers agentdb-memory/ in skills-mcp package
Step 2: Copies to .claude/skills/agentdb-memory/
          ├── SKILL.md              ← instructions for the agent
          ├── schemas/output.json
          ├── scripts/validate-config.json
          └── evals/agentdb-memory.yaml

Step 3: Claude Code auto-discovers the skill from .claude/skills/
Step 4: When agent encounters a matching task, it reads SKILL.md
Step 5: SKILL.md teaches the agent WHICH tools to call and WHEN

What it does NOT do (unlike dz setup --memory agentdb):

Does NOT create .dz/memory.rvf
Does NOT register agentdb MCP server
Does NOT configure session hooks

After dz init --select agentdb-memory, the user must manually add the MCP server:

claude mcp add agentdb -- npx agentdb@latest mcp start

When this is useful:

You already have agentdb installed separately and just want the skill guide
You want to teach the agent about agentdb tools without committing to the full .dz/ infrastructure
You're in a team where agentdb is managed centrally but each developer needs the skill docs

How it works

dz init compiles canonical skills from the agentskills.io standard into the target platform's layout
Writing is additive — existing files are never overwritten without --force
All 5 platform adapters produce byte-identical output (ADR-005)
dz doctor runs 7 health checks (node version, adapters, config, SQLite, skills)
dz migrate detects legacy keysarium/bto installations and recommends migration path

Use Cases

1. Short-term product research (one-off study)

Goal: Quickly research a product idea, competitors, market — get a structured report.

# Option A: via dz CLI
dz init --target claude-code --preset meta
# Then in Claude Code:
#   /explore "Research the market for AI-powered code review tools"
#   /feature-adr "Summarize findings into an ADR"

# Option B: via keysarium (full 7-phase pipeline)
npx @dzhechkov/keysarium init
# Then in Claude Code:
#   /casarium "AI-powered code review tools — market analysis"
#   → Phase 0: Discovery → Phase 1: Exploration → Phase 2: Paranoid Research
#   → Phase 3: Solution Design → Phase 4: Architecture → Phase 5: Presentation

What you get:

meta preset: /explore clarifies the problem → /feature-adr structures findings as ADR decisions
keysarium: full 7-phase pipeline with dream cycles, background workers, and presentation generation

Best for: Quick study (hours), competitive analysis, technology evaluation.

2. Long-term product research (evolving over time)

Goal: Continuously gather data, add new sources, and "recalculate" the product vision as insights accumulate.

# Install keysarium (research pipeline) + evidence-wiki (knowledge base)
npx @dzhechkov/keysarium init
# Copy evidence-wiki plugin into your project:
npx @dzhechkov/evidence-wiki   # or git clone https://github.com/djd1m/evidence-wiki

npm install -g @dzhechkov/harness-cli
dz init --target claude-code --preset meta

Workflow — iterative research cycles with evidence wiki:

Week 1:  /casarium "Product X — initial research"
         → researches/ directory created with findings
         → .keysarium/memory/ stores patterns + reward scores

         /wiki-generate                              ← evidence-wiki
         → Scans researches/, ADRs, docs
         → Generates wiki/concepts/*.md (atomic pages with inline sources)
         → Builds wiki/graph.json (knowledge graph)
         → wiki/INDEX.md links everything

Week 2:  Add new data → /casarium "Product X — update with Q2 metrics"
         → Memory recalls Week 1 patterns (reward-calibrated learning)
         → New findings merged with existing, conflicts resolved

         /wiki-generate --check                      ← re-generates wiki
         → New concepts added, existing updated
         → Every claim verified: triple-pillar protocol requires N independent
           typed sources (ADR + methodology + research)
         → Stale concepts flagged, broken evidence links detected

         /triple-check wiki/concepts/pricing-model.md ← verify specific page
         → Checks that every factual claim has inline source citations
         → Flags unsupported statements

Week N:  /casarium "Product X — pivot analysis after customer feedback"
         → Full history in memory layer + evidence wiki
         → /harvest extracts reusable knowledge patterns
         → /wiki-generate rebuilds the entire knowledge graph
         → Product vision "recalculated" — the wiki IS the living product model

The evidence-wiki advantage:

| Without evidence-wiki | With evidence-wiki | |----------------------|-------------------| | Research in markdown files | Atomic concept pages with inline sources | | Findings scattered across researches/ | Interlinked knowledge graph (graph.json) | | "I think we decided X" | Every claim has a cited source (triple-pillar) | | Hard to see what changed | /wiki-generate --check diffs the knowledge base | | No verification | /triple-check enforces evidence discipline |

Key features for long-term research:

Evidence wiki (@dzhechkov/evidence-wiki): atomic concept pages where every factual claim carries inline sources; knowledge graph for cross-referencing; triple-pillar protocol (N independent typed sources per claim)
Reward-calibrated memory (@dzhechkov/memory Reflexion): each checkpoint response trains the system — "ок" = excellent (1.0), feedback = good (0.7), rework = needs_work (0.3)
Agent SDK Dreaming: between sessions, patterns are consolidated and distilled
/harvest (knowledge-extractor skill): extracts reusable patterns from completed research into lib/ templates
SQLite + FTS5 backend: scales to 100k+ records with full-text search across all research sessions

Best for: Product strategy over months, continuous market monitoring, evolving product vision with evidence-backed decisions.

3. Product research + working prototype

Goal: Research the product AND build a functional prototype.

Option A: Sequential — research first, then code

# Step 1: Install research + development presets
npx @dzhechkov/keysarium init
# OR:
dz init --target claude-code --preset keysarium

# Step 2: Research phase
#   /casarium "SaaS platform for team retrospectives"
#   → Phase 0-2: Discovery, Exploration, Paranoid Research
#   → Phase 3: Solution Design (with CJM prototype)
#   → Result: researches/<slug>/ with full analysis

# Step 3: Switch to development
dz init --target claude-code --preset feature-adr

# Step 4: Build using research outputs
#   /feature-adr "Build the retrospective platform based on research in researches/<slug>/"
#   → Step 0: Router classifies as L/XL
#   → Step 1-5: Requirements, ADRs, DDD, Architecture (informed by research)
#   → Step 6: Implementation plan
#   → Step 7: Code generation (with /frontend-design for UI)
#   → Step 8-9: QE review + fleet assessment

What you get: Research artifacts in researches/, then code in features/<slug>/ + actual repository changes. Research directly feeds into ADR decisions.

Option B: Parallel — research and code simultaneously with p-replicator

# Install the full product development toolkit
npx @dzhechkov/p-replicator init

# Single pipeline: research → requirements → prototype
#   /replicate "SaaS platform for team retrospectives"
#   → Reverse-engineers similar products (reverse-engineering-unicorn)
#   → Generates SPARC PRD (sparc-prd-mini)
#   → Validates requirements (requirements-validator)
#   → Creates the project structure (pipeline-forge)
#   → Builds the prototype (cc-toolkit-generator-enhanced)
#   → Reviews with brutal honesty (brutal-honesty-review)

What you get: A working prototype generated from research in a single /replicate pipeline run. Faster but less deep than Option A.

Comparison

| Aspect | Option A (Sequential) | Option B (p-replicator) | |--------|----------------------|------------------------| | Research depth | Deep (7-phase keysarium) | Moderate (reverse-engineering) | | Code quality | High (11-step feature-adr + QE) | Good (pipeline-forge + review) | | Time | Days to weeks | Hours to days | | Best for | Complex products, regulated domains | MVPs, hackathons, quick validation | | Packages | keysarium + feature-adr preset | p-replicator | | Research artifacts | researches/ directory | Embedded in PRD | | Code artifacts | features/<slug>/ + repo changes | Generated project |

Tip: For maximum rigor, combine both — use p-replicator for a quick prototype, then run /feature-adr --full-qe-extended on the generated code for production-grade quality engineering.

Status

v0.3.89 — published on npm. Also available as Claude Plugin. Part of DZ Harness Hub.

Claude Plugin

DZ Harness Hub is available as a Claude Code plugin:

# Via marketplace (when published):
claude plugin marketplace add djd1m/dz-harness-hub
claude plugin install dz-harness-hub@dz-harness-hub

# Or test locally:
claude --plugin-dir /path/to/dz-harness-hub

# Generate plugin manifest from current inventory:
dz plugin --version 0.3.86

The .claude-plugin/ directory contains plugin.json + marketplace.json compatible with pi-claude-marketplace and skill-hub.

Related Projects

Skill sources

agentic-qe — 20 QE skills + 55 agents (test generation, coverage, chaos, QCSD swarms)
ECC — 20 curated skills (agent patterns, autonomous loops, docker, git workflows)
AgentShield — Security scanning (170 rules for .claude/ configs)
Understand-Anything — Codebase knowledge graph → architecture context

Platform & infrastructure

AgentDB — Self-learning vector memory (--memory agentdb, 41 MCP tools)
agentskills.io — Open standard for SKILL.md format (adopted by all 5 platforms)
OpenAI Codex — 2nd target platform
OpenCode — 3rd target platform (160K+ stars)
Hermes Agent — 4th target platform
OpenClaude — 5th target platform (28K+ stars)

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@dzhechkov/harness-cli

Why dz?

Install

User Journey — from install to mastery

Phase 1: Discover (what's available?)

Phase 2: Install (set up your workspace)

Phase 3: Use (work with your agent)

Phase 4: Create (build your own skills)

Phase 5: Maintain (keep skills fresh)

Phase 6: Share (publish to the world)

Three Ways to Install Skills

Available Presets (11)

Standalone Packages (install via npx, no dz CLI needed)

All Commands (32)

Targets (5 platforms)

Workflows (Opus 4.8+ dynamic workflows)

Scout (ecosystem intelligence)

BTO integration (create-skill --bto)

dz install — install skills from any npm package

dz sync-upstream — check for upstream updates

dz upgrade — check installed skills for updates

dz downloads — npm weekly download stats

dz benchmark — L0 quality gate

dz mcp-scan — static agent-permission audit

dz publish — automated npm publish

dz auto-canonicalize — discover skills in GitHub repos

dz registry — searchable skill index

dz stats + dz dashboard

Example: Thesis Defense Preparation (Academic Preset)

Batch mode: S3 archive → agent swarm

Example: Product Discovery with Design Thinking

6-Phase Protocol

Complexity Tiers (auto-selected)

Key Safeguards

What's included vs what's optional

Example: Import Skills from ECC

Example: Security Scan with AgentShield

Example: 4-Axis Risk Scoring

Example: Understand & Develop an Existing Project

Example: AI-Assisted Reasoning & Self-Improvement

Self-Learning: JSONL vs AgentDB

AgentDB self-learning algorithms

How to enable AgentDB

What does dz init --select agentdb-memory actually do?

How it works

Use Cases

1. Short-term product research (one-off study)

2. Long-term product research (evolving over time)

3. Product research + working prototype

Option A: Sequential — research first, then code

Option B: Parallel — research and code simultaneously with p-replicator

Comparison

Status

Claude Plugin

Related Projects

Skill sources

Platform & infrastructure

What does `dz init --select agentdb-memory` actually do?