dojo.md
v0.3.2
Published
University for AI agents. Train any model through scenario-based courses, graduate with a proven SKILL.md. Works with Claude Code, OpenClaw, Cursor, Windsurf, OpenRouter, and any MCP-compatible agent. No fine-tuning required.
Downloads
847
Maintainers
Readme
dojo.md
University for AI agents.
Train any model through scenario-based courses. Graduate with a SKILL.md — portable expertise that makes agents reliable in production. No fine-tuning. No weight modification. Just knowledge, distilled and proven.
Works with Claude Code, OpenClaw, Cursor, Windsurf, and any MCP-compatible agent framework.
dojo train stripe-refunds --model openai/gpt-4o --target 85
Level 1: ████████████ 3/3 (100%)
Level 2: ████████░░░░ 2/3 (67%)
Score: 83/100
Domain knowledge distilled:
→ "Verify customer identity before ANY charge lookup"
→ "Duplicate charges within 5 min window = single refund"
→ "Always explain refund timeline (5-10 business days)"
SKILL.md written → .claude/skills/stripe-refunds/openai--gpt-4o/SKILL.mdThe Problem
AI agents are unreliable in production. They demo well but fail on edge cases, skip validation steps, call wrong tools, and miss domain-specific knowledge that practitioners take for granted. Fine-tuning is expensive, slow, and model-locked. Prompt engineering is fragile and doesn't scale.
The Solution
dojo.md runs agents through progressively difficult scenarios, evaluates them with a hybrid deterministic + LLM-judged assertion system, extracts both failure corrections AND curriculum knowledge from the course itself, and distills everything into a SKILL.md document that gets injected into the agent's context.
The SKILL.md is a knowledge graduation document — not just corrections. Even an agent scoring 100% graduates with a SKILL.md, because the domain expertise embedded in the course (specific thresholds, counter-intuitive strategies, platform rules) has standalone value.
Scenario YAML → Engine → Mock Services → Evaluator → Skill Generator
↕ ↕ ↕
Isolated LLM Judge extractCurriculum()
State + Deterministic ↓
Assertions SKILL.mdQuick Start
npm install -g dojo.md
export ANTHROPIC_API_KEY=sk-ant-... # For Claude models
export OPENROUTER_API_KEY=sk-or-... # For OpenRouter models
# Train Claude on customer support
dojo train stripe-refunds
# Train GPT-4o, judged by Claude
dojo train stripe-refunds --model openai/gpt-4o --judge claude-sonnet-4-6
# Auto-loop until 90% or plateau
dojo train stripe-refunds --model openai/gpt-4o --target 90Why It Works
Two data streams, one artifact
| Source | What it captures | When present | |--------|-----------------|--------------| | Failure patterns | What the agent struggled with — wrong tools, missing validation, missed edge cases | When score < 100 | | Curriculum extraction | What the course intended to teach — domain knowledge from assertion criteria | Always |
At 92/100, your SKILL.md contains the domain knowledge from all scenarios PLUS specific corrections for the 8-point gap. At 100/100, it contains pure domain expertise — the graduation diploma.
Per-model skills
Different models fail differently. Claude misses edge cases. GPT-4o picks wrong tools. Llama hallucinates parameters. Each gets its own SKILL.md:
.claude/skills/stripe-refunds/
├── anthropic--claude-sonnet-4-6/SKILL.md # Claude's blind spots
├── openai--gpt-4o/SKILL.md # GPT-4o's blind spots
└── meta-llama--llama-3.1-70b/SKILL.md # Llama's blind spotsAuto-training loop
Set a target. Dojo loops: train → evaluate → generate SKILL.md → re-inject → retrain. Stops on target reached, plateau, or max iterations.
dojo train stripe-refunds --model openai/gpt-4o --target 85 --max-retrain 5Iteration 1: 25/100
Iteration 2: 50/100 (+25) — SKILL.md injected
Iteration 3: 68/100 (+18)
Iteration 4: 72/100 (+4) — plateau detected, stoppingAny Model
Train any model via OpenRouter. 200+ models supported:
dojo train cold-email-b2b --model openai/gpt-4o
dojo train cold-email-b2b --model google/gemini-2.5-pro
dojo train cold-email-b2b --model meta-llama/llama-3.3-70b-instruct
dojo train cold-email-b2b --model deepseek/deepseek-v3.2
dojo train cold-email-b2b --model x-ai/grok-4.1-fast47 Pre-Built Courses
Agents graduate with domain expertise across:
Customer Support — stripe-refunds, escalation handling, churn prevention, SLA breach communication, onboarding sequences
Sales — cold email B2B, objection handling, proposal writing, competitive battlecards, follow-up sequences
Marketing — Google Ads copy, Meta/Facebook ads, SEO blog writing, social media content, email campaigns
DevOps — incident response, deployment alerts, bug triage, GitHub issue management
Content — newsletter writing, Twitter/X threads, product launches, brand voice documentation
dojo list # See all courses
dojo generate "Handle Zendesk ticket routing and priority assignment" # Create your ownWorks With Everything
dojo.md generates AgentSkills-standard SKILL.md files. They work anywhere skills work.
Claude Code
Train agents from inside your IDE. dojo.md is an MCP server:
{
"mcpServers": {
"dojo": {
"command": "npx",
"args": ["tsx", "path/to/dojomd/src/mcp/server.ts"],
"env": { "ANTHROPIC_API_KEY": "sk-ant-..." }
}
}
}MCP tools: dojo_discover, dojo_train, dojo_results, dojo_skill, dojo_apply — full training workflow without leaving your editor.
OpenClaw
Drop your graduated SKILL.md into OpenClaw's skill directory and your agent has instant domain expertise. dojo.md skills follow the same AgentSkills standard that OpenClaw uses — they're cross-compatible by design.
# Train a skill
dojo train stripe-refunds --model claude-sonnet-4-6
# Graduated SKILL.md is ready for OpenClaw, Claude Code, Cursor, Windsurf, or any MCP agent
cat .claude/skills/stripe-refunds/anthropic--claude-sonnet-4-6/SKILL.mdClawHub has 13,000+ community skills. The difference: dojo skills are earned, not written. Every SKILL.md that comes out of dojo has a training score, scenarios it was validated against, and failure patterns it addresses. It's a diploma, not a blog post.
Cursor, Windsurf, and any MCP-compatible agent
Same MCP server config. Same SKILL.md output. The skills are portable — train once, use everywhere.
The SKILL.md Standard
Generated skills follow the Anthropic Agent Skills open standard. Portable across any MCP-compatible framework:
---
name: stripe-refunds
description: >-
Handle Stripe refund requests correctly. Use when processing
refunds, duplicate charges, or customer disputes.
---
## Domain Knowledge
[Non-obvious insights distilled from training curriculum]
## Quick Start
[Most common failure, corrected]
## Core Rules
[Freedom-calibrated: ALWAYS/step-by-step/prefer]
## Decision Tree
[If/then branching logic]
## Edge Cases
[Every trap, with correct handling]
## Anti-Patterns
[DON'T X. Instead, Y.]The description triggers loading — ~100 tokens idle, ~5,000 tokens when activated. Progressive disclosure keeps context clean.
CLI Reference
| Command | Description |
|---------|-------------|
| dojo train <course> | Run training session |
| dojo train <course> -m openai/gpt-4o -j claude-sonnet-4-6 -t 85 | Full multi-model auto-loop |
| dojo retrain <course> | Auto-loop with defaults (target 90, max 5) |
| dojo arena <course> | Benchmark multiple models head-to-head |
| dojo results [course] | Show latest results |
| dojo list | List installed courses |
| dojo generate <skill> | Generate a course from description |
Train Options
| Flag | Description | Default |
|------|-------------|---------|
| -m, --model | Agent model | claude-sonnet-4-6 |
| -j, --judge | Judge model | claude-sonnet-4-6 |
| -t, --target | Target score (enables auto-loop) | — |
| --max-retrain | Max loop iterations | 5 |
| --level | Run specific level only | all |
| --report | Save detailed report | — |
Scenario Format
meta:
id: simple-refund
level: 1
course: stripe-refunds
description: Process a straightforward refund
type: tool
state:
customers:
- id: cus_001
email: [email protected]
name: Alice Johnson
charges:
- id: ch_001
amount: 5000
customer: cus_001
status: succeeded
trigger: >
Customer Alice Johnson (cus_001) is requesting
a refund for charge ch_001 ($50.00).
assertions:
- type: api_called
tool: stripe_customers_retrieve
description: Verify customer identity
- type: api_called
tool: stripe_refunds_create
params: { charge: ch_001 }
description: Create the refund
- type: llm_judge
criteria: >
Agent confirms refund was processed and explains
the 5-10 business day timeline for the credit
to appear on the customer's statement.
description: Communicate success with timelineDevelopment
git clone https://github.com/edholofy/dojo.md
cd dojo.md
npm install
npm run build
npm test
# Dev mode
npm run dev -- train stripe-refundsMission
Turn experience into expertise for AI agents.
Today: Author courses, train models, graduate with SKILL.md. Tomorrow: Production feedback loops that generate new scenarios from real failures. Future: The open knowledge layer for agent expertise — proven, portable, model-agnostic.
License
MIT
