dojo.md

v0.3.2

Published

4 days ago

University for AI agents. Train any model through scenario-based courses, graduate with a proven SKILL.md. Works with Claude Code, OpenClaw, Cursor, Windsurf, OpenRouter, and any MCP-compatible agent. No fine-tuning required.

Downloads

847

dojo.md

University for AI agents.

Train any model through scenario-based courses. Graduate with a SKILL.md — portable expertise that makes agents reliable in production. No fine-tuning. No weight modification. Just knowledge, distilled and proven.

Works with Claude Code, OpenClaw, Cursor, Windsurf, and any MCP-compatible agent framework.

dojo train stripe-refunds --model openai/gpt-4o --target 85

Level 1: ████████████ 3/3 (100%)
Level 2: ████████░░░░ 2/3 (67%)
Score: 83/100

Domain knowledge distilled:
  → "Verify customer identity before ANY charge lookup"
  → "Duplicate charges within 5 min window = single refund"
  → "Always explain refund timeline (5-10 business days)"

SKILL.md written → .claude/skills/stripe-refunds/openai--gpt-4o/SKILL.md

The Problem

AI agents are unreliable in production. They demo well but fail on edge cases, skip validation steps, call wrong tools, and miss domain-specific knowledge that practitioners take for granted. Fine-tuning is expensive, slow, and model-locked. Prompt engineering is fragile and doesn't scale.

The Solution

dojo.md runs agents through progressively difficult scenarios, evaluates them with a hybrid deterministic + LLM-judged assertion system, extracts both failure corrections AND curriculum knowledge from the course itself, and distills everything into a SKILL.md document that gets injected into the agent's context.

The SKILL.md is a knowledge graduation document — not just corrections. Even an agent scoring 100% graduates with a SKILL.md, because the domain expertise embedded in the course (specific thresholds, counter-intuitive strategies, platform rules) has standalone value.

Scenario YAML → Engine → Mock Services → Evaluator → Skill Generator
                              ↕               ↕              ↕
                          Isolated        LLM Judge    extractCurriculum()
                          State           + Deterministic    ↓
                                          Assertions    SKILL.md

Quick Start

npm install -g dojo.md

export ANTHROPIC_API_KEY=sk-ant-...    # For Claude models
export OPENROUTER_API_KEY=sk-or-...    # For OpenRouter models

# Train Claude on customer support
dojo train stripe-refunds

# Train GPT-4o, judged by Claude
dojo train stripe-refunds --model openai/gpt-4o --judge claude-sonnet-4-6

# Auto-loop until 90% or plateau
dojo train stripe-refunds --model openai/gpt-4o --target 90

Why It Works

Two data streams, one artifact

| Source | What it captures | When present | |--------|-----------------|--------------| | Failure patterns | What the agent struggled with — wrong tools, missing validation, missed edge cases | When score < 100 | | Curriculum extraction | What the course intended to teach — domain knowledge from assertion criteria | Always |

At 92/100, your SKILL.md contains the domain knowledge from all scenarios PLUS specific corrections for the 8-point gap. At 100/100, it contains pure domain expertise — the graduation diploma.

Per-model skills

Different models fail differently. Claude misses edge cases. GPT-4o picks wrong tools. Llama hallucinates parameters. Each gets its own SKILL.md:

.claude/skills/stripe-refunds/
├── anthropic--claude-sonnet-4-6/SKILL.md    # Claude's blind spots
├── openai--gpt-4o/SKILL.md                  # GPT-4o's blind spots
└── meta-llama--llama-3.1-70b/SKILL.md       # Llama's blind spots

Auto-training loop

Set a target. Dojo loops: train → evaluate → generate SKILL.md → re-inject → retrain. Stops on target reached, plateau, or max iterations.

dojo train stripe-refunds --model openai/gpt-4o --target 85 --max-retrain 5

Iteration 1: 25/100
Iteration 2: 50/100 (+25) — SKILL.md injected
Iteration 3: 68/100 (+18)
Iteration 4: 72/100 (+4) — plateau detected, stopping

Any Model

Train any model via OpenRouter. 200+ models supported:

dojo train cold-email-b2b --model openai/gpt-4o
dojo train cold-email-b2b --model google/gemini-2.5-pro
dojo train cold-email-b2b --model meta-llama/llama-3.3-70b-instruct
dojo train cold-email-b2b --model deepseek/deepseek-v3.2
dojo train cold-email-b2b --model x-ai/grok-4.1-fast

47 Pre-Built Courses

Agents graduate with domain expertise across:

Customer Support — stripe-refunds, escalation handling, churn prevention, SLA breach communication, onboarding sequences

Sales — cold email B2B, objection handling, proposal writing, competitive battlecards, follow-up sequences

Marketing — Google Ads copy, Meta/Facebook ads, SEO blog writing, social media content, email campaigns

DevOps — incident response, deployment alerts, bug triage, GitHub issue management

Content — newsletter writing, Twitter/X threads, product launches, brand voice documentation

dojo list                    # See all courses
dojo generate "Handle Zendesk ticket routing and priority assignment"  # Create your own

Works With Everything

dojo.md generates AgentSkills-standard SKILL.md files. They work anywhere skills work.

Claude Code

Train agents from inside your IDE. dojo.md is an MCP server:

{
  "mcpServers": {
    "dojo": {
      "command": "npx",
      "args": ["tsx", "path/to/dojomd/src/mcp/server.ts"],
      "env": { "ANTHROPIC_API_KEY": "sk-ant-..." }
    }
  }
}

MCP tools: dojo_discover, dojo_train, dojo_results, dojo_skill, dojo_apply — full training workflow without leaving your editor.

OpenClaw

Drop your graduated SKILL.md into OpenClaw's skill directory and your agent has instant domain expertise. dojo.md skills follow the same AgentSkills standard that OpenClaw uses — they're cross-compatible by design.

# Train a skill
dojo train stripe-refunds --model claude-sonnet-4-6

# Graduated SKILL.md is ready for OpenClaw, Claude Code, Cursor, Windsurf, or any MCP agent
cat .claude/skills/stripe-refunds/anthropic--claude-sonnet-4-6/SKILL.md

ClawHub has 13,000+ community skills. The difference: dojo skills are earned, not written. Every SKILL.md that comes out of dojo has a training score, scenarios it was validated against, and failure patterns it addresses. It's a diploma, not a blog post.

Cursor, Windsurf, and any MCP-compatible agent

Same MCP server config. Same SKILL.md output. The skills are portable — train once, use everywhere.

The SKILL.md Standard

Generated skills follow the Anthropic Agent Skills open standard. Portable across any MCP-compatible framework:

---
name: stripe-refunds
description: >-
  Handle Stripe refund requests correctly. Use when processing
  refunds, duplicate charges, or customer disputes.
---

## Domain Knowledge
[Non-obvious insights distilled from training curriculum]

## Quick Start
[Most common failure, corrected]

## Core Rules
[Freedom-calibrated: ALWAYS/step-by-step/prefer]

## Decision Tree
[If/then branching logic]

## Edge Cases
[Every trap, with correct handling]

## Anti-Patterns
[DON'T X. Instead, Y.]

The description triggers loading — ~100 tokens idle, ~5,000 tokens when activated. Progressive disclosure keeps context clean.

CLI Reference

| Command | Description | |---------|-------------| | dojo train <course> | Run training session | | dojo train <course> -m openai/gpt-4o -j claude-sonnet-4-6 -t 85 | Full multi-model auto-loop | | dojo retrain <course> | Auto-loop with defaults (target 90, max 5) | | dojo arena <course> | Benchmark multiple models head-to-head | | dojo results [course] | Show latest results | | dojo list | List installed courses | | dojo generate <skill> | Generate a course from description |

Train Options

| Flag | Description | Default | |------|-------------|---------| | -m, --model | Agent model | claude-sonnet-4-6 | | -j, --judge | Judge model | claude-sonnet-4-6 | | -t, --target | Target score (enables auto-loop) | — | | --max-retrain | Max loop iterations | 5 | | --level | Run specific level only | all | | --report | Save detailed report | — |

Scenario Format

meta:
  id: simple-refund
  level: 1
  course: stripe-refunds
  description: Process a straightforward refund
  type: tool

state:
  customers:
    - id: cus_001
      email: [email protected]
      name: Alice Johnson
  charges:
    - id: ch_001
      amount: 5000
      customer: cus_001
      status: succeeded

trigger: >
  Customer Alice Johnson (cus_001) is requesting
  a refund for charge ch_001 ($50.00).

assertions:
  - type: api_called
    tool: stripe_customers_retrieve
    description: Verify customer identity
  - type: api_called
    tool: stripe_refunds_create
    params: { charge: ch_001 }
    description: Create the refund
  - type: llm_judge
    criteria: >
      Agent confirms refund was processed and explains
      the 5-10 business day timeline for the credit
      to appear on the customer's statement.
    description: Communicate success with timeline

Development

git clone https://github.com/edholofy/dojo.md
cd dojo.md
npm install
npm run build
npm test

# Dev mode
npm run dev -- train stripe-refunds

Mission

Turn experience into expertise for AI agents.

Today: Author courses, train models, graduate with SKILL.md. Tomorrow: Production feedback loops that generate new scenarios from real failures. Future: The open knowledge layer for agent expertise — proven, portable, model-agnostic.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

dojo.md

The Problem

The Solution

Quick Start

Why It Works

Two data streams, one artifact

Per-model skills

Auto-training loop

Any Model

47 Pre-Built Courses

Works With Everything

Claude Code

OpenClaw

Cursor, Windsurf, and any MCP-compatible agent

The SKILL.md Standard

CLI Reference

Train Options

Scenario Format

Development

Mission

License