claude-harness

v1.0.0

Published

3 months ago

Battle-tested Claude Code scaffold. Same model, 2x output.

0High
0Medium
0Low

claude claude-code harness scaffold ai agent developer-tools productivity context-management memory hooks skills self-improving

The thesis: The same model with a different scaffold produces dramatically different results. Jeff Clune's Darwin Godel Machine achieved 20% to 50% on SWE-bench by improving the scaffold alone — same model, same weights. Anthropic's own harness engineering shows 10-22 point improvements from scaffold changes. The harness IS the product.

Quick Start
Why This Exists
What's Inside
How It Works
Before and After
Skills — 15 on-demand workflows
Hooks — 6 scripts + 2 inline blockers
Rules — 3 auto-loading rule files
Memory System — Three-tier compaction survival
Auto-Switch — Autonomous session management
Project Structure
Examples
Docs
The Self-Improvement Part
Contributing
License

Quick Start

npx claude-harness init

Asks your project type (web app, CLI tool, agent project). Copies a tuned .claude/ directory into your repo. 60 seconds.

claude   # Hooks activate automatically. Context loads. Memory persists.

Verify your setup:

bash scripts/verify.sh

Requirements: Claude Code CLI, git, bash. macOS and Linux.

Why This Exists

Claude Code users spend ~20% of every session re-establishing context. Architecture, coding standards, past debugging decisions — all lost to compaction.

claude-harness is 100+ sessions of those scattered improvements packaged into something anyone can install in 60 seconds.

What's Inside

| Component | Count | What You Get | | ------------- | -------------------- | --------------------------------------------------------------- | | Skills | 15 | On-demand workflows (~70 tokens each at startup) | | Hooks | 6 scripts + 2 inline | Lifecycle automation (context load, memory flush, quality gate) | | Rules | 3 | Auto-loading context for TypeScript, code quality, docs | | Templates | 3 | CLAUDE.md, MEMORY.md, LEARNINGS.md starting points | | Scripts | 2 | Setup verification + autonomous session management | | Examples | 3 | Per-project-type configs (web app, CLI, agent) |

How It Works

The Core Loop

Session starts ──► SessionStart hook injects context, memory, learnings, git status
       │
       ▼
   You work ──► Rules auto-load when you edit matching files (.ts → TypeScript rules)
       │
       ▼
 Context fills ──► PreCompact hook saves anchor state + backup to disk
       │
       ▼
After compaction ──► PostCompact hook re-injects what was saved
       │
       ▼
 Session ends ──► SessionEnd hook writes summary; Stop hook warns about uncommitted work
       │
       ▼
 Next session ──► SessionStart reads yesterday's summary. Continuity without effort.

The Feature That Changed Everything

The pre-compact memory flush. Claude Code fires a PreCompact hook right before compressing your context. A 30-line bash script running at that exact moment saves whatever you need to disk:

#!/bin/bash
set -euo pipefail
PROJECT_DIR="${CLAUDE_PROJECT_DIR:-$(pwd)}"
DAILY_FILE="$PROJECT_DIR/memory/daily/$(date '+%Y-%m-%d').md"
mkdir -p "$(dirname "$DAILY_FILE")"

echo "### $(date '+%H:%M') -- Pre-compaction flush" >> "$DAILY_FILE"
cat > "$PROJECT_DIR/.claude/anchor-state.md" << EOF
Active task: $(cat "$PROJECT_DIR/.claude/current-task.md" 2>/dev/null || echo "none")
Recent files: $(git -C "$PROJECT_DIR" diff --name-only HEAD~3 2>/dev/null | head -10)
EOF

Then SessionStart reads anchor-state.md back in after compaction. Context survives what would otherwise be permanent amnesia.

Before and After

| | Before (Raw Claude Code) | After (With Harness) | | ----------------------- | ----------------------------------------------- | --------------------------------------------- | | Monday morning | Re-explain architecture, standards, recent work | Session loads everything automatically | | After compaction | Lost context, re-debug solved issues | Memory flushed pre-compaction, restored after | | Same bug twice | Fix it again from scratch | LEARNINGS.md consulted first — fix referenced | | Forgot to commit | Discover at 2am, scramble | Stop hook warns before exit | | rm -rf accident | Panic | PreToolUse hook blocks it | | New team member | 2-hour onboarding walkthrough | Clone repo, context loads automatically | | Context degradation | Session quality silently drops | Monitor warns, auto-switch restarts fresh |

Skills

15 on-demand workflows. Each is a markdown file with trigger conditions and step-by-step instructions. Claude reads them when needed (~70 tokens of metadata at startup).

Meta (Self-Improvement)

| Skill | What It Does | | --------------------------------------------------- | -------------------------------------------------------- | | /self-improve | Analyze past sessions for patterns, auto-update harness | | /deep-think | Socratic questioning, self-critique, adversarial debate | | /skill-creator | Auto-generate new skills from discovered patterns | | /harness-review | Audit scaffold for quality and token efficiency | | /skill-routing | Progressive disclosure architecture for large skill sets |

Planning & Architecture

| Skill | What It Does | | ------------------------------------------------- | -------------------------------------------------------- | | /architect | Structured trade-off analysis for architecture decisions | | /sprint | Break goals into 2-hour chunks with dependency mapping | | /model-routing | Auto-select the right model tier for each task |

Operations

| Skill | What It Does | | ------------------------------------------------------------- | -------------------------------------------- | | /troubleshoot | 6-level error recovery ladder | | /validate | Check harness setup is internally consistent | | /recover | Recovery protocol when things go wrong | | /handoff | Write session continuity document | | /memory-compression | Memory management across compactions | | /integrate-resources | Process new resources and extract insights | | /loop-integration | Session-scoped cron jobs for monitoring |

Hooks

6 lifecycle scripts + 2 inline blockers. Configured in .claude/settings.local.json.

| Hook | Event | What It Does | | ------------------------------------------------------------------ | ---------------------- | ---------------------------------------------------------------- | | session-start-context.sh | SessionStart | Loads CLAUDE.md, memory, learnings, git status, skills inventory | | context-monitor.sh | SessionStart | Health checks: stale plans, memory bloat, uncommitted work | | pre-compact-memory-flush.sh | PreCompact | Saves anchor state + backup before context compression | | post-compact-restore.sh | SessionStart (compact) | Re-injects context after auto-compaction | | stop-verify.sh | Stop | Quality gate: warns about uncommitted work | | session-end-log.sh | SessionEnd | Logs session summary to daily memory |

Inline hooks (no script file, configured in settings.local.json):

Destructive Command Blocker PreToolUse: Bash — blocks rm -rf /, sudo rm -rf, dd if=/dev, etc.
Protected File Blocker PreToolUse: Write|Edit — prevents accidental edits to lockfiles and .env

Rules

3 auto-loading rule files. Claude loads the relevant rule when you edit matching file types.

| Rule | Glob Pattern | What It Enforces | | ------------------------------------------ | -------------------------------- | --------------------------------------------------------- | | typescript.md | *.ts, *.tsx | Strict TS, no any, Zod at boundaries, Result pattern | | code-quality.md | *.ts, *.tsx, *.js, *.jsx | Functional components, Tailwind, accessibility, dark mode | | docs.md | *.md, docs/**/*.md | Never remove content, concrete examples, evidence-backed |

Memory System

Three-tier architecture that survives compaction.

Tier 1: Bootstrap (always loaded)
├── CLAUDE.md              Project config and operating instructions
└── memory/MEMORY.md       Curated long-term decisions (cap: 500 lines)

Tier 2: On-Demand (loaded when needed)
├── .claude/skills/        Workflows loaded by trigger
└── .claude/rules/         Context loaded by file pattern

Tier 3: Recovery (loaded after compaction)
├── .claude/anchor-state.md     What you were doing pre-compaction
├── .claude/backups/            Full snapshots
├── memory/daily/{date}.md      Session journals
└── memory/LEARNINGS.md         Append-only error patterns

Each compaction retains only 20-30% of detail. After 2 compactions, you're working with 4-9% of the original. The memory system ensures critical state always survives.

See docs/MEMORY-SYSTEM.md for the full architecture.

Auto-Switch

scripts/auto-switch.sh wraps Claude Code in a session loop. When context degrades, it captures state and starts a fresh session with full context injection.

# Interactive mode (daytime)
./scripts/auto-switch.sh

# Autonomous overnight mode
./scripts/auto-switch.sh --overnight

# Background execution
tmux new-session -d -s work './scripts/auto-switch.sh --overnight'

Handles: context degradation (immediate restart), rate limits (60-min wait), crashes (30s retry), normal exit (stop). Max 20 sessions per loop.

See docs/AUTO-SWITCH.md for details.

Project Structure

claude-harness/
├── skills/                 # 15 on-demand workflows
│   ├── self-improve/           Analyze sessions, auto-update harness
│   ├── deep-think/             Socratic questioning for hard decisions
│   ├── architect/              Trade-off analysis, second-order effects
│   ├── sprint/                 Break goals into 2-hour focused chunks
│   ├── troubleshoot/           6-level error recovery chain
│   ├── memory-compression/     Manage memory across compactions
│   ├── skill-creator/          Auto-generate new skills from patterns
│   ├── harness-review/         Audit scaffold for efficiency
│   ├── model-routing/          Route tasks to right model tier
│   ├── integrate-resources/    Process new docs into knowledge
│   ├── skill-routing/          Progressive disclosure architecture
│   ├── loop-integration/       Scheduled recurring tasks
│   ├── validate/               Verify setup consistency
│   ├── recover/                Rollback protocol when things break
│   └── handoff/                Session continuity documents
├── hooks/                  # 6 lifecycle scripts
│   ├── session-start-context.sh
│   ├── context-monitor.sh
│   ├── pre-compact-memory-flush.sh
│   ├── post-compact-restore.sh
│   ├── stop-verify.sh
│   └── session-end-log.sh
├── rules/                  # 3 auto-loading rule files
│   ├── typescript.md
│   ├── code-quality.md
│   └── docs.md
├── templates/              # Drop-in config files
│   ├── CLAUDE.md.template
│   ├── settings.local.json
│   └── memory/
│       ├── MEMORY.md.template
│       └── LEARNINGS.md.template
├── scripts/
│   ├── auto-switch.sh          Session loop with auto-restart
│   └── verify.sh               Setup verification
├── examples/               # Per-project-type configs
│   ├── web-app/
│   ├── cli-tool/
│   └── agent-project/
├── docs/
│   ├── HOOKS.md
│   ├── SKILLS.md
│   ├── MEMORY-SYSTEM.md
│   └── AUTO-SWITCH.md
├── bin/
│   └── init.sh                 Interactive installer
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
├── SECURITY.md
├── LICENSE                     MIT
└── package.json

Examples

Three starter configs included:

| Example | Stack | Best For | | ------------------------------------------- | ---------------------------------------------------- | ------------------------------ | | web-app/ | Next.js, React, Tailwind, shadcn/ui, Supabase, Clerk | Full-stack web applications | | cli-tool/ | Node.js CLI, commander.js, Zod, zero-config | Command-line tools and scripts | | agent-project/ | Claude Agent SDK, structured tool calling, SOUL.md | AI agent development |

The installer asks your project type and copies the matching example as your CLAUDE.md.

Docs

| Document | What It Covers | | ----------------------------------------- | ------------------------------------------------------------------ | | HOOKS.md | Hook system, lifecycle events, environment variables, custom hooks | | SKILLS.md | Skill catalog, how skills work, creating new skills | | MEMORY-SYSTEM.md | Three-tier memory, compaction survival, file formats | | AUTO-SWITCH.md | Autonomous session management, exit detection, continuity |

The Self-Improvement Part

The /self-improve skill reads your past session logs and finds patterns:

Questions Claude asks repeatedly — adds the answer to CLAUDE.md
Rules Claude forgets mid-session — adds to .claude/rules/
Information Claude keeps re-discovering — promotes to bootstrap memory
Backtracking patterns — creates a skill that prevents them

Each improvement is small. But they compound. After 10 sessions, your scaffold knows your project's quirks. After 50 sessions, new team members inherit all that institutional knowledge just by cloning the repo.

The harness improves itself. That's the whole thesis.

Contributing

See CONTRIBUTING.md for guidelines. We welcome:

New skills — reusable workflows
New hooks — lifecycle automation
Bug fixes — especially cross-platform (macOS + Linux)
Better docs — examples, explanations, tutorials
Example configs — project-type-specific setups

License

MIT — use it however you want.