@cstan0824/loopful
v2.2.0
Published
Loopful — a Claude Code plugin that enforces a disciplined, verifiable engineering workflow with 1 human gate and 2 agent gates.
Maintainers
Readme
Loopful
A Claude Code skill that enforces a disciplined, verifiable engineering workflow for multi-step development tasks. The workflow self-improves through reflective analysis after each run.
Repository: github.com/cstan0824/loopful
What It Does
Loopful guides Claude Code through 9 stages (0–8) with 1 human gate and 2 agent gates:
- Preference & Project Setup — auto-detect project context, initialize preferences
- Progressive Context Discovery — scan codebase at appropriate depth
- Clarification & Goal Setting — ask clarifying questions, define success criteria (HUMAN GATE)
- Planning — decompose tasks, identify risks, separate parallel from sequential work
- Execution Loop — small reversible steps with retry logic
- Verifier Gate — agent reviews code against stated goals
- Validation Gate — agent runs behavioral + visual tests (Playwright, screenshots)
- Review & Report — timestamped reports with full audit trail
- Learnings & Self-Improvement — save knowledge, perform reflective analysis, discover tools
Installation
Via npx (recommended)
Run inside your project directory:
npx @cstan0824/loopful@latest installThis copies the skill files into your project:
.claude/skills/loopful/SKILL.md— the main skill definition.claude/skills/loopful/{build,fix,improve,explore,setup}/SKILL.md— mode-specific instructions.claude/skills/loopful/reference/— supporting reference docs.loopful/— example artifacts (preferences, goals, learnings, state, reports)CLAUDE.md.example— example project rules to merge into yourCLAUDE.md
Manual
Copy the .claude/ directory into your project root:
cp -r .claude/ /path/to/your-project/.claude/Optionally merge CLAUDE.md.example rules into your project's CLAUDE.md.
How Claude Code Skills Work
Claude Code discovers skills from the .claude/skills/ directory in your project. Each skill is a folder containing a SKILL.md file with YAML frontmatter:
---
name: loopful
version: 2.0.12
description: Staged engineering workflow for Claude Code tasks.
---Modes are subdirectories with their own SKILL.md:
.claude/skills/loopful/
SKILL.md # Main skill (overview)
build/SKILL.md # /loopful:build
fix/SKILL.md # /loopful:fix
improve/SKILL.md # /loopful:improve
explore/SKILL.md # /loopful:explore
setup/SKILL.md # /loopful:setup
reference/ # Supporting docsOnce installed, the skill is available as slash commands in Claude Code:
/loopful:build <description>
/loopful:fix <description>
/loopful:improve <description>
/loopful:explore <description>
/loopful:setupTo verify the skill is installed, open Claude Code in your project and type /loopful — it should show all 5 modes in the autocomplete list.
To install globally (available in all projects), copy the skill to your home directory:
cp -r .claude/skills/loopful ~/.claude/skills/loopfulTo update, re-run npx @cstan0824/loopful@latest install or manually replace the files in .claude/skills/loopful/.
Usage
/loopful:build add user authentication with JWT
/loopful:fix login endpoint returns 500 on invalid email
/loopful:improve refactor database layer for testability
/loopful:explore how does the caching system work
/loopful:setupSupported Modes
| Mode | Purpose |
|------|---------|
| build | Create new feature or module |
| fix | Debug and repair an issue |
| improve | Refactor, optimize, or enhance existing code |
| explore | Understand and explain how a project works |
| setup | Initialize Loopful preferences |
Setup Flow
No upfront questionnaire. Setup happens contextually at each stage when decisions matter.
/loopful:setup auto-detects project context (language, framework, tests, code style, deployment) and creates .loopful/preferences.md with sensible defaults.
Then, as the workflow runs, each stage can ask one focused question when the decision is important:
| Stage | Question | Options |
|-------|----------|---------|
| Discovery (S1) | How deep to scan? | quick / standard / deep / auto |
| Planning (S3) | How to approach the plan? | minimal / balanced / thorough / auto |
| Validation (S6) | How thorough to test? | simple / standard / full / custom / auto |
| Report (S7) | How detailed the report? | brief / standard / detailed / auto |
Rules:
- Auto-decide by default — only ask when the decision is ambiguous or high-impact
- Ask once per stage per run — not a quiz
- User can answer, skip, or type "auto" to let the workflow decide
- Answers are saved to
preferences.md— future runs use the saved answer
Architecture Overview
Gate Model
Loopful uses a 1 human gate + 2 agent gates model to minimize interruptions while maintaining quality:
Clarification + Goal ← HUMAN GATE (only one)
│
▼
Planning ← auto-proceed
│
▼
Execution ← auto-proceed
│
▼
┌─────────────────┐
│ Verifier Gate │──── PASS ──────────────────────┐
│ (agent) │ │
└────────┬────────┘ │
│ FAIL │
▼ │
Fix + Retry (max 3) │
│ │
├─ PASS ────────────────────────────────────┤
│ │
└─ FAIL ×3 → ESCALATE TO HUMAN │
▼
┌─────────────────┐
│ Validation Gate │
│ (agent) │
│ • Behavioral │
│ • Visual │
│ • Edge cases │
└────────┬────────┘
│
PASS ──┤── FAIL
│ │
│ ▼
│ Fix + Retry (max 3)
│ │
│ ├─ PASS
│ │
│ └─ FAIL ×3 → ESCALATE
▼
Report (auto)| Gate | Type | What It Checks | |------|------|----------------| | Clarification + Goal | Human | Requirements are clear, goal is correct | | Verifier Gate | Agent | Code matches stated goals, no missing features | | Validation Gate | Agent | App actually works (behavioral), looks right (visual), handles edge cases |
Why only 1 human gate? The clarification stage is where user intent matters most. After that, the goal IS the contract — everything else is execution and automated quality checks. Agent gates catch problems without slowing the user down.
When to add more verifier gates:
- Simple projects (calculator): 1 verifier + 1 validation after execution
- Medium projects (multi-component): +1 verifier after planning
- Complex projects (architecture decisions): +verifier gates at key milestones
Gate Failure Handling
Each agent gate follows a fix → retry → escalate loop with a max of 3 attempts:
On failure, the gate returns structured feedback:
- What it checked and what it expected
- What it actually found (with evidence)
- Which goal criterion failed
The main agent uses this feedback to make a targeted fix — not a blind retry. Each attempt should address the specific failures reported.
After 3 failed attempts, the gate escalates to the human with full context:
- The original goal and milestones
- What the verifier/validation found each attempt
- What the main agent tried and why it failed
- Actionable options (simplify goal, try different approach, manual fix)
Why 3 retries?
- 1 — too aggressive, might be a simple fix
- 2 — reasonable for simple bugs
- 3 — if still failing, the problem is likely architectural or the goal is unrealistic
- No limit — wastes tokens, could loop forever on an impossible requirement
See Gate Failure Detail for the full escalation protocol.
Workflow Stages
Stage 0: Preference & Project Setup
Auto-detects project context (language, framework, test tool, code style, deployment, structure). Only runs on /loopful:setup or when no preferences exist.
Stage 1: Progressive Context Discovery
- quick-fix — scan directly relevant files only
- standard — scan repo structure, modules, tests
- deep — full architecture briefing and risk analysis
Reads .loopful/preferences.md including workflow adjustments from past runs.
Stage 2: Clarification & Goal Setting (HUMAN GATE)
The only human approval gate in the workflow. Two parts in one stage:
- Clarification — asks up to 3 questions with default assumptions. User can answer, approve assumptions, or type
skip. - Goal Setting — creates
.loopful/goals.mdwith success criteria (each markedtestable,manual-check, orunknown) and milestones. Presented to the user for a single approval.
After the user approves the goal, the workflow proceeds autonomously through all remaining stages.
Stage 3: Planning (auto)
Decomposes into ordered sub-tasks. Identifies what can run in parallel (analysis, docs review) vs what must be sequential (coupled edits, migrations). No human approval required — auto-proceeds after goal is set.
Stage 4: Execution Loop (auto)
Small reversible steps. Retries failed sub-tasks up to 3 times. Stops and marks blocked if a required milestone fails.
Stage 5: Verifier Gate (AGENT)
A separate agent reviews the implementation against the stated goals:
- All milestones implemented?
- Code matches stated requirements?
- No obvious bugs or missing features?
- Accessibility requirements met?
On PASS → continue to validation. On FAIL → returns structured feedback (what failed, expected vs actual, which goal criterion). Main agent fixes and re-verifies. Max 3 retries before escalating to human.
Stage 6: Validation Gate (AGENT)
Automated behavioral and visual validation using Playwright:
- Behavioral — open app, fill forms, click buttons, verify outputs
- Visual — screenshots reviewed by agent (font size, layout, contrast, mobile-friendliness)
- Edge cases — invalid inputs, boundary conditions, error handling
On PASS → continue to report. On FAIL → returns structured feedback (what test failed, screenshots, expected behavior). Main agent fixes and re-validates. Max 3 retries before escalating to human.
Stage 7: Review & Report (auto)
Generates .loopful/reports/<YYYY-MM-DD-HHmm>-<mode>-<task-slug>.md and updates latest.md.
Stage 8: Learnings & Self-Improvement (auto)
Saves project knowledge and performs reflective analysis. The stages are fixed — what happens within each stage can be fine-tuned.
See Self-Improvement below.
Gate Failure Detail
Verifier Gate — Failure Protocol
The verifier agent is a separate agent from the builder. It receives the goals as a checklist and reviews evidence (code, file list, behavior descriptions).
Verifier output format (on failure):
VERIFIER RESULT: FAIL (attempt 1/3)
Checked against goal: "Shopping calculator with discounts"
Milestones: 6/6 marked complete
FAILURES:
[M4] Discount calculation
Expected: 10% off RM50 = RM45.00
Actual: 10% off RM50 = RM5.00
Evidence: app.js line 87 — formula divides by 100 but
multiplies by subtotal instead of unit price
[M2] Font size
Expected: 20px minimum
Actual: 16px on item list
Evidence: style.css line 142 — .item-row sets font-size: 16px
SUGGESTED FIX:
- M4: Change formula to `subtotal * (1 - discount/100)`
- M2: Change .item-row font-size to 20px or inherit from bodyMain agent receives this feedback and must:
- Address each failure specifically (not rewrite everything)
- Explain what it changed and why
- Re-submit for verification
After 3 failed attempts → escalate to human:
⚠️ VERIFIER GATE FAILED (3/3)
Goal: "Shopping calculator with discounts for elderly"
Milestones: 6/6 marked complete
Attempt 1: Discount formula wrong (divided instead of multiplied)
→ Fixed formula
Attempt 2: Fixed formula but broke percentage toggle
→ Rewrote toggle logic
Attempt 3: Toggle works but discount still wrong for edge case (0%)
→ Added guard clause, still fails
Root cause: The discount calculation has conflicting requirements
(percentage vs fixed amount) that the current architecture can't
handle cleanly.
YOUR OPTIONS:
1. Simplify — remove percentage discount, keep fixed amount only
2. Redesign — let me restructure the calculation module
3. Manual — here's exactly what's wrong, you fix itValidation Gate — Failure Protocol
The validation agent uses Playwright to run the app and test it. It's a different agent from both the builder and the verifier.
Validation output format (on failure):
VALIDATION RESULT: FAIL (attempt 1/3)
BEHAVIORAL TESTS:
✓ Add item "Rice, qty 2, price RM5.00" — appears in list
✓ Grand total updates to RM10.00
✗ Add item with 10% discount — total shows RM9.00, expected RM9.00
BUT: savings display shows "Saved: RM1.00" when it should say "RM1.00 off"
✗ Remove item — button works but total doesn't update
Steps: add item → click ✕ → total still shows old value
VISUAL TESTS:
✓ Font size ≥ 20px on all text
✓ Buttons ≥ 56px tap target
✗ Contrast ratio on discount text: 3.8:1 (needs 4.5:1)
Screenshot: [attached]
EDGE CASES:
✗ Quantity = 0 — app adds item with RM0.00 total (should reject)
✓ Negative price — shows validation error (correct)
✓ Discount > 100% — shows warning (correct)
FAILURES: 3 behavioral, 1 visual, 1 edge caseMain agent receives this feedback and must:
- Fix each specific failure (with line numbers if available)
- Re-submit for validation
- Validation re-runs the same test suite
After 3 failed attempts → escalate to human:
⚠️ VALIDATION GATE FAILED (3/3)
Attempt 1: 5 failures (discount, remove, contrast, quantity, edge case)
→ Fixed remove handler, contrast, input validation
Attempt 2: 2 failures (discount calculation still wrong, savings display)
→ Rewrote discount module
Attempt 3: 1 failure (discount works but savings text is wrong for fixed discounts)
→ Can't fix without changing the display logic which affects other features
Screenshots of current state: [attached]
YOUR OPTIONS:
1. Accept — savings text is minor, rest works
2. Simplify — remove savings display entirely
3. Manual — here's the exact code that needs changingEscalation Protocol
When a gate fails 3 times, the workflow:
- Stops — does not proceed to the next stage
- Collects — all 3 attempts with feedback and fixes tried
- Presents — to the human with clear options
- Waits — for human decision before continuing
Human options on escalation:
- Simplify the goal — remove the failing requirement, continue with adjusted scope
- Try a different approach — agent suggests an alternative architecture
- Manual fix — human fixes the code directly, agent re-verifies
- Abort — stop the workflow, report what was completed
Key principle: Escalation is not failure. It's the system recognizing that the problem exceeds autonomous capability and asking for human judgment. The human gets full context to make an informed decision.
Approval Gates
Claude auto-advances through safe steps (reading, analyzing, planning, small edits, testing, reporting).
Claude MUST ask before:
- Deleting files
- Modifying
.env, secrets, credentials, or production config - Installing new packages not already declared in the project
- Database migrations or schema changes
- Auth/security architecture decisions
- Deployments
- External account/API/billing changes
- Large architectural rewrites
- Touching protected files listed in preferences
.loopful/ Artifacts
| File | Purpose | Git |
|------|---------|-----|
| preferences.md | Project context + workflow adjustments | Track |
| goals.md | Active goal with success criteria and milestones | Track |
| learnings.md | Accumulated project knowledge (capped) | Track |
| state.json | Machine-readable workflow state (ephemeral) | Ignore |
| project-briefing.md | Architecture and context briefing | Track |
| reports/<timestamp>-<mode>-<slug>.md | Per-run reports (never overwritten) | Track |
| reports/latest.md | Most recent report (overwritten each run) | Track |
Note: state.json is ephemeral per-run state. It is listed in .gitignore by default.
Self-Improvement
The workflow improves itself after each run through reflective analysis:
- Self-Question — "Why did this happen?" (third-person perspective)
- Root Cause Chain — Follow "why" to the deepest cause
- Identify Tuning — Which stage, which tuning point within it
- Write Adjustment — Full reasoning chain saved to
preferences.md - Validate — Evidence-backed? Root cause? Prevents recurrence?
Adjustments accumulate in preferences.md and are applied to future runs. The workflow literally gets smarter with each run.
See .loopful/preferences.md.example for the adjustment format.
Tool & Skill Discovery
When verification is insufficient (defects leak past tests), the workflow:
- Scans what tools are already installed (project deps, Claude skills, MCP servers, CLI tools)
- Identifies the gap between failure type and available tools
- Activates existing tools or suggests setting up new ones
Discovered tools are saved to preferences.md and used automatically in future runs.
MCP Discovery (Dynamic)
The skill does NOT require pre-configured MCP servers. It discovers what's available at runtime:
- Uses ToolSearch to find all available MCP tools in the environment
- Maps capabilities to needs (e.g., "need database verification" → find any DB MCP)
- Uses whatever is available — works with any MCP server (AWS, GCP, PostgreSQL, MongoDB, Datadog, Sentry, etc.)
- Falls back to local tools when no MCP server is available
- Suggests MCP servers to install when a capability gap is found
No pre-configuration needed. The skill adapts to your environment.
Recovery & Resume
If a run is interrupted, the next run reads state.json to understand where it left off:
state.jsontracks current stage, milestone progress, and verification status- The next run can resume from the last completed stage rather than starting over
- Goal state in
goals.mdpersists independently — milestones marked complete stay complete - If
state.jsonis missing or stale (no active run), the workflow starts fresh
To manually resume a failed run, invoke the same mode again — the workflow will detect the existing state and pick up where it left off.
Goals System
Each task produces a goal with:
- Success criteria — what "done" looks like, each marked
testable,manual-check, orunknown - Milestones — ordered checkpoints
- Test strategy — auto-detected from project type
- Progress tracking — updated as milestones complete
A goal is not complete until all required criteria are verified.
Learnings System
Learnings accumulate across runs and include:
- Confirmed Facts — verified project-specific knowledge
- Useful Patterns — conventions and approaches that worked
- Failed Assumptions — what was wrong (prevents repeated mistakes)
- Outcomes — what happened and why
Size cap: Learnings file is capped at ~500 lines. When it exceeds this, older entries are archived to .loopful/learnings-archive-<date>.md and the active file is trimmed to recent entries. Max 10 new entries per run. No secrets or credentials are saved.
Report Format
Each report includes: mode, depth, task, date, context summary, assumptions, goal status, milestone completion, files changed, execution summary, retries, skipped/blocked work, verification results, success criteria status, learnings, remaining risks, and recommended next step.
Testing
Validate the skill package:
python scripts/run_all_checks.pyOr run individual checks:
python scripts/validate_skill_structure.py
python scripts/validate_skill_content.py
python scripts/validate_examples.py
pytest tests/ -vFixture Project (dogfood testing)
The examples/fixture-project/ directory is a minimal calculator for testing the skill:
cd examples/fixture-project
npm install
npm testUse it to verify the skill can guide Claude through: discovery → clarification & goal → planning → execution → verifier → validation → report → learnings.
Limitations (MVP)
- Single orchestrator skill — no sub-skills yet
- No shell hooks
- No CI config
- MCP discovery is dynamic but no pre-built MCP integrations — uses whatever is available
Roadmap
- Sub-skills for specialized workflows (testing, security review, migration)
- Shell hooks for file protection and verification enforcement
- CI integration for automated skill validation
- MCP server for cross-tool orchestration
