@cstan0824/loopful

v2.2.0

Published

7 days ago

Loopful — a Claude Code plugin that enforces a disciplined, verifiable engineering workflow with 1 human gate and 2 agent gates.

0High
0Medium
0Low

cstan0824

claude-code skill plugin workflow engineering loop agent verification

Loopful

A Claude Code skill that enforces a disciplined, verifiable engineering workflow for multi-step development tasks. The workflow self-improves through reflective analysis after each run.

Repository: github.com/cstan0824/loopful

What It Does

Loopful guides Claude Code through 9 stages (0–8) with 1 human gate and 2 agent gates:

Preference & Project Setup — auto-detect project context, initialize preferences
Progressive Context Discovery — scan codebase at appropriate depth
Clarification & Goal Setting — ask clarifying questions, define success criteria (HUMAN GATE)
Planning — decompose tasks, identify risks, separate parallel from sequential work
Execution Loop — small reversible steps with retry logic
Verifier Gate — agent reviews code against stated goals
Validation Gate — agent runs behavioral + visual tests (Playwright, screenshots)
Review & Report — timestamped reports with full audit trail
Learnings & Self-Improvement — save knowledge, perform reflective analysis, discover tools

Installation

Via npx (recommended)

Run inside your project directory:

npx @cstan0824/loopful@latest install

This copies the skill files into your project:

.claude/skills/loopful/SKILL.md — the main skill definition
.claude/skills/loopful/{build,fix,improve,explore,setup}/SKILL.md — mode-specific instructions
.claude/skills/loopful/reference/ — supporting reference docs
.loopful/ — example artifacts (preferences, goals, learnings, state, reports)
CLAUDE.md.example — example project rules to merge into your CLAUDE.md

Manual

Copy the .claude/ directory into your project root:

cp -r .claude/ /path/to/your-project/.claude/

Optionally merge CLAUDE.md.example rules into your project's CLAUDE.md.

How Claude Code Skills Work

Claude Code discovers skills from the .claude/skills/ directory in your project. Each skill is a folder containing a SKILL.md file with YAML frontmatter:

---
name: loopful
version: 2.0.12
description: Staged engineering workflow for Claude Code tasks.
---

Modes are subdirectories with their own SKILL.md:

.claude/skills/loopful/
  SKILL.md              # Main skill (overview)
  build/SKILL.md        # /loopful:build
  fix/SKILL.md          # /loopful:fix
  improve/SKILL.md      # /loopful:improve
  explore/SKILL.md      # /loopful:explore
  setup/SKILL.md        # /loopful:setup
  reference/            # Supporting docs

Once installed, the skill is available as slash commands in Claude Code:

/loopful:build <description>
/loopful:fix <description>
/loopful:improve <description>
/loopful:explore <description>
/loopful:setup

To verify the skill is installed, open Claude Code in your project and type /loopful — it should show all 5 modes in the autocomplete list.

To install globally (available in all projects), copy the skill to your home directory:

cp -r .claude/skills/loopful ~/.claude/skills/loopful

To update, re-run npx @cstan0824/loopful@latest install or manually replace the files in .claude/skills/loopful/.

Usage

/loopful:build add user authentication with JWT
/loopful:fix login endpoint returns 500 on invalid email
/loopful:improve refactor database layer for testability
/loopful:explore how does the caching system work
/loopful:setup

Supported Modes

| Mode | Purpose | |------|---------| | build | Create new feature or module | | fix | Debug and repair an issue | | improve | Refactor, optimize, or enhance existing code | | explore | Understand and explain how a project works | | setup | Initialize Loopful preferences |

Setup Flow

No upfront questionnaire. Setup happens contextually at each stage when decisions matter.

/loopful:setup auto-detects project context (language, framework, tests, code style, deployment) and creates .loopful/preferences.md with sensible defaults.

Then, as the workflow runs, each stage can ask one focused question when the decision is important:

| Stage | Question | Options | |-------|----------|---------| | Discovery (S1) | How deep to scan? | quick / standard / deep / auto | | Planning (S3) | How to approach the plan? | minimal / balanced / thorough / auto | | Validation (S6) | How thorough to test? | simple / standard / full / custom / auto | | Report (S7) | How detailed the report? | brief / standard / detailed / auto |

Rules:

Auto-decide by default — only ask when the decision is ambiguous or high-impact
Ask once per stage per run — not a quiz
User can answer, skip, or type "auto" to let the workflow decide
Answers are saved to preferences.md — future runs use the saved answer

Architecture Overview

Gate Model

Loopful uses a 1 human gate + 2 agent gates model to minimize interruptions while maintaining quality:

Clarification + Goal   ← HUMAN GATE (only one)
        │
        ▼
    Planning            ← auto-proceed
        │
        ▼
    Execution           ← auto-proceed
        │
        ▼
  ┌─────────────────┐
  │  Verifier Gate   │──── PASS ──────────────────────┐
  │  (agent)         │                                 │
  └────────┬────────┘                                 │
           │ FAIL                                      │
           ▼                                           │
     Fix + Retry (max 3)                               │
           │                                           │
           ├─ PASS ────────────────────────────────────┤
           │                                           │
           └─ FAIL ×3 → ESCALATE TO HUMAN             │
                                                       ▼
                                          ┌─────────────────┐
                                          │  Validation Gate │
                                          │  (agent)         │
                                          │  • Behavioral    │
                                          │  • Visual        │
                                          │  • Edge cases    │
                                          └────────┬────────┘
                                                   │
                                            PASS ──┤── FAIL
                                                   │   │
                                                   │   ▼
                                                   │  Fix + Retry (max 3)
                                                   │   │
                                                   │   ├─ PASS
                                                   │   │
                                                   │   └─ FAIL ×3 → ESCALATE
                                                   ▼
                                              Report (auto)

| Gate | Type | What It Checks | |------|------|----------------| | Clarification + Goal | Human | Requirements are clear, goal is correct | | Verifier Gate | Agent | Code matches stated goals, no missing features | | Validation Gate | Agent | App actually works (behavioral), looks right (visual), handles edge cases |

Why only 1 human gate? The clarification stage is where user intent matters most. After that, the goal IS the contract — everything else is execution and automated quality checks. Agent gates catch problems without slowing the user down.

When to add more verifier gates:

Simple projects (calculator): 1 verifier + 1 validation after execution
Medium projects (multi-component): +1 verifier after planning
Complex projects (architecture decisions): +verifier gates at key milestones

Gate Failure Handling

Each agent gate follows a fix → retry → escalate loop with a max of 3 attempts:

On failure, the gate returns structured feedback:

What it checked and what it expected
What it actually found (with evidence)
Which goal criterion failed

The main agent uses this feedback to make a targeted fix — not a blind retry. Each attempt should address the specific failures reported.

After 3 failed attempts, the gate escalates to the human with full context:

The original goal and milestones
What the verifier/validation found each attempt
What the main agent tried and why it failed
Actionable options (simplify goal, try different approach, manual fix)

Why 3 retries?

1 — too aggressive, might be a simple fix
2 — reasonable for simple bugs
3 — if still failing, the problem is likely architectural or the goal is unrealistic
No limit — wastes tokens, could loop forever on an impossible requirement

See Gate Failure Detail for the full escalation protocol.

Workflow Stages

Stage 0: Preference & Project Setup

Auto-detects project context (language, framework, test tool, code style, deployment, structure). Only runs on /loopful:setup or when no preferences exist.

Stage 1: Progressive Context Discovery

quick-fix — scan directly relevant files only
standard — scan repo structure, modules, tests
deep — full architecture briefing and risk analysis

Reads .loopful/preferences.md including workflow adjustments from past runs.

Stage 2: Clarification & Goal Setting (HUMAN GATE)

The only human approval gate in the workflow. Two parts in one stage:

Clarification — asks up to 3 questions with default assumptions. User can answer, approve assumptions, or type skip.
Goal Setting — creates .loopful/goals.md with success criteria (each marked testable, manual-check, or unknown) and milestones. Presented to the user for a single approval.

After the user approves the goal, the workflow proceeds autonomously through all remaining stages.

Stage 3: Planning (auto)

Decomposes into ordered sub-tasks. Identifies what can run in parallel (analysis, docs review) vs what must be sequential (coupled edits, migrations). No human approval required — auto-proceeds after goal is set.

Stage 4: Execution Loop (auto)

Small reversible steps. Retries failed sub-tasks up to 3 times. Stops and marks blocked if a required milestone fails.

Stage 5: Verifier Gate (AGENT)

A separate agent reviews the implementation against the stated goals:

All milestones implemented?
Code matches stated requirements?
No obvious bugs or missing features?
Accessibility requirements met?

On PASS → continue to validation. On FAIL → returns structured feedback (what failed, expected vs actual, which goal criterion). Main agent fixes and re-verifies. Max 3 retries before escalating to human.

Stage 6: Validation Gate (AGENT)

Automated behavioral and visual validation using Playwright:

Behavioral — open app, fill forms, click buttons, verify outputs
Visual — screenshots reviewed by agent (font size, layout, contrast, mobile-friendliness)
Edge cases — invalid inputs, boundary conditions, error handling

On PASS → continue to report. On FAIL → returns structured feedback (what test failed, screenshots, expected behavior). Main agent fixes and re-validates. Max 3 retries before escalating to human.

Stage 7: Review & Report (auto)

Generates .loopful/reports/<YYYY-MM-DD-HHmm>-<mode>-<task-slug>.md and updates latest.md.

Stage 8: Learnings & Self-Improvement (auto)

Saves project knowledge and performs reflective analysis. The stages are fixed — what happens within each stage can be fine-tuned.

See Self-Improvement below.

Gate Failure Detail

Verifier Gate — Failure Protocol

The verifier agent is a separate agent from the builder. It receives the goals as a checklist and reviews evidence (code, file list, behavior descriptions).

Verifier output format (on failure):

VERIFIER RESULT: FAIL (attempt 1/3)

Checked against goal: "Shopping calculator with discounts"
Milestones: 6/6 marked complete

FAILURES:
  [M4] Discount calculation
    Expected: 10% off RM50 = RM45.00
    Actual:   10% off RM50 = RM5.00
    Evidence: app.js line 87 — formula divides by 100 but
              multiplies by subtotal instead of unit price

  [M2] Font size
    Expected: 20px minimum
    Actual:   16px on item list
    Evidence: style.css line 142 — .item-row sets font-size: 16px

SUGGESTED FIX:
  - M4: Change formula to `subtotal * (1 - discount/100)`
  - M2: Change .item-row font-size to 20px or inherit from body

Main agent receives this feedback and must:

Address each failure specifically (not rewrite everything)
Explain what it changed and why
Re-submit for verification

After 3 failed attempts → escalate to human:

⚠️ VERIFIER GATE FAILED (3/3)

Goal: "Shopping calculator with discounts for elderly"
Milestones: 6/6 marked complete

Attempt 1: Discount formula wrong (divided instead of multiplied)
  → Fixed formula
Attempt 2: Fixed formula but broke percentage toggle
  → Rewrote toggle logic
Attempt 3: Toggle works but discount still wrong for edge case (0%)
  → Added guard clause, still fails

Root cause: The discount calculation has conflicting requirements
  (percentage vs fixed amount) that the current architecture can't
  handle cleanly.

YOUR OPTIONS:
  1. Simplify — remove percentage discount, keep fixed amount only
  2. Redesign — let me restructure the calculation module
  3. Manual — here's exactly what's wrong, you fix it

Validation Gate — Failure Protocol

The validation agent uses Playwright to run the app and test it. It's a different agent from both the builder and the verifier.

Validation output format (on failure):

VALIDATION RESULT: FAIL (attempt 1/3)

BEHAVIORAL TESTS:
  ✓ Add item "Rice, qty 2, price RM5.00" — appears in list
  ✓ Grand total updates to RM10.00
  ✗ Add item with 10% discount — total shows RM9.00, expected RM9.00
    BUT: savings display shows "Saved: RM1.00" when it should say "RM1.00 off"
  ✗ Remove item — button works but total doesn't update
    Steps: add item → click ✕ → total still shows old value

VISUAL TESTS:
  ✓ Font size ≥ 20px on all text
  ✓ Buttons ≥ 56px tap target
  ✗ Contrast ratio on discount text: 3.8:1 (needs 4.5:1)
    Screenshot: [attached]

EDGE CASES:
  ✗ Quantity = 0 — app adds item with RM0.00 total (should reject)
  ✓ Negative price — shows validation error (correct)
  ✓ Discount > 100% — shows warning (correct)

FAILURES: 3 behavioral, 1 visual, 1 edge case

Main agent receives this feedback and must:

Fix each specific failure (with line numbers if available)
Re-submit for validation
Validation re-runs the same test suite

After 3 failed attempts → escalate to human:

⚠️ VALIDATION GATE FAILED (3/3)

Attempt 1: 5 failures (discount, remove, contrast, quantity, edge case)
  → Fixed remove handler, contrast, input validation
Attempt 2: 2 failures (discount calculation still wrong, savings display)
  → Rewrote discount module
Attempt 3: 1 failure (discount works but savings text is wrong for fixed discounts)
  → Can't fix without changing the display logic which affects other features

Screenshots of current state: [attached]

YOUR OPTIONS:
  1. Accept — savings text is minor, rest works
  2. Simplify — remove savings display entirely
  3. Manual — here's the exact code that needs changing

Escalation Protocol

When a gate fails 3 times, the workflow:

Stops — does not proceed to the next stage
Collects — all 3 attempts with feedback and fixes tried
Presents — to the human with clear options
Waits — for human decision before continuing

Human options on escalation:

Simplify the goal — remove the failing requirement, continue with adjusted scope
Try a different approach — agent suggests an alternative architecture
Manual fix — human fixes the code directly, agent re-verifies
Abort — stop the workflow, report what was completed

Key principle: Escalation is not failure. It's the system recognizing that the problem exceeds autonomous capability and asking for human judgment. The human gets full context to make an informed decision.

Approval Gates

Claude auto-advances through safe steps (reading, analyzing, planning, small edits, testing, reporting).

Claude MUST ask before:

Deleting files
Modifying .env, secrets, credentials, or production config
Installing new packages not already declared in the project
Database migrations or schema changes
Auth/security architecture decisions
Deployments
External account/API/billing changes
Large architectural rewrites
Touching protected files listed in preferences

`.loopful/` Artifacts

| File | Purpose | Git | |------|---------|-----| | preferences.md | Project context + workflow adjustments | Track | | goals.md | Active goal with success criteria and milestones | Track | | learnings.md | Accumulated project knowledge (capped) | Track | | state.json | Machine-readable workflow state (ephemeral) | Ignore | | project-briefing.md | Architecture and context briefing | Track | | reports/<timestamp>-<mode>-<slug>.md | Per-run reports (never overwritten) | Track | | reports/latest.md | Most recent report (overwritten each run) | Track |

Note: state.json is ephemeral per-run state. It is listed in .gitignore by default.

Self-Improvement

The workflow improves itself after each run through reflective analysis:

Self-Question — "Why did this happen?" (third-person perspective)
Root Cause Chain — Follow "why" to the deepest cause
Identify Tuning — Which stage, which tuning point within it
Write Adjustment — Full reasoning chain saved to preferences.md
Validate — Evidence-backed? Root cause? Prevents recurrence?

Adjustments accumulate in preferences.md and are applied to future runs. The workflow literally gets smarter with each run.

See .loopful/preferences.md.example for the adjustment format.

Tool & Skill Discovery

When verification is insufficient (defects leak past tests), the workflow:

Scans what tools are already installed (project deps, Claude skills, MCP servers, CLI tools)
Identifies the gap between failure type and available tools
Activates existing tools or suggests setting up new ones

Discovered tools are saved to preferences.md and used automatically in future runs.

MCP Discovery (Dynamic)

The skill does NOT require pre-configured MCP servers. It discovers what's available at runtime:

Uses ToolSearch to find all available MCP tools in the environment
Maps capabilities to needs (e.g., "need database verification" → find any DB MCP)
Uses whatever is available — works with any MCP server (AWS, GCP, PostgreSQL, MongoDB, Datadog, Sentry, etc.)
Falls back to local tools when no MCP server is available
Suggests MCP servers to install when a capability gap is found

No pre-configuration needed. The skill adapts to your environment.

Recovery & Resume

If a run is interrupted, the next run reads state.json to understand where it left off:

state.json tracks current stage, milestone progress, and verification status
The next run can resume from the last completed stage rather than starting over
Goal state in goals.md persists independently — milestones marked complete stay complete
If state.json is missing or stale (no active run), the workflow starts fresh

To manually resume a failed run, invoke the same mode again — the workflow will detect the existing state and pick up where it left off.

Goals System

Each task produces a goal with:

Success criteria — what "done" looks like, each marked testable, manual-check, or unknown
Milestones — ordered checkpoints
Test strategy — auto-detected from project type
Progress tracking — updated as milestones complete

A goal is not complete until all required criteria are verified.

Learnings System

Learnings accumulate across runs and include:

Confirmed Facts — verified project-specific knowledge
Useful Patterns — conventions and approaches that worked
Failed Assumptions — what was wrong (prevents repeated mistakes)
Outcomes — what happened and why

Size cap: Learnings file is capped at ~500 lines. When it exceeds this, older entries are archived to .loopful/learnings-archive-<date>.md and the active file is trimmed to recent entries. Max 10 new entries per run. No secrets or credentials are saved.

Report Format

Each report includes: mode, depth, task, date, context summary, assumptions, goal status, milestone completion, files changed, execution summary, retries, skipped/blocked work, verification results, success criteria status, learnings, remaining risks, and recommended next step.

Testing

Validate the skill package:

python scripts/run_all_checks.py

Or run individual checks:

python scripts/validate_skill_structure.py
python scripts/validate_skill_content.py
python scripts/validate_examples.py
pytest tests/ -v

Fixture Project (dogfood testing)

The examples/fixture-project/ directory is a minimal calculator for testing the skill:

cd examples/fixture-project
npm install
npm test

Use it to verify the skill can guide Claude through: discovery → clarification & goal → planning → execution → verifier → validation → report → learnings.

Limitations (MVP)

Single orchestrator skill — no sub-skills yet
No shell hooks
No CI config
MCP discovery is dynamic but no pre-built MCP integrations — uses whatever is available

Roadmap

Sub-skills for specialized workflows (testing, security review, migration)
Shell hooks for file protection and verification enforcement
CI integration for automated skill validation
MCP server for cross-tool orchestration