phi-cc

v1.0.11

Published

2 days ago

Agentic Coding Harness — AI pipeline that plans, implements, tests, and audits code changes

0High
0Medium
0Low

mhdsemps_

ai coding agent harness claude automation code-review ci

Phi — Agentic Coding Harness

Start any coding task with a single command. Phi plans, implements, tests, and audits your code — then hands you a plain-language report of everything that happened.

What Is This?

Phi is a set of shell scripts and AI agent definitions you install into any project with one command. Once installed, you describe what you want:

./phi --grill "Add rate limiting to the login endpoint"
./phi --implement

Phi runs a structured 6-agent pipeline that plans the work (with an interactive grilling phase), writes the code, checks the tests, audits for security, and produces a final report you can read in under a minute.

It does not run on its own. It does not run in the background. It only runs when you ask it to. It never commits or pushes code without your review.

Think of it like a very thorough assistant that plans before it acts, explains what changed and why, tests its own work honestly, and warns you when something is risky.

Usage Guide

Two-Phase Workflow (recommended for complex tasks)

For anything non-trivial, use the grill-then-implement pattern. Phase 1 stress-tests the plan before a single line of code is written. Phase 2 executes the hardened plan.

# Phase 1 — Plan and grill
./phi --grill "Add rate limiting to the login endpoint"
# Phi profiles your project, drafts a plan, then launches an interactive
# grill-with-docs interview. The Griller agent asks you one question at a time:
#   "Should the rate limit apply per-IP or per-account?"
#   "What status code on limit exceeded — 429 or 503?"
#   "Do you want the limit configurable via env var or config file?"
# Each answer crystallizes the plan. When the interview ends, the plan is ready.

# Phase 2 — Implement
./phi --implement
# Reads the hardened plan and runs the automated implementation pipeline:
# Implementer → Test Reviewer → Auditor. No further questions needed.

For simple, well-understood tasks, the single-command pipeline still works:

./phi "add a health-check endpoint at /health"

If the automated grill phase finds a material decision that the codebase cannot answer, Phi stops with NEEDS_USER_INPUT instead of letting the implementer make an assumption. Run ./phi --grill "<task>" to answer the questions interactively, then ./phi --implement.

Prerequisites

Node.js and npm (for npx — ships with Node.js)
Claude CLI — for the full pipeline: npm install -g @anthropic-ai/claude-code

For the shell entrypoint, the --grill and --implement flags require Claude CLI for the interactive interview and automated pipeline. Without Claude CLI, use ./phi "task" — Phi generates a paste-ready prompt for any capable AI tool. In Codex, the installed $phi skill adapts the workflow to Codex-native execution.

Phi itself has zero dependencies. No package manager, no runtime install, no imported libraries. Just the shell scripts and agent definitions.

Installation

Install Phi into any project with one command:

npx phi-cc@latest

The installer asks two questions:

phi-cc — Agentic Coding Harness Installer
========================================

Which AI coding tool will you use?
  1) Claude Code (recommended)
  2) Codex
  Enter 1-2 [1]:

Install globally (all projects) or locally (this project only)?
  1) Global — installs to your selected runtime config (~/.claude/ or ~/.codex/)
  2) Local  — installs to this project only (/path/to/your/project)
  Enter 1-2 [2]:

No flags needed — just answer two prompts. Prefer scripts? Use flags:

npx phi-cc@latest --claude --local               # Claude Code, this project
npx phi-cc@latest --claude --global              # Claude Code, all projects
npx phi-cc@latest --codex --local ./my-project   # Codex, specific project
npx phi-cc@latest --codex --global               # Codex, all projects

After installing, verify it works:

./phi --dry-run "describe what this project does"

In Codex, restart any already-open Codex session so new skills are discovered, then run:

$phi --dry-run "describe what this project does"

This runs the pipeline in preview mode — it plans but makes zero code changes. Check the report at .agentic-harness/runs/<run-id>/final-summary.md.

Version & Updates

Check your installed version:

./phi --version

Update Phi to the latest release:

./phi --update

This checks the npm registry for a newer version, backs up your runs/ directory, re-runs the installer, and restores your data. If you're already at the latest version, it prints a confirmation and exits.

Phi also checks for updates automatically in the background (no network blocking). If a new version is available, you'll see a banner when you run ./phi:

  ========================================
    A new version of Phi is available: v1.0.6
    Run: ./phi --update
  ========================================

Your First Run

Once installed, you can run Phi from anywhere in your project.

Two-phase (recommended for anything non-trivial):

./phi --grill "add a health-check endpoint at /health that returns {status: ok}"
# → interactive interview — answer a few questions about scope, error handling, etc.
./phi --implement
# → writes the code, runs tests, audits for safety

Single-phase (for simple, well-understood tasks):

./phi "add a health-check endpoint at /health that returns {status: ok}"

In Codex, use the installed skill:

$phi "add a health-check endpoint at /health that returns {status: ok}"

Phi profiles your project, plans the work, implements the change, reviews tests, audits for security, and writes a final summary. When it finishes, it prints the run ID and tells you where to find the report.

Safe preview — add --dry-run to see the plan without changing any code:

./phi --dry-run "rewrite the payment module"

No code is changed. You get a plan so you can see what Phi would do before committing to it.

The Pipeline Explained

Every run goes through the same sequence. No steps are skipped. Phi stops immediately if it detects something unsafe.

Profile → Plan → Grill (optional) → Implement → Test Review → Audit → Final Summary
                                         ↑              ↑
                                   (fix loop ≤3x)  (fix loop ≤3x)

| Step | Agent | What It Does | Output | |------|-------|-------------|--------| | 1. Profile | profile.sh | Scans your project for language, package manager, and tool commands (test, lint, build). Supports 11 languages. | repo-profile.md | | 2. Plan | Planner | Inspects your codebase, understands the task, writes a concrete step-by-step plan with file targets and verification strategy. | plan.md | | 3. Grill | Griller | Interactive via --grill; evidence-only in the single-command pipeline. Challenges fuzzy language, cross-references decisions against the codebase, updates plan.md as decisions crystallize. If unresolved user decisions remain, the pipeline stops before implementation. | grill-session.md | | 4. Implement | Implementer | Follows the plan, writes code and tests, runs the test suite after each change, notes any deviations. | implementation-report.md | | 5. Test Review | Test-Gap Reviewer | Audits test coverage on changed files, ranks gaps by risk (Critical → Low), sends implementer back up to 3 times if gaps exist. | test-gap-report.md | | 6. Audit | Auditor | STRIDE security analysis, code quality review, best-practices check on every changed file. Sends implementer back up to 3 times total (shared with test-gap loop). | audit-report.md | | 7. Summarize | Orchestrator | Synthesizes all reports into a plain-language final summary you can read in under a minute. | final-summary.md |

If your task touches existing code, an optional Legacy Characterizer documents the pre-existing patterns, tech debt, and risk surface before changes are made.

The fix loops share a combined maximum of 3 iterations. If the test reviewer finds gaps 3 times and they're not fully resolved, Phi stops looping and reports the remaining issues as warnings — it never loops forever.

Reading the Reports

Every run creates a dated folder:

.agentic-harness/runs/2026-01-15-143022-fix-login-rate-limiter/

The folder name is YYYY-MM-DD-HHMMSS-<slug>. Inside:

| File | When to Read | What It Tells You | |------|-------------|-------------------| | final-summary.md | Always — read this first. | Plain-language summary: what changed, why, test results, risks, next steps. | | plan.md | Before merging, or if the result surprises you. | The step-by-step plan Phi wrote before touching code. Compare to what actually happened. | | implementation-report.md | When you want to see exactly what changed. | File-by-file diff of every change, test results per change, any deviations from plan. | | test-gap-report.md | When tests fail or coverage is questionable. | Gaps ranked Critical → Low. Tells you what's untested and how risky it is. | | audit-report.md | Before merging to production. | Security vulnerabilities, code quality issues, best-practices violations. | | original-request.md | For reference. | Immutable record of exactly what you asked for. | | repo-profile.md | For reference. | What Phi detected about your project. Lists your test/lint/build commands. | | grill-session.md | When you used --grill. | Every question asked, every decision resolved, which artifacts were updated. | | validation-log.md | When a run fails. | Which commands were run, which passed/failed, failure classification. | | legacy-characterization.md | When touching existing code. | Pre-existing patterns, tech debt, and risks in the code that was modified. |

Each run folder is self-contained. Old runs never affect new runs. Delete old run folders anytime.

Status Labels

Every report starts with a status line. The final summary's status is the definitive verdict for the run:

| Label | Meaning | What to Do | |-------|---------|-----------| | PASS | Everything worked. Plan followed, tests pass, audit clean. | Review the diff and merge when ready. | | PASS_WITH_WARNINGS | Task completed with minor caveats — some tests couldn't run, edge cases noted, or loop limit hit. | Read the warnings before merging. | | FAIL | Something broke — tests fail, build broken, or plan couldn't be completed. | Read the report. Fix the issues before merging. | | NEEDS_USER_INPUT | Phi needs your decision before it can continue. | Read the report. Answer the question, then re-run. | | BLOCKED_UNSAFE | Stopped — the task involves destructive actions. | Reword your request. Do not proceed with this run. | | BLOCKED_TOOLING | Required tool missing (test runner, compiler, etc.). | Fix the tooling issue and re-run. |

Status labels are consistent across all reports. If the final status is anything other than PASS, read the report before merging.

Common Workflows

Add a feature (two-phase — recommended):

./phi --grill "add password reset via email link"
# → interactive grill session
# → answer questions, plan crystallizes
./phi --implement
# → implementer → TGR → auditor

Add a feature (single-phase — for simple tasks):

./phi "add password reset via email link"

The pipeline plans the new code, writes templates/routes/tests, checks that existing features still work, and runs a security audit on the new email path.

Investigate a problem (use --dry-run):

/phi --dry-run "figure out why the dashboard takes 12 seconds to load"

The planner inspects the code and data-fetching logic, flags slow queries or render bottlenecks, and produces a fix plan. No code is changed — you review the plan first.

Refactor:

./phi "replace all hardcoded API URLs with a config file"

Phi finds every hardcoded URL, creates a config file, updates all imports, runs the tests, and reports which files changed and why.

Explore an unfamiliar codebase:

./phi --dry-run "describe the authentication flow and identify any security concerns"

The planner reads through the codebase and produces a detailed analysis without touching anything.

When Things Go Wrong

If a run ends with FAIL or PASS_WITH_WARNINGS:

Open the final summary — .agentic-harness/runs/<run-id>/final-summary.md. Read "Test Results" and "Risks."
Check the test-gap report — gaps are ranked Critical → Low. Start with Critical.
Read the implementation report — it tells you exactly which files changed and how many tests pass/fail.
Run the tests yourself — repo-profile.md lists your project's test command. Run it to see the same results Phi saw.
Re-run with more specific context — "fix the bug" is vague. "fix the bug where clicking Save twice creates duplicate entries" gives Phi much more to work with.
Read the error report — if the run stopped early, error.md explains what failed, at which step, and what to do next.

Troubleshooting Quick Reference

| Problem | What's Happening | Fix | |---------|-----------------|-----| | "Claude CLI not found" | claude is not on your PATH. | Phi auto-generates a paste-ready prompt. Open universal-prompt.md and paste into any capable AI tool. Or install: npm install -g @anthropic-ai/claude-code | | "No grill session found" | --implement was run before --grill. | Run ./phi --grill "<task>" first to create and harden a plan. | | "plan.md not found or empty" | The grill interview didn't complete or didn't update the plan. | Re-run ./phi --grill "<task>" and complete the interview. | | "grill session is not aligned" | The griller found unresolved decisions, or the interview did not complete. | Re-run ./phi --grill "<task>" and answer until grill-session.md reports ALIGNED, then run ./phi --implement. | | "Claude CLI required" from --grill | The grill interview needs interactive Claude CLI. | Install: npm install -g @anthropic-ai/claude-code. Or use the single-phase pipeline: ./phi "<task>". | | Agent produced no report | Pipeline stopped before that agent — an earlier step failed. | Read final-summary.md or error.md. They name the failed step and explain why. | | Tests fail in the report | Phi found real test failures and reported them honestly. | Open test-gap-report.md for gaps by risk. Open implementation-report.md for per-file results. | | "Unknown language" warning | Profiler didn't recognize your project's language. | Phi still works for planning, audit, and dry-run. Validation commands are marked "not found" — never guessed. | | Dry run produced no plan | Claude CLI wasn't available for the planner. | Phi writes a template-based plan from repo profiling data. The plan is less detailed but still useful. | | $phi not found in Codex | Codex did not load the new skill yet, or the Codex install did not create .codex/skills/phi / ~/.codex/skills/phi. | Restart Codex after install. Verify SKILL.md exists in the matching skills directory, then run $phi --dry-run "<task>". | | Install conflicts | A file already exists in the target and differs from the source. | install.sh prompts you interactively. If non-TTY, it skips differing files. Re-run in a terminal for prompts. | | "Phi is not installed here" | You ran --update in a directory without Phi. | Change to a project with Phi installed, or run npx phi-cc@latest to install. |

Limitations

Phi is designed to be safe and transparent, but no automated tool is perfect. Read these before relying on it.

Safety Is Best-Effort, Not Guaranteed

Phi does not sandbox or isolate AI agents. Agents run with the same permissions as the user who started Phi.
Safety rules reduce risk but cannot prevent every bad command. AI reasoning is probabilistic — it can misunderstand instructions.
Agent reasoning is not mechanically tested. Phi verifies that agents produce report files, but cannot verify the content of those reports is correct. A plan might be flawed, an audit might miss a vulnerability.
Always review diffs before merging. Read the diff. Skim the reports. Run the tests yourself. Phi is an assistant, not a replacement for human judgment.

Operational Limits

Never auto-commits, never auto-pushes. Every change requires your manual review and commit.
Full pipeline requires Claude Code or Claude CLI. Without one, Phi generates a paste-ready prompt for use with another AI tool. This backup works but is less automated.
Profiling recognizes 11 languages: JavaScript/TypeScript, Python, Go, Rust, Java, Kotlin, Flutter/Dart, iOS (Swift/ObjC), Ruby, PHP, and Elixir. Projects in other languages still work — Phi just can't auto-detect test/lint/build commands.
3-iteration loop maximum for fix-and-review cycles. Prevents infinite loops but means some issues may remain after the limit.
When validation is unavailable (no test runner or lint tool detected), Phi relies on static code review instead. Less thorough than running real tests.

What Phi Cannot Do

Cannot guarantee generated code is correct, secure, or performant.
Cannot replace a human code reviewer, security auditor, or QA process.
Cannot run without access to an AI model.
Cannot commit, push, deploy, or publish your code.

Testing Phi Itself

Phi ships with a self-test suite that verifies the harness mechanics are working:

./test-harness

This runs 18 automated tests against small example projects and verifies that profiling, execution, installation, and syntax all work. All tests should pass.

Options:

./test-harness --profile-only    # Only profile tests
./test-harness --task-only       # Only task execution tests
./test-harness --install-only    # Only install tests
./test-harness --syntax-only     # Only POSIX syntax checks
./test-harness --keep            # Keep fixture directories after tests

File Overview

| Path | Purpose | |------|---------| | ./phi | Terminal entrypoint — start Phi from the command line | | ./install.sh | Installer — copies Phi into a target project | | ./test-harness | Self-test suite — verifies Phi works correctly | | .agentic-harness/profile.sh | Project profiler — detects language, package manager, tool commands | | .agentic-harness/agents/ | AI agent definitions (planner, griller, implementer, test-gap-reviewer, auditor, legacy-characterizer) | | .agentic-harness/templates/ | Report templates — markdown templates that structure every run's output | | .agentic-harness/runs/ | Run storage — each run creates a dated folder here with all reports | | .agentic-harness/.last-grill-run | State file — records the most recent --grill run ID so --implement finds it | | .agentic-harness/docs/ | Documentation — cheatsheet and reference files | | .claude/commands/phi.md | Claude Code slash command — the /phi entrypoint | | codex/skills/phi/SKILL.md | Codex skill source — installed to .codex/skills/phi or ~/.codex/skills/phi for the $phi entrypoint |

Requirements and Design Decisions

Phi was built against explicit requirements and design decisions:

REQUIREMENTS.md — The full capability contract with traceability from requirements to slices and validation evidence.
DECISIONS.md — Key design decisions: why shell scripts, why six status labels, how agent orchestration works, and more.

License

This project is provided as source-available. You are free to use, modify, and share it within your own projects and organizations.