@ralph-e-cli/ralph-e

v5.1.0

Published

4 months ago

Autonomous AI development loop with integrated Playwright UI/UX validation, visual regression testing, and self-healing builds.

Ralph-E

Autonomous AI coding loop. Runs AI agents on tasks until done.

Install

npm install -g @ralph-e-cli/ralph-e

Quick Start

# Single task
ralph-e "add login button"

# Work through a task list
ralph-e --prd PRD.md

Two Modes

Single task - just tell it what to do:

ralph-e "add dark mode"
ralph-e "fix the auth bug"

Task list - work through a PRD:

ralph-e              # uses PRD.md
ralph-e --prd tasks.md

Project Config

Optional. Stores rules the AI must follow.

ralph-e --init              # auto-detects project settings
ralph-e --config            # view config
ralph-e --add-rule "use TypeScript strict mode"

Creates .ralph-e/config.yaml:

project:
  name: "my-app"
  language: "TypeScript"
  framework: "Next.js"

commands:
  test: "npm test"
  lint: "npm run lint"
  build: "npm run build"

rules:
  - "use server actions not API routes"
  - "follow error pattern in src/utils/errors.ts"

boundaries:
  never_touch:
    - "src/legacy/**"
    - "*.lock"

AI Engines

ralph-e              # Claude Code (default)
ralph-e --opencode   # OpenCode
ralph-e --cursor     # Cursor
ralph-e --codex      # Codex
ralph-e --qwen       # Qwen-Code
ralph-e --droid      # Factory Droid
ralph-e --copilot    # GitHub Copilot

Model Override

ralph-e --model sonnet "add feature"    # use sonnet with Claude
ralph-e --sonnet "add feature"          # shortcut for above
ralph-e --opencode --model opencode/glm-4.7-free "task"

Engine-Specific Arguments

Pass additional arguments to the underlying engine CLI using -- separator:

ralph-e --copilot "add feature" -- --allow-all-tools --stream on
ralph-e --claude "fix bug" -- --no-permissions-prompt

Task Sources

Markdown file (default):

ralph-e --prd PRD.md

Markdown folder (for large projects):

ralph-e --prd ./prd/

Reads all .md files in the folder and aggregates tasks.

YAML:

ralph-e --yaml tasks.yaml

GitHub Issues:

ralph-e --github owner/repo
ralph-e --github owner/repo --github-label "ready"

Parallel Execution

ralph-e --parallel                  # 3 agents default
ralph-e --parallel --max-parallel 5 # 5 agents

Each agent gets isolated worktree + branch. Without --create-pr: auto-merges back with AI conflict resolution. With --create-pr: keeps branches, creates PRs. With --no-merge: keeps branches without merging.

Smart Scheduling

Use AI to predict which files each task will modify, then automatically group non-conflicting tasks for safe parallel execution:

ralph-e --parallel --smart-schedule
ralph-e --parallel --smart-schedule --planning-model haiku  # use cheaper model for planning

Smart scheduling uses the DSatur graph coloring algorithm to:

Predict file modifications for each task using AI
Build a conflict graph where edges represent file overlaps
Color the graph to group non-conflicting tasks
Execute each color group in parallel

This allows more tasks to run simultaneously while avoiding merge conflicts.

Sandbox Mode and Parallel Reliability

For large repos with big node_modules or dependency directories, use sandbox mode instead of git worktrees:

ralph-e --parallel --sandbox

Sandboxes are faster because they:

Symlink read-only dependencies (node_modules, .git, vendor, .venv, etc.)
Copy only source files that agents might modify

This avoids duplicating gigabytes of dependencies across worktrees. Changes are synced back to the original directory after each task completes.

Parallel execution reliability:

If worktree operations fail (e.g., nested worktree repos), ralph-e falls back to sandbox mode automatically
Retryable rate-limit or quota errors are detected and deferred for later retry
Local changes are stashed before the merge phase and restored after
Agents should not modify PRD files, .ralph-e/progress.txt, .ralph-e-worktrees, or .ralph-e-sandboxes

Branch Workflow

ralph-e --branch-per-task                # branch per task
ralph-e --branch-per-task --create-pr    # + create PRs
ralph-e --branch-per-task --draft-pr     # + draft PRs

Browser Automation

Ralph-E supports two approaches for browser-based testing:

Option 1: Playwright MCP (Test While Coding)

Give the AI agent direct browser access so it can test as it builds. This enables a code → test → fix loop during development.

Setup for Claude Code:

Add to your ~/.claude/mcp.json (or project .claude/mcp.json):

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@anthropics/mcp-playwright"]
    }
  }
}

The AI agent gets tools like browser_navigate, browser_click, browser_screenshot, browser_fill and can test UI changes immediately after writing code.

Setup for other agents:

Use agent-browser instead:

ralph-e "add login form" --browser    # enable browser automation
ralph-e "fix checkout" --no-browser   # disable browser automation

Option 2: Ralph-E Validation (Test After Task)

Ralph-E's built-in Playwright validation runs after each task completes, providing a final quality gate before marking the task done. See Playwright Visual Testing below.

Recommended: Use Both

For maximum coverage, use both approaches together:

Playwright MCP / agent-browser - AI tests while coding (catches issues early)
Ralph-E validation - Final gate before commit (visual regression, accessibility, performance)

# .ralph-e/config.yaml
playwright:
  enabled: true
  validateAfterTask: true
  onFailure: "block"  # Don't mark task complete if validation fails

This gives you: AI-driven testing during development + automated validation before commit.

Playwright Visual Testing

Ralph-E includes integrated Playwright validation for UI/UX testing with visual regression, accessibility, and performance monitoring.

ralph-e "add feature" --playwright                    # enable Playwright validation
ralph-e "fix ui" --playwright --visual-regression     # with visual regression testing
ralph-e "update form" --playwright --accessibility    # with accessibility testing
ralph-e --playwright-url http://localhost:5173        # custom base URL

Features

Visual Regression Testing: Compare screenshots against baselines to catch unintended UI changes
Accessibility Testing: WCAG compliance checking (wcag2a, wcag2aa, wcag2aaa)
Console Error Detection: Catch JavaScript errors during page load
Network Failure Monitoring: Detect failed API requests
Performance Budgets: Monitor Core Web Vitals (LCP, CLS, TTFB)
Self-Healing Tests: Learn from repeated failures and avoid making the same mistakes
Rollback on Failure: Automatically revert changes when validation fails

Configuration

Add to .ralph-e/config.yaml:

playwright:
  enabled: true
  baseUrl: "http://localhost:3000"

  # Visual regression settings
  visualRegression: true
  baselineDir: ".ralph-e/baselines"
  diffDir: ".ralph-e/diffs"
  pixelThreshold: 0.1  # 0-1, how much difference is acceptable

  # Accessibility testing
  accessibilityCheck: true
  accessibilityStandard: "wcag2aa"  # wcag2a, wcag2aa, wcag2aaa

  # When to validate
  validateAfterTask: true
  validateBeforeCommit: true

  # What to do on failure
  onFailure: "warn"  # warn, block, or rollback

  # Routes to test
  routes:
    - "/"
    - "/dashboard"
    - "/settings"

  # Viewports to test
  viewports:
    - name: "desktop"
      width: 1280
      height: 720
    - name: "mobile"
      width: 375
      height: 667

  # Performance budgets (optional)
  performanceBudget:
    lcp: 2500   # Largest Contentful Paint (ms)
    cls: 0.1    # Cumulative Layout Shift
    ttfb: 800   # Time to First Byte (ms)

  # Dev server settings
  devServerCommand: "npm run dev"
  devServerReadyPattern: "ready|started|listening"
  devServerTimeout: 60000

Self-Healing & Guardrails

When validation fails repeatedly (2+ times within 24 hours), Ralph-E automatically creates guardrails in .ralph-e/guardrails.md. These rules are injected into the AI agent's prompt to prevent repeating the same mistakes.

Example guardrail:

## Accessibility Issue: /dashboard
- Route: `/dashboard`
- Issue: Button "Submit" missing aria-label
- **Rule**: Ensure all interactive elements have proper ARIA labels
- Suggestion: Add aria-label or aria-labelledby to interactive elements

Rollback on Failure

When onFailure: "rollback" is set, Ralph-E creates a git checkpoint before validation. If validation fails, changes are automatically rolled back:

playwright:
  enabled: true
  onFailure: "rollback"  # automatically revert on validation failure

The rollback creates a git tag checkpoint, stashes any uncommitted changes, and resets to the pre-task state.

Options

| Flag | What it does | |------|--------------| | --prd PATH | task file or folder (auto-detected, default: PRD.md) | | --yaml FILE | YAML task file | | --github REPO | use GitHub issues | | --github-label TAG | filter issues by label | | --model NAME | override model for any engine | | --sonnet | shortcut for --claude --model sonnet | | --parallel | run parallel | | --max-parallel N | max agents (default: 3) | | --sandbox | use lightweight sandboxes instead of git worktrees | | --smart-schedule | use AI to predict file conflicts and optimize parallel grouping | | --planning-model MODEL | model for smart scheduling predictions (default: same as main) | | --no-merge | skip auto-merge in parallel mode | | --branch-per-task | branch per task | | --base-branch BRANCH | base branch for PRs | | --create-pr | create PRs | | --draft-pr | draft PRs | | --no-tests | skip tests | | --no-lint | skip lint | | --fast | skip tests + lint | | --no-commit | don't auto-commit | | --browser | enable browser automation | | --no-browser | disable browser automation | | --playwright | enable Playwright UI/UX validation | | --no-playwright | disable Playwright validation | | --playwright-url URL | base URL for Playwright (default: http://localhost:3000) | | --visual-regression | enable visual regression testing (requires --playwright) | | --accessibility | enable accessibility testing (requires --playwright) | | --max-iterations N | stop after N tasks | | --max-retries N | retries per task (default: 3) | | --retry-delay N | delay between retries in seconds (default: 5) | | --dry-run | preview only | | -v, --verbose | debug output | | --init | setup .ralph-e/ config | | --config | show config | | --add-rule "rule" | add rule to config |

Webhook Notifications

Get notified when sessions complete via Discord, Slack, or custom webhooks.

Configure in .ralph-e/config.yaml:

notifications:
  discord_webhook: "https://discord.com/api/webhooks/..."
  slack_webhook: "https://hooks.slack.com/services/..."
  custom_webhook: "https://your-api.com/webhook"

Telemetry (Opt-in)

Collect session data for building AI agent evaluation datasets:

# .ralph-e/config.yaml
telemetry:
  enabled: true
  privacyLevel: "anonymous"  # or "full" for prompts/responses
  format: "jsonl"            # or "deepeval" or "openai-evals"
  outputDir: ".ralph-e/telemetry"

Export formats:

jsonl: Raw session data, one session per line
deepeval: DeepEval compatible format for LLM evaluation
openai-evals: OpenAI Evals compatible format

Privacy levels:

anonymous: Only aggregate metrics (token counts, durations, success rates)
full: Full session data including prompts and responses

Requirements

Node.js 18+ or Bun
AI CLI: Claude Code, OpenCode, Cursor, Codex, Qwen-Code, Factory Droid, or GitHub Copilot
gh (optional, for GitHub issues / --create-pr)
playwright (optional, for visual regression testing - installed automatically as optional dependency)
@anthropics/mcp-playwright (optional, for AI-driven browser testing during development)

Credits

Ralph-E is based on ralphy by Michael Shimeles, which itself builds on the open-source Ralph project. Ralph-E extends it with integrated Playwright UI/UX validation, visual regression testing, accessibility checks, self-healing tests, and rollback capabilities.

License

MIT