flyee

v0.6.1

Published

17 hours ago

Autonomous Software Engineering Harness — 24 specialist agents, 88 skills, 46 workflows. Plan, build, evaluate, iterate — autonomously. All runtimes.

Downloads

970

0High
0Medium
0Low

flyeelab

ai agent coding-agent ai-framework multi-runtime antigravity claude cursor copilot codex windsurf gemini harness autonomous-agent

Flyee

Autonomous Software Engineering Harness — Plan, build, evaluate, iterate. Autonomously.

What is Flyee?

flyee is an autonomous software engineering harness that transforms any AI coding assistant into a self-correcting build system. It works with any runtime — Antigravity, Claude Code, Cursor, GitHub Copilot, Codex, Windsurf, Gemini CLI, and more.

Instead of just assisting, Flyee enables your AI to plan, build, evaluate, and iterate autonomously — using a GAN-inspired Generator↔Evaluator architecture with mechanical enforcement and entropy management.

Inspired by Anthropic's Harness Design and OpenAI's Harness Engineering research.

Key Differentiators

| Feature | Without Flyee | With Flyee | |---------|---------------|------------| | Code quality | Hope the AI gets it right | Evaluator grades every output via Playwright | | Architecture | Ad-hoc, drifts over time | Taste invariants enforce rules mechanically | | Dead code | Accumulates silently | Entropy GC cleans automatically | | Bug fixes | No regression check | Post-fix evaluation verifies the fix | | Features | Ship and pray | Generator↔Evaluator loop with contracts | | Documentation | Gets stale | Adapted into machine-parseable specs |

Quick Start

# Auto-detect your runtime and install
npx flyee

# Or specify your runtime
npx flyee --antigravity    # Gemini / Antigravity
npx flyee --claude          # Claude Code
npx flyee --cursor          # Cursor
npx flyee --copilot         # GitHub Copilot
npx flyee --codex           # OpenAI Codex
npx flyee --windsurf        # Windsurf
npx flyee --gemini          # Gemini CLI
npx flyee --opencode        # OpenCode
npx flyee --all             # All detected runtimes

After install, you can start using slash commands in your AI assistant.

Command Reference

🏗️ Build — Create Applications

# Build a complete app autonomously from a prompt
/harness "Create a project management tool with kanban boards"

# Build with deep quality tier (production-grade, more iterations)
/harness "Build a DAW for music production" --tier deep

# Build with PRD socratic gate (asks 12 discovery questions first)
/harness --with-prd "SaaS analytics dashboard for e-commerce"

# Build a frontend-focused project with lite budget
/harness "Landing page for AI startup" --profile frontend-design --tier lite

# Start a new project with full PRD → SDD → Design System pipeline
/new-project my-saas-app

# Quick project without formal PRD
/new-project --quick my-prototype

⚡ Features — Add to Existing Projects

# Add a feature with automatic quality evaluation (Micro-Harness)
/new-task "Add dark mode toggle with system preference detection"

# Add a feature with explicit evaluation (for fix-type tasks)
/new-task "Fix login redirect loop" --eval

# Execute an existing task from your backlog
/execute "Implement notifications system"
/execute 3.3    # By task ID

# Enhance an existing project (discovers architecture, then adds features)
/harness --enhance "Add real-time notifications and user settings page"

# Lightweight feature evaluation
/harness --micro "Add export to CSV in reports page"

🐛 Debug — Fix Issues

# Systematic debugging with post-fix evaluation
/debug

# Debug with automatic hypothesis generation
/debug "Users can't upload files larger than 5MB"

# Debug a specific error
/debug "TypeError: Cannot read property 'map' of undefined in Dashboard"

🏛️ Legacy — Modernize Existing Projects

# Full legacy analysis + modernization with harness evaluation
/harness --legacy ./path/to/project

# Legacy project with alternative command
/legacy-project ./path/to/project

# Enhance-only: align foundation without adding features
/harness --enhance --tier lite

# Enhance with deep refactoring
/harness --enhance "Migrate to App Router and add i18n" --tier deep

✅ Quality — Evaluate & Maintain

# Evaluate a running application (no code changes, just scoring)
/harness --eval-only --profile full-stack

# Evaluate with frontend-design focus
/harness --eval-only --profile frontend-design

# Run entropy garbage collection (dead code, stale docs, pattern divergence)
/harness --gc

# Resume an interrupted harness session
/harness --resume

# Use an existing spec file
/harness --from-spec .flyee/spec/SPEC-my-app.md

🎨 Design System & UI

# Initialize design system with shadcn/ui
/ds-init

# Create design system components
/ds-components Button Card Modal

# Validate design system compliance
/ds-validate

# Build a page using the design system
/page-build /dashboard

# Review UI/UX of an existing page
/review-page /settings

# Holistic project audit across 6 dimensions
/audit

📋 Planning & Documentation

# Create a Product Requirements Document (feeds harness automatically)
/prd "Marketplace for freelance developers"

# Create a Software Design Document (feeds harness automatically)
/tdd new "Payment integration with Stripe"

# Generate task breakdown from SDD
/tdd breakdown

# Strategic planning
/plan "Q3 roadmap for mobile app"

# Structured brainstorming
/brainstorm "How to implement real-time collaboration"

🔄 Task Management

# Mark a task as complete (with verification gates)
/task-complete

# Update task progress
/task-update 2.1 "70% - API done, UI in progress"

# Check task status
/check-task 3.3

# View project status
/status

🧪 Testing & Deployment

# Generate and run tests
/test "auth flow"
/test coverage

# Production deployment with pre-flight checks
/deploy

# Start local preview server
/preview

How It Works

The Harness Loop

Every code change in Flyee passes through a quality gate:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   PLANNER   │────▶│  GENERATOR  │────▶│  EVALUATOR  │
│  (Expand    │     │  (Write     │     │  (Test via   │
│   prompts   │     │   code)     │◀────│   Playwright)│
│   into      │     │             │ fix │             │
│   specs)    │     └─────────────┘     └──────┬──────┘
└─────────────┘                                │
                                          PASS? ──▶ Ship ✅
                                          FAIL? ──▶ Iterate 🔄

Three Pillars

| Pillar | Source | What | |--------|--------|------| | Evaluation Loop | Anthropic | Generator↔Evaluator with sprint contracts and numerical scoring | | Mechanical Enforcement | OpenAI | Taste invariants — linters that inject corrective instructions | | Entropy Management | OpenAI | Automated GC to prevent codebase decay |

All Execution Scenarios → Harness

| Scenario | Command | Harness Mode | Max Iterations | |----------|---------|-------------|----------------| | New project | /harness or /new-project | Full | 5 per sprint | | New feature | /new-task | Micro | 2 | | Bug fix | /debug | Post-fix eval | 2 | | Refactoring | /execute | Micro | 2 | | Legacy project | /harness --legacy | Legacy | 5 per sprint | | Continuation | /harness --enhance | Enhance | 5 per sprint | | Evaluation only | /harness --eval-only | Eval | 1 |

What You Get

🤖 24 Specialist Agents

| Agent | Domain | |-------|--------| | harness-planner | Autonomous spec expansion, PRD/SDD adaptation | | harness-evaluator | QA via Playwright, scoring, micro/post-fix modes | | orchestrator | Multi-agent coordination | | frontend-specialist | Web UI/UX (React, Next.js, Vite) | | backend-specialist | API, databases, server | | mobile-developer | iOS, Android, React Native, Flutter | | security-auditor | OWASP, vulnerability scanning | | debugger | Systematic 4-phase debugging | | game-developer | Game development | | ... and 15 more | Various domains |

🧩 88 Modular Skills

Skills are knowledge modules that enhance agent capabilities:

Taste Invariants — Mechanical code quality enforcement
Agent Legibility — Code optimized for agent comprehension
Entropy Management — Automated GC for agent codebases
Sprint Contract — Generator↔Evaluator negotiation
Harness Doc Adapter — Converts PRD/SDD to harness specs
Evaluation Criteria — Domain-specific scoring profiles
Design System Enforcement — Token-based UI consistency
Testing Patterns — Unit, integration, E2E strategies
Quality Gates — Structured quality verification
Knowledge Persistence — Cross-session memory
Hallucination Guard — Detects empty completions
Stuck Detection — Breaks infinite agent loops
... and 76 more

Architecture

your-project/
├── .agent/              ← Installed by flyee (runtime-specific)
│   ├── agents/          ← 24 specialist agents
│   ├── skills/          ← 88 modular skills
│   ├── workflows/       ← 46 automated workflows
│   ├── scripts/         ← Automation scripts (harness_runner.py)
│   └── bridge/          ← Optional: Flyee SaaS sync
├── .flyee/              ← ALL Flyee artifacts (unified namespace)
│   ├── AGENTS.md        ← Agent-legible project overview
│   ├── harness-state.json  ← Generator↔Evaluator loop state
│   ├── KNOWLEDGE.md     ← Cross-session persistent knowledge
│   ├── spec/            ← Product specs, PRDs, SDDs, Design System
│   │   ├── SPEC-{name}.md
│   │   └── DESIGN-SYSTEM.md
│   ├── rules/           ← Mechanical enforcement
│   │   └── TASTE-RULES.md
│   ├── discovery/       ← Codebase analysis
│   │   └── DISCOVERY.md
│   ├── sprints/         ← Contracts, evaluations, progress
│   │   ├── CONTRACT-S{N}.md
│   │   ├── EVAL-S{N}-I{M}.md
│   │   └── PROGRESS.md
│   ├── audits/          ← Holistic evaluations & screenshots
│   │   └── AUDIT-{date}.md
│   └── decisions/       ← Architecture Decision Records
│       └── ADR-{NNN}.md
├── GEMINI.md            ← Engine file (or CLAUDE.md, COPILOT.md, etc.)
└── your code...

Optional: Flyee SaaS Integration

Connect to the Flyee Platform for team collaboration:

# Configure connection
npx flyee --connect <api-key>

When connected, you get:

📊 Team dashboard with project progress
🔄 Task sync across team members
💰 Cost analytics and budget tracking
📈 Gantt visualization of project timeline
🎯 OKR management and alignment

Without SaaS, flyee works 100% offline with local state in .flyee/.

Supported Runtimes

| Runtime | Status | Config Dir | Engine File | |---------|--------|-----------|-------------| | Antigravity (Gemini) | ✅ | .agent/ | GEMINI.md | | Claude Code | ✅ | .claude/ | CLAUDE.md | | Cursor | ✅ | .cursor/ | .cursorrules | | GitHub Copilot | ✅ | .github/ | COPILOT.md | | OpenAI Codex | ✅ | .codex/ | CODEX.md | | Windsurf | ✅ | .windsurf/ | WINDSURF.md | | Gemini CLI | ✅ | .gemini/ | GEMINI.md | | OpenCode | ✅ | .opencode/ | OPENCODE.md |

Contributing

See CONTRIBUTING.md for guidelines.

License

MIT — see LICENSE.

Built with ❤️ by Bruno Santana