flyee
v0.6.1
Published
Autonomous Software Engineering Harness — 24 specialist agents, 88 skills, 46 workflows. Plan, build, evaluate, iterate — autonomously. All runtimes.
Downloads
970
Maintainers
Readme
Flyee
Autonomous Software Engineering Harness — Plan, build, evaluate, iterate. Autonomously.
What is Flyee?
flyee is an autonomous software engineering harness that transforms any AI coding assistant into a self-correcting build system. It works with any runtime — Antigravity, Claude Code, Cursor, GitHub Copilot, Codex, Windsurf, Gemini CLI, and more.
Instead of just assisting, Flyee enables your AI to plan, build, evaluate, and iterate autonomously — using a GAN-inspired Generator↔Evaluator architecture with mechanical enforcement and entropy management.
Inspired by Anthropic's Harness Design and OpenAI's Harness Engineering research.
Key Differentiators
| Feature | Without Flyee | With Flyee | |---------|---------------|------------| | Code quality | Hope the AI gets it right | Evaluator grades every output via Playwright | | Architecture | Ad-hoc, drifts over time | Taste invariants enforce rules mechanically | | Dead code | Accumulates silently | Entropy GC cleans automatically | | Bug fixes | No regression check | Post-fix evaluation verifies the fix | | Features | Ship and pray | Generator↔Evaluator loop with contracts | | Documentation | Gets stale | Adapted into machine-parseable specs |
Quick Start
# Auto-detect your runtime and install
npx flyee
# Or specify your runtime
npx flyee --antigravity # Gemini / Antigravity
npx flyee --claude # Claude Code
npx flyee --cursor # Cursor
npx flyee --copilot # GitHub Copilot
npx flyee --codex # OpenAI Codex
npx flyee --windsurf # Windsurf
npx flyee --gemini # Gemini CLI
npx flyee --opencode # OpenCode
npx flyee --all # All detected runtimesAfter install, you can start using slash commands in your AI assistant.
Command Reference
🏗️ Build — Create Applications
# Build a complete app autonomously from a prompt
/harness "Create a project management tool with kanban boards"
# Build with deep quality tier (production-grade, more iterations)
/harness "Build a DAW for music production" --tier deep
# Build with PRD socratic gate (asks 12 discovery questions first)
/harness --with-prd "SaaS analytics dashboard for e-commerce"
# Build a frontend-focused project with lite budget
/harness "Landing page for AI startup" --profile frontend-design --tier lite
# Start a new project with full PRD → SDD → Design System pipeline
/new-project my-saas-app
# Quick project without formal PRD
/new-project --quick my-prototype⚡ Features — Add to Existing Projects
# Add a feature with automatic quality evaluation (Micro-Harness)
/new-task "Add dark mode toggle with system preference detection"
# Add a feature with explicit evaluation (for fix-type tasks)
/new-task "Fix login redirect loop" --eval
# Execute an existing task from your backlog
/execute "Implement notifications system"
/execute 3.3 # By task ID
# Enhance an existing project (discovers architecture, then adds features)
/harness --enhance "Add real-time notifications and user settings page"
# Lightweight feature evaluation
/harness --micro "Add export to CSV in reports page"🐛 Debug — Fix Issues
# Systematic debugging with post-fix evaluation
/debug
# Debug with automatic hypothesis generation
/debug "Users can't upload files larger than 5MB"
# Debug a specific error
/debug "TypeError: Cannot read property 'map' of undefined in Dashboard"🏛️ Legacy — Modernize Existing Projects
# Full legacy analysis + modernization with harness evaluation
/harness --legacy ./path/to/project
# Legacy project with alternative command
/legacy-project ./path/to/project
# Enhance-only: align foundation without adding features
/harness --enhance --tier lite
# Enhance with deep refactoring
/harness --enhance "Migrate to App Router and add i18n" --tier deep✅ Quality — Evaluate & Maintain
# Evaluate a running application (no code changes, just scoring)
/harness --eval-only --profile full-stack
# Evaluate with frontend-design focus
/harness --eval-only --profile frontend-design
# Run entropy garbage collection (dead code, stale docs, pattern divergence)
/harness --gc
# Resume an interrupted harness session
/harness --resume
# Use an existing spec file
/harness --from-spec .flyee/spec/SPEC-my-app.md🎨 Design System & UI
# Initialize design system with shadcn/ui
/ds-init
# Create design system components
/ds-components Button Card Modal
# Validate design system compliance
/ds-validate
# Build a page using the design system
/page-build /dashboard
# Review UI/UX of an existing page
/review-page /settings
# Holistic project audit across 6 dimensions
/audit📋 Planning & Documentation
# Create a Product Requirements Document (feeds harness automatically)
/prd "Marketplace for freelance developers"
# Create a Software Design Document (feeds harness automatically)
/tdd new "Payment integration with Stripe"
# Generate task breakdown from SDD
/tdd breakdown
# Strategic planning
/plan "Q3 roadmap for mobile app"
# Structured brainstorming
/brainstorm "How to implement real-time collaboration"🔄 Task Management
# Mark a task as complete (with verification gates)
/task-complete
# Update task progress
/task-update 2.1 "70% - API done, UI in progress"
# Check task status
/check-task 3.3
# View project status
/status🧪 Testing & Deployment
# Generate and run tests
/test "auth flow"
/test coverage
# Production deployment with pre-flight checks
/deploy
# Start local preview server
/previewHow It Works
The Harness Loop
Every code change in Flyee passes through a quality gate:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ PLANNER │────▶│ GENERATOR │────▶│ EVALUATOR │
│ (Expand │ │ (Write │ │ (Test via │
│ prompts │ │ code) │◀────│ Playwright)│
│ into │ │ │ fix │ │
│ specs) │ └─────────────┘ └──────┬──────┘
└─────────────┘ │
PASS? ──▶ Ship ✅
FAIL? ──▶ Iterate 🔄Three Pillars
| Pillar | Source | What | |--------|--------|------| | Evaluation Loop | Anthropic | Generator↔Evaluator with sprint contracts and numerical scoring | | Mechanical Enforcement | OpenAI | Taste invariants — linters that inject corrective instructions | | Entropy Management | OpenAI | Automated GC to prevent codebase decay |
All Execution Scenarios → Harness
| Scenario | Command | Harness Mode | Max Iterations |
|----------|---------|-------------|----------------|
| New project | /harness or /new-project | Full | 5 per sprint |
| New feature | /new-task | Micro | 2 |
| Bug fix | /debug | Post-fix eval | 2 |
| Refactoring | /execute | Micro | 2 |
| Legacy project | /harness --legacy | Legacy | 5 per sprint |
| Continuation | /harness --enhance | Enhance | 5 per sprint |
| Evaluation only | /harness --eval-only | Eval | 1 |
What You Get
🤖 24 Specialist Agents
| Agent | Domain |
|-------|--------|
| harness-planner | Autonomous spec expansion, PRD/SDD adaptation |
| harness-evaluator | QA via Playwright, scoring, micro/post-fix modes |
| orchestrator | Multi-agent coordination |
| frontend-specialist | Web UI/UX (React, Next.js, Vite) |
| backend-specialist | API, databases, server |
| mobile-developer | iOS, Android, React Native, Flutter |
| security-auditor | OWASP, vulnerability scanning |
| debugger | Systematic 4-phase debugging |
| game-developer | Game development |
| ... and 15 more | Various domains |
🧩 88 Modular Skills
Skills are knowledge modules that enhance agent capabilities:
- Taste Invariants — Mechanical code quality enforcement
- Agent Legibility — Code optimized for agent comprehension
- Entropy Management — Automated GC for agent codebases
- Sprint Contract — Generator↔Evaluator negotiation
- Harness Doc Adapter — Converts PRD/SDD to harness specs
- Evaluation Criteria — Domain-specific scoring profiles
- Design System Enforcement — Token-based UI consistency
- Testing Patterns — Unit, integration, E2E strategies
- Quality Gates — Structured quality verification
- Knowledge Persistence — Cross-session memory
- Hallucination Guard — Detects empty completions
- Stuck Detection — Breaks infinite agent loops
- ... and 76 more
Architecture
your-project/
├── .agent/ ← Installed by flyee (runtime-specific)
│ ├── agents/ ← 24 specialist agents
│ ├── skills/ ← 88 modular skills
│ ├── workflows/ ← 46 automated workflows
│ ├── scripts/ ← Automation scripts (harness_runner.py)
│ └── bridge/ ← Optional: Flyee SaaS sync
├── .flyee/ ← ALL Flyee artifacts (unified namespace)
│ ├── AGENTS.md ← Agent-legible project overview
│ ├── harness-state.json ← Generator↔Evaluator loop state
│ ├── KNOWLEDGE.md ← Cross-session persistent knowledge
│ ├── spec/ ← Product specs, PRDs, SDDs, Design System
│ │ ├── SPEC-{name}.md
│ │ └── DESIGN-SYSTEM.md
│ ├── rules/ ← Mechanical enforcement
│ │ └── TASTE-RULES.md
│ ├── discovery/ ← Codebase analysis
│ │ └── DISCOVERY.md
│ ├── sprints/ ← Contracts, evaluations, progress
│ │ ├── CONTRACT-S{N}.md
│ │ ├── EVAL-S{N}-I{M}.md
│ │ └── PROGRESS.md
│ ├── audits/ ← Holistic evaluations & screenshots
│ │ └── AUDIT-{date}.md
│ └── decisions/ ← Architecture Decision Records
│ └── ADR-{NNN}.md
├── GEMINI.md ← Engine file (or CLAUDE.md, COPILOT.md, etc.)
└── your code...Optional: Flyee SaaS Integration
Connect to the Flyee Platform for team collaboration:
# Configure connection
npx flyee --connect <api-key>When connected, you get:
- 📊 Team dashboard with project progress
- 🔄 Task sync across team members
- 💰 Cost analytics and budget tracking
- 📈 Gantt visualization of project timeline
- 🎯 OKR management and alignment
Without SaaS, flyee works 100% offline with local state in .flyee/.
Supported Runtimes
| Runtime | Status | Config Dir | Engine File |
|---------|--------|-----------|-------------|
| Antigravity (Gemini) | ✅ | .agent/ | GEMINI.md |
| Claude Code | ✅ | .claude/ | CLAUDE.md |
| Cursor | ✅ | .cursor/ | .cursorrules |
| GitHub Copilot | ✅ | .github/ | COPILOT.md |
| OpenAI Codex | ✅ | .codex/ | CODEX.md |
| Windsurf | ✅ | .windsurf/ | WINDSURF.md |
| Gemini CLI | ✅ | .gemini/ | GEMINI.md |
| OpenCode | ✅ | .opencode/ | OPENCODE.md |
Contributing
See CONTRIBUTING.md for guidelines.
License
MIT — see LICENSE.
Built with ❤️ by Bruno Santana
