red64-cli

v0.11.0

Published

3 days ago

Red64 Flow Orchestrator - Deterministic spec-driven development CLI

0High
0Medium
0Low

red64-npm

cli spec-driven workflow orchestration

Red64 CLI

30,000 hours of experience building software.The SDLC that makes AI-generated code maintainable.

20 years of building products. 1+ year of AI-first development. Captured in a CLI.

TDD built in. Code smells to avoid. Documentation required. Quality gates enforced. The process that turns AI code into production-ready software. The result? Code that lives and evolves—not legacy the day it ships.

Quick Start · Why Red64 · Features · Documentation

🎯 The Problem

I've spent 20 years building products and writing software. 30,000 hours of experience. Then I went all-in on AI coding tools:

They're incredible for building a feature. But then you start iterating—and you hit a wall:

❌ Code quality goes down the drain
❌ No testing (or tests written after the fact)
❌ No documentation/specs (good luck iterating on anything)
❌ No careful design review, no code review
❌ No quality gates—code smells everywhere
❌ Large commits that can't be easily rolled back
❌ No non-regression tests, so things start breaking

This is the same problem that arises in any team with no processes, no gates, no constraints.

✅ The Solution

The solution is what I've been doing for 20 years: Software Development Life Cycle and Processes. The stuff tech leaders and experience software professional implement in their teams. The stuff that separates "it works" from "it's maintainable."

Red64 CLI captures both:

My 30,000 hours of experience — code smells to avoid, patterns that scale, production wisdom
My process for working with AI — the SDLC that makes AI-generated code maintainable

The process (HOW the software professional works):

Isolate every feature in a branch (git worktree)
Write tests FIRST (TDD built in)
Small atomic commits (one thing per commit)
Document everything (REQUIREMENTS.md, DESIGN.md)
High test coverage enforced
Quality gates at every phase

The expertise (WHAT the software professional builds):

Code smells to avoid (the stuff that breaks at 3 AM)
Patterns and anti-patterns for Python, Next, Ruby, Rails etc...
Stack-specific conventions (Next.js, Rails, FastAPI, etc.)

The result: Code that lives and evolves. We've rewritten features in another language in days because the documentation is so complete.

Legend: work in parallel in two seperate worktrees, preview generated documentation with markdown and diagram rendering

🚀 Quick Start

# Install
npm install -g red64-cli

# Initialize in your project
cd /path/to/your/project
red64 init --stack nextjs

# Start a feature (interactive mode)
red64 start "user-auth" "Add login and registration with JWT"

# Or YOLO mode — no babysitting required
red64 start "shopping-cart" "Full cart with checkout" --sandbox -y

That's it. Red64 generates requirements → design → tests → implementation → documentation.

Each phase has review checkpoints. Each task = one clean commit. Tests first. Docs included.

🔥 YOLO Mode (No Babysitting)

Tired of approving every line?

red64 start "feature-name" "description" --sandbox -y

--sandbox = Docker isolation (AI can't break your system, pulls image from ghcr.io/red64llc/red64-sandbox)
-y = Auto-approve all phases (total autonomy)

Start a feature. Go to lunch. Come back to a completed branch—with tests, docs, and clean commits.

With other tools, YOLO mode means "write code fast with no oversight." With Red64, autonomous mode means "follow the SDLC with no babysitting."

The AI still:

Writes tests FIRST (TDD enforced)
Documents everything (REQUIREMENTS.md, DESIGN.md)
Makes atomic commits (easy to review, easy to rollback)
Passes quality gates (no code smells ship)

Review the PR when it's done. Like a senior engineer delegating to a junior who's been properly onboarded.

🏆 Battle-Tested

We built 6 production products with Red64 at red64.io/ventures:

| Company | Industry | Status | |---------|----------|--------| | Saife | InsurTech | Production | | EngineValue | Engineering Scorecards | Production | | MediaPulse | Digital Presence | Production | | QueryVault | Data Platform | Production | | Kafi (Internal product) | Virtual Executive Assistant | Production |

Same tool. Same encoded experience. Now open source.

💡 Why Red64?

Two Decades of Experience, Encoded

I've spent 20 years building products—30,000 hours of learning what works and what breaks. Then I spent a year going all-in on AI coding tools.

The pattern is always the same:

Week 1: "This is amazing! I shipped a feature in a day!"
Week 4: "Why is everything breaking? Why is the code so messy?"
Week 8: "I'm afraid to touch anything. Time to rewrite."

The missing ingredient? SDLC. The stuff that takes 20 years to learn. The stuff I've been teaching engineers my entire career.

Red64 gives you both:

| What Goes Wrong Without SDLC | Red64 Solution | |------------------------------|----------------| | No tests → things break when you iterate | TDD built in (tests FIRST) | | No docs → can't remember why anything works | REQUIREMENTS.md + DESIGN.md per feature | | Huge commits → can't rollback, can't review | Atomic commits (one task = one commit) | | No quality gates → code smells everywhere | Guardrails from 30K hours of experience | | Babysitting every line → slow, exhausting | Autonomous mode with SDLC guardrails |

What You Get Per Feature

feature-branch/
├── REQUIREMENTS.md      # What we're building and why
├── DESIGN.md            # How it works, architecture decisions
├── TASKS.md             # Atomic breakdown with acceptance criteria
├── src/
│   ├── feature.ts       # Implementation
│   └── feature.test.ts  # Tests (written first)
└── docs/
    └── feature.md       # User-facing documentation

Every decision traceable. Every line has a reason. Code that survives iteration.

📊 Comparison

| Feature | Red64 | Cursor | Copilot | Claude Code | Gemini CLI | Aider | |---------|:-----:|:------:|:-------:|:-----------:|:----------:|:-----:| | 30K hours expertise encoded | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | SDLC/Process enforced | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | Autonomous mode | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | Sandboxed execution | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | | MCP support | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | | TDD enforced (tests first) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | High coverage enforced | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | Auto-generates docs | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | Git worktree isolation | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | Atomic commits enforced | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | | Phase gates with review | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | Code smell guardrails | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | Resumable multi-step flows | ✅ | ❌ | ❌ | ❌ | ❌ | ⚠️ | | Multi-model support | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | | Battle-tested (production) | ✅ 6 cos | N/A | N/A | N/A | N/A | N/A |

Key: ✅ = Built-in & enforced | ⚠️ = Partial/Optional | ❌ = Not available

The difference: Other tools have autonomous modes. Red64 has autonomous mode plus the encoded expertise and enforced process that produces production-quality code.

When to Use Red64

✅ Use Red64 when:

Building complete features (not quick fixes)
You want code with tests, docs, and clean history
You need to walk away and let AI work autonomously
You're tired of babysitting every line
You want code that's safe to refactor

❌ Use other tools when:

Making quick, single-file edits
You want real-time IDE autocomplete
Exploring or prototyping ideas

⚡ Features

Multi-Agent Support

Use your preferred AI:

red64 init --agent claude   # Default
red64 init --agent gemini   # Google Gemini
red64 init --agent codex    # OpenAI Codex

Local Development Stack (Ollama)

Run Red64 with local open-source models via Ollama — no API costs, full privacy:

# One-time setup
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3-coder-next  # ~46GB, runs on 64GB MacBook

# Use with Red64
red64 start "feature" "description" --model qwen3-coder-next --ollama

The --ollama flag configures Red64 to use your local Ollama instance at localhost:11434.

Works with sandbox mode:

red64 start "feature" "description" --model qwen3-coder-next --ollama --sandbox -y

Or set environment variables directly:

export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
red64 start "feature" "description" --model qwen3-coder-next

Recommended local models:

qwen3-coder-next — 80B MoE, 3B active params (best quality)
deepseek-coder-v2 — Strong coding performance
codellama — Meta's code-focused Llama

Note: Local models are slower than cloud APIs (~10-30 tok/s vs instant) and have smaller context windows (32K-64K vs 200K). Best for cost-sensitive development or air-gapped environments.

Smart Resume

Interrupted? Just run start again:

red64 start "shopping-cart" "..."
# Detects in-progress flow, offers to resume

MCP Server Support

Configure MCP servers once, and Red64 automatically injects them into whichever agent you use (Claude, Gemini, or Codex):

# Add an MCP server
red64 mcp add context7 npx -y @upstash/context7-mcp

# List configured servers
red64 mcp list

# Remove a server
red64 mcp remove context7

MCP servers are stored in .red64/config.json and translated into each agent's native config format before invocation. Configs are cleaned up after execution so your personal agent settings stay untouched.

Works in both local and --sandbox mode (stdio servers run inside the container).

Note on Playright

Please note that Playright's capabilities are already included in the Docker Image via Vercel's AI-native browser automation CLI. There's no need to add Playrigh MCP when running in Sanbox mode.

Steering Documents

Customize AI behavior in .red64/steering/:

product.md — Product vision, user personas
tech.md — Stack standards, code smells to avoid
structure.md — Codebase organization

📖 Documentation

🛠 Commands

red64 init --agent gemini     # Initialize Red64 in your project
red64 start <feature> <desc>  # Start a new feature
red64 start ... --sandbox -y  # YOLO mode (autonomous)
red64 status [feature]        # Check flow status
red64 list                    # List all active flows
red64 abort <feature>         # Abort and clean up
red64 mcp list                # List configured MCP servers
red64 mcp add <name> <cmd>    # Add an MCP server
red64 mcp remove <name>       # Remove an MCP server

Flags

| Flag | Description | |------|-------------| | -y, --yes | Auto-approve all phases (YOLO mode) | | --sandbox | Run in Docker isolation (uses GHCR image by default) | | --local-image | Build and use local sandbox image instead of GHCR (init only) | | -m, --model | Override AI model | | -a, --agent | Set coding agent (claude/gemini/codex) | | --ollama | Use local Ollama backend (localhost:11434) | | --verbose | Show detailed logs |

🤝 Contributing

We'd love your help encoding more production wisdom:

Fork the repository
Create a feature branch
Make your changes
Run tests: npm test
Submit a pull request

What we're looking for:

More code smells to catch
Stack-specific best practices
Bug fixes and improvements

📜 License

MIT — Built by Yacin Bahi at Red64.io

The code isn't the asset.The documentation + tests + history is the asset.The code is just the current implementation.

⭐ Star this repo if you believe AI should write code like a senior engineer.