solo-cto-agent

v1.4.0

Published

a month ago

The full CTO loop for solo founders: dual-agent review, 3-round consensus, auto-rework, before/after visual report, natural-language orders, Telegram/Discord control surface, auto-merge.

Downloads

184

solo-cto-agent

The full CTO loop for solo founders — dual-agent review, multi-turn consensus, auto-rework, before/after visual reports, and a Telegram/Discord control surface.

You push code. Two AI agents review it, debate for up to three rounds until they reach consensus, auto-push fixes for any blockers, shoot before/after screenshots of your Vercel preview, and ping your phone on Telegram with action buttons. When you tap ✅ Merge (or set the auto-merge-when-ready label), GitHub merges it once CI is green. You stay in the loop from your phone; you never touch YAML.

npm i -g solo-cto-agent
solo-cto-agent init --wizard         # pick your repos, pick your tier
solo-cto-agent do "fix the auth bug in tribo"   # natural-language work order
# ...PR opens → review → rework → merge, all visible in Telegram.

Surfaces: CLI · GitHub Action · VS Code Extension · Telegram bot · optional Discord mirror.

Languages: English (primary) - 한국어 안내 below.

Quickstart

# 1. Install + wizard (wizard auto-discovers your repos via gh CLI,
#    creates the orchestrator repo on GitHub, sets TRACKED_REPOS,
#    and installs workflows on every product repo you pick.)
npm install -g solo-cto-agent
npx solo-cto-agent init --wizard

# 2. Keys (all set in your shell, then copied to repo secrets by setup)
export ANTHROPIC_API_KEY="sk-ant-..."     # required — Claude review + NL orders
export OPENAI_API_KEY="sk-..."            # required for dual-agent / CTO tier
export ORCHESTRATOR_PAT="ghp_..."         # required — cross-repo dispatch + write
export TELEGRAM_BOT_TOKEN="..."           # optional — PR notifications + /commands
export TELEGRAM_CHAT_ID="..."             # optional
export DISCORD_WEBHOOK_URL="https://..."  # optional — mirror of Telegram output
export VERCEL_TOKEN="..."                 # optional — before/after visual reports

# 3. Run setup-pipeline (reads saved wizard selection; no manual --repos needed)
solo-cto-agent setup-pipeline --org <your-github-org>

# 4. Verify
solo-cto-agent doctor

# 5. Kick off a real work order
solo-cto-agent do "add a monthly ARPU chart to the tribo admin dashboard"
#   → LLM picks target repo, drafts a spec issue, labels agent-claude
#   → claude worker opens a PR
#   → cross-reviewer.js runs 3-round consensus
#   → rework-agent.js pushes fixes if needed
#   → visual-report.yml posts before/after screenshots
#   → combined-pr-gate.yml sends "all checks passed" to Telegram
#   → auto-merge-when-ready label makes GitHub merge on CI green
#   → pr-merge-notify.yml sends final "✅ merged" to Telegram/Discord

Every step above ships end-to-end today. The doctor subcommand tells you anything missing with the exact command to run.

Telegram bot — the phone-first control surface

After you set TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID, the bot gives you:

| Command | What it does | |---|---| | /status [repo] | Open, non-draft PRs + review state across tracked repos | | /list [repo] | Last 10 PRs, one-line summary each | | /rework <pr> | Force a rework cycle on an existing PR | | /approve <pr> | Approve the PR | | /do "<instruction>" | Natural-language work order (same as CLI do) | | /digest | Today's PR activity summary | | /merge <pr> | Immediate merge (admin-only: TELEGRAM_ADMIN_CHAT_IDS) |

Every review / rework / visual-report message includes inline buttons — ✅ Approve · ❌ Reject · 🔧 Rework · 🔀 Merge. Tap to act without leaving Telegram.

If DISCORD_WEBHOOK_URL is set, visual-change screenshots and auto-diagnose reports mirror to Discord as attachments.

External services

| Service | Used for | Required? | Setup | |---|---|---|---| | GitHub | orchestrator repo + product repo workflows | ✅ required | wizard auto-creates orchestrator repo via gh repo create | | Anthropic | Claude consensus review, NL orders, rework | ✅ required | ANTHROPIC_API_KEY env var | | OpenAI | Codex counter-review (dual-agent) | CTO tier | OPENAI_API_KEY env var | | Vercel | preview URLs for before/after visual-report | optional | VERCEL_TOKEN + VERCEL_PROJECT_ID. Works with Netlify / Cloudflare Pages / Render / Railway previews too if their deployment_status webhooks fire — the visual stage resolves the URL from SHA and shoots whichever host serves it. | | Telegram | notifications + CTO commands | optional | /telegram wizard command + bot token | | Discord | optional mirror of Telegram output | optional | DISCORD_WEBHOOK_URL on orchestrator secrets | | Browserless | alternate screenshot provider (skips Playwright install cost) | optional | VISUAL_REVIEW_PROVIDER=browserless + BROWSERLESS_API_KEY |

Compatibility

Stack-agnostic. The toolkit never touches your application code directly. Agents produce patches that land on your PR branch; your repo's existing CI / build tools verify them. Works with Next.js, Vite, Remix, SvelteKit, FastAPI, Rails — anything with a PR workflow.
Hosting-agnostic. Vercel is the default for the visual-report preview URL resolver, but any host that ships preview URLs tied to commit SHAs works. For hosts without that (plain Docker, bare-metal, self-hosted): set VISUAL_REVIEW_PROVIDER=off and the pipeline just skips the visual stage — everything else still runs.
Database-agnostic. The toolkit doesn't read or write your database. Postgres (Supabase / Neon / PlanetScale-Postgres), MySQL, SQLite, MongoDB — all fine. Agent workers DO see your schema files if they're in the repo (prisma/schema.prisma, supabase/schema.sql, etc.) so suggested fixes can be schema-aware.
Docker. If your product repo is dockerized, nothing changes — GitHub Actions runners handle the build per your existing Dockerfile. The agents commit to the PR branch, your CI rebuilds the container, the auto-merge gate waits on that CI.
Windows / macOS / Linux for the CLI. GitHub Actions runners are Linux for all automation paths.

Platform-specific CLI setup

macOS / Linux

export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."   # optional for cowork-main, required for codex-main
solo-cto-agent doctor

Windows PowerShell

$env:ANTHROPIC_API_KEY="sk-ant-..."
$env:OPENAI_API_KEY="sk-..."   # optional for cowork-main, required for codex-main
solo-cto-agent doctor

If you choose codex-main during the wizard, also install:

GitHub CLI: cli.github.com
GitHub PAT for cross-repo dispatch: github.com/settings/personal-access-tokens/new
- Classic PAT: check repo + workflow scopes.
- Fine-grained PAT: grant Contents: write, Issues: write, Pull requests: write, Actions: write on every product repo listed in setup-pipeline --repos. The orchestrator pushes fix commits and posts comments on those repos on your behalf.

If you choose codex-main, template drift audit is enabled by default:

local check: solo-cto-agent template-audit
scheduled check: template-audit.yml in the orchestrator repo
policy: report-only by default, so drift is detected but never auto-overwritten

I made this because I got tired of using AI coding tools that were good at writing code, but still left me doing all the messy CTO work around it.

The hard part was rarely "write the feature." It was everything around the feature:

catching missing env vars before a deploy breaks
not re-explaining the same stack every new session
stopping error loops before they waste half an hour
getting honest pushback on ideas instead of empty encouragement
cleaning up UI that looks obviously AI-generated

This repo is my attempt to package those habits into a small set of reusable skills. It is not magic. It is not a replacement for judgment. It is just a better operating system for the kind of AI agent I wanted to work with.

What this is

solo-cto-agent is an opinionated CTO toolkit for solo founders, indie hackers, and small teams using AI coding agents in their build workflow.

Primary workflow: Claude Cowork + OpenAI Codex. Cowork-only is supported for single-agent use, but this document assumes Cowork + Codex unless noted.

The point is simple:

less repetitive setup work
less context loss between sessions
two models cross-checking each other's review (not one model's opinion)
actual criticism before you commit to bad ideas
secrets caught before they leave your machine

What changes in practice

This is the difference I wanted in day-to-day use:

| Without this | With this | | -------------------------------------------- | -------------------------------------------------------------- | | Same build error over and over | Circuit breaker stops the loop and summarizes the likely cause | | "Please add this manually to your dashboard" | Agent checks setup earlier and asks once when needed | | New session, same explanation again | Important decisions get reused | | Rounded-blue-gradient AI UI | Design checks push for more intentional output | | "Looks good to me" feedback | Review forces actual criticism | | Agent asks permission for every tiny step | Low-risk work gets done without constant back-and-forth |

Production numbers

This is running on three private repos (Next.js + Supabase, Vite + React, Next.js + Prisma). Numbers from the last 30 days:

| Metric | Value | | --- | --- | | PRs opened | 53 | | PRs merged | 48 | | Mean time to merge | 0.64 hours | | Test suite | 996 tests, 57 files, all passing | | CLI commands | 25 subcommands | | Skills | 8 (44 reference docs) | | npm version | 1.3.2 |

Dual-agent cross-review and Managed Agents deep review are live and tested against real diffs. Decision tracking is wired but the decision queue has not produced enough data for meaningful stats yet.

Who this is for

This repo is probably useful if you:

build mostly alone or with a very small team
already use Claude Cowork (optionally with Codex) as your primary AI coding workflow
want the agent to take more initiative
care about startup execution, not just code completion
are okay with opinionated defaults

It is probably not a good fit if you:

work in a tightly locked-down enterprise environment
do not want agents touching files or setup
want every action manually approved
prefer a neutral framework-agnostic starter pack with very conservative defaults

Operating modes

Choose a mode during init --wizard. The same package supports both.

| Mode | Default behavior | Best for | |---|---|---| | codex-main | Full CI/CD automation (GitHub Actions, auto-review, auto-rework) | Stable GitHub Actions + webhook environments | | cowork-main | Local-first with manual sync (wizard + local review/sync) | Offline work, minimal external dependencies |

codex-main — PR opened → Claude review → Codex cross-review → rework loop → merge conditions. Agent scores auto-updated in orchestrator repo.

cowork-main — Local review/learn commands work without GitHub Actions. sync --apply pulls latest scores/patterns when you choose.

The selected mode is saved in ~/.claude/skills/solo-cto-agent/SKILL.md.

Tool entry points

This pack is designed for Cowork + Codex. Start from the Claude entry point and expand only if you need automation.

| Tool | Entry point | Status | |---|---|---| | Claude (Cowork + CLI) | docs/claude.md | Supported (primary) |

Gamma users can still use the toolkit today, but Gamma is not a core runtime. The intended flow is:

use solo-cto-agent to generate, review, tighten, and validate the content or product narrative
move the final output into Gamma for presentation publishing

That keeps the core position stable: Cowork + Codex are the operating surface, Gamma is a downstream publishing surface.

Examples

Real-world flows, four-part shape (input -> agent behavior -> output -> pain reduced). Start with whichever subfolder matches your bottleneck:

examples/build/ - writing features, escaping recurring error loops
examples/ship/ - pre-deploy env lint, idempotent release pipeline
examples/review/ - dual-review blockers, UI/UX vision gates
examples/founder-workflow/ - session brief, idea critique

If you want the live codex-main proof first, start here:

docs/codex-main-validation.svg - one-page proof card for codex solo and codex + cowork
docs/codex-main-live-validation.md - repeatable runbook for fresh codex solo and codex + cowork validation on real repos
examples/ship/codex-main-setup-on-live-project.md - real full-auto install on a private Next.js app
examples/review/codex-main-codex-solo-routing.md - verified single-agent Codex routing path
examples/review/codex-main-codex-plus-cowork.md - verified dual-agent routing path
examples/review/codex-main-live-pr-review.md - real PR-open automation timings and outputs
examples/founder-workflow/codex-main-live-rework-and-digest.md - real rework-round comments and scheduled digest behavior

See examples/README.md for the full index.

한국어 안내

solo-cto-agent는 Claude Cowork + OpenAI Codex 워크플로우에 최적화된 실전형 CTO 스킬팩입니다. "코드 작성" 자체가 아니라, 배포/리뷰/설계/의사결정 전반에서 더 나은 판단을 돕는 것이 목표입니다.

핵심 포인트:

반복되는 빌드/배포 에러는 circuit breaker가 루프를 자동 차단합니다.
빈말 리뷰가 아닌 dual-review + cross-check로 실질적인 블로커를 잡습니다.
UI/UX 디자인에서 vision 체크로 뻔한 AI 디자인을 막습니다.
세션 큐/브리핑/메모리로 컨텍스트 손실을 줄입니다.

빠른 시작:

npm install -g solo-cto-agent
solo-cto-agent init
export ANTHROPIC_API_KEY="sk-ant-..."   # https://console.anthropic.com/settings/keys
solo-cto-agent doctor                    # 설치 상태 확인 + 누락 안내
solo-cto-agent review                    # 첫 리뷰 실행

가이드 링크:

Claude 엔트리: docs/claude.md
예제 모음: examples/README.md
설치/셋업(한국어): docs/cowork-main-install.md
설정/커스터마이징: docs/configuration.md
Tier 비교/예시: docs/tier-matrix.md, docs/tier-examples.md
CTO 티어 정책: docs/cto-policy.md
외부 루프 정책: docs/external-loop-policy.md
피드백 가이드: docs/feedback-guide.md
스킬 슬리밍: docs/skill-slimming.md

What's inside

solo-cto-agent/
  bin/                        # CLI (25 commands)
    cli.js
    self-evolve/              # 9 modules: error-collector, quality-analyzer, skill-scout, ...
  skills/
    build/      SKILL.md + references/   # prereq scan, error loop breaker, deploy checklist
    ship/       SKILL.md + references/   # deploy monitor, circuit breaker, recovery loop
    craft/      SKILL.md + references/   # typography, color, spacing, anti-slop
    spark/      SKILL.md + references/   # market scan, unit economics, risk framing
    review/     SKILL.md + references/   # 3-perspective critique, scoring, synthesis
    memory/     SKILL.md + references/   # session context, knowledge articles, episode log
    orchestrate/ SKILL.md               # multi-agent routing, CI/CD dispatch
    self-evolve/ SKILL.md               # error pattern learning, skill improvement, weekly report
    _shared/                             # agent-spec, skill-context (shared across skills)
  templates/
    orchestrator/             # full orchestrator repo scaffold (workflows, agents, api, ops)
    product-repo/             # product repo scaffold (workflows, STATE.md, .env.example)
    builder-defaults/         # routing-policy.json, agent-scores.json
    workflows/                # solo-cto-review.yml (3-pass auto-review)
  tests/                      # 996 tests across 57 files
  benchmarks/                 # effectiveness reports, metrics
  docs/                       # claude.md, tier-matrix, configuration, policies
  examples/                   # real-world flows: build, ship, review, founder-workflow
  completions/                # bash + zsh tab completion

Three Axes: Tier / Agent / Mode

At a glance:

| | Cowork (semi-auto) | Codex (full-auto) | |----------|--------------------|-----------------------| | Builder | local + manual review | CI dispatch + auto-fix | | CTO | local + dual-agent cross-review | CI dispatch + dual + cross-review + scoring |

Cowork runs in your terminal with you in the loop. Codex runs in CI and reworks itself until the PR passes.

solo-cto-agent is configured across three independent axes. You choose each based on your workflow.

| Axis | Decision | Options | |---|---|---| | Tier | Scope of capability | Builder / CTO | | Agent | Who reviews | Cowork (Claude) / Cowork + Codex | | Mode | Automation depth | Semi-auto (cowork-main) / Full-auto (codex-main) |

Quick pick if you are unsure:

Start with Builder + Cowork (single Claude agent, semi-auto, optional Telegram bot).
Move to CTO + Full-auto when you want dual-agent cross-review and always-on CI/CD across repos.

Agents (summary)

| Agent | What it means | When to use | |---|---|---| | Cowork (Claude) | single-agent review and fixes | cost-sensitive, fast iteration | | Cowork + Codex | dual review + cross-check | higher confidence, higher cost |

Modes (summary)

| | Semi-auto (cowork-main) | Full-auto (codex-main) | |---|---|---| | Runtime | Cowork desktop + CLI | GitHub Actions + orchestrator | | Triggers | manual / scheduled | webhook + repository_dispatch | | Data freshness | manual sync (dry-run default) | auto-commits scores + patterns | | Infra | local-first, minimal | CI/CD + orchestrator repo | | Best for | low infra, private repos | full automation, multi-repo |

Mode notes:

Semi-auto keeps network side-effects off by default. You run sync --apply only when you want remote data.
Full-auto assumes CI/CD is active and runs reviews, scoring, and reporting automatically.

Full-auto requires:

an orchestrator repo
GitHub Actions secrets: ORCHESTRATOR_PAT, ANTHROPIC_API_KEY, OPENAI_API_KEY
pipelines installed via setup-pipeline or setup.sh

Full-auto adds:

auto reviews + rework dispatch
decision queue + daily briefing
agent scores + routing
UI/UX quality gate + visual checks

Tiers (summary)

Not sure which tier? One question:

Solo dev shipping code with one Claude agent? → Builder (default, recommended for most users)
Want dual-agent cross-review (Claude + Codex) and multi-repo CI/CD? → CTO

| Tier | Includes | Agents | Extras | Recommended for | |---|---|---|---|---| | Builder | spark + review + memory + craft + build + ship | solo Claude | optional Telegram bot for PR notify/approve | solo dev shipping | | CTO | Builder + orchestrate | Claude + Codex (dual-agent cross-review) | agent scoring, routing, decision queue, daily briefing | multi-agent CI/CD across repos |

Details: docs/tier-matrix.md, docs/tier-examples.md, docs/cto-policy.md, docs/cowork-main-install.md, docs/configuration.md.

Install

npm (recommended)

npm install -g solo-cto-agent
solo-cto-agent init

Platform notes

macOS: supported directly. zsh is the default shell assumed by most examples.
Windows: supported for the CLI. Use PowerShell environment variables during setup. Some Cowork-side shell snippets still assume POSIX-style commands.
Gamma: supported as a downstream presentation tool for decks/docs/content, not as a primary execution surface.

Maintainer note (publish)

Publishing requires either:

an Automation token with Bypass 2FA enabled, or
a 6-digit OTP from an Authenticator app

Quick install (Claude Code)

curl -sSL https://raw.githubusercontent.com/seunghunbae-3svs/solo-cto-agent/main/setup.sh | bash

Manual install

git clone https://github.com/seunghunbae-3svs/solo-cto-agent.git
cp -r solo-cto-agent/skills/* ~/.claude/skills/
cat solo-cto-agent/autopilot.md >> ~/.claude/CLAUDE.md

Only want one skill?

cp -r solo-cto-agent/skills/build ~/.claude/skills/

Then open the skill file and replace the placeholders with your actual stack. Example:

{{YOUR_OS}}        -> macOS / Windows / Linux
{{YOUR_EDITOR}}    -> Cowork / VSCode / etc.
{{YOUR_DEPLOY}}    -> Vercel / Railway / Netlify / etc.
{{YOUR_FRAMEWORK}} -> Next.js / Remix / SvelteKit / etc.

Using with Cowork + Codex

Codex is a first-class target. Use the SKILL.md files directly as your instruction source. No extra Codex-specific files are required - Cowork reads SKILL.md natively, and Codex (via OpenAI API) is invoked through the CLI when both keys are set.

Shell completions

Tab completion for all commands, flags, and options.

# Bash — add to ~/.bashrc
source <(solo-cto-agent --completions bash)

# Zsh — add to ~/.zshrc
source <(solo-cto-agent --completions zsh)

How I use autonomy

Most agent workflows feel too timid in the wrong places and too reckless in the dangerous ones. So I split behavior into 3 levels.

L1 - just do it

Small, low-risk work should not need approval. Examples:

fixing typos
creating obvious files
loading context
choosing an output format
doing routine search or setup checks

L2 - do it, then explain

If something is a bit ambiguous but still low-risk, the agent makes the best assumption, does the work, and tells me what it assumed. That is usually better than spending 10 messages clarifying something that could have been resolved in one pass.

L3 - ask first

Some things still need explicit approval:

production deploys
schema changes
cost-increasing decisions
anything sent under my name
actions that could cause irreversible damage

That split has worked much better for me than asking permission every 30 seconds.

Skills

build

This is the one I use most. Its job is to reduce the annoying parts of implementation work:

check prerequisites before coding
catch missing env vars, packages, migrations, or config earlier
keep scope from drifting
stop repeated error loops
keep build and deploy problems from bouncing back to the user too quickly

The core idea is simple:

do more of the setup thinking before writing code, not after something fails.

ship

The job is not done when the code is written. It is done when the deploy works.

This skill treats deploy failures as part of the work:

monitor the build
read the logs
try reasonable fixes
stop when a circuit breaker is hit
escalate clearly instead of spiraling

craft

This exists because AI-generated UI often has a very obvious look. Too many gradients. Too much rounded everything. Too many generic SaaS defaults that look "fine" but still feel cheap.

This skill is an opinionated design filter:

typography rules
color discipline
spacing consistency
motion sanity
anti-slop checks

It does not guarantee great design, but it helps avoid lazy AI design.

spark

For idea work, I wanted something better than "this market is huge."

This skill takes an early idea and forces it through structure:

market scan
competitors
unit economics
scenarios
risk framing
PRD direction

Useful when an idea is still vague but you need something more testable.

review

This skill is intentionally not friendly. It looks at a plan from three perspectives:

investor
target user
smart competitor

The point is to expose weak points early, not to make the founder feel good.

memory

This is for reducing repeat explanation and preserving useful context.

Not everything needs to be remembered forever. But decisions, repeated failure patterns, and project context should not disappear every session.

orchestrate

This is the CTO-tier skill. It handles multi-agent routing when you have both Claude and Codex running.

Decides which agent handles which task, dispatches reviews across repos, tracks agent scores over time, and runs the daily briefing loop. If you only use one agent, you do not need this.

self-evolve

This runs in the background after work is done. It collects error patterns from failed builds, analyzes rework cycles to find recurring issues, and adjusts skill behavior based on what keeps going wrong.

Nine modules: error-collector, quality-analyzer, rework-learner, skill-improver, skill-scout, feedback-collector, external-trends, weekly-report, and the orchestrator that ties them together. Most of it is invisible unless you check the weekly report or the error-patterns file.

Skill slimming

When skills grow past 150 lines, most of that weight is reference data the agent doesn't need on every activation. The references/ pattern splits hot-path logic from cold-path data, cutting token costs by 58-79% per skill without losing functionality.

See docs/skill-slimming.md for the pattern, measured results, and how to apply it.

Feedback and personalization

The system learns from CI/CD events automatically, but you can accelerate it with explicit feedback. See docs/feedback-guide.md for how to send feedback, what categories exist, and how the routing engine uses it.

Design principles

Agent does the work, user makes decisions

If the agent can reasonably figure something out, it should do that. The user should spend time on judgment calls, not repetitive setup.

Risks before strengths

Good review starts with what is broken, vague, or contradictory. Praise comes after that.

Facts over vibes

If a number appears, it should have a source, a formula, or a clear label like:

[confirmed]
[estimated]
[unverified]

Pre-scan, don't surprise

A lot of agent frustration comes from late discovery: missing env vars, missing package installs, missing DB changes, missing credentials. This pack tries to catch those earlier.

Keep the loop bounded

If the same problem keeps happening, stop and report clearly. An agent that loops forever is worse than one that asks for help.

What this is not

This is not:

a hosted product
a full framework
a universal standard for agent behavior
a replacement for technical judgment

It is just a set of operating rules that worked well enough for me to package and share.

Recommended first use

If you want to try this without changing your whole workflow:

install only build and review
replace the stack placeholders
use them on one real feature or bug
see whether the agent becomes more useful or just more opinionated

That is the easiest way to tell whether this fits how you work.

Support

If this tool saves you time, consider sponsoring the project. Every contribution helps maintain and improve solo-cto-agent.

License

MIT - fork it, modify it, ship it.

Post-install verification

After installation, verify the pack works:

Check skills exist in your agent directory (e.g. ~/.claude/skills)
Confirm each skill has valid frontmatter (--- block)
Run a simple prompt like "Use build to fix a TypeScript error"
Run bash scripts/validate.sh to check file integrity
Confirm no auto-merge or deploy happens without approval

If something fails, re-run setup.sh --update and check again.

Sample output

Build (preflight + fix)

[build] pre-scan: missing env vars: STRIPE_SECRET_KEY, STRIPE_WEBHOOK_SECRET
[build] request: please provide the 2 keys above before proceeding
[build] applied: fixed prisma client mismatch
[build] build: npm run build -> OK
[build] report: 3 files changed, 1 risk flagged, rollback path noted

Review + rework

[review] Codex: REQUEST_CHANGES (blocker: missing RLS policy)
[review] Claude: APPROVE (nits: copy, spacing)
[rework] round 1/2 -> fixed RLS policy + added tests
[decision] recommendation: HOLD until preview verified

FAQ

Q: Do I need all eight skills? A: No. Start with build and review. Add the others if you find yourself wanting them. Each skill is independent.

Q: Why does the agent stop retrying after 3 attempts? A: Infinite loops waste more time than they save. If something fails 3 times, the agent summarizes what it knows and hands control back to you.

Q: Why is the design skill so opinionated? A: Because default AI output tends toward the same rounded-gradient look. The rules push for more intentional choices. Override whatever doesn't fit your taste.

Q: Does this work outside Cowork + Codex? A: Yes. Provider abstraction supports any OpenAI-compatible or Anthropic-compatible API (Ollama, LM Studio, Groq, etc.). Set OPENAI_API_BASE or ANTHROPIC_API_BASE to point at your provider. See docs/configuration.md for setup details.

Q: Why a separate orchestrator repo? A: The orchestrator holds cross-repo logic (agent routing, score tracking, visual baselines, daily briefings) that doesn't belong in any single product repo. It dispatches workflows across your product repos and collects results centrally. If you only have one product repo, you can still use it - the separation keeps CI/CD config out of your application code.

Q: How much do the API calls cost? A: Typical per-PR cost depends on your review depth. A Claude auto-review of a medium PR (under 500 lines changed) uses roughly 5K-15K input tokens and 1K-3K output tokens. At Anthropic's Sonnet pricing that is well under $0.10 per review. If you add Codex cross-review (CTO tier), add roughly $0.05-0.15 per review for the OpenAI side. A solo dev doing 2-3 PRs per day can stay comfortably under $5/month on Anthropic and $5/month on OpenAI. Visual checks (Playwright screenshots) use no API tokens - they run in GitHub Actions compute only.

Q: How do I set up the 3-pass auto-review on my repos? A: Copy the workflow below to .github/workflows/solo-cto-review.yml in your repo and add ANTHROPIC_API_KEY to your repo secrets. Every PR will automatically get a 3-pass review:

Pass 1 — Code Review: structure, security, performance, bugs
Pass 2 — Cross-Check: validates Pass 1 findings, catches missed issues
Pass 3 — UI/UX Review: accessibility, responsiveness, usability

Final verdict: APPROVE (merge-ready) or REQUEST_CHANGES (fix and push to re-trigger). See the workflow file for the full YAML.

Q: Can I use this without GitHub Actions? A: The skills (init, build, review, craft, etc.) work independently of CI/CD. You can install them and use them in your editor without ever running setup-pipeline. The CI/CD automation is an optional layer on top.

Q: How do I keep local skill data in sync with CI/CD results? A: Run solo-cto-agent sync --org <your-org>. This fetches agent scores, workflow results, PR reviews, and error patterns from your orchestrator repo via the GitHub API. By default it runs in dry-run mode (display only). Add --apply to merge remote data into local files. This way you always preview what will change before any local files are modified.

Q: What does a real review look like? A: Here is a trimmed example from a production PR review:

[claude-review] PR #42 - Add group-buying countdown timer
  CHANGES_REQUESTED
  - Missing error boundary around countdown component
  - useEffect cleanup not handling unmount (memory leak risk)
  - Hardcoded timezone offset - use Intl.DateTimeFormat instead
  - Price calculation should use Decimal, not float
  Good: proper loading states, accessible aria-labels

The review targets real issues (memory leaks, timezone bugs, floating-point money) rather than style nits.

Q: What happens on Day 1 with no data? A: Everything works - skills activate, build checks run, reviews trigger. The system starts empty and accumulates value over time. Agent scores begin tracking from the first PR. Error patterns grow as the failure catalog catches new issues. By session 10+ you will notice fewer repeated errors and more context-aware reviews.

Q: Does this make network calls automatically? A: No. status reads only local files. sync is manual and opt-in - you run it explicitly when you want CI/CD data pulled from GitHub. Error pattern merging from sync is dry-run by default; use sync --apply to actually write changes. No background network activity, no telemetry.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

solo-cto-agent

Quickstart

Telegram bot — the phone-first control surface

External services

Compatibility

Platform-specific CLI setup

What this is

What changes in practice

Production numbers

Who this is for

Operating modes

Tool entry points

Examples

한국어 안내

What's inside

Three Axes: Tier / Agent / Mode

Agents (summary)

Modes (summary)

Tiers (summary)

Install

npm (recommended)

Platform notes

Maintainer note (publish)

Quick install (Claude Code)

Manual install

Only want one skill?

Using with Cowork + Codex

Shell completions

How I use autonomy

L1 - just do it

L2 - do it, then explain

L3 - ask first

Skills

build

ship

craft

spark

review

memory

orchestrate

self-evolve

Skill slimming

Feedback and personalization

Design principles

Agent does the work, user makes decisions

Risks before strengths

Facts over vibes

Pre-scan, don't surprise

Keep the loop bounded

What this is not

Recommended first use

Support

License

Post-install verification

Sample output

FAQ