@unimatrix27/ralph-harness
v1.1.1
Published
Throwaway-EC2 loop that picks one ready-for-agent GitHub issue, implements it, opens a PR, runs one auto-review pass, then terminates.
Readme
ralph-harness
Throwaway-EC2 loop that picks one ready-for-agent GitHub issue, implements it
on a fresh branch, opens a PR, and incorporates one auto-review pass — then
terminates.
Generic. Target repo + target-specific config supplied at runtime.
Status: iteration 1, in progress (single-fire from laptop). See issue #1 for the PRD and the open issues for the slice plan.
Out of scope (iteration 1)
Iteration 1 is single-fire-from-laptop and intentionally minimal. The following are deferred to later iterations:
- Iteration 2 — management EC2. A long-lived management box that
schedules
fire.shitself on a cron so the laptop stops being in the loop. Reuses the slice 5 launcher and the slice 6 cloud-init module unchanged. - Iteration 2 — Lambda janitor. Out-of-band sweeper that
force-terminates any
Project=ralphEC2 still running pastMaxLifetimeMin, defending against a launcher process killed before its ceiling fires. - Iteration 3 — custom AMI. Bake the dependency install (Node, .NET,
gh, Docker,uv,claude,jq,yq) so cold-start drops from minutes to seconds. Iteration 1 installs at boot viadotnet-install.shetc. - Iteration 3 — spot + ARM (
t4g). Iteration 1 firesm7a.xlargeon-demand for predictability under the wall-clock backstop. - Iteration 3 — multi-target dispatch. One run, one target. Routing across multiple target repos is deferred.
- Iteration 3 — observability dashboard. Iteration 1 ships per-instance CloudWatch streams plus the milestone-log GitHub issue (caveman-format). A grafana/CW-dashboard view is later.
Architecture: iteration-2-extensible
The slice 5 launcher (lib/fire-launcher.sh) and slice 6 cloud-init module
(lib/cloud-init/bootstrap.sh) are written so the iteration-2 management
EC2 can call fire::run exactly the way the laptop does today. No core
script needs a rewrite to move the trigger off the laptop — the management
box just needs the same env (RALPH_TARGET_REPO, IAM permissions for the
launcher) and a cron entry. See docs/smoke-test.md
for the manual end-to-end runbook iteration 2 will automate.
OAuth-vs-API-key migration footnote
Iteration 1 ships OAuth-via-Keychain. The EC2 worker authenticates to
Claude using the OAuth credential synced from the operator's macOS Keychain
into SSM (slice 4). Switching to a long-lived API key is a one-file
change in lib/cloud-init/bootstrap.sh (the boot__fetch_secrets step):
swap the SSM key, drop the credential file at the API-key consumer's
expected location instead. No other module changes. Tracked as iteration-2
work in issue #7.
OAuth caveats that apply today (and will go away if you migrate to an API key):
- Plan limits. The EC2 worker burns the engineer's Claude plan limits on every run.
- Rotation. Each desktop
claude /loginmay invalidate the prior refresh token. Re-runralph-sync-credentialimmediately after every login, or the worker will fail at the firstclaude --printcall. - Concurrent use unverified. Simultaneous use of the same OAuth credential by the desktop app and an EC2 worker is not validated. Assume one active consumer at a time.
Public-safe
This repository contains zero target-specific identifiers. Every target knob is
supplied at runtime via the target repo's .ralph/config.yaml and via
environment variables / SSM. See CONTRIBUTING.md for the
contract.
What's here today
Slice 10 — end-to-end smoke-test runbook (HITL):
docs/smoke-test.md— operator-driven runbook that exercises the full chain on a fresh AWS account:bootstrap-aws→sync-credential→ seed PAT →fire→ tail CloudWatch → verify PR → verify cleanup. Acceptance signal is "every phase reachedPHASE_ENDand the instance isterminated."
Slice 1 — target-config-schema:
docs/config-schema.md— full schema for the.ralph/config.yamlfile every target repo must commit.lib/target-config-schema.sh— sourceable bash module that loads and validates a target repo's.ralph/config.yaml. Fails loud with a useful message on any error.bin/load-config— thin CLI over the validator. Exits 0 on valid input, non-zero on bad input.
Slice 2 — github-state-mutator:
lib/github-state-mutator.sh— idempotent shell wrappers aroundghfor the four state mutations the orchestrator needs:swap_label,comment_issue,find_or_create_milestone_log_issue,append_caveman_log.bin/gsm— CLI for manual verification against a sandbox repo.
Slice 3 — aws-bootstrap (TS, slice-4 port):
src/lib/aws-bootstrap.ts— idempotentensure*functions for every AWS-side resource the harness needs (KMS aliasalias/ralph, SSM SecureString placeholders, EC2 IAM role + instance profile with minimum-scope inline policy, security group in the default VPC, CloudWatch log group) plus the target-sideagent-stucklabel. Implemented over the AWS SDK v3 clients (@aws-sdk/client-{kms,ssm,iam,ec2,cloudwatch-logs,sts}).ralph-bootstrap-aws(src/bin/ralph-bootstrap-aws.ts) — single-shot CLI: reads config from env, ensures every resource, second run is a clean no-op. Region is forced toeu-central-1.
Slice 5 — fire-launcher (single-fire EC2 + CloudWatch streaming):
lib/fire-launcher.sh— fires one throwaway EC2 ineu-central-1(m7a.xlarge, 60 GB gp3, AL2023 from public SSM AMI parameter, default-VPC public subnet, auto-assigned public IP, IMDSv2 required) using the bootstrapped IAM instance profile and security group. Tags every instance + volumeProject=ralphplus a UTCLaunchedAttimestamp andMaxLifetimeMin. Launches with--instance-initiated-shutdown-behavior terminate, then pollsdescribe-instancesuntilterminated. On the 75-minute ceiling the launcher force-terminate-instancesand exits non-zero.bin/fire.sh— single-shot launcher CLI.
Slice 9 — review call (post-PR) + post-hoc agent-stuck detection:
prompts/review.md— generic review prompt template. Target context (repo, default branch, work dir, build/test commands, picked issue number, PR number, PR branch, configuredreview_bot.usernameand.source, optionalprompt_extensions.review) is injected via{{...}}placeholders; the template itself contains zero target-specific identifiers.lib/ec2-orchestrator.sh—orch::runnow chains a review call onto thePR_OPENEDbranch from slice 8: bashsleep RALPH_REVIEW_WAIT_SEC(default600s — the configured external review bot's window), renderprompts/review.md, invokeclaude --print, wrap the call inPHASE_START phase=review issue=<n> pr=<m>/PHASE_END phase=review duration_s=… status=…markers (status comes from the result file), verify/tmp/ralph/review-result.json, branch on itsstatus. Right after the discoveryPICKEDbranch the orchestrator emits a stablePICKED_ISSUE=<n>marker so the launcher's post-hoc check can correlate hard-killed instances back to a source issue.- The review call writes
/tmp/ralph/review-result.jsonwith one of two shapes:{"status":"NO_REVIEW","reason":"…"}— no comments / reviews from the configuredreview_bot.username(with matchingreview_bot.source) within the 10-minute window. No revision, no caveman log entry, no state mutation.{"status":"REVISION_APPLIED","issue":<n>,"pr_number":<m>,"summary":"…","gotcha":"…"}— exactly ONE revision pass: the call addresses the configured bot's verdict, runs build + test until green, commits with areview: address …message, and pushes to the same PR branch. No follow-up rounds, no agent-vs-agent ping-pong.
- Bash branches on
review-result.json.status:NO_REVIEW→OUTCOME=pr_opened issue=<n> pr=<m> review=none;REVISION_APPLIED→OUTCOME=pr_opened issue=<n> pr=<m> review=revisedand one caveman-format comment is appended to the milestone-log issue viagsm::append_caveman_log(fromlib/github-state-mutator.sh, bundled into the EC2 user-data alongside the orchestrator). - Multi-reviewer case: only the configured
review_botis consulted; Copilot, humans, and other bots on the same PR are ignored. lib/fire-launcher.sh— embedsprompts/review.mdalongside discovery + implementation, exportsRALPH_REVIEW_PROMPT=/opt/ralph/prompts/review.md, and bundleslib/github-state-mutator.shinto the user-data so the orchestrator can callgsm::append_caveman_logon the worker. Afterwait_for_terminated(regardless of clean exit or wall-clock breach) the launcher runs a post-hoc agent-stuck check from the laptop:- List target-repo PRs whose body contains
<!-- ralph-launch: <RALPH_LAUNCH_TAG> -->(the marker the impl call embeds). If at least one is found, the run is treated as a clean termination. - Otherwise, fetch the per-instance CloudWatch stream
(
aws logs filter-log-events) for the orchestrator'sPICKED_ISSUE=<n>marker. If found, apply the configuredagent_stuck_label(defaultagent-stuck) to that source issue viagh issue edit. Covers both wall-clock-killed and orchestrator-crashed cases where the impl call could not self-label. - If neither a tagged PR nor a
PICKED_ISSUEmarker is recoverable, the post-hoc check is a no-op (clean exit).
- List target-repo PRs whose body contains
Slice 8 — implementation call:
prompts/implementation.md— generic implementation prompt template. Target context (repo, default branch, work dir, build/test commands, branch prefix, agent-stuck label, per-launch tag, optionalprompt_extensions.implementation) is injected via{{...}}placeholders at render time; the template itself contains zero target-specific identifiers.lib/ec2-orchestrator.sh—orch::runnow chains the implementation call onto thePICKEDbranch from discovery: rendersprompts/implementation.md, appends the crafted context discovery wrote to/tmp/ralph/crafted-prompt.md, invokesclaude --print, wraps the call inPHASE_START phase=implementation/PHASE_END phase=implementation duration_s=... issue=<n> status=...markers (status comes from the result file so a CloudWatch grep yields a one-line per-iteration summary), and verifies the result file before branching.- The implementation call writes
/tmp/ralph/impl-result.jsonwith one of two shapes:{"status":"PR_OPENED","issue":<n>,"pr_number":<m>,"pr_url":...,"branch":...}{"status":"AGENT_STUCK","issue":<n>,"reason":"..."}
- Bash branches on
impl-result.json.status:PR_OPENED→OUTCOME=pr_opened issue=<n> pr=<m>;AGENT_STUCK→OUTCOME=agent_stuck issue=<n>. Any other status, missing output, or invalid JSON aborts with exit 3 — the EC2 instance still terminates via the cloud-init EXIT trap. - The PR body carries an HTML-comment marker
<!-- ralph-launch: <RALPH_LAUNCH_TAG> -->(defaults to the EC2 instance id, set inlib/cloud-init/bootstrap.sh). Slice 9's launcher post-hoc check uses this to correlate when the EC2 was hard-killed before recording state. agent-stuckescape: the prompt instructs the impl call to self-stop when ANY of (>3 build/test fix iterations on the same failure surface) OR (>15 file edits without a green build) OR (self-judged futility) hits. On stuck: the source issue gets theagent_stuck_labellabel (defaultagent-stuck), no PR is opened, andimpl-result.jsonrecordsstatus=AGENT_STUCK.- The launcher embeds
prompts/implementation.mdinto the rendered user-data alongside the discovery prompt and exportsRALPH_IMPLEMENTATION_PROMPT=/opt/ralph/prompts/implementation.md.
Slice 7 — discovery call:
prompts/discovery.md— generic discovery prompt template. Target context (repo, default branch, work dir, build/test commands, branch prefix, optionalprompt_extensions.discovery) is injected via{{...}}placeholders at render time; the template itself contains zero target-specific identifiers.lib/ec2-orchestrator.sh—orch::runnow fires the real discovery call: renders the template, invokesclaude --print(with--permission-mode bypassPermissionsby default; override viaRALPH_CLAUDE_FLAGS), wraps the call inPHASE_START phase=discovery/PHASE_END phase=discovery duration_s=...markers, and verifies the four output files exist before branching. Each invocation is a fresh claude session (memoryMCP excluded by slice 6's MCP set).- The discovery call writes four files under
$RALPH_OUT_DIR(default/tmp/ralph/):decision.json—status(PICKED|NONE|ALL_BLOCKED), picked issue number, reasoning.issue.json— fullgh issue view --json ...payload of the picked issue (or{}for non-PICKED).crafted-prompt.md— implementation prompt for slice 8, with target conventions surfaced from the target'sCLAUDE.md,AGENTS.md,CONTEXT.md, and anydocs/adr/*files.milestone-log.json—{milestone, log_issue}pointing at the[log] <milestone>issue (find-or-create) for cross-iteration learnings.
- Bash branches on
decision.json.status:NONE→OUTCOME=no_work,ALL_BLOCKED→OUTCOME=all_blocked,PICKED→OUTCOME=picked issue=<n>(slices 8/9 plug in here). Any other status, missing output file, or invalid JSON aborts with exit 3 — the EC2 instance still terminates via the cloud-init EXIT trap. - The launcher embeds
prompts/discovery.mdinto the rendered user-data via a quoted heredoc and exportsRALPH_DISCOVERY_PROMPT=/opt/ralph/prompts/discovery.md, so the prompt template ships alongside the lib bundle without a network fetch.
Slice 6 — ec2-bootstrap (deps, secrets, fresh clone, hello orchestrator):
lib/cloud-init/bootstrap.sh— slice 6's cloud-init payload. Runs once on first boot. Installs OS dependencies (Node 20, .NET 10 SDK viadotnet-install.sh,gh, Docker,uv,claudeCLI, plusgit,jq, andyq). Configures the five MCPs the harness uses (serena,morph-mcp,context7,github,sequential-thinking); thememoryMCP is intentionally NOT added so every iteration starts with fresh context. Fetches the GitHub PAT and the Claude OAuth credential from SSM SecureString (/ralph/github-pat,/ralph/claude-oauth-credential) into mode-0600 files at their consumer's expected on-disk locations; never echoes the values. Clones the target repo fresh on its resolved default branch (gh repo view --json defaultBranchRef, nomain/masterassumption). Runs safety guards (on default branch, clean working tree, origin matchesRALPH_TARGET_REPO). Validates.ralph/config.yamlviatcs::validate. Hands off toorch::run. Ships stdout to CloudWatch (/ralph/main, stream-per-instance) and runsshutdown -h nowvia a bashtrap EXITso the box terminates on any exit (success or failure).lib/ec2-orchestrator.sh— sourceable shell module definingorch::run. Slice 6 shipped a stub; slice 7 replaces it with the real discovery call (see above). Slices 8/9 plug in implementation and review after thePICKEDbranch.- The launcher renders one self-contained user-data script by
concatenating
lib/target-config-schema.sh,lib/ec2-orchestrator.sh, andlib/cloud-init/bootstrap.shafter a small env shim that exports the five required runtime knobs (RALPH_TARGET_REPO,RALPH_AWS_REGION,RALPH_GITHUB_TOKEN_SSM_KEY,RALPH_CLAUDE_OAUTH_SSM_KEY,RALPH_LOG_GROUP). No SSH at any layer; SSM Session Manager is the only debug entry point.
# Fire one EC2 (slice 6 ec2-bootstrap, stub orchestrator):
RALPH_TARGET_REPO=owner/target ralph-fire
# Tail the per-instance CloudWatch stream:
aws --region eu-central-1 logs tail /ralph/main \
--log-stream-names <i-...> --followSlice 4 — operator helper CLIs (TS, macOS-only Keychain syncer):
src/lib/credential-syncer.ts— reads theClaude Code-credentialsentry from the macOS Keychain and uploads it to the SSM SecureString at/ralph/claude-oauth-credential(overridable viaRALPH_CLAUDE_OAUTH_SSM_KEY), encrypted underalias/ralph.ralph-sync-credential(src/bin/ralph-sync-credential.ts) — thin CLI wrapper. Region is forced toeu-central-1. The credential is passed to@aws-sdk/client-ssmas the request body ofPutParameter, so it is never visible on any process's argv and is never echoed in info or error output.ralph-sync-github-pat— net-new operator CLI: reads a GitHub PAT from stdin and uploads it to the SSM SecureString at/ralph/github-pat(overridable viaRALPH_GITHUB_TOKEN_SSM_KEY). Pass the token on stdin only — argv is rejected.ralph-tail-logs— thin wrapper aroundaws logs tailthat defaults to/ralph/mainand the worker's per-instance stream.
Run after every desktop claude /login:
ralph-sync-credential
# or with a custom key:
RALPH_CLAUDE_OAUTH_SSM_KEY=/ralph/claude-oauth-credential ralph-sync-credentialCaveats:
- Rotation: every desktop
claude /loginmay invalidate the prior refresh token. Re-runralph-sync-credentialimmediately after each login so the EC2 worker picks up the fresh credential. - Concurrent use unverified: simultaneous use of the same credential by the desktop app and an EC2 worker has not been validated; assume one active consumer at a time.
- Plan limits: the EC2 worker burns the engineer's Claude plan limits while running.
- OAuth-vs-API-key: iteration 1 ships OAuth-via-Keychain. Switching to a
long-lived API key is a one-file change in the future
ec2-bootstrapmodule (see issue #7).
src/ holds the TS modules and bin entries with co-located vitest
tests (*.test.ts). Run with npm test.
# Validate a config file by hand:
ralph-validate-config path/to/.ralph/config.yaml
# Manually swap a label on a sandbox repo:
ralph-gsm swap-label owner/sandbox 1 ready-for-agent ready-for-human
# Bootstrap AWS resources for a target repo (idempotent):
RALPH_TARGET_REPO=owner/target ralph-bootstrap-aws
# Sync the macOS Keychain credential into SSM (re-run after every claude /login):
ralph-sync-credential
# Seed the GitHub PAT into SSM (one-time):
echo "$GITHUB_PAT" | ralph-sync-github-pat
# Fire one throwaway EC2 instance (full discovery → impl → review chain):
RALPH_TARGET_REPO=owner/target ralph-fire
# Run the test suite:
npm testDependencies (operator side): Node ≥ 24 and the harness installed globally
(npm install -g @unimatrix27/[email protected]), plus jq, gh, and
aws CLI v2. See CONTRIBUTING.md for install hints.
Schema at a glance
A target repo's .ralph/config.yaml looks like:
build_cmd: "make build"
test_cmd: "make test"
branch_prefix: "ralph"
review_bot:
username: "claude"
source: "comment"
# optional:
agent_stuck_label: "agent-stuck"
prompt_extensions:
discovery: |
...extra discovery-prompt instructions...
implementation: |
...extra impl-prompt instructions...
review: |
...extra review-prompt instructions...See docs/config-schema.md for the full spec, types,
and validation rules.
