@joabundis15/forge
v0.1.2
Published
Self-validating AI code agent CLI: per-worktree services, agent-driven validate-fix loop, and PR review mode.
Maintainers
Readme
Forge
A self-validating AI code agent CLI. Forge runs an AI agent inside an isolated git worktree, sets up the project's environment, runs the application, hits its endpoints, and feeds failures back to the agent until validation passes.
What's in v1
- CLI:
forge run "<task>"andforge init - Git worktree sandbox
- Hand-rolled tool-use loop on the Anthropic SDK (bash, read_file, str_replace, write_file)
- Curl-based endpoint validation with status and JSON-subset body matching
- Bundled Next.js demo (
examples/sample-api/) with a deliberately broken endpoint - Built on Effect 3.x throughout
Out of scope for v1: docker, services orchestration (Postgres/Redis), Playwright validation, test-suite validation, PR-review mode, dashboard.
Install
npm install -g @joabundis15/forge
forge --helpOr use without installing:
npx @joabundis15/forge run "<task>" --cwd /path/to/projectRequirements
- Node.js >= 20
- npm (bundled with Node)
- Docker, if you use
forge upservices or the docker sandbox - One of:
- The
claudeCLI installed and logged in (claude login) for theclaudeCodeprovider (used by the bundled demo, works with a Claude subscription) - An
ANTHROPIC_API_KEYenvironment variable for theclaudeprovider (direct API)
- The
Demo
npm install
npm run demoThe bundled demo uses the claudeCode provider, so it inherits whatever auth your local claude CLI is set up with. Make sure claude --version works (i.e. you're logged in via claude login). If you would rather run the demo with an API key, edit examples/sample-api/.forge/config.ts and switch provider from "claudeCode" to "claude", then export ANTHROPIC_API_KEY before running.
npm run demo automatically prepares examples/sample-api/: it initializes a git repo inside the example (required because Forge uses git worktree), copies .env.example to .env, and runs npm install for the example. It then invokes Forge against the example to fix the broken /api/users endpoint.
You can also run Forge against any other project that has a forge.config.ts or .forge/config.ts:
npx tsx src/cli.ts run "<task>" --cwd /path/to/projectThe target project must be its own git repository.
Running tests
npm test # unit + service + e2e (no real API calls)
npm run typecheckThe e2e test runs the orchestrator against examples/sample-api/ with a scripted Anthropic client.
Configuration
A forge.config.ts (or .forge/config.ts) at the project root declares setup, run, validate, and agent settings. See examples/sample-api/.forge/config.ts for a working example.
Docker sandbox
Forge can optionally run install + the application inside a Docker container. Add to your forge.config.ts:
sandbox: {
type: "docker",
docker: {
image: "node:20", // base image
// dockerfile: ".forge/Dockerfile", // or build a custom image
// ports: [3000], // override port mapping; defaults to the run.healthCheck port
},
}Requirements: docker CLI installed (Docker Desktop or docker engine). The agent's file edits still happen on the host filesystem; the bind mount surfaces them inside the container. Ports declared here are published to localhost so the host-side validator can hit the app.
Caveats: bind-mount perf is slower on macOS than native (about 2-3x for npm install). First run pulls the base image (~1GB for node:20).
- Linux file ownership: files written inside the container as root will appear as root-owned on the host. If this matters, use
--user $(id -u):$(id -g)in a futuresandbox.docker.runArgssetting (not yet supported in v1). - Windows bind mounts: paths are passed straight to
-v. Native Windows paths likely won't work without conversion. macOS and Linux are tested. - Container must bind 0.0.0.0: when the run command starts a server, it must listen on
0.0.0.0(not127.0.0.1/localhost) for the published port to be reachable from the host validator. Most frameworks do this by default; Next.js dev mode bindslocalhostunless you pass--hostname 0.0.0.0.
Providers
The agent.provider config field selects which agent backend Forge uses:
claude(default): hand-rolled tool-use loop built on@anthropic-ai/sdk. Forge owns the conversation, the tools (bash,read_file,str_replace,write_file), and the per-turn budget (agent.agentMaxTurns). RequiresANTHROPIC_API_KEY.claudeCode: shells out to the Claude Code CLI (claude --print) inside the worktree. Inherits whatever authclaude loginset up (Claude Pro/Max subscription or API key), so you can run Forge without an Anthropic API key. The CLI manages its own tools and turns; Forge just pipes the prompt in and reads stdout. Requires theclaudebinary to be installed and on PATH. Theagent.agentMaxTurnsfield is ignored by this provider.
The bundled demo (examples/sample-api/) is configured with provider: "claudeCode" so npm run demo works without an API key as long as you're logged into the Claude CLI.
Architecture notes
Each subsystem is an Effect Context.Tag interface with a Layer implementation. The orchestrator composes Sandbox, Environment, Runner, Validator, and Agent. Resources are managed via Effect.Scope, so the worktree and the spawned application process are torn down automatically on success, failure, or interrupt.
Spec: docs/superpowers/specs/2026-05-01-forge-v1-design.md.
Plan: docs/superpowers/plans/2026-05-01-forge-v1-implementation.md.
Constraints
- The target project must be its own git repository. Forge uses
git worktree add, which requires the target directory to be (or contain) a real git repo. If you point Forge at a non-git directory, sandbox creation will fail or, worse, create a worktree of an enclosing parent repo. The bundledexamples/sample-api/is set up automatically bynpm run demo. - Forge copies your
.envfiles into the worktree directory under<projectRoot>/.forge/worktrees/<runId>/. Make sure your project's.gitignoreexcludes.forge/worktrees/so secrets do not get committed. - v1 assumes nothing else is bound to the same port the run command listens on. There is no automatic port assignment.
Status
This is a portfolio-grade v1: scope is the smallest slice that proves the self-validation loop end-to-end. See the spec for what is intentionally deferred.
