@joabundis15/forge

v0.1.2

Published

2 days ago

Self-validating AI code agent CLI: per-worktree services, agent-driven validate-fix loop, and PR review mode.

0High
0Medium
0Low

joabundis15

ai agent cli claude anthropic validate sandbox worktree docker effect

Forge

A self-validating AI code agent CLI. Forge runs an AI agent inside an isolated git worktree, sets up the project's environment, runs the application, hits its endpoints, and feeds failures back to the agent until validation passes.

What's in v1

CLI: forge run "<task>" and forge init
Git worktree sandbox
Hand-rolled tool-use loop on the Anthropic SDK (bash, read_file, str_replace, write_file)
Curl-based endpoint validation with status and JSON-subset body matching
Bundled Next.js demo (examples/sample-api/) with a deliberately broken endpoint
Built on Effect 3.x throughout

Out of scope for v1: docker, services orchestration (Postgres/Redis), Playwright validation, test-suite validation, PR-review mode, dashboard.

Install

npm install -g @joabundis15/forge
forge --help

Or use without installing:

npx @joabundis15/forge run "<task>" --cwd /path/to/project

Requirements

Node.js >= 20
npm (bundled with Node)
Docker, if you use forge up services or the docker sandbox
One of:
- The claude CLI installed and logged in (claude login) for the claudeCode provider (used by the bundled demo, works with a Claude subscription)
- An ANTHROPIC_API_KEY environment variable for the claude provider (direct API)

Demo

npm install
npm run demo

The bundled demo uses the claudeCode provider, so it inherits whatever auth your local claude CLI is set up with. Make sure claude --version works (i.e. you're logged in via claude login). If you would rather run the demo with an API key, edit examples/sample-api/.forge/config.ts and switch provider from "claudeCode" to "claude", then export ANTHROPIC_API_KEY before running.

npm run demo automatically prepares examples/sample-api/: it initializes a git repo inside the example (required because Forge uses git worktree), copies .env.example to .env, and runs npm install for the example. It then invokes Forge against the example to fix the broken /api/users endpoint.

You can also run Forge against any other project that has a forge.config.ts or .forge/config.ts:

npx tsx src/cli.ts run "<task>" --cwd /path/to/project

The target project must be its own git repository.

Running tests

npm test          # unit + service + e2e (no real API calls)
npm run typecheck

The e2e test runs the orchestrator against examples/sample-api/ with a scripted Anthropic client.

Configuration

A forge.config.ts (or .forge/config.ts) at the project root declares setup, run, validate, and agent settings. See examples/sample-api/.forge/config.ts for a working example.

Docker sandbox

Forge can optionally run install + the application inside a Docker container. Add to your forge.config.ts:

sandbox: {
  type: "docker",
  docker: {
    image: "node:20",                    // base image
    // dockerfile: ".forge/Dockerfile",  // or build a custom image
    // ports: [3000],                    // override port mapping; defaults to the run.healthCheck port
  },
}

Requirements: docker CLI installed (Docker Desktop or docker engine). The agent's file edits still happen on the host filesystem; the bind mount surfaces them inside the container. Ports declared here are published to localhost so the host-side validator can hit the app.

Caveats: bind-mount perf is slower on macOS than native (about 2-3x for npm install). First run pulls the base image (~1GB for node:20).

Linux file ownership: files written inside the container as root will appear as root-owned on the host. If this matters, use --user $(id -u):$(id -g) in a future sandbox.docker.runArgs setting (not yet supported in v1).
Windows bind mounts: paths are passed straight to -v. Native Windows paths likely won't work without conversion. macOS and Linux are tested.
Container must bind 0.0.0.0: when the run command starts a server, it must listen on 0.0.0.0 (not 127.0.0.1 / localhost) for the published port to be reachable from the host validator. Most frameworks do this by default; Next.js dev mode binds localhost unless you pass --hostname 0.0.0.0.

Providers

The agent.provider config field selects which agent backend Forge uses:

claude (default): hand-rolled tool-use loop built on @anthropic-ai/sdk. Forge owns the conversation, the tools (bash, read_file, str_replace, write_file), and the per-turn budget (agent.agentMaxTurns). Requires ANTHROPIC_API_KEY.
claudeCode: shells out to the Claude Code CLI (claude --print) inside the worktree. Inherits whatever auth claude login set up (Claude Pro/Max subscription or API key), so you can run Forge without an Anthropic API key. The CLI manages its own tools and turns; Forge just pipes the prompt in and reads stdout. Requires the claude binary to be installed and on PATH. The agent.agentMaxTurns field is ignored by this provider.

The bundled demo (examples/sample-api/) is configured with provider: "claudeCode" so npm run demo works without an API key as long as you're logged into the Claude CLI.

Architecture notes

Each subsystem is an Effect Context.Tag interface with a Layer implementation. The orchestrator composes Sandbox, Environment, Runner, Validator, and Agent. Resources are managed via Effect.Scope, so the worktree and the spawned application process are torn down automatically on success, failure, or interrupt.

Spec: docs/superpowers/specs/2026-05-01-forge-v1-design.md. Plan: docs/superpowers/plans/2026-05-01-forge-v1-implementation.md.

Constraints

The target project must be its own git repository. Forge uses git worktree add, which requires the target directory to be (or contain) a real git repo. If you point Forge at a non-git directory, sandbox creation will fail or, worse, create a worktree of an enclosing parent repo. The bundled examples/sample-api/ is set up automatically by npm run demo.
Forge copies your .env files into the worktree directory under <projectRoot>/.forge/worktrees/<runId>/. Make sure your project's .gitignore excludes .forge/worktrees/ so secrets do not get committed.
v1 assumes nothing else is bound to the same port the run command listens on. There is no automatic port assignment.

Status

This is a portfolio-grade v1: scope is the smallest slice that proves the self-validation loop end-to-end. See the spec for what is intentionally deferred.