promptmill

v0.1.24

Published

8 days ago

Run an agent prompt repeatedly in a batch loop, tee-ing each run to the console and a per-run log file.

0High
0Medium
0Low

kaspernj

claude agent batch loop cli prompt

Promptmill

Run an agent prompt repeatedly in a batch loop — feeding a prompt file to an agent CLI N times, and tee-ing each run's output to the console and a per-run log file. Useful for batch-testing an autonomous prompt for consistency. Supports Claude Code (default), the Google Gemini CLI, the OpenAI Codex CLI, the Antigravity CLI (agy), and the OpenCode CLI via --agent.

Install

npm install -g promptmill
# or run without installing
npx promptmill prompts/my-prompt.md

Quick start

promptmill prompts/my-prompt.md --runs 50

Each run reads the prompt file fresh, spawns the agent with the prompt, streams stdout/stderr to your terminal, and appends the same output to a per-run file in the shared .promptmill-runs/ directory, named <YYYY-MM-DD HH-MM-SS> <Agent> (run <n>).log (e.g. 2026-05-31 07-36-50 Claude (run 1).log). A non-zero run is logged and the batch continues.

CLI

promptmill <prompt-file> [options] [-- <agent args...>]

| Option | Default | Env | Description | | --- | --- | --- | --- | | --agent <name> | claude | | Agent to run: claude, gemini, codex, antigravity, or opencode. Sets the default command and label. | | --awesometasks <t> | — | | AwesomeTasks mode. <t> is a board/project id, project name, or board URL on tasks.diestoeckels.de. The positional <prompt-file> becomes optional; without one Promptmill uses its shipped default prompt. Either way, {{AWESOMETASKS_TARGET}} in the prompt is replaced with <t> before the agent runs. See AwesomeTasks mode. | | --runs <n> | 100 (min 0) | RUNS | Number of runs | | --max-turns <n> | (off — no cap) | MAX_TURNS | Max agent turns per run, min 1 (Claude only — other agents ignore it). Default is no cap so long autonomous runs can finish. Opt in if you want a hard ceiling. | | --log-dir <path> | .promptmill-runs | LOG_DIR | Per-run log directory, shared by all agents (each run's filename carries the agent label) | | --command <cmd> | the agent's | | Agent executable to spawn (claude / gemini) | | --model <name> | agent's highest | | Model to use. Defaults to the agent's highest (see below). Antigravity has no model flag; OpenCode uses its own configured default unless you pass provider/model. | | --level <name> | agent's highest | | Reasoning level, where it's separate from the model name. Defaults to the agent's highest. Gemini/Antigravity/OpenCode have no level. | | --cwd <path> | current dir | | Working directory | | --output-format <fmt> | pretty | | Output mode: pretty (live readable progress), text (final result only), json, or stream-json (raw JSON events) | | --log-file-prefix <s> | (none) | | Optional prefix prepended to each log filename | | --label <s> | the agent's | | Console banner label | | --session-id <name> | promptmill | | Logical session name reused across runs and invocations so the agent resumes the same session each time. See Sessions. | | --no-line-prefix | (prefix on) | | Don't prefix each output line with [run N/total] | | -h, --help | | | Show help |

Precedence for runs / max-turns / log-dir: flag > env var > default.

Model & reasoning level

By default promptmill runs each agent at its highest model and reasoning level; override with --model / --level. (These are the highest at time of writing — they live in the agent registry and may need bumping as new models ship.)

| Agent | default --model | default --level | | --- | --- | --- | | claude | opus (--model) | xhigh (--effort; scale low/medium/high/xhigh/max) | | gemini | pro (-m) | — (no level flag) | | codex | gpt-5.5 (-m) | xhigh (-c model_reasoning_effort=) | | antigravity | — (no model flag) | — (no level flag) | | opencode | — (uses OpenCode's configured default; pass provider/model via --model) | — (no level flag) |

Passing --model/--level to an agent that has no such flag (e.g. --agent antigravity --model …) is an error.

Agents

By default promptmill drives Claude Code (claude). Pass --agent gemini to drive the Google Gemini CLI instead — it must be installed (npm i -g @google/gemini-cli) and authenticated. promptmill runs Gemini headless with --approval-mode yolo, feeds the prompt on stdin, and (in the default pretty mode) renders Gemini's stream-json events into the same live, readable progress. --max-turns applies to Claude only and is off by default.

promptmill prompts/my-prompt.md --agent gemini --runs 25

Pass --agent codex to drive the OpenAI Codex CLI — it must be installed and signed in (codex login). promptmill runs codex exec --dangerously-bypass-approvals-and-sandbox, feeds the prompt on stdin, and (in pretty mode) renders Codex's --json events (thread.started, command executions, file changes, the agent's message, token usage) into live, readable progress. Codex must run inside a git repository (its own guard — pass -- --skip-git-repo-check to bypass). Codex has no turn limit, so --max-turns is ignored. promptmill also passes --disable shell_snapshot, because Codex's shell-snapshot validation can fail on valid Bash emitted by shell managers like RVM (its function dumps use extglob patterns that Codex re-parses without extglob); disabling it does not remove your RVM-managed Ruby, which stays available via PATH.

promptmill prompts/my-prompt.md --agent codex --runs 25

Pass --agent antigravity to drive the Antigravity CLI (agy) — it must be installed and authenticated. promptmill runs agy --print --dangerously-skip-permissions (raising --print-timeout so long runs aren't cut at 5 min), feeds the prompt on stdin, and prints the agent's text response. Antigravity has no JSON/event-stream output, so it is text-only: every output mode (including pretty) shows its plain response — there is no live event rendering. --max-turns is ignored.

promptmill prompts/my-prompt.md --agent antigravity --runs 25

Headless / SSH sign-in: agy's silent auth reads its token from the OS keyring via the freedesktop Secret Service over D-Bus, with a hard 5-second timeout. On a headless box with no D-Bus session bus (a plain SSH session), agy tries to autolaunch one and the attempt can intermittently stall past that timeout — at which point agy abandons silent auth and escalates to an interactive browser sign-in that a non-interactive batch can never complete, so the run hangs. When promptmill detects no session bus (DBUS_SESSION_BUS_ADDRESS unset and no $XDG_RUNTIME_DIR/bus), it points agy at an unreachable bus so the keyring lookup fails fast and agy falls through to its valid file-based token — reliably and without prompting (it prints a one-line notice when it does this). When a real session bus is present, promptmill leaves the environment untouched so the keyring stays the source of truth. As a safety net, if agy still requests interactive sign-in, promptmill aborts that run immediately and stops the batch with a message telling you to sign in once interactively (run agy yourself) rather than hanging until --print-timeout.

Pass --agent opencode to drive the OpenCode CLI — it must be installed and authenticated (opencode auth login once; credentials are stored in a file and work headless over SSH). promptmill runs opencode run --dangerously-skip-permissions to auto-approve tools. Unlike the other agents, OpenCode reads the prompt as a positional argument (not stdin), so promptmill passes the prompt text as the trailing message. OpenCode has no promptmill event renderer, so it is text-only: pretty/text use --format default (its human-readable output) and json/stream-json use --format json (raw JSON events passed through). No model is forced — OpenCode uses its own configured default unless you pass --model provider/model (e.g. --model anthropic/claude-sonnet-4-...); there is no level flag. --max-turns is ignored.

PATH note: OpenCode's installer typically adds its bin dir to PATH only in ~/.bashrc, so opencode may work in your interactive terminal but not be found when promptmill spawns it. promptmill handles this automatically: when opencode isn't on PATH it falls back to the installer's canonical location ~/.opencode/bin/opencode (printing a one-line notice). If your binary lives somewhere else, point promptmill at it with --command /path/to/opencode. A command that is neither on PATH nor in a known location fails fast with could not start "<command>" — command not found on PATH.

Choosing the model: pass --model provider/model and promptmill forwards it to opencode run --model …; omit it and OpenCode uses the default model from its own config ("model" in ~/.config/opencode/opencode.json). List available ids with opencode models. To make a model the permanent default for every run (promptmill or plain opencode), set it as "model" in that config instead of passing --model.

# Use a specific model (e.g. DeepSeek V4 Pro) for this batch:
promptmill prompts/my-prompt.md --agent opencode --model deepseek/deepseek-v4-pro --runs 25

# Or omit --model to use OpenCode's configured default:
promptmill prompts/my-prompt.md --agent opencode --runs 25

Stopping: press Ctrl+C once for a graceful stop — the current run finishes, the next one is skipped, and promptmill exits. Press Ctrl+C again to interrupt the current run and exit immediately.

Exit codes: 0 all runs finished · 1 fatal (missing prompt file, invalid runs/max-turns, or an unexpected error) · 130 stopped with Ctrl-C (SIGINT/SIGTERM), gracefully or interrupted. A run that exits non-zero does not fail the batch.

Sessions

By default every promptmill run — and every invocation — resumes the same agent session, named promptmill. The first run starts a fresh session; subsequent runs (within the batch and across invocations) continue it, so the agent keeps the memory it built up. Override the name with --session-id <name> to keep unrelated batches isolated:

promptmill prompts/feature-a.md --session-id feature-a
promptmill prompts/feature-b.md --session-id feature-b

Per-agent details:

Claude and Gemini: promptmill derives a deterministic UUID v5 from the session name (same name → same UUID across machines and time). The first run for a name uses --session-id <uuid> to create the session; the moment promptmill sees the agent's stream-json init event it writes the UUID to <log-dir>/sessions.json as a "session created" marker, and every subsequent run (in this batch and future invocations) uses --resume <uuid>. The marker is written even when the run later exits non-zero (e.g. error_max_turns), because by then the session has already been created on the agent's side. Both CLIs treat --session-id as strictly create-only, so the marker is what prevents Session ID … is already in use. Neither --output-format text (plain text) nor --output-format json (single final object) emits the NDJSON event stream the extractor reads, so promptmill silently runs the first capture run in stream-json for both. Once captured, subsequent runs honor the user's chosen format again.
If the session already existed before promptmill recorded it (e.g. left over from an earlier release that wrote no marker), the first run fails with Error: Session ID … is already in use. Promptmill recognizes that exact message — with the UUID it itself derived — and records the marker from the error, so the next run resumes cleanly without any user action.
Codex cannot pin a session id up front. Promptmill runs codex exec fresh on first use, captures the assigned thread id from the --json stream's thread.started event, persists it to <log-dir>/sessions.json, and uses codex exec resume <id> for every subsequent run.
Antigravity is best-effort. Promptmill scans agy --print output for a recognizable conversation id; if found it is persisted and reused via --conversation <id>, otherwise each run starts fresh.
OpenCode assigns its own session id (shape ses_…), but only prints it under --format json — never in --format default (what pretty/text map to). So promptmill runs the first capture run in --format json, reads the id from the event stream, persists it to <log-dir>/sessions.json, and reuses it via --session <id> for every subsequent run (which honor your chosen format again). That one capture run's live output is raw JSON; later runs are not.

The session UUID is printed at startup (Session: promptmill (b8c4… )). Markers in <log-dir>/sessions.json are keyed by <agent>:<name> (e.g. "claude:promptmill"), so the shared .promptmill-runs/ directory cannot cross-pollinate sessions between agents. Delete the entry (or the whole file) to force the next run to create a new session under the same name.

Resuming a session interactively

To take over the conversation yourself — to inspect what the batch did or continue it by hand — run promptmill resume. It opens the agent's interactive CLI attached to the persisted session, using the same --agent, --session-id, --log-dir, --cwd, and --command you would pass to a batch:

promptmill resume --agent claude                       # claude --resume <uuid>
promptmill resume --agent codex --session-id project-a # codex resume <thread-id>

Per agent it runs claude --resume <uuid> / gemini --resume <uuid> / codex resume <id> / agy --conversation <id> / opencode --session <id>. Claude and Gemini always have their reproducible session UUID; Codex, Antigravity, and OpenCode need a prior batch run to have recorded their session id, so resume stops with a clear message if none exists yet. Anything after -- is appended to the agent command (e.g. -- --fork).

There is also promptmill continue, which opens the agent's interactive CLI on its most recent session via the agent's own "continue last" flag — claude --continue / agy --continue / opencode --continue. Unlike resume it needs no recorded id (it targets whatever the agent's last session is), so it works even when promptmill never captured one. continue is unsupported for Gemini (no continue-last flag) and Codex (its resume --last picks the globally newest session, ignoring the working directory, so it could open an unrelated repo) — use resume for those.

promptmill continue --agent opencode  # opencode --continue (last session in this dir)

The difference: resume reopens the specific session promptmill tracked; continue reopens the agent's latest session. For OpenCode (sessions are per-directory) both target the work promptmill just did, as long as you run from the same directory.

AwesomeTasks mode

Point Promptmill at an AwesomeTasks board on tasks.diestoeckels.de instead of giving it a prompt file. The agent (which needs its own awesometasks skill / tooling) picks scoped Backlog tasks, moves each to Doing, implements, opens a PR, moves it to Review, and comments the result.

promptmill --awesometasks https://tasks.diestoeckels.de/boards/42 --agent codex --runs 1

--awesometasks accepts a board id, project id, project name, or a board URL — the value is forwarded verbatim into the prompt as {{AWESOMETASKS_TARGET}} and the agent's skill resolves it against the live API. Each Promptmill run drains every in-scope Backlog task it can find; --runs N just repeats the cycle (useful for polling).

The shipped prompt lives at src/prompts/awesometasks.md. To use your own instead, pass it as the positional argument — {{AWESOMETASKS_TARGET}} placeholders are still substituted:

promptmill prompts/my-awesometasks-worker.md --awesometasks 113 --agent codex

The agent needs valid AwesomeTasks credentials in its environment (see the awesometasks skill for the token lookup order). Promptmill itself never touches the API.

Output

By default (pretty) promptmill runs claude in stream-json under the hood and renders the events into live, readable progress — assistant messages, tool calls (→ Bash: …, → Read: …), errors, and a final ✓ done (N turns, $cost, time) summary — so you can watch a long run as it works:

[run 1/30] · session started (claude-opus-4-7)
[run 1/30] Reading the repo conventions first.
[run 1/30] → Bash: git rev-parse --abbrev-ref HEAD
[run 1/30] → Read: AGENTS.md
[run 1/30] → Edit: app/auth/session.rb
[run 1/30] ✓ done (14 turns, $0.42, 3m12s)

The other modes pass Claude's raw output of that format through unchanged:

promptmill prompts/my-prompt.md                              # pretty: live readable progress (default)
promptmill prompts/my-prompt.md --output-format stream-json  # raw JSON events (full fidelity for logs/parsing)
promptmill prompts/my-prompt.md --output-format text         # only each run's final result (non-streaming — silent until the run ends)
promptmill prompts/my-prompt.md --output-format json         # a single JSON result object per run

pretty assumes Claude's stream-json event schema. For a different --command, use stream-json/text/json (or pretty will simply pass any non-JSON lines through unchanged).

Every output line is prefixed with the run it belongs to, e.g. [run 3/20] …, so you always know where you are in the batch. Pass --no-line-prefix for unprefixed output (e.g. when piping --output-format stream-json to a JSON parser).

Use a different agent

Point promptmill at another agent CLI with --command, and pass extra args after -- (appended to the default args):

promptmill prompts/my-prompt.md --command codex -- --some-flag value

Programmatic API

import {runAgentBatch} from "promptmill"

const {runs, failures} = await runAgentBatch({
  promptFile: "prompts/my-prompt.md",
  runs: 10,
  logDir: ".promptmill-runs"
})

Also exported: spawnAgentRun, parseCliOptions, runCli, DEFAULTS, defaultClaudeArgs, timestampForLogFile, integerOption, buildLogFileName.

Logs

One file per run in the shared .promptmill-runs/ directory (override with --log-dir), named <YYYY-MM-DD HH-MM-SS> <agent label> (run <n>).log — e.g. 2026-05-31 07-36-50 Claude (run 1).log — with an optional --log-file-prefix prepended. Each file holds that run's tee'd stdout/stderr plus a final status line.

Development

npm install
npm run all-checks   # typecheck + lint + test

License

MIT