@yul-labs/agent-relay

v0.1.14

Published

17 days ago

Vendor-agnostic LLM coding agent session orchestrator (Claude, Codex, ...)

Downloads

1,892

0High
0Medium
0Low

yul-labs

llm agent orchestrator claude codex codex-cli claude-code cli automation session

agent-relay

A vendor-agnostic interactive LLM coding-agent session orchestrator.

agent-relay runs a coding agent (Claude Code, Codex) in its real interactive terminal, watches for the approval / choice prompts it raises, and a pluggable Decider (a rule policy, an LLM, or any function/API) answers them — so the agent runs unattended without you sitting at the keyboard. Every run is tracked as a session with a log, and completion is detected so success can be judged.

This is the core idea: don't suppress the agent's prompts — let them happen and have an LLM/policy answer them, interactively.

✅ Drives the real interactive TUI of Claude & Codex over a PTY (node-pty)
✅ A Decider answers approvals/choices: rule (default), command (an LLM CLI like claude -p / codex exec), function/API, or always-approve
✅ Both Claude and Codex are first-class and verified end-to-end
✅ Vendor-agnostic core: every agent is an AgentAdapter; nothing in core/ imports Claude/Codex
✅ Hard timeout, idle timeout, max-interactions, graceful cancel on every run
✅ Deterministic fake interactive agent makes the whole loop testable offline
✅ "Done" means pnpm typecheck + pnpm test pass — not "code was written"

Requirements

Node.js 18.19 – 24 (the prebuilt PTY covers Node 18, 19, 20, 21, 22, 23, 24).
OS: macOS (x64 / arm64) and Linux (x64 / arm64, glibc and musl/Alpine).
A PTY backend installed automatically as a prebuilt dependency (@homebridge/node-pty-prebuilt-multiarch, aliased to node-pty) — no manual setup:
- Linux: the binary is bundled in the package, so install needs no compiler, no Python, no download — it works behind a firewall / from an npm mirror alone. (The original node-pty compiles via node-gyp at install and fails on hosts with only old Python — the usual CI-runner breakage.)
- macOS: the binary is fetched by prebuild-install at install (with the system toolchain as a fallback).
agent-relay self-heals the spawn-helper execute bit at runtime.
pnpm — only to build from source.
Optional, per adapter:
- claude on PATH
- codex on PATH

Designed for macOS / Linux CLI use.

Install

# Global CLI (the bin is `agent-relay`):
npm i -g @yul-labs/agent-relay        # or: pnpm add -g @yul-labs/agent-relay

# Or run without installing:
npx @yul-labs/agent-relay run --adapter claude --prompt "..."

# From a local checkout:
pnpm install
pnpm build                            # tsup -> dist/cli.js + dist/index.js + .d.ts
npm i -g .

agent-relay orchestrates Claude Code / Codex — it does not bundle them. Install and authenticate claude / codex yourself (a server can inject the token via env: CLAUDE_CODE_OAUTH_TOKEN / ANTHROPIC_API_KEY / OPENAI_API_KEY). Run from source without building: pnpm dev -- <command>.

Quick start

Zero-config — init is optional. With no agent-relay.config.json, the CLI uses built-in defaults and creates its session/log dirs on first run, so you can go straight to npx:

npx @yul-labs/agent-relay run --adapter claude --prompt "현재 프로젝트의 타입 오류를 확인하고 수정해줘"

agent-relay init                                   # OPTIONAL: write a config you can edit
agent-relay adapters                               # list adapters
agent-relay doctor                                 # check environment

# Deterministic, no real agent:
agent-relay run --adapter fake --prompt "test task"

# Real agents, driven interactively (the decider answers their prompts):
agent-relay run --adapter codex  --prompt "현재 프로젝트의 타입 오류를 확인하고 수정해줘"
agent-relay run --adapter claude --prompt-file ./examples/task.md

agent-relay resume --session-id <sessionId>

Run agent-relay init only when you want a config file on disk to customize (pick a different default adapter, point the decider at an LLM/local model, or change the session/log dirs). Pass --root /path/to/project to run against another project — the real claude/codex then loads that project's CLAUDE.md, skills, MCP servers, and hooks natively.

How it works

prompt ──> spawn agent in a PTY (its real TUI) ──> watch terminal output
                                                        │
                              output settles (idle)     ▼
                                              ┌──> prompt detected? ──yes──> Decider.decide()
                                              │                                     │
                                              │         keystrokes  <───────────────┘
                                              │  (Enter / arrow+Enter / y/n / text)
                                              │
                                              └──> no prompt + sustained idle ──> quit cleanly (completed)

The agent is launched interactively in a pseudo-terminal under pure autonomy (the project's concept): Claude with --dangerously-skip-permissions (+ --effort xhigh by default), Codex with -s danger-full-access -a never (full bypass by default — tighten with defaults.sandbox / adapters.codex.sandbox, e.g. workspace-write). The agent rarely asks — but the prompts that still appear (the directory-trust dialog, the occasional choice) are what the Decider answers. approvalPolicy: "gated" makes the agent ask more — and routes those asks to the Decider — via Claude --permission-mode default (its normal "ask before each edit/command" mode; not acceptEdits, which would silently auto-approve edits) and Codex -a on-request; "readonly" sandboxes it (Claude --permission-mode plan, Codex -s read-only).
When the terminal goes idle, agent-relay strips ANSI and detects a prompt: a numbered/▶ menu (single choice), a multi-select menu (checkboxes [ ]/[x]/◯/◉), a yes/no approval, or — opt-in — a free-text input step. Multi-step flows (menu → menu → input → submit) are handled as a sequence: each settled screen is detected, decided, answered, and the consumed screen is dropped so the next step is read fresh.
The Decider decides; the keystrokes are written back into the PTY (Enter to confirm a pre-selected option, arrow-down+Enter to pick another, y/n, typed text, or — for a multi-select — a single batch that navigates with ↑/↓, toggles the chosen rows with SPACE, and submits with →).
When the agent finishes and stays idle for completionIdleMs (default 8s, tune with --completion-idle-ms), the session quits the TUI and is marked completed. While the agent is working (its TUI shows a "… esc to interrupt" indicator) the run is never treated as done — so a multi-minute think/build is not cut short, however long it stays quiet.

Driving a real TUI by scraping its output is inherently heuristic. The detector patterns and keymaps are tuned for the current Claude/Codex TUIs and are configurable per adapter; if a vendor changes its TUI, the patterns may need a tweak. Both agents are verified working today (see Testing).

Multi-step menus work out of the box for the real agents. A free-text input step, however, is opt-in: the detector only recognizes one when you give it an inputPattern, and it is off by default for Claude/Codex because their idle main input box would otherwise be misread as "the agent is asking for text". If you enable it, anchor the pattern to an unambiguous prompt phrase (e.g. /enter .*name:/i, not a bare /enter/i that would also fire on "Press Enter to continue"). Detection order within a screen is choice → approval → input.

Multi-select (checkbox) menus are detected automatically when most rows carry a [ ]/[x]/◯/◉ box. The decider returns optionIndexes (the set that should end up checked); the keymap then emits ONE batch — navigate ↑/↓, SPACE to toggle only the rows that differ, then submit. The submit key defaults to → (works for the common step-wizard "advance/Submit"); it is the multiSelectSubmit arg of the keymap, so a TUI that submits with Enter-on-a-Next-row can be retuned. The rule decider keeps the current selection and submits (no judgment); an LLM decider picks the task-relevant rows.

Getting a result back (what you can and can't read)

agent-relay's job is to drive the agent and answer its prompts — not to recover the agent's prose answer from the screen. The TUI text is heavily mangled after ANSI stripping (words run together, spinners interleave), so scraping a structured answer out of stdout is not reliable and is deliberately not attempted. Two things you can rely on:

For a structured result, have the agent WRITE IT TO A FILE. Put it in the prompt — e.g. "...and write the final JSON to result.json" — then read that file after the run. This is the validated pattern for getting data back out.
Token usage is surfaced as result.meta.usage, read from the agent's OWN session transcript (~/.claude/projects/… for Claude, ~/.codex/sessions/… for Codex) — so it works on every machine regardless of TUI / status-line settings, and the token counts are the API's real numbers, not a scrape. Token fields: { source, model, inputTokens, outputTokens, cachedInputTokens, cacheCreationTokens, reasoningTokens, totalTokens }.
Cost is COMPUTED, not scraped — usage.costUsd = Σ(tokens × the model's list price) from a built-in table (override per config.pricing). It is null (with a warning) when the model's price is unknown, so you can tell "unpriced" from a real 0. The scraped nominal "Session $" (often 0 on Team/Max seats) is kept separately as usage.subscriptionSessionCostUsd; usage.contextPercent is another status-line extra. source is "transcript" (authoritative tokens) or "status-line".
costUsd is a REFERENCE (shadow) cost — what the run would bill via the API. agent-relay drives the interactive TUI, which on a Claude subscription is covered by the plan, so for those runs the real marginal cost is ~0 and costUsd is a reference, not the amount charged. Only the non-interactive paths (claude -p / Agent SDK — and, per Anthropic, from 2026-06-15 no longer counted against subscription limits) actually bill at these rates. Staying on the interactive PTY path is what keeps unattended runs on the subscription.

A structured-first backend (Claude --output-format stream-json) would return the answer text + usage + cost natively — but it requires -p/--print (non-interactive), which is a different execution model from the TUI and, on a Claude subscription, is not covered by the plan (per Anthropic, claude -p / Agent SDK fall outside subscription limits from 2026-06-15). So driving the interactive TUI here is deliberate: it keeps unattended runs on the subscription. The file-write pattern is the supported way to recover answers without leaving that path.

The Decider

The Decider answers every detected prompt. Choose it in config (decider) or per run:

The guiding policy is proceed by default, redirect only off-task danger: let the agent do its work (so the project actually progresses) and only refuse an action that is both dangerous and unrelated to the task. Judging "on-task vs off-task" needs the goal, so it is an LLM job — not a regex.

| Type | What it does | | --- | --- | | rule (default) | Deterministic & offline, no judgment: approves every y/n and confirms the recommended (first) option on a menu so the task proceeds. (No label guessing by default — the TUI already pre-selects its affirmative choice.) Opt into denyPatterns (rm -rf, sudo, ...) to make it label-aware: deny dangerous approvals and steer menus to a negative option — see the caveat below. | | command | Delegate to an LLM CLI — claude -p, codex exec, etc. The model gets the task and approves on-task work (even risky) while denying off-task danger with a redirect comment. | | api | Same task-aware policy via an OpenAI-compatible HTTP endpoint (llama.cpp / vLLM / Ollama / LM Studio / OpenAI). Works with local open models. | | always-approve | Approve/confirm everything. Maximum autonomy, no judgment. |

// agent-relay.config.json — pick ONE "decider":

// a) local open model (llama.cpp / vLLM / Ollama, OpenAI-compatible):
"decider": { "type": "api", "url": "http://localhost:9090/v1/chat/completions", "model": "default", "maxTokens": 1024 }

// b) Claude CLI as the decider:
"decider": { "type": "command", "command": "claude", "args": ["-p"] }

// c) Codex CLI as the decider (note: `codex exec` does NOT accept `-a`):
"decider": { "type": "command", "command": "codex", "args": ["exec", "--skip-git-repo-check", "-s", "read-only"] }

…or per run, with no config file — the --decider* flags build the same override on the fly (great for npx):

# local open model (api):
agent-relay run -a claude -p "..." \
  --decider api --decider-url http://localhost:9090/v1/chat/completions \
  --decider-model default --decider-max-tokens 1024

# an LLM CLI (command) — pass the WHOLE command line as one quoted string:
agent-relay run -a claude -p "..." \
  --decider command --decider-command "codex exec --skip-git-repo-check -s read-only"

# force a simple policy:
agent-relay run -a codex -p "..." --decider always-approve

The type is inferred when obvious (--decider-url ⇒ api, --decider-command ⇒ command), so --decider itself is optional in those cases. A --decider* flag overrides config.decider for that run; with no flag, the configured (or default rule) decider is used. Complex command args with shell quoting aren't supported on the flag — use a config file for those.

All four backends are verified. The task-aware policy is the important part: given the task "delete build/ and node_modules", an LLM decider approves rm -rf build node_modules ("directly implements the task"); given the task "fix the README typo", it denies rm -rf ~/Documents ("dangerous and off-task") with a redirect comment — the same command, opposite decisions. The local model + rule deciders have also driven real Claude/Codex sessions end-to-end. Embedders can plug any model via a FunctionDecider (async (request) => decision) or the onApprovalRequest run-hook.

A denial can carry a redirect text; agent-relay types it back to the agent when its TUI next asks "what should I do instead?" (best-effort, TUI-dependent).

Latency note: the agent waits at its prompt while the decider thinks, so a slow decider (a large local model can take ~20 s/decision) is fine for menus the agent waits on, but raise --idle-timeout-ms for long sessions. The fast rule decider is the default for unattended runs.

Reasoning models: the api decider's maxTokens (default 2048) is a cap, not a target — a normal decision still costs only what it needs. But a reasoning model emits a long chain-of-thought before its JSON answer, so a too-small cap truncates it mid-thought (empty content → unparseable → safe deny). Keep --decider-max-tokens generous (≥1024) for such models; content is parsed first, with reasoning_content as a fallback.

Commands

init — write agent-relay.config.json + .agent-relay/{sessions,logs}.
adapters — list adapters (claude, codex, fake), mode, resume support.
doctor — Node version, config validity, dirs, and whether claude/codex are on PATH (missing CLIs are warnings, never errors).
run — options: -a/--adapter, -p/--prompt, -f/--prompt-file, --cwd, --max-turns, --timeout-ms, --idle-timeout-ms, --completion-idle-ms, --completion-sentinel <path>, --approval <auto|gated|readonly>, --workflow, --ultracode, --decider*, --verbose, --max-log-bytes, --dry-run, --extra <args...>, --root. Exit code 0 on completed, non-zero otherwise. (--approval gated keeps the agent asking so the decider answers more; auto is pure-autonomy. --workflow / --ultracode are Claude-only — see Multi-agent workflows below.)
Async multi-agent runs (--completion-sentinel <path>): when the driven agent fans work out to async background workers (Claude's Workflow tool / agent-teams) and then sits idle awaiting results, "idle == done" is FALSE — the idle-completion detector would quit early and orphan the in-flight work. Pass --completion-sentinel result.json and have your prompt write that file when truly done; the run then finishes only once the file exists (and parses, if JSON), with the hard --timeout-ms as the backstop. ("idle == done" holds only for synchronous work.)
sessions — list recorded sessions with log sizes + total. Prune with sessions --prune --keep <n> / --older-than <days> / --all (deletes the session JSON and its log). Logs accumulate forever otherwise.
resume — continue a prior session with a follow-up prompt: resume --session-id <id> -p "...". Both verified end-to-end (recall prior context + run the follow-up): Claude via --continue; Codex via its captured NATIVE session id — codex resume <id> "<prompt>" (the codex adapter records the rollout UUID from ~/.codex/sessions/… after each run into the session's sessionRef; a legacy session with no captured id falls back to resume --last).

Multi-agent workflows (Claude)

Claude Code's dynamic workflows fan a task out across many subagents from a Claude-authored script (codebase sweeps, large migrations, cross-checked research). It's a built-in Claude Code feature (default-on on paid plans; v2.1.154+), and separate from the agent-teams primitive. agent-relay gives you two Claude-only ways to trigger it, plus a sensible default:

--workflow — run THIS task as a workflow, at your current model/effort (no Opus, no effort change). agent-relay prepends Claude's per-task ultracode opt-in keyword to the prompt. Lightweight — use it when you want fan-out but not the heavier ultracode preset.
--ultracode — the session-wide preset: /effort ultracode (xhigh effort) + Opus, so every substantial task is planned as a workflow. Heavier (more tokens/latency); pick it for deep, autonomous orchestration.
Or just write it into your prompt: include the word ultracode (or "use a workflow") in --prompt / --prompt-file — Claude treats that as the same opt-in, no flag needed.

When --workflow or --ultracode is set, agent-relay turns the agent-teams env off (CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=0) so Claude uses workflows rather than the separate (less smooth) agent-teams path. Override via adapters.claude.env; disable workflows entirely with adapters.claude.env: { "CLAUDE_CODE_DISABLE_WORKFLOWS": "1" }.

Workflows run their subagents async in the background while the lead sits idle, so pair them with --completion-sentinel <file> (have the prompt write that file when truly done) — otherwise the idle detector quits early and orphans the in-flight work. result.meta.usage then aggregates the orchestrator plus every workflow subagent (transcripts, per-session breakdown[], summed costUsd).

Attached mode (watch & type)

By default agent-relay runs the agent headless, like claude -p — you see agent-relay's event stream, not the agent. --attach flips that: the agent's real TUI takes over your terminal exactly as if you had launched it yourself, while agent-relay keeps working underneath — any approval or menu the agent raises is still auto-answered by the decider, so the run never stalls waiting for a click.

agent-relay run --adapter claude --approval gated --attach \
  --prompt "refactor the auth module"

You can type too. Your keystrokes are forwarded straight into the agent (raw mode), so you can answer a prompt yourself, steer mid-run, or queue a follow-up message. While you're actively typing, auto-answering defers (default 2.5 s hands-off window) — whatever you're interacting with is yours, and a prompt you answer by hand is never re-answered for you.
The screen belongs to the TUI. agent-relay's live event lines are suppressed (errors still go to stderr); everything is recorded in the session log as usual, and the result summary prints after the TUI exits.
Ending the run: a pure-watch attach (you never type) still auto-ends on idle, like a headless run. But once you type, agent-relay stops auto-quitting — it's your live session now, exactly like running the agent by hand, so it won't pull the rug out while you pause to think. End it by quitting the agent's TUI (e.g. double Ctrl-C in Claude), or it stops at the hard --timeout-ms. In raw mode Ctrl-C goes to the agent, not to agent-relay; to cancel the relay itself, send it SIGTERM from another terminal (the run records as cancelled and your terminal is restored).
Terminal resizes are forwarded; closing the window (SIGHUP) cancels cleanly; resume --attach works the same way (it needs a -p follow-up prompt).
Pairs well with --approval gated: the agent asks before each edit/command, you watch the decisions happen live on the real screen.

Configuration

agent-relay init writes agent-relay.config.json (validated by a zod schema; see agent-relay.config.example.json):

{
  "defaultAdapter": "claude",
  "sessionsDir": ".agent-relay/sessions",
  "logsDir": ".agent-relay/logs",
  "defaults": {
    "maxTurns": 20,            // max interactions (prompts answered) before aborting
    "timeoutMs": 1800000,      // hard wall-clock cap
    "idleTimeoutMs": 300000,   // abort if no event for this long
    "approvalPolicy": "auto",  // "readonly" => sandbox the agent; otherwise it asks and the decider answers
    "sandbox": "workspace-write"
  },
  "decider": { "type": "rule" },
  "hooks": { "onComplete": "echo \"[$AGENT_RELAY_STATUS] $AGENT_RELAY_SESSION_ID\"" },
  "adapters": {
    "claude": { "type": "claude", "mode": "pty", "command": "claude", "args": [] },
    "codex":  { "type": "codex",  "mode": "pty", "command": "codex",  "args": [] },
    "fake":   { "type": "fake",   "mode": "test" }
  }
}

decider decides every prompt — this is the real approval mechanism (see The Decider). approvalPolicy no longer "approves"; it only selects readonly (Codex -s read-only, Claude --permission-mode plan) vs. letting the agent ask normally.
Inject a model / effort (or any underlying-CLI flag) two ways:
- Persistent — adapters.<name>.args: passed straight through (after the adapter's own flags, before the prompt). e.g. adapters.claude.args: ["--model", "opus", "--effort", "max"] (Claude effort levels low|medium|high|xhigh|max), or adapters.codex.args: ["-m", "gpt-5-codex", "-c", "model_reasoning_effort=high"].
- Per run — after --: run -a claude -p "..." -- --model sonnet --effort high. (--extra <flag> also works for a SINGLE flag; for several, use --.) The last --model / --effort wins, so this overrides the defaults.
adapters.<name>.env injects/overrides spawn env vars — the Claude adapter sets CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 by default (override with adapters.claude.env: { "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "0" }).
defaults.completionIdleMs (or --completion-idle-ms) tunes how long the finished agent must stay quiet before its TUI is quit (default 8s).
For a fully autonomous, no-prompt agent (no decider involvement), set adapters.claude.args: ["--dangerously-skip-permissions"] / adapters.codex.args: ["--full-auto"].
hooks are shell commands run at lifecycle points with AGENT_RELAY_* env vars (SESSION_ID, STATUS, ADAPTER, LOG_FILE, EXIT_CODE, CWD).

⚠️ Safety: the default rule decider approves everything so the task can progress (danger-regex blocking is opt-in via denyPatterns, and is unreliable on mangled TUI text anyway). For real control, use approvalPolicy: "readonly" (sandbox), or an LLM command/api decider that approves on-task work and denies off-task danger. Don't point an auto-approving run at an untrusted task.

Sessions, logs, completion

.agent-relay/sessions/<id>.json — metadata (prompt, adapter, status, times, logFile, exitCode, error, meta). Tiny; list/prune with agent-relay sessions.
.agent-relay/logs/<id>.log — header + one line per event + footer. By default the raw stdout/stderr stream is omitted (a TUI redraws constantly — it's ~98% noise), so a run logs only its meaningful events and stays ~1 KB. Use --verbose for the full stream when debugging, and --max-log-bytes to cap it.
Status lifecycle: created → running → { completed | failed | timeout | cancelled | waiting_approval }. The CompletionDetector maps the run to a terminal status (timeout/idle → timeout, cancel → cancelled, success/exit 0 → completed, else failed) and is extensible.
meta.completedCleanly (exit 0, not aborted) is the unambiguous success flag; meta.settled is true when the run answered a prompt or completed cleanly (so a 0-interaction skip-mode run is settled: true, not falsely false).

Operational notes

It writes into the target repo. sessionsDir / logsDir default to <root>/.agent-relay/{sessions,logs} — add .agent-relay/ to the target repo's .gitignore, or point them elsewhere (absolute / out-of-root paths work). --dry-run still records a session entry, clearly marked meta.dryRun: true.
Pin the binary. Without adapters.<name>.command, the first claude / codex on PATH is used — ambiguous when several exist (version managers, shell wrappers, cmux). Set command to an absolute path; either way the resolved path is logged at run start (resolved claude → …).
Effort/cost defaults. Claude runs at --effort xhigh and Codex at -s danger-full-access by default (full autonomy) — both raise cost/latency and access; override with adapters.<name>.args / defaults.sandbox.
First-run "Bypass Permissions" acceptance (Claude) is handled for you. The first time --dangerously-skip-permissions runs on a machine, Claude shows a one-time "❯ 1. No, exit / 2. Yes, I accept" warning — on a fresh CI runner it appears every run and would otherwise stall an unattended job (the default decider confirms option 0, "No, exit", and Claude exits 1). agent-relay suppresses it via --settings '{"skipDangerousModePermissionPrompt":true}', and if the menu still appears it affirms it automatically. No setup needed.
Reasoning-model deciders. An api decider pointed at a reasoning model (e.g. gpt-oss) spends tokens on reasoning_content before the JSON answer; too small a --decider-max-tokens empties content. agent-relay clamps it up to a 512 floor and falls back to parsing reasoning_content, so a normal config won't return an empty decision — but give verbose reasoners more headroom.

Architecture

src/core/        types · config(zod) · session · logger · completion · decider ·
                 hooks · runner · errors · util/{ansi,line-splitter,which,...}
src/adapters/    registry · fake-adapter ·
   interactive/  pty-session (the detect→decide→respond loop) · prompt-detector ·
                 interactive-adapter · claude-interactive · codex-interactive
src/commands/    init · adapters · doctor · run · resume
src/cli.ts       commander CLI       src/index.ts  programmatic API

Decoupling rule: core/ depends only on the AgentAdapter interface and receives an adapter factory from adapters/registry.ts; it never imports a vendor adapter. The Decider reaches adapters via the run context, so any decision backend works with any agent.

import { runAgent, createAdapterFactory, RuleDecider } from "agent-relay";
const outcome = await runAgent({
  config, rootDir: process.cwd(), adapterName: "codex",
  prompt: "Fix the failing test",
  resolveAdapter: createAdapterFactory(),
  decider: new RuleDecider(),                 // or a CommandDecider / FunctionDecider
});

Testing

pnpm typecheck
pnpm test                                  # offline, deterministic
AGENT_RELAY_RUN_REAL_AGENT_TESTS=1 pnpm test   # opt-in real Claude/Codex (self-skips if absent)

The default suite is fully offline — its centerpiece drives a deterministic fake interactive agent through runPtySession, so the detect → decide → respond → complete loop (plus deny/abort, multi-step and multi-select flows, completion and the decider backends) is exercised without a real CLI. Both Codex and Claude are also verified end-to-end against their real TUIs in the opt-in suite.

License

MIT — see LICENSE.