@yul-labs/agent-relay
v0.1.14
Published
Vendor-agnostic LLM coding agent session orchestrator (Claude, Codex, ...)
Downloads
1,892
Maintainers
Readme
agent-relay
A vendor-agnostic interactive LLM coding-agent session orchestrator.
agent-relay runs a coding agent (Claude Code, Codex) in its real
interactive terminal, watches for the approval / choice prompts it raises, and a
pluggable Decider (a rule policy, an LLM, or any function/API) answers them —
so the agent runs unattended without you sitting at the keyboard. Every run
is tracked as a session with a log, and completion is detected so success can
be judged.
This is the core idea: don't suppress the agent's prompts — let them happen and have an LLM/policy answer them, interactively.
- ✅ Drives the real interactive TUI of Claude & Codex over a PTY (node-pty)
- ✅ A Decider answers approvals/choices:
rule(default),command(an LLM CLI likeclaude -p/codex exec),function/API, oralways-approve - ✅ Both Claude and Codex are first-class and verified end-to-end
- ✅ Vendor-agnostic core: every agent is an
AgentAdapter; nothing incore/imports Claude/Codex - ✅ Hard timeout, idle timeout, max-interactions, graceful cancel on every run
- ✅ Deterministic fake interactive agent makes the whole loop testable offline
- ✅ "Done" means
pnpm typecheck+pnpm testpass — not "code was written"
Requirements
Node.js 18.19 – 24 (the prebuilt PTY covers Node 18, 19, 20, 21, 22, 23, 24).
OS: macOS (x64 / arm64) and Linux (x64 / arm64, glibc and musl/Alpine).
A PTY backend installed automatically as a prebuilt dependency (
@homebridge/node-pty-prebuilt-multiarch, aliased tonode-pty) — no manual setup:- Linux: the binary is bundled in the package, so install needs no
compiler, no Python, no download — it works behind a firewall / from an npm
mirror alone. (The original
node-ptycompiles via node-gyp at install and fails on hosts with only old Python — the usual CI-runner breakage.) - macOS: the binary is fetched by
prebuild-installat install (with the system toolchain as a fallback).
agent-relay self-heals the
spawn-helperexecute bit at runtime.- Linux: the binary is bundled in the package, so install needs no
compiler, no Python, no download — it works behind a firewall / from an npm
mirror alone. (The original
pnpm — only to build from source.
Optional, per adapter:
Designed for macOS / Linux CLI use.
Install
# Global CLI (the bin is `agent-relay`):
npm i -g @yul-labs/agent-relay # or: pnpm add -g @yul-labs/agent-relay
# Or run without installing:
npx @yul-labs/agent-relay run --adapter claude --prompt "..."
# From a local checkout:
pnpm install
pnpm build # tsup -> dist/cli.js + dist/index.js + .d.ts
npm i -g .agent-relay orchestrates Claude Code / Codex — it does not bundle them.
Install and authenticate claude / codex yourself (a server can inject the
token via env: CLAUDE_CODE_OAUTH_TOKEN / ANTHROPIC_API_KEY / OPENAI_API_KEY).
Run from source without building: pnpm dev -- <command>.
Quick start
Zero-config — init is optional. With no agent-relay.config.json, the CLI
uses built-in defaults and creates its session/log dirs on first run, so you can
go straight to npx:
npx @yul-labs/agent-relay run --adapter claude --prompt "현재 프로젝트의 타입 오류를 확인하고 수정해줘"agent-relay init # OPTIONAL: write a config you can edit
agent-relay adapters # list adapters
agent-relay doctor # check environment
# Deterministic, no real agent:
agent-relay run --adapter fake --prompt "test task"
# Real agents, driven interactively (the decider answers their prompts):
agent-relay run --adapter codex --prompt "현재 프로젝트의 타입 오류를 확인하고 수정해줘"
agent-relay run --adapter claude --prompt-file ./examples/task.md
agent-relay resume --session-id <sessionId>Run agent-relay init only when you want a config file on disk to customize
(pick a different default adapter, point the decider at an LLM/local model, or
change the session/log dirs). Pass --root /path/to/project to run against
another project — the real claude/codex then loads that project's
CLAUDE.md, skills, MCP servers, and hooks natively.
How it works
prompt ──> spawn agent in a PTY (its real TUI) ──> watch terminal output
│
output settles (idle) ▼
┌──> prompt detected? ──yes──> Decider.decide()
│ │
│ keystrokes <───────────────┘
│ (Enter / arrow+Enter / y/n / text)
│
└──> no prompt + sustained idle ──> quit cleanly (completed)- The agent is launched interactively in a pseudo-terminal under pure
autonomy (the project's concept): Claude with
--dangerously-skip-permissions(+--effort xhighby default), Codex with-s danger-full-access -a never(full bypass by default — tighten withdefaults.sandbox/adapters.codex.sandbox, e.g.workspace-write). The agent rarely asks — but the prompts that still appear (the directory-trust dialog, the occasional choice) are what the Decider answers.approvalPolicy: "gated"makes the agent ask more — and routes those asks to the Decider — via Claude--permission-mode default(its normal "ask before each edit/command" mode; notacceptEdits, which would silently auto-approve edits) and Codex-a on-request;"readonly"sandboxes it (Claude--permission-mode plan, Codex-s read-only). - When the terminal goes idle, agent-relay strips ANSI and detects a prompt:
a numbered/▶ menu (single choice), a multi-select menu (checkboxes
[ ]/[x]/◯/◉), a yes/no approval, or — opt-in — a free-text input step. Multi-step flows (menu → menu → input → submit) are handled as a sequence: each settled screen is detected, decided, answered, and the consumed screen is dropped so the next step is read fresh. - The Decider decides; the keystrokes are written back into the PTY
(Enter to confirm a pre-selected option, arrow-down+Enter to pick another,
y/n, typed text, or — for a multi-select — a single batch that navigates with ↑/↓, toggles the chosen rows with SPACE, and submits with→). - When the agent finishes and stays idle for
completionIdleMs(default 8s, tune with--completion-idle-ms), the session quits the TUI and is marked completed. While the agent is working (its TUI shows a "… esc to interrupt" indicator) the run is never treated as done — so a multi-minute think/build is not cut short, however long it stays quiet.
Driving a real TUI by scraping its output is inherently heuristic. The detector patterns and keymaps are tuned for the current Claude/Codex TUIs and are configurable per adapter; if a vendor changes its TUI, the patterns may need a tweak. Both agents are verified working today (see Testing).
Multi-step menus work out of the box for the real agents. A free-text
input step, however, is opt-in: the detector only recognizes one when you
give it an inputPattern, and it is off by default for Claude/Codex because
their idle main input box would otherwise be misread as "the agent is asking for
text". If you enable it, anchor the pattern to an unambiguous prompt phrase
(e.g. /enter .*name:/i, not a bare /enter/i that would also fire on
"Press Enter to continue"). Detection order within a screen is choice →
approval → input.
Multi-select (checkbox) menus are detected automatically when most rows carry
a [ ]/[x]/◯/◉ box. The decider returns optionIndexes (the set that
should end up checked); the keymap then emits ONE batch — navigate ↑/↓, SPACE to
toggle only the rows that differ, then submit. The submit key defaults to →
(works for the common step-wizard "advance/Submit"); it is the multiSelectSubmit
arg of the keymap, so a TUI that submits with Enter-on-a-Next-row can be retuned.
The rule decider keeps the current selection and submits (no judgment); an LLM
decider picks the task-relevant rows.
Getting a result back (what you can and can't read)
agent-relay's job is to drive the agent and answer its prompts — not to recover the agent's prose answer from the screen. The TUI text is heavily mangled after ANSI stripping (words run together, spinners interleave), so scraping a structured answer out of stdout is not reliable and is deliberately not attempted. Two things you can rely on:
- For a structured result, have the agent WRITE IT TO A FILE. Put it in the
prompt — e.g. "...and write the final JSON to
result.json" — then read that file after the run. This is the validated pattern for getting data back out. - Token usage is surfaced as
result.meta.usage, read from the agent's OWN session transcript (~/.claude/projects/…for Claude,~/.codex/sessions/…for Codex) — so it works on every machine regardless of TUI / status-line settings, and the token counts are the API's real numbers, not a scrape. Token fields:{ source, model, inputTokens, outputTokens, cachedInputTokens, cacheCreationTokens, reasoningTokens, totalTokens }. - Cost is COMPUTED, not scraped —
usage.costUsd= Σ(tokens × the model's list price) from a built-in table (override perconfig.pricing). It isnull(with a warning) when the model's price is unknown, so you can tell "unpriced" from a real0. The scraped nominal "Session $" (often0on Team/Max seats) is kept separately asusage.subscriptionSessionCostUsd;usage.contextPercentis another status-line extra.sourceis"transcript"(authoritative tokens) or"status-line".costUsdis a REFERENCE (shadow) cost — what the run would bill via the API. agent-relay drives the interactive TUI, which on a Claude subscription is covered by the plan, so for those runs the real marginal cost is ~0 andcostUsdis a reference, not the amount charged. Only the non-interactive paths (claude -p/ Agent SDK — and, per Anthropic, from 2026-06-15 no longer counted against subscription limits) actually bill at these rates. Staying on the interactive PTY path is what keeps unattended runs on the subscription.
A structured-first backend (Claude --output-format stream-json) would
return the answer text + usage + cost natively — but it requires -p/--print
(non-interactive), which is a different execution model from the TUI and, on a
Claude subscription, is not covered by the plan (per Anthropic, claude -p /
Agent SDK fall outside subscription limits from 2026-06-15). So driving the
interactive TUI here is deliberate: it keeps unattended runs on the
subscription. The file-write pattern is the supported way to recover answers
without leaving that path.
The Decider
The Decider answers every detected prompt. Choose it in config (decider) or
per run:
The guiding policy is proceed by default, redirect only off-task danger: let the agent do its work (so the project actually progresses) and only refuse an action that is both dangerous and unrelated to the task. Judging "on-task vs off-task" needs the goal, so it is an LLM job — not a regex.
| Type | What it does |
| --- | --- |
| rule (default) | Deterministic & offline, no judgment: approves every y/n and confirms the recommended (first) option on a menu so the task proceeds. (No label guessing by default — the TUI already pre-selects its affirmative choice.) Opt into denyPatterns (rm -rf, sudo, ...) to make it label-aware: deny dangerous approvals and steer menus to a negative option — see the caveat below. |
| command | Delegate to an LLM CLI — claude -p, codex exec, etc. The model gets the task and approves on-task work (even risky) while denying off-task danger with a redirect comment. |
| api | Same task-aware policy via an OpenAI-compatible HTTP endpoint (llama.cpp / vLLM / Ollama / LM Studio / OpenAI). Works with local open models. |
| always-approve | Approve/confirm everything. Maximum autonomy, no judgment. |
// agent-relay.config.json — pick ONE "decider":
// a) local open model (llama.cpp / vLLM / Ollama, OpenAI-compatible):
"decider": { "type": "api", "url": "http://localhost:9090/v1/chat/completions", "model": "default", "maxTokens": 1024 }
// b) Claude CLI as the decider:
"decider": { "type": "command", "command": "claude", "args": ["-p"] }
// c) Codex CLI as the decider (note: `codex exec` does NOT accept `-a`):
"decider": { "type": "command", "command": "codex", "args": ["exec", "--skip-git-repo-check", "-s", "read-only"] }…or per run, with no config file — the --decider* flags build the same
override on the fly (great for npx):
# local open model (api):
agent-relay run -a claude -p "..." \
--decider api --decider-url http://localhost:9090/v1/chat/completions \
--decider-model default --decider-max-tokens 1024
# an LLM CLI (command) — pass the WHOLE command line as one quoted string:
agent-relay run -a claude -p "..." \
--decider command --decider-command "codex exec --skip-git-repo-check -s read-only"
# force a simple policy:
agent-relay run -a codex -p "..." --decider always-approveThe type is inferred when obvious (--decider-url ⇒ api, --decider-command
⇒ command), so --decider itself is optional in those cases. A --decider*
flag overrides config.decider for that run; with no flag, the configured
(or default rule) decider is used. Complex command args with shell quoting
aren't supported on the flag — use a config file for those.
All four backends are verified. The task-aware policy is the important part:
given the task "delete build/ and node_modules", an LLM decider approves
rm -rf build node_modules ("directly implements the task"); given the task
"fix the README typo", it denies rm -rf ~/Documents ("dangerous and
off-task") with a redirect comment — the same command, opposite decisions.
The local model + rule deciders have also driven real Claude/Codex sessions
end-to-end. Embedders can plug any model via a FunctionDecider
(async (request) => decision) or the onApprovalRequest run-hook.
A denial can carry a redirect
text; agent-relay types it back to the agent when its TUI next asks "what should I do instead?" (best-effort, TUI-dependent).
Latency note: the agent waits at its prompt while the decider thinks, so a slow decider (a large local model can take ~20 s/decision) is fine for menus the agent waits on, but raise
--idle-timeout-msfor long sessions. The fastruledecider is the default for unattended runs.
Reasoning models: the
apidecider'smaxTokens(default 2048) is a cap, not a target — a normal decision still costs only what it needs. But a reasoning model emits a long chain-of-thought before its JSON answer, so a too-small cap truncates it mid-thought (emptycontent→ unparseable → safe deny). Keep--decider-max-tokensgenerous (≥1024) for such models;contentis parsed first, withreasoning_contentas a fallback.
Commands
init— writeagent-relay.config.json+.agent-relay/{sessions,logs}.adapters— list adapters (claude,codex,fake), mode, resume support.doctor— Node version, config validity, dirs, and whetherclaude/codexare on PATH (missing CLIs are warnings, never errors).run— options:-a/--adapter,-p/--prompt,-f/--prompt-file,--cwd,--max-turns,--timeout-ms,--idle-timeout-ms,--completion-idle-ms,--completion-sentinel <path>,--approval <auto|gated|readonly>,--workflow,--ultracode,--decider*,--verbose,--max-log-bytes,--dry-run,--extra <args...>,--root. Exit code0oncompleted, non-zero otherwise. (--approval gatedkeeps the agent asking so the decider answers more;autois pure-autonomy.--workflow/--ultracodeare Claude-only — see Multi-agent workflows below.)- Async multi-agent runs (
--completion-sentinel <path>): when the driven agent fans work out to async background workers (Claude's Workflow tool / agent-teams) and then sits idle awaiting results, "idle == done" is FALSE — the idle-completion detector would quit early and orphan the in-flight work. Pass--completion-sentinel result.jsonand have your prompt write that file when truly done; the run then finishes only once the file exists (and parses, if JSON), with the hard--timeout-msas the backstop. ("idle == done" holds only for synchronous work.) sessions— list recorded sessions with log sizes + total. Prune withsessions --prune --keep <n>/--older-than <days>/--all(deletes the session JSON and its log). Logs accumulate forever otherwise.resume— continue a prior session with a follow-up prompt:resume --session-id <id> -p "...". Both verified end-to-end (recall prior context + run the follow-up): Claude via--continue; Codex via its captured NATIVE session id —codex resume <id> "<prompt>"(the codex adapter records the rollout UUID from~/.codex/sessions/…after each run into the session'ssessionRef; a legacy session with no captured id falls back toresume --last).
Multi-agent workflows (Claude)
Claude Code's dynamic workflows fan a task out across many subagents from a
Claude-authored script (codebase sweeps, large migrations, cross-checked
research). It's a built-in Claude Code feature (default-on on paid plans;
v2.1.154+), and separate from the agent-teams primitive. agent-relay gives
you two Claude-only ways to trigger it, plus a sensible default:
--workflow— run THIS task as a workflow, at your current model/effort (no Opus, no effort change). agent-relay prepends Claude's per-taskultracodeopt-in keyword to the prompt. Lightweight — use it when you want fan-out but not the heavier ultracode preset.--ultracode— the session-wide preset:/effort ultracode(xhigh effort) + Opus, so every substantial task is planned as a workflow. Heavier (more tokens/latency); pick it for deep, autonomous orchestration.- Or just write it into your prompt: include the word
ultracode(or "use a workflow") in--prompt/--prompt-file— Claude treats that as the same opt-in, no flag needed.
When --workflow or --ultracode is set, agent-relay turns the agent-teams env
off (CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=0) so Claude uses workflows rather
than the separate (less smooth) agent-teams path. Override via
adapters.claude.env; disable workflows entirely with
adapters.claude.env: { "CLAUDE_CODE_DISABLE_WORKFLOWS": "1" }.
Workflows run their subagents async in the background while the lead sits idle,
so pair them with --completion-sentinel <file> (have the prompt write that
file when truly done) — otherwise the idle detector quits early and orphans the
in-flight work. result.meta.usage then aggregates the orchestrator plus every
workflow subagent (transcripts, per-session breakdown[], summed costUsd).
Attached mode (watch & type)
By default agent-relay runs the agent headless, like claude -p — you see
agent-relay's event stream, not the agent. --attach flips that: the agent's
real TUI takes over your terminal exactly as if you had launched it yourself,
while agent-relay keeps working underneath — any approval or menu the agent
raises is still auto-answered by the decider, so the run never stalls waiting
for a click.
agent-relay run --adapter claude --approval gated --attach \
--prompt "refactor the auth module"- You can type too. Your keystrokes are forwarded straight into the agent (raw mode), so you can answer a prompt yourself, steer mid-run, or queue a follow-up message. While you're actively typing, auto-answering defers (default 2.5 s hands-off window) — whatever you're interacting with is yours, and a prompt you answer by hand is never re-answered for you.
- The screen belongs to the TUI. agent-relay's live event lines are suppressed (errors still go to stderr); everything is recorded in the session log as usual, and the result summary prints after the TUI exits.
- Ending the run: a pure-watch attach (you never type) still auto-ends on
idle, like a headless run. But once you type, agent-relay stops auto-quitting
— it's your live session now, exactly like running the agent by hand, so it
won't pull the rug out while you pause to think. End it by quitting the agent's
TUI (e.g. double Ctrl-C in Claude), or it stops at the hard
--timeout-ms. In raw mode Ctrl-C goes to the agent, not to agent-relay; to cancel the relay itself, send it SIGTERM from another terminal (the run records ascancelledand your terminal is restored). - Terminal resizes are forwarded; closing the window (SIGHUP) cancels cleanly;
resume --attachworks the same way (it needs a-pfollow-up prompt). - Pairs well with
--approval gated: the agent asks before each edit/command, you watch the decisions happen live on the real screen.
Configuration
agent-relay init writes agent-relay.config.json (validated by a zod schema;
see agent-relay.config.example.json):
{
"defaultAdapter": "claude",
"sessionsDir": ".agent-relay/sessions",
"logsDir": ".agent-relay/logs",
"defaults": {
"maxTurns": 20, // max interactions (prompts answered) before aborting
"timeoutMs": 1800000, // hard wall-clock cap
"idleTimeoutMs": 300000, // abort if no event for this long
"approvalPolicy": "auto", // "readonly" => sandbox the agent; otherwise it asks and the decider answers
"sandbox": "workspace-write"
},
"decider": { "type": "rule" },
"hooks": { "onComplete": "echo \"[$AGENT_RELAY_STATUS] $AGENT_RELAY_SESSION_ID\"" },
"adapters": {
"claude": { "type": "claude", "mode": "pty", "command": "claude", "args": [] },
"codex": { "type": "codex", "mode": "pty", "command": "codex", "args": [] },
"fake": { "type": "fake", "mode": "test" }
}
}deciderdecides every prompt — this is the real approval mechanism (see The Decider).approvalPolicyno longer "approves"; it only selectsreadonly(Codex-s read-only, Claude--permission-mode plan) vs. letting the agent ask normally.- Inject a model / effort (or any underlying-CLI flag) two ways:
- Persistent —
adapters.<name>.args: passed straight through (after the adapter's own flags, before the prompt). e.g.adapters.claude.args: ["--model", "opus", "--effort", "max"](Claude effort levelslow|medium|high|xhigh|max), oradapters.codex.args: ["-m", "gpt-5-codex", "-c", "model_reasoning_effort=high"]. - Per run — after
--:run -a claude -p "..." -- --model sonnet --effort high. (--extra <flag>also works for a SINGLE flag; for several, use--.) The last--model/--effortwins, so this overrides the defaults.
- Persistent —
adapters.<name>.envinjects/overrides spawn env vars — the Claude adapter setsCLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1by default (override withadapters.claude.env: { "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "0" }).defaults.completionIdleMs(or--completion-idle-ms) tunes how long the finished agent must stay quiet before its TUI is quit (default 8s).- For a fully autonomous, no-prompt agent (no decider involvement), set
adapters.claude.args: ["--dangerously-skip-permissions"]/adapters.codex.args: ["--full-auto"]. hooksare shell commands run at lifecycle points withAGENT_RELAY_*env vars (SESSION_ID,STATUS,ADAPTER,LOG_FILE,EXIT_CODE,CWD).
⚠️ Safety: the default
ruledecider approves everything so the task can progress (danger-regex blocking is opt-in viadenyPatterns, and is unreliable on mangled TUI text anyway). For real control, useapprovalPolicy: "readonly"(sandbox), or an LLMcommand/apidecider that approves on-task work and denies off-task danger. Don't point an auto-approving run at an untrusted task.
Sessions, logs, completion
.agent-relay/sessions/<id>.json— metadata (prompt, adapter, status, times, logFile, exitCode, error, meta). Tiny; list/prune withagent-relay sessions..agent-relay/logs/<id>.log— header + one line per event + footer. By default the raw stdout/stderr stream is omitted (a TUI redraws constantly — it's ~98% noise), so a run logs only its meaningful events and stays ~1 KB. Use--verbosefor the full stream when debugging, and--max-log-bytesto cap it.- Status lifecycle:
created → running → { completed | failed | timeout | cancelled | waiting_approval }. TheCompletionDetectormaps the run to a terminal status (timeout/idle →timeout, cancel →cancelled, success/exit 0 →completed, elsefailed) and is extensible. meta.completedCleanly(exit 0, not aborted) is the unambiguous success flag;meta.settledistruewhen the run answered a prompt or completed cleanly (so a 0-interaction skip-mode run issettled: true, not falselyfalse).
Operational notes
- It writes into the target repo.
sessionsDir/logsDirdefault to<root>/.agent-relay/{sessions,logs}— add.agent-relay/to the target repo's.gitignore, or point them elsewhere (absolute / out-of-root paths work).--dry-runstill records a session entry, clearly markedmeta.dryRun: true. - Pin the binary. Without
adapters.<name>.command, the firstclaude/codexonPATHis used — ambiguous when several exist (version managers, shell wrappers,cmux). Setcommandto an absolute path; either way the resolved path is logged at run start (resolved claude → …). - Effort/cost defaults. Claude runs at
--effort xhighand Codex at-s danger-full-accessby default (full autonomy) — both raise cost/latency and access; override withadapters.<name>.args/defaults.sandbox. - First-run "Bypass Permissions" acceptance (Claude) is handled for you. The
first time
--dangerously-skip-permissionsruns on a machine, Claude shows a one-time "❯ 1. No, exit / 2. Yes, I accept" warning — on a fresh CI runner it appears every run and would otherwise stall an unattended job (the default decider confirms option 0, "No, exit", and Claude exits 1). agent-relay suppresses it via--settings '{"skipDangerousModePermissionPrompt":true}', and if the menu still appears it affirms it automatically. No setup needed. - Reasoning-model deciders. An
apidecider pointed at a reasoning model (e.g. gpt-oss) spends tokens onreasoning_contentbefore the JSON answer; too small a--decider-max-tokensemptiescontent. agent-relay clamps it up to a 512 floor and falls back to parsingreasoning_content, so a normal config won't return an empty decision — but give verbose reasoners more headroom.
Architecture
src/core/ types · config(zod) · session · logger · completion · decider ·
hooks · runner · errors · util/{ansi,line-splitter,which,...}
src/adapters/ registry · fake-adapter ·
interactive/ pty-session (the detect→decide→respond loop) · prompt-detector ·
interactive-adapter · claude-interactive · codex-interactive
src/commands/ init · adapters · doctor · run · resume
src/cli.ts commander CLI src/index.ts programmatic APIDecoupling rule: core/ depends only on the AgentAdapter interface and
receives an adapter factory from adapters/registry.ts; it never imports a
vendor adapter. The Decider reaches adapters via the run context, so any
decision backend works with any agent.
import { runAgent, createAdapterFactory, RuleDecider } from "agent-relay";
const outcome = await runAgent({
config, rootDir: process.cwd(), adapterName: "codex",
prompt: "Fix the failing test",
resolveAdapter: createAdapterFactory(),
decider: new RuleDecider(), // or a CommandDecider / FunctionDecider
});Testing
pnpm typecheck
pnpm test # offline, deterministic
AGENT_RELAY_RUN_REAL_AGENT_TESTS=1 pnpm test # opt-in real Claude/Codex (self-skips if absent)The default suite is fully offline — its centerpiece drives a deterministic fake
interactive agent through runPtySession, so the detect → decide → respond →
complete loop (plus deny/abort, multi-step and multi-select flows, completion and
the decider backends) is exercised without a real CLI. Both Codex and Claude are
also verified end-to-end against their real TUIs in the opt-in suite.
License
MIT — see LICENSE.
