@yaniv-tg/babysitter-codex
v0.1.5
Published
Babysitter orchestration plugin for OpenAI Codex CLI - multi-step AI workflows with full upstream process library, 15 slash commands, and BOM-safe SKILL installation
Maintainers
Readme
babysitter-codex
Babysitter orchestration plugin for OpenAI Codex CLI. It adds structured multi-step AI workflows with quality convergence, lifecycle hooks, 15 orchestration modes, and the full upstream Babysitter process library.
This project was created by Babysitter already running on Codex.
What You Get
- External Babysitter orchestration integrated into Codex
- Iterative execution loop (
run:create->run:iterate->task:post) - Breakpoint support for interactive approvals
- Yolo mode for non-interactive auto-approval
- Compatibility mode for SDK builds that expose only core run/task commands
- Deterministic per-run trace log at
<runDir>/run-trace.jsonl
Latest Version
- Current release:
0.1.5(2026-03-11) - Key additions in
0.1.5:- team-pinned install flow (
babysitter:team-install) - lockfile (
babysitter.lock.json) - layered rules resolver
- process index cache
- content integrity manifest + verification
- mapping contract CI gates
- install hardening: strip UTF-8 BOM from copied
SKILL.mdfiles to prevent Codex frontmatter parse failures
- team-pinned install flow (
Version Control Documentation
- See CHANGELOG.md for versioned updates and latest release notes.
- See docs/ROADMAP.md for planned feature milestones and rollout order.
- See docs/MAINTAINER_RUNBOOK.md for maintainer operations.
- See docs/UPSTREAM_SYNC.md for process/skill sync from upstream Babysitter.
- See docs/DOWNSTREAM_STAGING_SYNC.md for automatic PR sync from this repo into
a5c-ai/babysitter:staging. - See docs/COMPATIBILITY_MATRIX.md for version support policy.
- See docs/REAL_WORLD_VALIDATION.md for release validation scenarios.
- See docs/FEATURES_1_10.md for implementation notes of the prioritized feature set.
- See docs/CODEX_MAPPING.md for upstream-to-codex command/process mapping.
- See docs/ARCHITECTURE_UPGRADES_1_8.md for lock/install/rules/index/integrity architecture.
Full Test Scenarios
- Core scenario runner:
npm run test:scenario - Long-session strict scenario:
npm run test:long-scenario - Scenario runbook: test/FULL_SESSION_SCENARIO.md
The long-session scenario validates:
- 3+ interview breakpoints
- at least one breakpoint with 4 questions
- generated artifact reflects selected interview choices (colors/text)
- simulated 60-minute AI workload
- strict
100/100score gate
Bundled Upstream Library
babysitter-codex now bundles upstream Babysitter assets under:
upstream/babysitter/skills/babysit/process(full process library)upstream/babysitter/skills/babysit/reference(reference docs)upstream/babysitter/commands(upstream command docs)
Default process-library root for Codex mapping:
upstream/babysitter/skills/babysit/process
Override with:
BABYSITTER_PROCESS_LIBRARY_ROOT=<path>
Important: No Native /babysitter Slash Commands
Codex does not have built-in /babysitter:* commands.
Babysitter is external and activated by this skill using natural-language triggers (for example: babysitter, orchestrate, yolo, resume, doctor).
Requirements
- Node.js 18+
- npm 9+
- OpenAI Codex CLI (installed and usable from terminal)
babysitter-codex installs the Babysitter SDK dependency automatically. Users do not need a separate manual SDK install for normal usage.
npm Package Name
Install from the scoped package:
@yaniv-tg/babysitter-codex
The unscoped babysitter-codex name is currently a security placeholder on npm and is not used for this project.
Installation (Windows)
1. Verify prerequisites (PowerShell)
node -v
npm -v
codex --version2. Install globally from npm
npm.cmd install -g @yaniv-tg/babysitter-codex3. Or install from local repo clone
cd C:\path\to\babysitter-codex
npm.cmd install -g .4. Verify install
npm.cmd ls -g @yaniv-tg/babysitter-codex --depth=05. Restart Codex
After install, restart Codex so it loads the updated skill files.
Windows notes
- If PowerShell blocks
npx, usenpx.cmd. - If global npm install fails with permissions, run terminal as Administrator.
Installation (macOS)
1. Verify prerequisites
node -v
npm -v
codex --version2. Install globally from npm
npm install -g @yaniv-tg/babysitter-codex3. Or install from local repo clone
cd /path/to/babysitter-codex
npm install -g .4. Verify install
npm ls -g @yaniv-tg/babysitter-codex --depth=0
ls -l ~/.codex/skills/babysitter-codex/.codex/hooks/*.sh5. Restart Codex
After install, restart Codex so it loads the updated skill files.
macOS notes
- If shell scripts lose executable bits, rerun install or apply:
chmod +x ~/.codex/skills/babysitter-codex/.codex/hooks/*.sh - If
npxis not on PATH inside Codex, install Node vianvmand relaunch shell.
Installation (Linux)
1. Verify prerequisites
node -v
npm -v
codex --version2. Install globally from npm
npm install -g @yaniv-tg/babysitter-codex3. Or install from local repo clone
cd /path/to/babysitter-codex
npm install -g .4. Verify install
npm ls -g @yaniv-tg/babysitter-codex --depth=05. Restart Codex
After install, restart Codex so it loads the updated skill files.
Linux notes
- If global install needs elevated rights, use
sudo npm install -g @yaniv-tg/babysitter-codex. - Prefer using
nvmor user-owned Node install to avoidsudowhere possible. - The installer now sanitizes UTF-8 BOM in
SKILL.mdduring copy, preventingmissing YAML frontmatter delimited by ---warnings from Codex skill loading.
Uninstall
Windows
npm.cmd uninstall -g @yaniv-tg/babysitter-codexLinux
npm uninstall -g @yaniv-tg/babysitter-codexmacOS
npm uninstall -g @yaniv-tg/babysitter-codexQuick Start (How To Actually Run It)
In Codex chat, use prompts like:
babysitter help
babysitter call implement authentication with tests
babysitter yolo fix lint and failing tests
babysitter resume latest incomplete run
babysitter doctor current runYou can also run the SDK CLI directly in terminal:
babysitter run:create --process-id <id> --entry <path>#<export> --inputs <file> --json
babysitter run:iterate <runDir> --json --iteration 1
babysitter task:list <runDir> --pending --json
babysitter task:post <runDir> <effectId> --status ok --value tasks/<effectId>/output.json --jsonAll Modes And How To Activate Them
| Mode | Example prompt to activate | What it does |
|------|----------------------------|--------------|
| call | babysitter call build auth with tests | Start an orchestration run (interactive) |
| yolo | babysitter yolo fix failing tests end-to-end | Fully autonomous, no breakpoints |
| resume | babysitter resume latest run | Resume an existing run |
| plan | babysitter plan migration workflow | Plan a workflow without executing |
| forever | babysitter forever monitor build health every hour | Never-ending periodic run |
| doctor | babysitter doctor run 01ABC... | Diagnose run health |
| observe | babysitter observe current workspace | Launch observer dashboard |
| retrospect | babysitter retrospect latest run | Analyze a run and propose process improvements |
| model | babysitter model set execute=gpt-5 | Set or view model routing policy |
| issue | babysitter issue 123 --repo owner/repo | Start workflow from a GitHub issue |
| help | babysitter help | Help and documentation |
| project-install | babysitter project-install this repo | Set up a project for babysitting |
| team-install | babysitter team-install | Install team-pinned runtime/content setup |
| user-install | babysitter user-install for backend workflows | Set up your user profile |
| assimilate | babysitter assimilate https://github.com/org/method | Assimilate external methodology |
Codex Compatibility Modes
babysitter-codex supports two runtime modes automatically:
fullmode: SDK exposes advanced commands (session:*,profile:*,skill:*,health)compat-coremode: SDK exposes core orchestration commands only (run:*,task:*,version)
In compat-core, orchestration continues and unavailable advanced commands are skipped gracefully.
Feature Flags
Advanced capabilities (event streaming, policy engine, model routing, telemetry, etc.) are gated by feature flags.
- File-based flags:
.a5c/config/features.json - Env-based overrides:
BABYSITTER_FEATURE_<FLAG_NAME> - Defaults:
.codex/feature-flags.js
Requested Features (1-10) Status
The top requested Codex harness features are now implemented in runtime:
- Session management UX: alias/tag/search resume selectors (
recent,tag:<x>,search:<q>,list,name <alias>,tag +/-<x>). - First-class notifications + event stream: stable
v1JSON events withid,seq,runId, webhook/slack/desktop/file sinks. - Long-task autonomous mode + approvals: staged approval policy, strict allowlists, retry backoff+jitter.
- Multi-repo orchestration: alias/scope-based workspace routing from
.a5c/workspace/repos.json. - Mid-session model switching: persisted
.a5c/config/model-policy.json+ per-step phase routing. - Plan/Act hardening: immutable plan-mode execution blocks for mutating task kinds.
- Richer hooks lifecycle:
on-tool-error,on-policy-block,on-retry+ configurable payload transforms. - MCP reliability toolkit:
doctor mcpdiagnostics for transport/auth/path/config readiness. - Native GitHub issue/PR workflow: issue fetch + comments + optional apply mode + PR comment/create helpers.
- Cost/token telemetry + budgets: per-iteration telemetry history, soft/hard budget phases, adaptive prompt shrinking.
New Config Files
.a5c/config/model-policy.json.a5c/config/policies.json.a5c/config/hook-transforms.json.a5c/workspace/repos.json
SDK Contracts (Codex-Suitable)
- Core required commands:
run:createrun:iterate <runDir>run:status <runDir>task:list <runDir>task:post <runDir> <effectId>
- Optional advanced commands:
session:*profile:*skill:*health
If advanced commands are unavailable, the harness degrades to compat-core and continues orchestration with core run/task operations.
Troubleshooting
task:postpath errors:- Use run-relative references like
tasks/<effectId>/output.json.
- Use run-relative references like
- Missing advanced SDK commands:
- Expected in
compat-core; orchestration still works with core run/task commands.
- Expected in
Blame Beni
CRITICAL: BENI_EXCEPTION_UNHANDLED (0xB3N1-v0.1.5)
Message: "we should keep it simple" emitted 12 milliseconds before adding lockfiles, process indexes, and 7,798 upstream assets.
Stack trace:
at Beni.Init("tiny patch")
at Beni.Upgrade("runtime/content split")
at Beni.Upgrade("team install")
at Beni.Upgrade("integrity + signatures")
at Beni.Upgrade("15 modes")
at PagerDuty.Trigger("prod confidence increased, sleep decreased")
Incident response:
- Severity: SEV-BENI
- Blast radius: every repository Beni can spell
- Mitigation status: containment failed, architecture improved
Repro:
1. Ask Beni for one small codex integration tweak.
2. Go make coffee.
3. Return to find lockfiles, CI contract checks, team installers, and a manifesto.
If anything breaks, open an issue and always blame Beni.
This policy is deterministic, backward-compatible, and thoroughly validated.License
MIT
