@sebastianandreasson/pi-autonomous-agents
v0.15.2
Published
Portable unattended PI harness for developer/tester/visual-review loops.
Readme
PI Autonomous Agents
@sebastianandreasson/pi-autonomous-agents is an npm package for running a bounded unattended PI workflow inside another repository.
It orchestrates:
- a
developerturn - a fast local verification step
- an independent
testerturn - an optional focused
developerFixturn when verification/tester finds a real issue - optional periodic visual review from screenshots
The package is intentionally generic. It handles supervision, prompts, runtime state, telemetry, retries, and guardrails. The consuming repo still owns its own tasks, instructions, tests, model endpoints, and screenshot capture flow.
Install
npm install -D @sebastianandreasson/pi-autonomous-agentsThen in the consuming repo, tell your agent:
Find SETUP.md in @sebastianandreasson/pi-autonomous-agents and set everything up for this repository.The package ships a top-level SETUP.md specifically for that workflow.
What This Package Owns
- unattended loop orchestration
- PI Node SDK integration
- config loading
- prompt assembly
- verification/tester/visual-review handoff
- timeout and loop guards
- telemetry and run summaries
- runtime isolation and stale-run recovery
What Each Repo Must Provide
TODOS.md- repo-specific
pi/DEVELOPER.md - repo-specific
pi/TESTER.md - a fast bounded
testCommand - model configuration that actually matches the local/cloud providers in use
- optionally a screenshot capture command for visual review
Quick Start In A Repo
The normal setup shape is:
TODOS.md
pi.config.json
pi/
DEVELOPER.md
TESTER.mdTypical scripts:
pi:once/pi:runuse defaultsdktransportpi:runalso hosts web UI on127.0.0.1:4317by defaultpi:mockskips real agent execution
{
"scripts": {
"pi:mock": "PI_CONFIG_FILE=pi.config.json PI_TRANSPORT=mock PI_TEST_CMD= pi-harness once",
"pi:once": "PI_CONFIG_FILE=pi.config.json pi-harness once",
"pi:run": "PI_CONFIG_FILE=pi.config.json pi-harness run",
"pi:report": "PI_CONFIG_FILE=pi.config.json pi-harness report",
"pi:visual:once": "PI_CONFIG_FILE=pi.config.json pi-harness visual-once",
"pi:visualize": "PI_CONFIG_FILE=pi.config.json pi-harness visualize"
}
}Start from templates/pi.config.example.json, templates/DEVELOPER.md, templates/TESTER.md, and templates/gitignore.fragment.
Request telemetry is enabled by default for SDK runs. pi-harness writes a managed Pi extension package under .pi/extensions/pi-harness-request-telemetry/ in the consuming repo, with a package.json manifest and index.mjs shim that Pi auto-discovers on the next resource reload. Disable that with PI_REQUEST_TELEMETRY_ENABLED=0 or "piRequestTelemetryEnabled": false.
By default the extension now stores compact request telemetry only:
requests.jsonlwith exact request totals and summarized tool/file attributionspans.jsonlwith byte counts and attribution metadata, but not full prompt text
Verbose hook traces and raw span text are opt-in for debugging:
PI_REQUEST_TELEMETRY_STORE_HOOKS=1or"piRequestTelemetryStoreHooks": truePI_REQUEST_TELEMETRY_STORE_SPAN_TEXT=1or"piRequestTelemetryStoreSpanText": true
CLI
pi-harness once
pi-harness run
pi-harness report
pi-harness clear-history
pi-harness visual-once
pi-harness visualize
pi-harness debug-live
pi-harness visual-review-workerUse PI_CONFIG_FILE to point at the repo-local config file:
PI_CONFIG_FILE=pi.config.json pi-harness onceIf PI_CONFIG_FILE is not set, the package falls back to the bundled generic pi.config.json.
Core Workflow
Each real iteration works like this:
developerimplements one unchecked task fromTODOS.md.- The harness runs the configured fast verification command.
- If verification passes,
testerreviews the change independently. - If tester or verification fails, the findings go back to
developerFixfor one focused repair pass. - If tester reaches
PASS, tester creates the final commit directly by default. - Every
Nsuccessful iterations, optional visual review can inspect screenshots and veto the success if it finds a real problem.
The default commit model is commitMode: "agent". The older harness-managed parsed commit-plan flow still exists as commitMode: "plan", but it is now a compatibility mode rather than the default.
Recommended Model Setup
The package supports:
- one default text model via
piModel - one default visual-review model via
visualReviewModel - optional per-role overrides via
roleModels - per-model endpoint config in
models - default transport via
transport(sdkormock)
Typical pattern:
- local model for
developer - local model for
developerRetry - local model for
developerFix - local or slightly stronger model for
tester - stronger frontier model only for
visualReview
Example:
{
"piModel": "local/text-model",
"visualReviewModel": "cloud/vision-model",
"models": {
"local/text-model": {
"baseUrl": "http://localhost:8000/v1",
"apiKey": "local",
"vision": false
},
"local/tester-model": {
"baseUrl": "http://localhost:8000/v1",
"apiKey": "local",
"vision": false
},
"cloud/vision-model": {
"baseUrl": "https://api.openai.com/v1",
"apiKeyEnv": "OPENAI_API_KEY",
"vision": true
}
},
"roleModels": {
"developer": "local/text-model",
"developerRetry": "local/text-model",
"developerFix": "local/text-model",
"tester": "local/tester-model",
"visualReview": "cloud/vision-model"
}
}Important:
- do not guess model ids
- if using a custom OpenAI-compatible provider, verify
<baseUrl>/models - if using PI models directly, verify
pi --list-models - if
PI_CODING_AGENT_DIRpoints at a repo-local PI home, make sure it is bootstrapped and containsmodels.json
The harness now preflights those checks before starting a real run.
Important Config Fields
Common fields in pi.config.json:
taskFiledeveloperInstructionsFiletesterInstructionsFiletransport(sdkormock)piModelpiRequestTelemetryEnabledmodelsroleModelscommitModepromptModetestCommandvisualReviewEnabledvisualCaptureCommandfailureArtifactDircontinueAfterSecondstoolContinueAfterSecondsnoEventTimeoutSecondstoolNoEventTimeoutSecondssameFileLoopBudgetloopHistoryLimitlargeFileWarningLineslargeSpecWarningLines
Key defaults:
transport:sdkcommitMode:agentpromptMode:compactpiTools:read,edit,write,find,ls,bashcontinueAfterSeconds:300toolContinueAfterSeconds:900noEventTimeoutSeconds:900toolNoEventTimeoutSeconds:1800sameFileLoopBudget:2loopHistoryLimit:25
Prompt and Tooling Behavior
The package is optimized for local models by default:
- prompts are compacted before handoff
- changed-file lists and feedback excerpts are capped
- prompts prefer
readfor source inspection - shell is intended for
git, tests, and narrow diagnostics - SDK transport carries forward oversized shell-read warnings and loop/timeout guards
- repeated same-file loop failures are remembered across iterations and escalate the next edit strategy
- the supervisor emits large-file/spec warnings when touched files are getting risky
This is deliberate. Large monolith files, huge e2e specs, and broad TODO items are one of the main causes of local-model drift and retry loops.
Recommended repo shape:
- keep TODO items very small and implementation-shaped
- split giant stores/modules before they become constant edit hotspots
- split ever-growing end-to-end specs into scenario files
- keep the default
testCommandto a bounded smoke check, not a multi-minute happy-path run
Runtime Isolation And Recovery
Recent versions of the package isolate each run more aggressively:
- active ownership lock at
.pi-runtime/active-run.json - per-run runtime directory under
.pi-runtime/runs/<runId>/ - per-run PI sessions and telemetry
runIdadded to telemetry- in-progress iteration state persisted before agent work starts
- stale run locks recovered when the owning PID is gone
- timeout cleanup kills the full spawned process group, not only the direct child
- parent-death watchers shut down orphaned supervisor layers instead of letting them continue under
PPID 1
That is meant to prevent orphaned timed-out agents or concurrent supervisors from corrupting shared state.
Debugging Artifacts
Useful files during a run:
.pi-last-prompt.txtExact assembled prompt for the current role..pi-last-output.txtLatest agent output snapshot..pi-last-verification.txtLatest verification output snapshot..pi-last-iteration.jsonStructured summary of the last completed iteration.pi-output/failure-artifacts/Compact failure artifacts with command, exit code, changed files, tester summary, and output excerpt..pi-state.jsonPersistent harness state, including in-progress iteration data.pi.logMain run log.pi_telemetry.jsonlpi_telemetry.csvpi-output/token-usage/events.jsonlNormalized token-attribution event stream for downstream tools. Each row includes phase, role, kind, session/model, attribution bucket, tool/file context, and token counts.pi-output/token-usage/summary.jsonDerived structured token summary with totals plus breakdowns by phase, model, session, attribution, tool, file, and directory..pi-runtime/active-run.json.pi-runtime/runs/<runId>/...
Each run also gets run-scoped token artifacts under .pi-runtime/runs/<runId>/token-usage.events.jsonl and .pi-runtime/runs/<runId>/token-usage.summary.json.
pi-harness report summarizes recent telemetry and token artifacts and surfaces things like terminal reasons, large-file warnings, failure artifacts, and top token hotspots.
pi-harness run now also starts lightweight local web UI for orchestration flow by default. By default it listens on 127.0.0.1:4317. Override with PI_VISUALIZER_HOST and PI_VISUALIZER_PORT. Set PI_VISUALIZER=0 to disable embedded web UI for a run.
Visualizer uses SSE for live updates instead of browser polling.
pi-harness visualize still exists as standalone viewer if you want to inspect run history without starting a new run.
Visualizer now includes:
- TODO-centric main view with current task open by default
- run history selector from
.pi-runtime/runs/ - orchestration flow for selected todo
- 50/50 split between live worker feed and current repo edits
- per-iteration stage graph with retries/rechecks in diagnostics
- clickable graph nodes and timeline rows that show full event JSON
- historical run summaries and per-run last output snapshots
- live worker feed with thinking text, assistant text, tool calls, and tool output
- feed controls to hide thinking and collapse repetitive deltas
- pinned latest tool output panel
Visual Review Contract
Visual review is optional and generic. The harness does not know how to navigate your app.
If enabled, your repo must provide a real screenshot capture command that writes a manifest under the configured capture directory. The manifest shape is documented in docs/PI_SUPERVISOR.md.
Visual review should be used as a periodic audit, not as the default inner-loop gate.
Resetting Harness State
If you want to wipe harness-generated state and start fresh:
PI_CONFIG_FILE=pi.config.json pi-harness clear-historyThat clears configured harness runtime/history artifacts and verifies they are gone. It does not remove project source files.
Docs
- SETUP.md Agent-facing setup instructions for consuming repos.
- docs/PI_SUPERVISOR.md More detailed flow, transport, telemetry, and runtime documentation.
- docs/TOKEN_USAGE_ARTIFACTS.md Agent-facing contract and usage guidance for token-usage artifacts and downstream tooling.
- docs/PI_REQUEST_TELEMETRY_EXTENSION.md Repo-local Pi extension prototype for request-level telemetry via Pi hooks.
- templates/PROJECT_SETUP.md Minimal consuming-repo layout summary.
Development
In this package repo:
npm run check
npm testFor local visualizer iteration against fake live SDK agent:
npm run debug:live-uiScenario variants:
node src/cli.mjs debug-live --reset --scenario noisy --task-count 24
node src/cli.mjs debug-live --reset --scenario retryFor React/Vite visualizer UI dev loop:
npm run dev:visualizer:uiFor production visualizer UI build:
npm run build:visualizer:uiPublish now auto-runs check, tests, and UI build via prepublishOnly.
This seeds .pi-debug/live-ui/, runs harness there with streaming fake SDK fixture, hosts visualizer, and gives stable local repro loop for UI work. React app lives in visualizer-ui/. Visualizer server now serves built assets from visualizer-ui/dist/ and falls back to build-instructions page if build artifacts are missing.
See docs/VISUALIZER_UI_PLAN.md for migration plan.
The package requires Node >=20.
