@steve-life/code-agent-sdk
v0.0.3
Published
Experimental multi-agent Codex orchestration SDK for coding workflows.
Maintainers
Readme
Code Agent SDK
Experimental TypeScript SDK for orchestrating coding work across multiple model tiers and agent roles.
The SDK treats a coding request as a planned workflow instead of a single model
call. A planner produces a TaskDAG, specialized workers implement scoped
pieces in isolated workspaces, a merge broker validates patches, verifiers run
commands, reviewers inspect the result, and the caller receives both the final
result and the raw thread-level trace.
This package is intended for developer tools, internal coding platforms, CI assistants, and agent runners that need to show how a coding task was planned, executed, verified, and reviewed.
Chinese documentation is available at docs/zh-CN-usage.md.
Status
This project is early-stage infrastructure. The public API is usable, but the orchestration model is still evolving.
- Package name:
@steve-life/code-agent-sdk - Runtime: Node.js 18 or newer
- Language: TypeScript, ESM
- Primary integration boundary:
ModelRunner - Real model adapter:
CodexModelRunnerAdapter - Local deterministic mode:
MockModelRunner
Install
npm install @steve-life/code-agent-sdkThe SDK expects the target project to be a Git repository. Real Codex-backed
runs also require the authentication and environment expected by
@openai/codex-sdk.
For local SDK development:
git clone https://github.com/SteveLaiTVT/code-agent-sdk.git
cd code-agent-sdk
npm install
npm testQuick Start
Use runCodingTask() when you only need the final orchestration result.
import { runCodingTask } from "@steve-life/code-agent-sdk";
const result = await runCodingTask(
"Build a playable snake game. Put pure game logic in small functions.",
"/path/to/target-repo",
"main",
);
console.log(result.status);
console.log(result.summary);
console.log(result.modelUsage.totals);runCodingTask(message, repo, branch, options?) will:
- verify that
repois a Git repository - fetch
origin/<branch>when possible - check out
branch - create a
ProjectSpace - run the orchestrator with a real
CodexModelRunnerAdapter - return an
OrchestrationResult
Run this against a clean working tree when possible. The SDK can apply patches to the target repo after merge validation succeeds.
Streamed Runs
Use runCodingTaskStreamed() when a UI, CLI, dashboard, or log collector needs
to render the process while it is running.
import { runCodingTaskStreamed } from "@steve-life/code-agent-sdk";
const stream = await runCodingTaskStreamed(
"Add a task board screen with loading, empty, and error states.",
"/path/to/target-repo",
"main",
);
for await (const event of stream.events) {
switch (event.type) {
case "run.started":
console.log("run", event.runId, event.project.root);
break;
case "planner.completed":
console.log("planned tasks", event.dag.tasks.length);
break;
case "task.started":
console.log("task", event.task.taskId, event.workerId);
break;
case "thread.event":
console.log("thread", event.threadRunId, event.model, event.sdkEvent.type);
break;
case "model.usage":
console.log("usage", event.model, event.usage);
break;
case "run.completed":
case "run.failed":
console.log(event.result.status, event.result.summary);
break;
}
}
const result = await stream.result;
console.log(result.trace);
console.log(result.modelUsage.byModel);The stream exposes orchestration-level events and raw Codex SDK thread events. This is useful when the caller needs a replayable trace for each planner, worker, and reviewer thread.
Manual Plan Review
Manual plan review is opt-in. When enabled, the orchestrator validates the
planner's TaskDAG, emits plan.review.required, and waits before starting
workers, merge, verification, or review.
const stream = await runCodingTaskStreamed(
"Refactor the settings screen.",
"/path/to/target-repo",
"main",
{
orchestrator: {
planReview: { mode: "manual" },
},
},
);
for await (const event of stream.events) {
if (event.type === "plan.review.required") {
renderPlan(event.dag, event.options);
stream.planReview?.approve();
}
}The controller supports approve(), revise(feedback), and cancel(reason).
revise() sends feedback back to the planner and emits a new
plan.review.required event for the revised DAG. cancel() completes the run
with status: "cancelled" and does not create task workspaces or apply patches.
Non-streamed APIs reject manual plan review because they cannot expose the
controller.
Pre-Merge Validation
Implementation patches are validated before they are merged into the target
project. The SDK does not assume Node, Android, iOS, Flutter, or any other
platform. The planner must set each task's validationTools and
verificationCommands; those task-level commands run in a temporary validation
workspace. When pre-merge validation is enabled, the SDK treats
verificationCommands as patch-level gates and does not replay the same
commands after merge. Use explicit verifier tasks or
fullVerificationCommands for repo-wide post-merge checks. The caller can add
extra global commands for stricter gates:
const stream = await runCodingTaskStreamed(message, repo, "main", {
orchestrator: {
preMergeValidation: {
commands: ["npm run build"],
},
},
});If pre-merge validation fails, the task fails and the patch is not merged into
the project. The stream emits task.validation.completed with the command
results.
When To Use Direct Codex vs Plan Mode
Use direct Codex, either through runSingleCodexTask() or plain
@openai/codex-sdk, when speed matters more than orchestration structure.
Route to full plan mode when you need task boundaries, approval gates, or
artifact-quality traceability.
| Route | Best for | Why | | --- | --- | --- | | Direct Codex | Single-file edits, quick bug triage, prompt exploration, tiny refactors, one-shot docs or tests | Lowest overhead and fastest feedback loop | | Plan mode | Multi-file bugs, risky refactors, shared infra, public API changes, build/config work, security-sensitive tasks | Gives you DAG planning, scoped writes, pre-merge validation, review, and replayable trace |
Practical default:
- Start with direct Codex when the request likely stays within 1 to 2 files and does not need manual approval or audit artifacts.
- Route to plan mode when the task spans modules, needs human plan approval, or should leave a verifiable patch/review trail.
- Hard-route to plan mode for migrations, build tooling changes, shared library contracts, auth/security logic, and regressions that need structured debugging.
Public API
| API | Purpose |
| --- | --- |
| runCodingTask(message, repo, branch, options?) | Run a real Codex-backed orchestration and return the final result. |
| runCodingTaskStreamed(message, repo, branch, options?) | Run a real Codex-backed orchestration and expose events while it runs. |
| test(message, repo, branch, options?) | Alias for runCodingTask(). |
| runSingleCodexTask(message, repo, branch) | Run one direct Codex thread as a baseline. |
| AgentOrchestrator | Low-level orchestrator class for custom model runners and workspace policies. |
| MockModelRunner | Deterministic local model runner for tests and demos. |
| CodexModelRunnerAdapter | Real adapter backed by @openai/codex-sdk. |
| WorkspaceManager | Creates task and review workspaces. |
| MergeBroker | Validates and applies worker patches. |
| ReviewAggregator | Aggregates reviewer output into a final status. |
| createCodexOptions | Builds role-aware sandbox and tool-permission options. |
The package also exports core types such as TaskContract, TaskDAG,
ProjectSpace, TaskScope, WorkerResult, ReviewResult,
OrchestrationEvent, ThreadRunTrace, ModelUsageSummary, and
OrchestrationResult.
Options
import { runCodingTask } from "@steve-life/code-agent-sdk";
const result = await runCodingTask(
"Refactor the settings page into smaller components.",
"/path/to/target-repo",
"main",
{
projectId: "settings-refactor",
modelConfig: {
plannerModel: "gpt-5.5",
componentWorkerModel: "gpt-5.3-codex-spark",
layoutWorkerModel: "gpt-5.4-mini",
screenWorkerModel: "gpt-5.5",
reviewerModel: "gpt-5.5",
},
orchestrator: {
maxSparkWorkers: 4,
maxMiniWorkers: 2,
maxGpt55Workers: 1,
fullVerificationCommands: ["npm test"],
},
},
);RunCodingTaskOptions fields:
| Field | Description |
| --- | --- |
| projectId | Optional stable identifier for the target project. Defaults to the repo directory name. |
| modelConfig | Model names used by CodexModelRunnerAdapter. |
| orchestrator | Partial AgentOrchestratorOptions for concurrency, verification, custom model runners, and workspace policies. |
runCodingTask() defaults to real Codex-backed execution. If you want a fully
deterministic local run, instantiate AgentOrchestrator with MockModelRunner
or use the local demo.
Orchestration Model
The SDK separates responsibility by role.
| Role | Default tier | Responsibility |
| --- | --- | --- |
| planner | GPT-5.5 xhigh | Understand the request and produce a TaskDAG with contracts, ownership, dependencies, models, and review plan. |
| component-worker | Spark | Implement low-risk pure functions, pure components, validators, formatters, and mappers. |
| layout-worker | mini | Compose layout-level pieces such as cards, grids, drawers, dialogs, loading states, empty states, and error states. |
| screen-worker | GPT-5.5 high/medium | Implement screen logic, state coordination, routing, permissions, data loading, and cross-cutting integration. |
| verifier | program | Run lint, typecheck, tests, builds, and smoke checks. |
| reviewer | GPT-5.5 high/xhigh | Review code and reports without directly editing source. |
| merge-broker | program | Validate, merge, and verify worker patches. |
The planner decides the concrete task.model for each task. Model usage is
then collected from streamed thread events and summarized by model in the final
OrchestrationResult.
TaskDAG And Task Contracts
A planner returns a TaskDAG.
interface TaskDAG {
dagId: string;
tasks: TaskContract[];
edges: TaskDAGEdge[];
}Each TaskContract defines the worker role, model, file scope, dependencies,
acceptance criteria, validation tools, verification commands, and risk level for
one unit of work.
interface TaskContract {
taskId: string;
title: string;
role: AgentRole;
model: string;
modelTier: ModelTier;
reasoningEffort: ReasoningEffort;
objective: string;
readPaths: string[];
writePaths: string[];
forbiddenPaths: string[];
dependencies: string[];
acceptanceCriteria: string[];
validationTools: string[];
verificationCommands: string[];
riskLevel: "low" | "medium" | "high" | "critical";
expectedOutputs: string[];
notes: string[];
}The orchestrator validates the DAG, checks task scope safety, runs ready tasks in dependency order, and avoids running overlapping write scopes in the same parallel batch.
Result Shape
OrchestrationResult is the durable output to store in a database, job log, PR
comment, or build artifact.
| Field | Description |
| --- | --- |
| status | Final status: pass, needs_changes, reject, failed, or cancelled. |
| dag | The planner-produced TaskDAG. |
| taskResults | Worker and verifier results. |
| mergeResults | Patch validation and merge results. |
| verificationResults | Command-level verification output. |
| reviewResults | Structured reviewer reports. |
| trace | Replayable thread traces grouped by planner, worker, and reviewer run. |
| modelUsage | Token and turn counts grouped by model plus totals. |
| summary | Human-readable summary of the run. |
modelUsage has both byModel and totals.
console.log(result.modelUsage.byModel["gpt-5.5"]);
console.log(result.modelUsage.totals.outputTokens);Stream Event Reference
The streamed API emits these event families:
| Event | Description |
| --- | --- |
| run.started | A new orchestration run started. |
| planner.started | Planner thread is about to run. |
| planner.completed | Planner produced a TaskDAG. |
| planner.failed | Planner failed before a valid DAG was produced. |
| plan.review.required | Plan review is waiting for caller approval, revision, or cancellation. |
| plan.review.approved | Caller approved the current plan revision. |
| plan.review.revision_requested | Caller requested a revised plan with feedback. |
| plan.review.cancelled | Caller cancelled before implementation started. |
| task.started | A worker, verifier, or reviewer task started. |
| task.completed | A task completed successfully. |
| task.failed | A task failed. |
| task.validation.completed | Pre-merge validation finished for a worker patch. |
| merge.completed | Merge broker completed patch validation and apply for a task. |
| verification.completed | Verification command group completed. |
| review.completed | Reviewer produced a structured report. |
| thread.event | Raw streamed Codex SDK event for a planner, worker, or reviewer thread. |
| model.usage | Usage extracted from a completed turn. |
| run.completed | Run completed with a non-failed result. |
| run.failed | Run completed in failed state. |
When building a UI, render thread.event for detailed timeline visibility and
use higher-level events for task status, DAG visualization, and summary panels.
Workspace And Merge Flow
Implementation workers do not edit the main project root directly.
The default runCodingTask() path uses:
WorkspaceManager({ strategy: "git-worktree", keepWorkspaces: true })MergeBrokerCodexModelRunnerAdapterexecuteVerificationCommands: true
The flow is:
- create an isolated task workspace
- run the worker in that workspace
- generate a patch from the worker workspace
- run task-level pre-merge validation in a temporary validation workspace
- validate changed files against
TaskContract.writePathsand apply the patch only after all gates pass - run explicit verifier tasks or configured full verification
- run reviewers
Generated workspaces live under .agent-orchestrator/. Keep this path ignored
in target repositories unless you intentionally want to inspect saved
workspaces.
Permission Model
Permissions are derived from:
AgentRole + ProjectSpace + TaskScopeThere is no projectType switch. File access is path-scoped and every path is
checked against ProjectSpace.root.
Network permissions are modeled separately:
shellNetwork: command-level network access such ascurl,npm install,git clone, orwgetwebSearch: controlled web search or documentation lookupmcpRead: read-only MCP access such as GitHub, Slack, Jira, or docsmcpWrite: side-effecting MCP access such as PR creation, issue updates, or messages
Only shellNetwork maps to Codex sandbox network access. Tool permissions such
as webSearch, mcpRead, and mcpWrite stay at the orchestration layer.
Mock Runner
The mock runner is useful for tests, demos, and local integration work when you do not want to call real models.
import {
AgentOrchestrator,
MockModelRunner,
type ProjectSpace,
} from "@steve-life/code-agent-sdk";
const project: ProjectSpace = {
projectId: "demo",
root: process.cwd(),
};
const orchestrator = new AgentOrchestrator({
modelRunner: new MockModelRunner(),
executeVerificationCommands: false,
});
const result = await orchestrator.run(
"Implement a task-card UI workflow.",
project,
);The mock planner returns a fixed task-card DAG, mock workers return deterministic patch metadata, and mock reviewers return structured pass reports.
Local Demo
npm run agent:demoThe demo:
- creates a
ProjectSpacerooted at the current project - asks the mock planner for a task-card
TaskDAG - runs Spark component tasks in parallel
- generates mock patches
- validates paths through
MergeBroker - runs mock verification
- aggregates contract, integration, architecture, and security review reports
Real Model Integration
The adapter boundary is ModelRunner.
export interface ModelRunner {
runPlanner(requirement: string, context?: ModelRunnerPlannerContext): Promise<TaskDAG>;
runWorker(input: ModelRunnerWorkerInput): Promise<ModelRunnerWorkerOutput>;
runReviewer(input: ModelRunnerReviewerInput): Promise<ReviewResult>;
}Use CodexModelRunnerAdapter for real Codex SDK execution. The adapter starts
Codex threads for the planner, workers, and reviewers, consumes
thread.runStreamed(), forwards raw thread events into the orchestration
stream, and returns model usage from completed turns.
Custom adapters can implement the same interface for other model providers, internal routing layers, or offline test harnesses.
Development
npm install
npm test
npm run agent:demoBefore publishing:
npm run build
npm pack --dry-runAutomated npm publishing
Pushes to main run the GitHub Actions workflow in
.github/workflows/publish-npm.yml.
The workflow:
- installs dependencies with
pnpm install --frozen-lockfile - runs
pnpm test - checks whether the current
package.jsonversion already exists on npm - publishes with
npm publish --access public --provenanceonly when that version is not published yet
Configure the repository secret NPM_TOKEN with an npm automation token before
the first automated release. To release a new package version, update
package.json's version, then push or merge the change to main.
Current Mocked Or Experimental Parts
MockModelRunnerreturns a fixed task-cardTaskDAG.- Mock workers can return JSON patch files instead of real model edits.
- Workspace management supports mock copy mode and a git-worktree strategy boundary.
- Verification commands can be mocked unless
executeVerificationCommandsis enabled. - Review workers return structured mock review reports unless backed by a real
ModelRunner. - The orchestration API is still experimental and may change before a stable release.
Operational Notes
- Run against a clean target repository when possible.
runCodingTask()may check out the requested branch in the target repo.- Real workers run with
networkAccessEnabled: falseby default. - Planner and reviewer behavior depends on the configured model and adapter.
- Store
result.traceandresult.modelUsageif you need auditability, replay, or cost reporting. - Prefer the streamed API for user-facing products so callers can see planning, execution, verification, review, and raw thread progress.
License
MIT
