@worklab-ai/agent-runtime
v0.1.9
Published
Agent runtime supporting Claude SDK, Claude CLI, Codex CLI, and PI SDK out of the box
Maintainers
Readme
@worklab-ai/agent-runtime
Generic agent runtime that supports four backends out of the box:
- Claude SDK (
@anthropic-ai/claude-agent-sdk) - Claude Code CLI (the
claudebinary) - Pi SDK (
@earendil-works/pi-agent-core, used for OpenAI / Codex / Gemini / OpenRouter / Ollama / etc. via Pi providers) - Codex CLI (the
codexapp-server)
Hosts wire in their own pricing, persistence, credential, and compaction-recording callbacks. The runtime returns raw text + raw structured output; hosts that want a domain-specific contract (e.g. worklab_result) parse it on their end.
See ARCHITECTURE.md for the package boundary, runtime selection flow, lifecycle diagrams, and host responsibilities.
Install
npm install @worklab-ai/agent-runtimePeer requirements:
- Node.js ≥ 20
claudeCLI on PATH (only forexecutionMode: "cli"withclaudeSDK)codexCLI on PATH (only forexecutionMode: "cli"withcodexSDK; override via thecodexAppServerCommandoption)ripgrepon PATH (or supplied viaripgrepPath) — required for theGlobandGrepbuilt-in tools
Quick start
import { createRuntime } from "@worklab-ai/agent-runtime";
const runtime = createRuntime({
// Host integration (all optional)
workspace: "/path/to/repo",
ripgrepPath: "/usr/bin/rg",
});
const result = await runtime.run("You are a helpful assistant.", {
model: { sdk: "claude", model: "claude-sonnet-4-6" },
executionMode: "sdk",
messages: [{ role: "user", content: "Read README.md and summarize it." }],
cwd: "/path/to/repo",
allowedTools: ["Read", "Bash"],
maxTurns: 10,
onEvent: (event) => console.log(event.type),
});
console.log(result.text);When to reach for this vs. other JS agent runtimes
@worklab-ai/agent-runtime is purpose-built for autonomous, long-running agent work with provider portability and operational resilience as first-class concerns. It is not a streaming-chat UI kit. Where each peer fits:
- Vercel AI SDK — best when you're building a chat / generative-UI experience inside a React or Next.js app.
useChat,useCompletion, streaming server components, and edge-runtime compatibility are their strengths. Their provider list is curated (Anthropic, OpenAI, Google, etc., via@ai-sdk/*packages); there's no Pi gateway, no Claude Code CLI, no Codex CLI app-server, and no per-call provider fallback. If you're rendering a streaming chat into a browser, use them. If you're orchestrating multi-turn autonomous work that must survive a rate-limited primary provider, use us. - Claude Agent SDK (
@anthropic-ai/claude-agent-sdk) — first-party Anthropic SDK. Tight integration with Claude features (canUseTool, sub-agents, hooks, MCP). We wrap it as one of our four backends and add context compaction, transcript-resume across provider drops, a 22-kind failure taxonomy, a tool-bloat guard with artifact persistence, and a provider fallback router. Reach for the bare Anthropic SDK when you only ever talk to Claude and don't need cross-provider portability or resume. - Mastra — a workflow engine + memory + RAG stack. Different category: it's the layer above a runtime. You can layer Mastra workflows on top of
@worklab-ai/agent-runtimeif you want both. - OpenAI Agents SDK — first-party OpenAI SDK. Same trade-off as the Claude Agent SDK: tight integration with OpenAI, no other providers. Pi providers in our runtime cover OpenAI plus a dozen others through a single API.
- LangChain.js — kitchen sink with deep abstraction stacks. We're deliberately lean; if you want chains, agents, vector stores, and parsers under one umbrella, LangChain is built for that. If you want a focused runtime kernel, use us.
What we natively bridge (no extra packages):
- Anthropic Claude via the Claude Agent SDK (
claudeSDK). - Anthropic Claude via the
claudeCode CLI binary. - OpenAI's Codex via the
codexapp-server CLI. - OpenAI, Google Gemini, AWS Bedrock, OpenRouter, xAI, Groq, Mistral, Perplexity, DeepSeek, Ollama, LlamaCPP, GLM, Vercel AI Gateway, GitHub Copilot, Gemini CLI — all through the Pi (
@earendil-works/pi-ai) provider gateway, which our SDK adapter speaks directly.
At-a-glance:
| Need | Use this | Use Vercel AI SDK | Use Claude Agent SDK |
|---|---|---|---|
| Streaming chat UI in React/Next | ✗ | ✓ | ✗ |
| Multi-provider portability | ✓ (4 backends, 15+ providers) | partial | ✗ |
| CLI providers (claude/codex binaries) | ✓ | ✗ | ✗ |
| Provider fallback on rate limit / overload | ✓ (createRouterRuntime) | ✗ | ✗ |
| Aggressive context compaction with summarization | ✓ | ✗ | partial |
| Transcript-tail resume after provider drops | ✓ | ✗ | ✗ |
| Tool-output bloat guard + artifact persistence | ✓ | ✗ | ✗ |
| MCP transports out of the box (stdio/SSE/HTTP) | ✓ | partial | ✓ |
| HITL approval gates with risk tiers | ✓ | ✗ | partial (canUseTool) |
| Multi-subscriber observer with cost/cache metrics | ✓ | partial | partial |
| Edge-runtime compatibility | ✗ | ✓ | partial |
Honest summary: if the agent runs without a human watching the screen for minutes-to-hours and must survive provider blips, this is the right tool. If a human is watching a streaming chat, Vercel's SDK is the right tool. Both can coexist in the same app.
Picking a backend
The runtime picks a backend from options.model + options.executionMode:
| model.sdk | executionMode | Backend |
|---|---|---|
| "claude" | "sdk" (or omitted) | Claude SDK |
| "claude" | "cli" | claude CLI |
| "pi" | any | Pi SDK |
| "codex" | "cli" | Codex app-server CLI |
A model reference can be the parsed shape { sdk, model, provider? } or a string ("pi:openai:gpt-5.5", "claude:claude-sonnet-4-6", etc.) that you parse with the package's parseRuntimeModelReference helper.
createRuntime(host)
Pass host-level integration once at boot. All keys are optional.
createRuntime({
// -- host callbacks --
resolveCustomPricing, // (parsed) => NormalizedPricing | null
resolvePiApiKey, // async (provider) => string | undefined
persistArtifact, // ({ filename, buffer, toolName, toolUseId }) => path | null
onCompactionRecorded, // (compactionRow) => void
// -- tool runtime context (process-level config for the tool kernel) --
workspace, // primary allowed root for path-based tools
repoRoot, // secondary allowed root
ripgrepPath, // explicit path to `rg`; falls back to vendored binary, then PATH
qaOutputDir, // fallback dir for Playwright MCP filename routing
// -- observers (multi-subscriber telemetry) --
// Optional. Each observer receives every event the runtime emits.
// Built-in createMetricsObserver() aggregates cost, cache hit rate, token
// counts, tool-call counts, errors, and turn-latency percentiles.
observers: [],
// -- approval gates (HITL) --
// Optional. When set, the runtime asks the host before every tool call
// whose risk tier is "medium" or "high" (and not session-allowlisted).
// See the "Approval gates" section below for the request/response shape.
onToolApprovalRequest, // async (req) => { decision, reason? }
toolRiskTiers: { Bash: "high" }, // per-tool tier override (low|medium|high)
approvalDefaultRiskTier: "medium",
approvalTimeoutMs: 60_000, // timeout → auto-deny
approvalAlwaysAllowTools: [], // start with these in session allowlist
// -- host-customisable identity strings (all optional, defaults shown) --
runtimeBrand: {
schemaPrefix: "worklab", // prefix for snapshot/result schema ids
mcpClientName: "worklab", // MCP client name reported to MCP servers
mcpClientVersion: "0.1.0", // MCP client version
tempdirPrefix: "worklab-cli-", // mkdtemp prefix for CLI provider scratch dirs
providerModelPrefix: "worklab",// id prefix for custom Pi providers
doctorCommand: "worklab doctor", // command suggested in tool error messages
serviceName: "worklab", // Codex app-server serviceName
clientInfoName: "worklab", // Codex app-server clientInfo.name
clientInfoTitle: "Worklab", // Codex app-server clientInfo.title
},
});runtimeBrand lets an external host reskin the package without forking string-by-string. Defaults preserve worklab strings, so worklab itself doesn't need to set anything.
Returns:
run(systemPrompt, options)— async, runs one agent turn against the chosen backend.configureTools(next)— update the tool runtime context after construction.
runtime.run(systemPrompt, options)
Per-call options (a non-exhaustive selection):
| Option | Type | Notes |
|---|---|---|
| model | object \| string | Required. See "Picking a backend". |
| executionMode | "sdk" \| "cli" | Default "sdk". |
| messages | Message[] | Conversation history. |
| cwd | string | Working directory for the agent's tools. |
| allowedTools | string[] | Built-in tool allowlist. Default: all. |
| disallowedTools | string[] | Block list. |
| mcpServers | Record<string, McpServerConfig> | Configured MCP servers (stdio / sse / http). |
| maxTurns | number | Hard cap on agent turns. |
| outputSchema | JSONSchema | If set, the agent is asked to produce structured JSON matching this schema. The result lands in result.structuredResult. |
| abortSignal | AbortSignal | Cancel the run. |
| liveInput | LiveInputQueue | Stream of in-flight user messages (for human-in-the-loop steering). |
| onEvent | (event) => void | Fired for every event the provider emits (assistant text, tool calls/results, runtime warnings, structured output). |
| runId | string | Tag this run for downstream callbacks (e.g. onCompactionRecorded). |
| providerSessionId | string | Resume a prior provider session. |
| runArtifactDir | string | Used by some providers as the Playwright MCP filename target. |
| piCodexTransport | string | Forwarded to Pi when running OpenAI Codex models. |
| codexAppServerCommand | string | Override the Codex CLI binary. |
| codexAppServerArgs | string[] | Override the Codex CLI arguments. |
Returns:
{
text: string, // raw assistant text
structuredResult?: any, // JSON returned via outputSchema (if any)
structuredResultSource?: string, // where structuredResult came from
events: RuntimeEvent[], // full event stream (for host-side parsing)
usage: {
input_tokens, output_tokens,
cache_read_tokens, cache_creation_tokens,
cost_usd,
},
durationMs: number,
numTurns: number,
model: string,
effort: string,
sdk: "claude" | "pi" | "codex",
cancelled: boolean,
error: string | null,
errorDetails: object | null,
failureKind: string | null,
providerSessionId: string | null,
runtimeWarnings: RuntimeWarning[],
diagnostics: object,
capabilitiesUsed: { // what the backend actually did this call
prompt_cache_active: true|false|null,
thinking_enabled: true|false|null,
structured_output_enforced: boolean,
subagent_invoked: true|false|null,
mcp_servers_used: string[],
native_subagents_used: string[],
tool_compaction_applied: boolean,
context_compaction_applied: true|false|null,
},
}capabilitiesUsed is the per-call complement to runtimeCapabilities(). Tristate fields use null to mean "this provider can't tell" — distinct from false ("definitely off"). It's also emitted as a capabilities_resolved event near the end of the run, so observers can capture it without inspecting the result object.
Built-in tools
The agent kernel ships with: Read, Write, Edit, Glob, Grep, Bash, WebFetch, WebSearch. You select via allowedTools. Tool implementations honor:
cwd(required for path-based tools)- The runtime context's
workspace/repoRootallow-list (paths outside both, plus/tmpandprocess.cwd(), are rejected) - Output truncation with optional artifact persistence (
{toolArtifactDir}/tool-output/{runId}/...whentoolArtifactDiris configured)
Override or extend the tool surface by passing mcpServers for MCP-backed tools.
Structured output
Pass options.outputSchema (a JSON Schema). On Claude SDK / Codex app-server / Pi SDK, the runtime wires the schema into the provider's structured-output API. The matched JSON lands in result.structuredResult.
The package does not validate structuredResult against your schema — it only forwards what the provider produced. Hosts run their own validation (Zod, AJV, etc.).
Provider fallback router
createRouterRuntime({ host, chain }) wraps the standard runtime with an ordered chain of model references. On a retryable provider failure (rate limit, overload, network blip — classified via the same taxonomy as retryableProviderFailureInfo), it retries the same logical run against the next chain entry, replaying the transcript-tail snapshot of the previous attempt so the next provider continues rather than starts over.
import { createRouterRuntime } from "@worklab-ai/agent-runtime";
const router = createRouterRuntime({
host: { /* same shape as createRuntime */ },
chain: [
{ sdk: "claude", model: "claude-opus-4-7" },
{ sdk: "claude", model: "claude-sonnet-4-6" },
{ model: { sdk: "pi", provider: "openai", model: "gpt-5.5" }, requires: { structured_output: true } },
],
});
const result = await router.run("...", { /* same shape as runtime.run */ });
console.log(result.failoverHistory);
// [{ model, failureKind, requestId, retryableSubkind }, ...] one entry per attempt that didn't succeed.Behaviour:
- Successful run on entry N → returns the result with
failoverHistoryset to attempts 0..N-1. - Retryable failure → emits
provider_failover_started, builds a transcript snapshot, and retries on the next entry. - Non-retryable failure (auth, billing, invalid request) → returns immediately with
failoverHistorycontaining the one attempt. - Cancellation → returns immediately.
- Chain exhausted →
failureKind: "provider_unavailable_exhausted",failoverHistorylists every attempt.
Chain entries can require backend capabilities via requires: { structured_output: true, supports_mcp: true, ... }; entries that don't satisfy the requirements are skipped (logged in failoverHistory as failureKind: "skipped_capability_mismatch").
Observers & metrics
The runtime emits structured events for everything that happens during a run — assistant messages, tool calls, runtime warnings, cache hits/misses, cost updates, provider request start/end, approval lifecycle. Hosts can subscribe via host.observers[] (any number) or the simpler options.onEvent callback (one subscriber). Both work simultaneously.
A built-in aggregator covers the common metrics:
import { createRuntime, createMetricsObserver } from "@worklab-ai/agent-runtime";
const metrics = createMetricsObserver();
const runtime = createRuntime({ observers: [metrics] });
await runtime.run("...", { model: { sdk: "claude", model: "claude-sonnet-4-6" } });
console.log(metrics.snapshot());
// {
// events: { total, byType: { tool_use: 5, assistant: 8, ... } },
// tokens: { input, output, cacheReadTokens, cacheCreationTokens },
// cost: { cumulativeUsd },
// cache: { hits, misses, hitRatio, readTokensFromEvents },
// tools: { callsByName: { Bash: 3, Read: 2 }, errorsByName: { ... } },
// errors: { total, byKind: { provider_unavailable: 1 } },
// turns: { count, latencyMsP50, latencyMsP95 },
// approvals: { pending, granted, denied },
// }Custom observers implement { recordEvent(event), recordMetric(metric)?, flush()? }. Fan-out is synchronous on the hot path; observers that need to do I/O must buffer internally.
Notable new events emitted by the bridges:
provider_request_started/_completed— at the boundary of each LLM call (sdk, model, runtime, timestamp, durationMs).cache_hit/cache_miss— when the provider reports cached / cache-creation input tokens.cost_accumulated— running cost in USD with cumulative token breakdown.
Approval gates (human-in-the-loop)
Pass onToolApprovalRequest to gate tool calls behind a runtime approval. The runtime calls your callback once per tool invocation whose risk tier requires it, and pauses the agent until you respond.
const runtime = createRuntime({
toolRiskTiers: { Bash: "high", Read: "low" },
async onToolApprovalRequest(req) {
// req = { requestId, toolName, toolUseId, argumentsSummary, riskTier, model }
// argumentsSummary is already secret-redacted (API keys, Bearer tokens,
// and known JSON fields like "api_key" / "password" stripped).
if (req.toolName === "Bash" && req.argumentsSummary.includes("rm -rf")) {
return { decision: "deny", reason: "destructive" };
}
return { decision: "approve" };
},
});Tiers (configurable per tool):
- low — auto-approved; the callback is not called.
- medium (default) — calls the host; if no callback is supplied, auto-approves.
- high — calls the host; if no callback is supplied, fails closed (deny).
Responses:
{ decision: "approve" }— allow this call.{ decision: "deny", reason? }— block; the agent receives a tool error.{ decision: "always" }— allow + session-allowlist for the run.
Backend coverage: Claude SDK (via canUseTool) and Pi SDK (via tool dispatch wrapping). Claude CLI and Codex CLI bridge into their backend's own approval models (permissionMode / approvalPolicy) — per-call runtime gates aren't available there.
Approval lifecycle is observable via onEvent:
tool_approval_pending— emitted before calling the host.tool_approval_granted— host approved.tool_approval_denied— host denied, timed out, threw, or no callback for a high-risk tool.
Tool-result bloat handling
@worklab-ai/agent-runtime/agent/tool-bloat.js enforces a 256 KB default cap per tool_result. When a payload exceeds the cap, the kernel:
- Calls your
persistArtifact({ filename, buffer, toolName, toolUseId })callback (if you supplied one). - Substitutes a compact text reference in the agent's transcript.
- Emits a
runtime_warningwithwarning_kind: "tool_payload_truncated"and the saved-paths array.
Hosts that don't supply persistArtifact get the truncation summary but no on-disk capture.
Context compaction
@worklab-ai/agent-runtime/agent/compaction.js provides createAgentCompactionManager(...) which the Pi SDK provider invokes automatically. Configure via the agent's settings (agent_compaction_* keys). When a compaction completes, the kernel hands a structured row to your onCompactionRecorded(record) callback so the host can persist it however it likes.
Advanced exports
The package exposes its inner pieces via subpath imports:
import { resolveRuntimeBridge, listRuntimeBridges, runtimeCapabilities } from "@worklab-ai/agent-runtime/ai/runtime/registry.js";
import { generateClaudeResponse } from "@worklab-ai/agent-runtime/ai/providers/claude-sdk.js";
import { createAgentCompactionManager, estimateFirstTurnInput } from "@worklab-ai/agent-runtime/agent/compaction.js";
import { configureToolRuntime, readToolRuntime } from "@worklab-ai/agent-runtime/agent/tools/shared/runtime-context.js";
// ...These are stable but treated as advanced API. Most consumers should reach for createRuntime first.
Example consumer
See examples/echo-agent/ for a runnable consumer that imports @worklab-ai/agent-runtime, runs a single Claude SDK turn with the Bash tool, and prints the result.
License
GPL-3.0-only.
