npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@worklab-ai/agent-runtime

v0.1.9

Published

Agent runtime supporting Claude SDK, Claude CLI, Codex CLI, and PI SDK out of the box

Readme

@worklab-ai/agent-runtime

Generic agent runtime that supports four backends out of the box:

  • Claude SDK (@anthropic-ai/claude-agent-sdk)
  • Claude Code CLI (the claude binary)
  • Pi SDK (@earendil-works/pi-agent-core, used for OpenAI / Codex / Gemini / OpenRouter / Ollama / etc. via Pi providers)
  • Codex CLI (the codex app-server)

Hosts wire in their own pricing, persistence, credential, and compaction-recording callbacks. The runtime returns raw text + raw structured output; hosts that want a domain-specific contract (e.g. worklab_result) parse it on their end.

See ARCHITECTURE.md for the package boundary, runtime selection flow, lifecycle diagrams, and host responsibilities.

Install

npm install @worklab-ai/agent-runtime

Peer requirements:

  • Node.js ≥ 20
  • claude CLI on PATH (only for executionMode: "cli" with claude SDK)
  • codex CLI on PATH (only for executionMode: "cli" with codex SDK; override via the codexAppServerCommand option)
  • ripgrep on PATH (or supplied via ripgrepPath) — required for the Glob and Grep built-in tools

Quick start

import { createRuntime } from "@worklab-ai/agent-runtime";

const runtime = createRuntime({
  // Host integration (all optional)
  workspace: "/path/to/repo",
  ripgrepPath: "/usr/bin/rg",
});

const result = await runtime.run("You are a helpful assistant.", {
  model: { sdk: "claude", model: "claude-sonnet-4-6" },
  executionMode: "sdk",
  messages: [{ role: "user", content: "Read README.md and summarize it." }],
  cwd: "/path/to/repo",
  allowedTools: ["Read", "Bash"],
  maxTurns: 10,
  onEvent: (event) => console.log(event.type),
});

console.log(result.text);

When to reach for this vs. other JS agent runtimes

@worklab-ai/agent-runtime is purpose-built for autonomous, long-running agent work with provider portability and operational resilience as first-class concerns. It is not a streaming-chat UI kit. Where each peer fits:

  • Vercel AI SDK — best when you're building a chat / generative-UI experience inside a React or Next.js app. useChat, useCompletion, streaming server components, and edge-runtime compatibility are their strengths. Their provider list is curated (Anthropic, OpenAI, Google, etc., via @ai-sdk/* packages); there's no Pi gateway, no Claude Code CLI, no Codex CLI app-server, and no per-call provider fallback. If you're rendering a streaming chat into a browser, use them. If you're orchestrating multi-turn autonomous work that must survive a rate-limited primary provider, use us.
  • Claude Agent SDK (@anthropic-ai/claude-agent-sdk) — first-party Anthropic SDK. Tight integration with Claude features (canUseTool, sub-agents, hooks, MCP). We wrap it as one of our four backends and add context compaction, transcript-resume across provider drops, a 22-kind failure taxonomy, a tool-bloat guard with artifact persistence, and a provider fallback router. Reach for the bare Anthropic SDK when you only ever talk to Claude and don't need cross-provider portability or resume.
  • Mastra — a workflow engine + memory + RAG stack. Different category: it's the layer above a runtime. You can layer Mastra workflows on top of @worklab-ai/agent-runtime if you want both.
  • OpenAI Agents SDK — first-party OpenAI SDK. Same trade-off as the Claude Agent SDK: tight integration with OpenAI, no other providers. Pi providers in our runtime cover OpenAI plus a dozen others through a single API.
  • LangChain.js — kitchen sink with deep abstraction stacks. We're deliberately lean; if you want chains, agents, vector stores, and parsers under one umbrella, LangChain is built for that. If you want a focused runtime kernel, use us.

What we natively bridge (no extra packages):

  • Anthropic Claude via the Claude Agent SDK (claude SDK).
  • Anthropic Claude via the claude Code CLI binary.
  • OpenAI's Codex via the codex app-server CLI.
  • OpenAI, Google Gemini, AWS Bedrock, OpenRouter, xAI, Groq, Mistral, Perplexity, DeepSeek, Ollama, LlamaCPP, GLM, Vercel AI Gateway, GitHub Copilot, Gemini CLI — all through the Pi (@earendil-works/pi-ai) provider gateway, which our SDK adapter speaks directly.

At-a-glance:

| Need | Use this | Use Vercel AI SDK | Use Claude Agent SDK | |---|---|---|---| | Streaming chat UI in React/Next | ✗ | ✓ | ✗ | | Multi-provider portability | ✓ (4 backends, 15+ providers) | partial | ✗ | | CLI providers (claude/codex binaries) | ✓ | ✗ | ✗ | | Provider fallback on rate limit / overload | ✓ (createRouterRuntime) | ✗ | ✗ | | Aggressive context compaction with summarization | ✓ | ✗ | partial | | Transcript-tail resume after provider drops | ✓ | ✗ | ✗ | | Tool-output bloat guard + artifact persistence | ✓ | ✗ | ✗ | | MCP transports out of the box (stdio/SSE/HTTP) | ✓ | partial | ✓ | | HITL approval gates with risk tiers | ✓ | ✗ | partial (canUseTool) | | Multi-subscriber observer with cost/cache metrics | ✓ | partial | partial | | Edge-runtime compatibility | ✗ | ✓ | partial |

Honest summary: if the agent runs without a human watching the screen for minutes-to-hours and must survive provider blips, this is the right tool. If a human is watching a streaming chat, Vercel's SDK is the right tool. Both can coexist in the same app.

Picking a backend

The runtime picks a backend from options.model + options.executionMode:

| model.sdk | executionMode | Backend | |---|---|---| | "claude" | "sdk" (or omitted) | Claude SDK | | "claude" | "cli" | claude CLI | | "pi" | any | Pi SDK | | "codex" | "cli" | Codex app-server CLI |

A model reference can be the parsed shape { sdk, model, provider? } or a string ("pi:openai:gpt-5.5", "claude:claude-sonnet-4-6", etc.) that you parse with the package's parseRuntimeModelReference helper.

createRuntime(host)

Pass host-level integration once at boot. All keys are optional.

createRuntime({
  // -- host callbacks --
  resolveCustomPricing,    // (parsed) => NormalizedPricing | null
  resolvePiApiKey,         // async (provider) => string | undefined
  persistArtifact,         // ({ filename, buffer, toolName, toolUseId }) => path | null
  onCompactionRecorded,    // (compactionRow) => void

  // -- tool runtime context (process-level config for the tool kernel) --
  workspace,               // primary allowed root for path-based tools
  repoRoot,                // secondary allowed root
  ripgrepPath,             // explicit path to `rg`; falls back to vendored binary, then PATH
  qaOutputDir,             // fallback dir for Playwright MCP filename routing

  // -- observers (multi-subscriber telemetry) --
  // Optional. Each observer receives every event the runtime emits.
  // Built-in createMetricsObserver() aggregates cost, cache hit rate, token
  // counts, tool-call counts, errors, and turn-latency percentiles.
  observers: [],

  // -- approval gates (HITL) --
  // Optional. When set, the runtime asks the host before every tool call
  // whose risk tier is "medium" or "high" (and not session-allowlisted).
  // See the "Approval gates" section below for the request/response shape.
  onToolApprovalRequest,           // async (req) => { decision, reason? }
  toolRiskTiers: { Bash: "high" }, // per-tool tier override (low|medium|high)
  approvalDefaultRiskTier: "medium",
  approvalTimeoutMs: 60_000,       // timeout → auto-deny
  approvalAlwaysAllowTools: [],    // start with these in session allowlist

  // -- host-customisable identity strings (all optional, defaults shown) --
  runtimeBrand: {
    schemaPrefix: "worklab",       // prefix for snapshot/result schema ids
    mcpClientName: "worklab",      // MCP client name reported to MCP servers
    mcpClientVersion: "0.1.0",     // MCP client version
    tempdirPrefix: "worklab-cli-", // mkdtemp prefix for CLI provider scratch dirs
    providerModelPrefix: "worklab",// id prefix for custom Pi providers
    doctorCommand: "worklab doctor", // command suggested in tool error messages
    serviceName: "worklab",        // Codex app-server serviceName
    clientInfoName: "worklab",     // Codex app-server clientInfo.name
    clientInfoTitle: "Worklab",    // Codex app-server clientInfo.title
  },
});

runtimeBrand lets an external host reskin the package without forking string-by-string. Defaults preserve worklab strings, so worklab itself doesn't need to set anything.

Returns:

  • run(systemPrompt, options) — async, runs one agent turn against the chosen backend.
  • configureTools(next) — update the tool runtime context after construction.

runtime.run(systemPrompt, options)

Per-call options (a non-exhaustive selection):

| Option | Type | Notes | |---|---|---| | model | object \| string | Required. See "Picking a backend". | | executionMode | "sdk" \| "cli" | Default "sdk". | | messages | Message[] | Conversation history. | | cwd | string | Working directory for the agent's tools. | | allowedTools | string[] | Built-in tool allowlist. Default: all. | | disallowedTools | string[] | Block list. | | mcpServers | Record<string, McpServerConfig> | Configured MCP servers (stdio / sse / http). | | maxTurns | number | Hard cap on agent turns. | | outputSchema | JSONSchema | If set, the agent is asked to produce structured JSON matching this schema. The result lands in result.structuredResult. | | abortSignal | AbortSignal | Cancel the run. | | liveInput | LiveInputQueue | Stream of in-flight user messages (for human-in-the-loop steering). | | onEvent | (event) => void | Fired for every event the provider emits (assistant text, tool calls/results, runtime warnings, structured output). | | runId | string | Tag this run for downstream callbacks (e.g. onCompactionRecorded). | | providerSessionId | string | Resume a prior provider session. | | runArtifactDir | string | Used by some providers as the Playwright MCP filename target. | | piCodexTransport | string | Forwarded to Pi when running OpenAI Codex models. | | codexAppServerCommand | string | Override the Codex CLI binary. | | codexAppServerArgs | string[] | Override the Codex CLI arguments. |

Returns:

{
  text: string,                     // raw assistant text
  structuredResult?: any,           // JSON returned via outputSchema (if any)
  structuredResultSource?: string,  // where structuredResult came from
  events: RuntimeEvent[],           // full event stream (for host-side parsing)
  usage: {
    input_tokens, output_tokens,
    cache_read_tokens, cache_creation_tokens,
    cost_usd,
  },
  durationMs: number,
  numTurns: number,
  model: string,
  effort: string,
  sdk: "claude" | "pi" | "codex",
  cancelled: boolean,
  error: string | null,
  errorDetails: object | null,
  failureKind: string | null,
  providerSessionId: string | null,
  runtimeWarnings: RuntimeWarning[],
  diagnostics: object,
  capabilitiesUsed: {                  // what the backend actually did this call
    prompt_cache_active: true|false|null,
    thinking_enabled: true|false|null,
    structured_output_enforced: boolean,
    subagent_invoked: true|false|null,
    mcp_servers_used: string[],
    native_subagents_used: string[],
    tool_compaction_applied: boolean,
    context_compaction_applied: true|false|null,
  },
}

capabilitiesUsed is the per-call complement to runtimeCapabilities(). Tristate fields use null to mean "this provider can't tell" — distinct from false ("definitely off"). It's also emitted as a capabilities_resolved event near the end of the run, so observers can capture it without inspecting the result object.

Built-in tools

The agent kernel ships with: Read, Write, Edit, Glob, Grep, Bash, WebFetch, WebSearch. You select via allowedTools. Tool implementations honor:

  • cwd (required for path-based tools)
  • The runtime context's workspace / repoRoot allow-list (paths outside both, plus /tmp and process.cwd(), are rejected)
  • Output truncation with optional artifact persistence ({toolArtifactDir}/tool-output/{runId}/... when toolArtifactDir is configured)

Override or extend the tool surface by passing mcpServers for MCP-backed tools.

Structured output

Pass options.outputSchema (a JSON Schema). On Claude SDK / Codex app-server / Pi SDK, the runtime wires the schema into the provider's structured-output API. The matched JSON lands in result.structuredResult.

The package does not validate structuredResult against your schema — it only forwards what the provider produced. Hosts run their own validation (Zod, AJV, etc.).

Provider fallback router

createRouterRuntime({ host, chain }) wraps the standard runtime with an ordered chain of model references. On a retryable provider failure (rate limit, overload, network blip — classified via the same taxonomy as retryableProviderFailureInfo), it retries the same logical run against the next chain entry, replaying the transcript-tail snapshot of the previous attempt so the next provider continues rather than starts over.

import { createRouterRuntime } from "@worklab-ai/agent-runtime";

const router = createRouterRuntime({
  host: { /* same shape as createRuntime */ },
  chain: [
    { sdk: "claude", model: "claude-opus-4-7" },
    { sdk: "claude", model: "claude-sonnet-4-6" },
    { model: { sdk: "pi", provider: "openai", model: "gpt-5.5" }, requires: { structured_output: true } },
  ],
});

const result = await router.run("...", { /* same shape as runtime.run */ });
console.log(result.failoverHistory);
// [{ model, failureKind, requestId, retryableSubkind }, ...]  one entry per attempt that didn't succeed.

Behaviour:

  • Successful run on entry N → returns the result with failoverHistory set to attempts 0..N-1.
  • Retryable failure → emits provider_failover_started, builds a transcript snapshot, and retries on the next entry.
  • Non-retryable failure (auth, billing, invalid request) → returns immediately with failoverHistory containing the one attempt.
  • Cancellation → returns immediately.
  • Chain exhausted → failureKind: "provider_unavailable_exhausted", failoverHistory lists every attempt.

Chain entries can require backend capabilities via requires: { structured_output: true, supports_mcp: true, ... }; entries that don't satisfy the requirements are skipped (logged in failoverHistory as failureKind: "skipped_capability_mismatch").

Observers & metrics

The runtime emits structured events for everything that happens during a run — assistant messages, tool calls, runtime warnings, cache hits/misses, cost updates, provider request start/end, approval lifecycle. Hosts can subscribe via host.observers[] (any number) or the simpler options.onEvent callback (one subscriber). Both work simultaneously.

A built-in aggregator covers the common metrics:

import { createRuntime, createMetricsObserver } from "@worklab-ai/agent-runtime";

const metrics = createMetricsObserver();
const runtime = createRuntime({ observers: [metrics] });

await runtime.run("...", { model: { sdk: "claude", model: "claude-sonnet-4-6" } });

console.log(metrics.snapshot());
// {
//   events: { total, byType: { tool_use: 5, assistant: 8, ... } },
//   tokens: { input, output, cacheReadTokens, cacheCreationTokens },
//   cost: { cumulativeUsd },
//   cache: { hits, misses, hitRatio, readTokensFromEvents },
//   tools: { callsByName: { Bash: 3, Read: 2 }, errorsByName: { ... } },
//   errors: { total, byKind: { provider_unavailable: 1 } },
//   turns: { count, latencyMsP50, latencyMsP95 },
//   approvals: { pending, granted, denied },
// }

Custom observers implement { recordEvent(event), recordMetric(metric)?, flush()? }. Fan-out is synchronous on the hot path; observers that need to do I/O must buffer internally.

Notable new events emitted by the bridges:

  • provider_request_started / _completed — at the boundary of each LLM call (sdk, model, runtime, timestamp, durationMs).
  • cache_hit / cache_miss — when the provider reports cached / cache-creation input tokens.
  • cost_accumulated — running cost in USD with cumulative token breakdown.

Approval gates (human-in-the-loop)

Pass onToolApprovalRequest to gate tool calls behind a runtime approval. The runtime calls your callback once per tool invocation whose risk tier requires it, and pauses the agent until you respond.

const runtime = createRuntime({
  toolRiskTiers: { Bash: "high", Read: "low" },
  async onToolApprovalRequest(req) {
    // req = { requestId, toolName, toolUseId, argumentsSummary, riskTier, model }
    // argumentsSummary is already secret-redacted (API keys, Bearer tokens,
    // and known JSON fields like "api_key" / "password" stripped).
    if (req.toolName === "Bash" && req.argumentsSummary.includes("rm -rf")) {
      return { decision: "deny", reason: "destructive" };
    }
    return { decision: "approve" };
  },
});

Tiers (configurable per tool):

  • low — auto-approved; the callback is not called.
  • medium (default) — calls the host; if no callback is supplied, auto-approves.
  • high — calls the host; if no callback is supplied, fails closed (deny).

Responses:

  • { decision: "approve" } — allow this call.
  • { decision: "deny", reason? } — block; the agent receives a tool error.
  • { decision: "always" } — allow + session-allowlist for the run.

Backend coverage: Claude SDK (via canUseTool) and Pi SDK (via tool dispatch wrapping). Claude CLI and Codex CLI bridge into their backend's own approval models (permissionMode / approvalPolicy) — per-call runtime gates aren't available there.

Approval lifecycle is observable via onEvent:

  • tool_approval_pending — emitted before calling the host.
  • tool_approval_granted — host approved.
  • tool_approval_denied — host denied, timed out, threw, or no callback for a high-risk tool.

Tool-result bloat handling

@worklab-ai/agent-runtime/agent/tool-bloat.js enforces a 256 KB default cap per tool_result. When a payload exceeds the cap, the kernel:

  1. Calls your persistArtifact({ filename, buffer, toolName, toolUseId }) callback (if you supplied one).
  2. Substitutes a compact text reference in the agent's transcript.
  3. Emits a runtime_warning with warning_kind: "tool_payload_truncated" and the saved-paths array.

Hosts that don't supply persistArtifact get the truncation summary but no on-disk capture.

Context compaction

@worklab-ai/agent-runtime/agent/compaction.js provides createAgentCompactionManager(...) which the Pi SDK provider invokes automatically. Configure via the agent's settings (agent_compaction_* keys). When a compaction completes, the kernel hands a structured row to your onCompactionRecorded(record) callback so the host can persist it however it likes.

Advanced exports

The package exposes its inner pieces via subpath imports:

import { resolveRuntimeBridge, listRuntimeBridges, runtimeCapabilities } from "@worklab-ai/agent-runtime/ai/runtime/registry.js";
import { generateClaudeResponse } from "@worklab-ai/agent-runtime/ai/providers/claude-sdk.js";
import { createAgentCompactionManager, estimateFirstTurnInput } from "@worklab-ai/agent-runtime/agent/compaction.js";
import { configureToolRuntime, readToolRuntime } from "@worklab-ai/agent-runtime/agent/tools/shared/runtime-context.js";
// ...

These are stable but treated as advanced API. Most consumers should reach for createRuntime first.

Example consumer

See examples/echo-agent/ for a runnable consumer that imports @worklab-ai/agent-runtime, runs a single Claude SDK turn with the Bash tool, and prints the result.

License

GPL-3.0-only.