@agent-compose/sdk

v0.5.7

Published

6 days ago

Client library for agent-compose — define agents, runtimes, and workflows, and invoke them against an agent-compose server.

0High
0Medium
0Low

chris-moller

@agent-compose/sdk

TypeScript SDK for agent-compose. Use it to:

Author workflows that run agentic LLM loops inside isolated sandboxes
Define runtimes that wrap a coding-CLI tool (Claude Code, OpenAI Desktop, …) into a sandbox-portable agent loop
Register, invoke, observe, and cancel workflows via the HTTP API (AgentComposeClient)
Manage factories, secrets, API keys, and snapshots programmatically

The hierarchy: a team owns one or more factories (project containers); each factory owns workflow templates, secrets, and runs. Workflows are versioned per (factory, name, version). New code that doesn't care about factories transparently lands in default — every team has one.

Installation

npm install @agent-compose/sdk
# peer dep:
npm install zod

Authoring a workflow

A workflow is async (ctx, sandbox) => T. Two positional args:

ctx carries the run identity (run.id), the caller's input, plus observability helpers (setMetadata, step).
sandbox is a capability the engine constructs once for the run — pass it to agent({ sandbox, ... }) and to any helper that takes a SandboxProvider (file writers, git utilities, command runners).

// my-workflow.ts
import { defineWorkflow, agent, claudeRuntime } from "@agent-compose/sdk";
import PROMPT from "./prompt.md" with { type: "text" };

export default defineWorkflow({
  async run(ctx, sandbox) {
    const repo = (ctx.input?.repo as string | undefined) ?? "owner/repo";

    const result = await agent({
      sandbox,
      runtime:    claudeRuntime,
      prompt:     `${PROMPT}\n\nRepository: ${repo}`,
      tools:      ["Bash", "Read", "Edit", "Write", "Grep", "Glob"],
      budget:     { turnsPerIteration: 40, maxIterations: 8 },
    });

    await ctx.setMetadata({ summary: result.status?.summary });
    return { ok: result.status?.completed ?? false };
  },
  // Optional: outbound network rules that the runner sandbox will enforce
  // (Vercel only — E2B ignores). Use `$VAR` placeholders for secrets that
  // get resolved from the per-workflow secret store at dispatch time.
  networkPolicy: {
    allow: {
      "*": [],
      "api.anthropic.com": [{ transform: [{ headers: { "x-api-key": "$ANTHROPIC_API_KEY" } }] }],
    },
  },
});

defineWorkflow is a thin sugar — it returns the bare run function with networkPolicy / placeholders / snapshots attached as metadata that the bundler picks up at registration time. A plain export default async (ctx, sandbox) => {...} is also valid; you just lose the metadata channel.

What the workflow can do with `ctx`

interface WorkflowCtx {
  run:         { id: string };
  input?:      Record<string, unknown>;
  setMetadata: (data: Record<string, unknown>) => Promise<void>;
  step<T>(name: string, fn: () => Promise<T>): Promise<T>;
}

step("phase-name", () => …) wraps a phase for the run timeline — emits step_started / step_completed / step_failed lifecycle events with duration. Use it for setup, external API calls, or anything you want visible on the dashboard's run detail page.

The `agent` loop

agent({
  sandbox,            // the workflow's sandbox arg
  runtime,            // claudeRuntime, or your own via createClaudeRuntime / defineRuntime
  prompt,             // raw markdown — `--- frontmatter ---` is auto-stripped
  tools?,             // model tool allowlist (defaults inside agentLoop)
  budget?,            // { turnsPerIteration, maxIterations }
  workingDir?,        // every shell command runs here
  responseSchema?,    // zod — when set, the loop demands a `<response>` block on exit
  onAgentEvent?,      // per-message hook (e.g. wire to telemetry)
  onIteration?,       // per-iteration hook with parsed `<status>` block
})
// → AgentLoopResult { status?, response? (when responseSchema set), iterations, … }

The protocol is simple: the model emits XML-tagged blocks (<status> / <response>) the loop parses. See sdk/src/agent/protocol-suffix.md for the full instructions appended to every prompt.

Defining a runtime

A "runtime" wraps an agent's underlying execution model — usually a coding CLI like Claude Code or OpenAI Desktop — so agent can drive it. The SDK ships built-ins; you only need a custom one for an exotic provider.

Built-in runtimes

import {
  createClaudeRuntime,    // factory, takes config
  claudeRuntime,           // pre-built default (DEFAULT_CLAUDE_MODEL)
  ClaudeRunner,            // class, if you need to override
} from "@agent-compose/sdk";

// openAIDesktopRuntime is NOT in the package root (it pulls in `sharp` for
// screenshot capture; the native binding can't be cross-compiled). Import
// directly when you actually want the desktop runtime:
import openAIDesktopRuntime from "@agent-compose/sdk/runtimes/openai-desktop.js";

Custom runtime

import { defineRuntime, type AgentRuntime } from "@agent-compose/sdk";

const myRuntime: AgentRuntime = defineRuntime({
  create: (sandbox, opts) => {
    // Return a ModelExecutionContract — see sdk/src/types/runtime.ts
    return {
      sendMessage({ prompt, sessionId, signal }) {
        // Async generator that yields AgentMessage chunks the loop parses.
        return /* … */;
      },
    };
  },
});

AgentRuntime is a tagged record with create(sandbox, RuntimeOptions) → ModelExecutionContract. There is no provider field on it — the runtime is bound to the workflow at author time (you pass it to agent), not selected by the server.

Registering a workflow

The agentc CLI handles the bundling-and-registration step for you:

agentc register my-workflow.ts -n my-workflow

Under the hood that calls bundleWorkflow(workflowPath) (resolves imports, inlines runtime sources via dynamic-require traversal) and POST /api/v1/factories/<slug>/templates with the bundled source. If you need to drive registration from your own build pipeline, you can do the same thing via the SDK directly:

import { AgentComposeClient, bundleWorkflow } from "@agent-compose/sdk";

const client = new AgentComposeClient(
  "https://your-server.example.com",
  process.env.AGENT_COMPOSE_API_KEY!,
);

const bundled = await bundleWorkflow("./my-workflow.ts");
await client.register({
  name:        "my-workflow",
  source:      bundled.source,
  runtimes:    bundled.runtimes,        // [{ name, source }] — embedded so the runner has them locally
  schedule:    "*/30 * * * *",          // optional cron
  factorySlug: "default",                // optional — defaults to "default"
  // snapshots, networkPolicy, placeholders — all optional
});

register() requires the caller's API key to carry the admin scope (or full team-access for legacy keys without scopes).

Invoking a workflow

Two flavours:

// Fire-and-forget — returns the run id immediately.
const { id } = await client.invoke("my-workflow", {
  repo: "owner/repo",
});

// Block until the run settles (default 30min timeout, 1s poll).
const status = await client.invokeAndWait("my-workflow", { repo: "owner/repo" }, {
  timeoutMs:      5 * 60_000,
  pollIntervalMs: 2000,
});
console.log(status.status);   // "success" | "failed" | "abandoned" | "canceled"
console.log(status.output);   // workflow's return value

output is the workflow's run() return value (whatever defineWorkflow({ async run() { return … } }) resolves to). setMetadata() writes to a separate metadata field — useful for "side-channel" facts (PR url, plan url) without polluting the structured return.

invoke and invokeAndWait both accept { factorySlug, snapshots, networkPolicy, placeholders, parentRunId } as the third argument. Per-invocation snapshots merges field-by-field with the registered default. factorySlug defaults to "default".

Auto parent/child tracing

The SDK detects process.env.RUN_ID (set by the runner sandbox on every dispatch) and automatically threads it as parentRunId on subsequent invoke() calls. Workflows that fan out to other workflows get a parent/child tree in the dashboard for free. Pass parentRunId: null to opt out.

Cancelling a run

await client.cancelRun(runId);

Idempotent — cancelling an already-terminal run returns the current state without throwing. The server stamps the run as canceled, kills any live sandboxes, and emits a run_canceled event on the stream.

Streaming live logs

streamRunLogs returns an async generator of RunEvents in real time, re-attaching via SSE under the hood. Pass lastEventId (the highest seq you've already processed) to resume after a reconnect.

for await (const ev of client.streamRunLogs(runId, { lastEventId: 0 })) {
  console.log(ev.event, ev.seq, ev.data);
  if (ev.event === "run_complete" || ev.event === "run_failed" || ev.event === "run_canceled") {
    break;
  }
}

AbortSignal works too — pass { signal } and call controller.abort() to tear the stream down from the caller side.

Factories

Factories are project containers within a team. Each factory has its own workflow templates, secrets, runs, and (optionally) scoped API keys. New projects don't need to think about them — default is auto-created per team and is what the SDK falls back to when factorySlug is omitted.

// CRUD on factories
await client.createFactory({ slug: "ci-bots", name: "CI Bots", description: "…" });
const factories = await client.listFactories();
const f         = await client.getFactory("ci-bots");
await client.updateFactory("ci-bots", { name: "Continuous-Integration Bots" });
await client.deleteFactory("ci-bots");

// Templates list — flat across factories, or scoped to one
const all   = await client.listTemplates();
const scoped = await client.listTemplates({ factorySlug: "ci-bots" });

// Register / invoke / secret operations all accept factorySlug
await client.register({ name: "scrape", source, factorySlug: "ci-bots", … });
await client.invoke("scrape", { url: "…" }, { factorySlug: "ci-bots" });
await client.setSecret("scrape", "GH_TOKEN", "ghp_…", { factorySlug: "ci-bots" });

CLI equivalents: agentc factory list | create | get | update | delete, plus --factory <slug> on every other command.

Per-workflow secrets

Secrets live in GCP Secret Manager, one row per (factory, workflow, key). They're injected as env vars into the runner sandbox at dispatch time, never persisted in the VM. Values are write-only — the API only returns metadata (key, timestamps).

await client.setSecret("my-workflow", "ANTHROPIC_API_KEY", process.env.ANTHROPIC_API_KEY!);
const list = await client.listSecrets("my-workflow"); // [{ key, createdAt, updatedAt }]
await client.deleteSecret("my-workflow", "STALE_KEY");

// Scope to a non-default factory:
await client.setSecret("scrape", "GH_TOKEN", "ghp_…", { factorySlug: "ci-bots" });

Mutations require admin scope.

API keys

Mint and list scoped keys programmatically (requires an admin-scoped caller key). New keys are returned once, in the same response as the metadata — copy the ac_… value immediately.

const created = await client.createApiKey({
  name:      "ci-dispatcher",
  scopes:    ["read", "invoke"],
  expiresAt: new Date(Date.now() + 30 * 86_400_000).toISOString(),  // 30 days
  // factorySlug: "ci-bots"  // optional — scopes the key to a single factory
});
console.log(created.key);   // "ac_…" — the only time you'll see this

const all = await client.listApiKeys();

CLI equivalent: agentc keys create <name> --scopes read,invoke --expires-in 30d.

Usage

const usage = await client.getUsage(
  new Date(Date.now() - 30 * 86_400_000),
  new Date(),
);
// usage.rows: [{ day, runs, sandbox_seconds, … }]

CLI equivalent: agentc usage.

Snapshots (replay-friendly sandboxes)

Long-running workflows can capture the runner sandbox as a Vercel snapshot on success (snapshots: { saveLatest: true }). Other workflows reference that snapshot via snapshots.bootFrom to boot into the same prepared VM (deps installed, repo cloned, etc.) instead of repeating setup.

// Capture per-invocation:
await client.invoke("my-workflow", input, { snapshots: { saveLatest: true } });

// Boot from a captured snapshot — pick the id from `agentc snapshot
// list` or the dashboard snapshots page:
await client.invoke("my-workflow", input, {
  snapshots: { bootFrom: { snapshotId: "snap_…" } },
});

// Set a default at registration time:
defineWorkflow({ run, snapshots: { saveLatest: true } });

// Retain every step's snapshot (not just the latest):
defineWorkflow({ run, snapshots: { saveLatest: true, retainSteps: true } });

// Browse / clean up:
const page = await client.listSnapshotsPage({ workflow: "my-workflow", limit: 50 });
const snaps = page.data;
await client.deleteRunSnapshot(snaps[0].runId, snaps[0].snapshotId);

CLI equivalents: agentc snapshot list / agentc snapshot delete <run-id> <snapshot-id>.

Per-invocation overrides merge field-by-field

The snapshots object on invoke() is merged with the registered template's snapshots config — you can override bootFrom alone without losing saveLatest, or vice versa.

Authentication

The SDK accepts a Bearer API key (ac_…). Mint one from the dashboard: sign in at <server-url>/login, then Settings → API Keys → Create key.

Default scopes (read + invoke) are right for a CI / dispatch caller. Tick admin only if this key needs to register templates, mint other keys, or manage secrets.

const client = new AgentComposeClient(
  process.env.AGENT_COMPOSE_URL!,
  process.env.AGENT_COMPOSE_API_KEY!,
);

The dashboard itself uses the cookie-bound session path; the SDK is for programmatic / server-to-server callers.

Public exports — quick reference

| Export | What | |---|---| | defineWorkflow | Attach metadata to a workflow run function | | defineRuntime | Wrap an agent execution provider as an AgentRuntime | | defineSandboxEnvironment | Sugar for declaring a workflow whose primary purpose is to build a snapshot for others to boot from | | agent / agentLoop | Embed an LLM loop inside a workflow | | runWorkflow | Local engine for running a workflow in-process (test harness) | | bundleWorkflow | Resolve + inline a workflow's runtime sources for registration | | claudeRuntime / createClaudeRuntime / ClaudeRunner | Built-in Claude Code runtime + factory | | AgentComposeClient | HTTP client — register, invoke, cancel, stream logs, factories, snapshots, secrets, API keys, usage | | AgentComposeError | Thrown by every non-2xx HTTP response | | parseAgentStatus / parseAgentResponse / AgentStatusSchema / AgentMessageSchema | Protocol parsers | | parseSseStream | Generic SSE chunk decoder (used by streamRunLogs) | | createSandbox / reconnectSandbox / killAllSandboxes / killSandboxById / getSandboxQuotas / listOwnedSandboxes / deleteSandboxSnapshot | Sandbox-provider helpers (Vercel + E2B) |

Type exports: WorkflowFn, WorkflowCtx, WorkflowDefinition, WorkflowHooks, AgentBudget, AgentRuntime, RuntimeOptions, ModelExecutionContract, McpServerConfig, AgentMessage (and its variants), AgentStatus, RunStatus, RegisterResult, RunEvent, FactoryRow, SnapshotListEntry, ApiKey, ApiKeyCreated, UsageRollupRow, UsageResponse, CancelRunResponse, AgentLoopResult, AgentOpts, SandboxProvider, DesktopSandboxProvider, SandboxNetworkPolicy, SandboxCreateOpts, OwnedSandbox, BundledWorkflow.

For the canonical signatures, follow your IDE's go-to-definition into @agent-compose/sdk — sdk/src/index.ts is the public surface and the files it re-exports from carry full inline docstrings.

Errors

All non-2xx HTTP responses throw AgentComposeError(status, message). The message is the server's { error: string } body when present, falling back to the HTTP status text:

import { AgentComposeError } from "@agent-compose/sdk";

try {
  await client.invoke("missing-workflow");
} catch (err) {
  if (err instanceof AgentComposeError && err.status === 404) {
    // template not registered
  }
}

invokeAndWait throws AgentComposeError(504, …) on timeout for symmetry.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@agent-compose/sdk

Installation

Authoring a workflow

What the workflow can do with ctx

The agent loop

Defining a runtime

Built-in runtimes

Custom runtime

Registering a workflow

Invoking a workflow

Auto parent/child tracing

Cancelling a run

Streaming live logs

Factories

Per-workflow secrets

API keys

Usage

Snapshots (replay-friendly sandboxes)

Per-invocation overrides merge field-by-field

Authentication

Public exports — quick reference

Errors

What the workflow can do with `ctx`

The `agent` loop