experimental-ai-sdk-code-mode
v1.0.14
Published
QuickJS-backed code mode tool for AI SDK
Maintainers
Readme
AI SDK Code Mode
experimental-ai-sdk-code-mode provides an AI SDK tool that runs JavaScript or
type-stripped TypeScript in an isolated QuickJS WASM sandbox. It is meant for
agents that need to call several tools, combine their results, run independent
tool calls concurrently, or do structured JSON transformations in one step.
Installation
pnpm add ai experimental-ai-sdk-code-modeThe runtime uses Node.js worker threads and is intended for server-side AI SDK tools, not browser execution.
ai is a peer dependency. The package supports AI SDK 6 stable and AI SDK 7
beta:
pnpm add ai@^6 experimental-ai-sdk-code-mode
pnpm add ai@beta experimental-ai-sdk-code-modeQuick Start
import { generateText, tool } from "ai";
import { z } from "zod";
import { createCodeModeTool } from "experimental-ai-sdk-code-mode";
const search = tool({
description: "Search indexed documents.",
inputSchema: z.object({
query: z.string(),
limit: z.number().int().optional(),
}),
outputSchema: z.object({
results: z.array(
z.object({
id: z.string(),
title: z.string(),
}),
),
}),
execute: async ({ query, limit }) => {
return { results: await searchDocuments(query, limit) };
},
});
const readDocument = tool({
description: "Read a document by id.",
inputSchema: z.object({
id: z.string(),
}),
outputSchema: z.object({
id: z.string(),
title: z.string(),
body: z.string(),
}),
execute: async ({ id }) => {
return await readDocumentById(id);
},
});
const codeMode = createCodeModeTool(
{
search,
readDocument,
},
{
executionPolicy: {
timeoutMs: 30_000,
memoryLimitBytes: 64 * 1024 * 1024,
},
},
);
const result = await generateText({
model,
tools: { codeMode },
prompt: "Search for the latest internal QuickJS notes and summarize them.",
});What The Model Sees
createCodeModeTool(tools) generates the code-mode tool description from the
provided AI SDK tools. The description includes:
- sandbox rules
- whether
fetchis available, plus the configured fetch policy when present - TypeScript call signatures for every provided tool, including return types
when the tool provides an AI SDK
outputSchema - examples for calling tools and returning the final value
For example, if you pass search and readDocument, the model sees guidance
like this in the code-mode tool description:
declare const tools: {
/** Search indexed documents. */
search: (input: {
query: string;
limit?: number;
}) => Promise<{
results: Array<{
id: string;
title: string;
}>;
}>;
/** Read a document by id. */
readDocument: (input: {
id: string;
}) => Promise<{
id: string;
title: string;
body: string;
}>;
};Inside code mode, calls look like normal async JavaScript:
const { results } = await tools.search({ query: "QuickJS", limit: 5 });
const documents = await Promise.all(
results.map((item) => tools.readDocument({ id: item.id })),
);
return {
count: documents.length,
titles: documents.map((doc) => doc.title),
};JSON.parse and JSON.stringify are available in the sandbox. Returned values
and tool inputs/outputs must be JSON-serializable.
API
createCodeModeTool(tools, options?)Returns an AI SDK tool() whose input schema is:
{
js: string;
}The js string is wrapped in an async function, so top-level await and
return are supported:
const first = await tools.search({ query: "sandbox" });
return { first };The package also exports the lower-level runner:
import { runCodeMode } from "experimental-ai-sdk-code-mode";
const output = await runCodeMode({
js: "return await tools.add({ a: 1, b: 2 });",
tools: {
add: tool({
inputSchema: z.object({ a: z.number(), b: z.number() }),
execute: async ({ a, b }) => ({ sum: a + b }),
}),
},
});Options
interface CodeModeOptions {
executionPolicy?: {
timeoutMs?: number;
memoryLimitBytes?: number;
maxStackSizeBytes?: number;
maxResultBytes?: number;
maxSourceBytes?: number;
maxToolInputBytes?: number;
maxToolOutputBytes?: number;
maxBridgeRequests?: number;
maxInFlightBridgeRequests?: number;
};
fetchPolicy?: false | {
fetch?: typeof globalThis.fetch;
allowedOrigins?: string[];
allowedUrlPrefixes?: string[];
allowedMethods?: string[];
maxResponseBytes?: number;
allowRedirects?: boolean;
maxRedirects?: number;
};
approval?: {
mode?: "callback" | "interrupt";
onApprovalRequired?: (request: {
toolName: string;
input: unknown;
toolCallId: string;
}) =>
| "approved"
| "denied"
| { approved: boolean; reason?: string }
| Promise<"approved" | "denied" | { approved: boolean; reason?: string }>;
};
lifecycle?: {
onNestedToolCall?: (event: CodeModeNestedToolCallEvent) => void | Promise<void>;
onNestedToolResult?: (event: CodeModeNestedToolResultEvent) => void | Promise<void>;
onFetchRequest?: (event: CodeModeFetchRequestEvent) => void | Promise<void>;
onFetchResult?: (event: CodeModeFetchResultEvent) => void | Promise<void>;
onInterrupt?: (event: CodeModeInterruptEvent) => void | Promise<void>;
onTrace?: (trace: CodeModeTrace) => void | Promise<void>;
onHookError?: (
error: unknown,
event: CodeModeLifecycleHookErrorEvent,
) => void | Promise<void>;
};
telemetry?: {
isEnabled?: boolean;
tracer?: unknown;
recordInputs?: boolean;
recordOutputs?: boolean;
functionId?: string;
metadata?: Record<string, unknown>;
};
modelOutput?: {
includeNestedToolSummary?: boolean;
includeNestedToolOutputs?: boolean;
includeFetchSummary?: boolean;
maxSummaryEntries?: number;
};
}Defaults:
| Option | Default |
| --- | --- |
| executionPolicy.timeoutMs | 30_000 |
| executionPolicy.memoryLimitBytes | 64 * 1024 * 1024 |
| executionPolicy.maxStackSizeBytes | 2 * 1024 * 1024 |
| executionPolicy.maxResultBytes | 1024 * 1024 |
| executionPolicy.maxSourceBytes | 256 * 1024 |
| executionPolicy.maxToolInputBytes | 1024 * 1024 |
| executionPolicy.maxToolOutputBytes | 4 * 1024 * 1024 |
| executionPolicy.maxBridgeRequests | 256 |
| executionPolicy.maxInFlightBridgeRequests | 32 |
| fetchPolicy | disabled |
| fetchPolicy.maxResponseBytes | 1024 * 1024 |
| fetchPolicy.allowRedirects | false |
| fetchPolicy.maxRedirects | 10 |
| approval.mode | "callback" |
Worker-pool size is process-global. By default, code mode uses a dynamic memory-based limit capped at 32 workers. The default admits at least one active invocation, then only admits another worker when available memory can cover the configured QuickJS memory limit plus runtime overhead. Override it explicitly with:
import { setMaxWorkers } from "experimental-ai-sdk-code-mode";
setMaxWorkers(8);
setMaxWorkers(undefined); // reset to the dynamic memory-based defaultBundled Worker Assets
Normal Node.js usage does not require worker asset configuration. By default, the package starts an inline Node.js worker from a generated data URL with the QuickJS runtime bundle and asyncify WASM bytes embedded in that worker source, so serverless packagers do not need to preserve sibling worker or WASM files next to the bundled package entry.
If an environment disallows data: URL workers, or if you want to ship an
explicit custom worker asset, configure the runtime before starting code-mode
invocations:
import { setCodeModeWorkerUrl } from "experimental-ai-sdk-code-mode";
setCodeModeWorkerUrl(new URL("./code-mode-worker.mjs", import.meta.url));Custom workers must be self-contained. The package does not publish worker or WASM asset subpaths.
Concurrency
Code mode uses a bounded worker pool. Each active invocation checks out one worker and creates a fresh QuickJS module, runtime, and context for that run. When the run completes normally, the worker returns to the idle pool. When a run times out, aborts, or the worker fails, that worker is retired and replaced on a future invocation.
The worker boundary is intentional. QuickJS can suspend while host tools execute, so each active invocation still needs an independent asyncified QuickJS/WASM instance. Workers also give the host a hard termination boundary for runaway code; instantiating multiple WASM modules in the main thread would preserve asyncify independence, but it would not provide the same event-loop isolation or reliable timeout kill path.
Tool calls inside one sandbox can also run concurrently:
const [profile, invoices, tickets] = await Promise.all([
tools.getProfile({ userId }),
tools.listInvoices({ userId }),
tools.listTickets({ userId }),
]);
return { profile, invoices, tickets };Use setMaxWorkers to cap the number of active pooled workers. When the limit
is reached, new invocations fail with CodeModeConcurrencyError. The slot stays
occupied until the sandbox result and any accepted host bridge work have settled
or observed abort, so detached host work cannot silently outlive accounting.
Every tool or fetch promise created inside code mode must be awaited or otherwise
handled before returning. An unawaited bridge call fails with
CodeModeDetachedBridgeRequestError; an observed bridge call that is still
pending when the script returns is aborted and also fails the invocation.
maxBridgeRequests limits total bridge calls per invocation, and
maxInFlightBridgeRequests limits concurrent tool/fetch calls inside one
sandbox.
Tool Semantics
Nested tool calls preserve the important AI SDK behavior:
- tool inputs are validated against each tool's
inputSchema executereceives forwardedToolExecutionOptions, including abort signals- nested calls get derived
toolCallIdvalues for tracing - thrown tool errors are propagated
- async iterable tool outputs are consumed and the final output is returned
- tools without
executeare rejected - unknown tools fail clearly
Only top-level tool names are intended for the public API:
await tools.search({ query: "..." });Observability
Code mode exposes nested bridge activity without parsing generated code or final tool results. Lifecycle hooks fire for nested tool calls, nested tool results, fetch requests, fetch results, interrupts, and the final per-invocation trace:
const codeMode = createCodeModeTool(tools, {
lifecycle: {
onNestedToolCall: (event) => {
console.log(event.toolName, event.toolCallId);
},
onTrace: (trace) => {
console.log(trace.status, trace.bridgeRequests.length);
},
},
});Lifecycle hook errors are isolated from sandbox execution. Provide
lifecycle.onHookError if hook failures should be recorded.
For OpenTelemetry, pass an OTEL-compatible tracer:
const codeMode = createCodeModeTool(tools, {
telemetry: {
isEnabled: true,
tracer,
functionId: "agent.code_mode",
metadata: { runtime: "ash" },
},
});This emits spans for the outer code-mode invocation and each nested tool/fetch
bridge request. Raw source, inputs, and outputs are not recorded; telemetry
attributes include names, ids, status, replay flags, and byte sizes. Set
recordInputs: false or recordOutputs: false to omit size attributes for
those directions.
To expose nested bridge activity to the model, enable the AI SDK
toModelOutput mapping:
const codeMode = createCodeModeTool(tools, {
modelOutput: {
includeNestedToolSummary: true,
includeNestedToolOutputs: true,
includeFetchSummary: true,
},
});Completed code_mode results remain unchanged for host code, but the
model-visible tool output becomes:
{
result: { foo, barId: { id: bar.id } },
nestedTools: [
{
kind: "tool",
toolName: "getBar",
toolCallId: "call_1:tool-1",
status: "fulfilled",
replayed: false,
output: { type: "json", value: { id: bar.id } },
},
],
}Inputs are not included in this model-visible summary. Nested outputs are only
included when includeNestedToolOutputs is enabled. When a nested tool defines
AI SDK toModelOutput, code mode uses it; otherwise it applies AI SDK's
default text/json tool-output mapping. Interruption
results are not wrapped, so approval and host-interrupt continuation helpers can
still find the pending continuation.
Model-visible summaries are bound to the specific code_mode execution that
created them. If the runtime cannot bind a summary to the current invocation, it
returns the normal output with an empty summary rather than using a stale or
shared trace.
Host Interrupts
Host tools can pause code mode for external work that is not approval, such as
connection OAuth. Call requestCodeModeInterrupt from inside the host tool and
store the returned CodeModeInterrupt in your session state.
For example, a connection-backed tool can interrupt with
{ kind: "connection-auth", ... }, let the host start and complete the OAuth
flow, then resume the same code-mode invocation with
continueCodeModeInterrupt. The replay ledger prevents already-completed tool
and fetch calls from running again before the interrupted tool receives the
OAuth resolution.
import {
continueCodeModeInterrupt,
createCodeModeTool,
isCodeModeInterrupt,
replaceCodeModeInterruptResult,
requestCodeModeInterrupt,
unwrapCodeModeResult,
type CodeModeToolExecutionOptions,
} from "experimental-ai-sdk-code-mode";
const tools = {
connectionTool: tool({
inputSchema: z.object({ connectionId: z.string() }),
execute: async ({ connectionId }, options) => {
const { codeModeInterrupt } = options as CodeModeToolExecutionOptions;
if (codeModeInterrupt === undefined) {
requestCodeModeInterrupt({
kind: "connection-auth",
connectionId,
scopes: ["read:items"],
});
}
return fetchWithConnection({
connectionId,
token: codeModeInterrupt.resolution.token,
});
},
}),
};
const codeMode = createCodeModeTool(tools);
const result = await codeMode.execute?.(
{
js: `
const response = await tools.connectionTool({ connectionId: "conn_1" });
return { id: response.id, title: response.title };
`,
},
{ toolCallId: "call_1", messages },
);
const normalized = unwrapCodeModeResult(result);
if (normalized.status === "interrupted" && isCodeModeInterrupt(normalized.interrupt)) {
session.state.codeMode = normalized.interrupt;
// Start and complete OAuth using normalized.interrupt.payload.
}
const finalOutput = await continueCodeModeInterrupt({
interrupt: storedInterrupt,
resolution: { token: oauthToken },
tools,
});
messages = replaceCodeModeInterruptResult(messages, storedInterrupt, finalOutput);CodeModeInterrupt is a JSON-serializable record describing the paused nested
call (interruptId, toolName, toolCallId, outerToolCallId, input,
payload) plus an opaque, host-signed continuation replay capability. Persist
the whole interruption and pass it back to continueCodeModeInterrupt to resume.
Generic interruptions do not synthesize AI SDK approval messages;
approval-specific helpers below still do.
Approval
Code mode preserves AI SDK approval semantics for nested host tools. If sandbox
code calls a tool with needsApproval: true, approval is requested for that
inner tool name and input, not for the outer code_mode call.
There are two approval modes.
Callback Approval
Callback approval is the default mode. It is useful when the host can decide synchronously or asynchronously during the same code-mode invocation.
Without an approval callback, an approval-required nested tool fails with
CodeModeToolApprovalRequiredError:
const codeMode = createCodeModeTool({
deleteFile: tool({
inputSchema: z.object({ path: z.string() }),
needsApproval: true,
execute: async ({ path }) => deleteFile(path),
}),
});Provide approval.onApprovalRequired to approve or deny before the nested tool
executes:
const codeMode = createCodeModeTool(tools, {
approval: {
onApprovalRequired: async ({ toolName, input, toolCallId }) => {
const approved = await askUserForApproval({ toolName, input, toolCallId });
return approved ? "approved" : { approved: false, reason: "User denied" };
},
},
});If the callback denies approval, the invocation fails with
CodeModeToolApprovalDeniedError.
AI SDK Approval Flow
Use interrupt approval when you want to plug into an existing AI SDK or Ash human-in-the-loop approval flow.
const codeMode = createCodeModeTool(tools, {
approval: {
mode: "interrupt",
},
});In interrupt mode, an approval-required nested tool returns a
CodeModeApprovalInterrupt instead of executing. Approval is built on the
generic host-interrupt machinery: a CodeModeApprovalInterrupt is a
CodeModeInterrupt whose payload kind is the reserved
"ai-sdk-code-mode/tool-approval". It exposes the inner tool name/input, an
interruptId (used as the AI SDK approval id), and the opaque continuation.
Store the interrupt by interruptId; the continuation is host state and should
not be reconstructed from model-visible messages.
Approval responses are runtime-validated. getCodeModeApprovalResponse ignores
malformed approval response parts, including non-boolean approved values, and
continueCodeModeApproval rejects malformed responses before replay.
The flow is:
- The model calls
code_mode. - Sandbox code calls an approval-required nested tool.
- Code mode returns
CodeModeApprovalInterrupt. toCodeModeApprovalMessages(interrupt)creates AI SDK approval message parts for the original nested tool.- Your approval UI records a
tool-approval-response. getCodeModeApprovalResponse(messages, interrupt)reads that response.continueCodeModeApproval(...)restarts the same code with the stored continuation.
import type { ModelMessage } from "ai";
import {
continueCodeModeApproval,
createCodeModeTool,
getCodeModeApprovalResponse,
isCodeModeApprovalInterrupt,
toCodeModeApprovalMessages,
type CodeModeApprovalInterrupt,
} from "experimental-ai-sdk-code-mode";
const codeMode = createCodeModeTool(tools, {
approval: {
mode: "interrupt",
},
});
const pendingApprovals = new Map<string, CodeModeApprovalInterrupt>();
const messages: ModelMessage[] = [];
const result = await codeMode.execute?.(
{
js: `
const file = await tools.readFile({ path: "notes.md" });
await tools.deleteFile({ path: "notes.md" });
return { deleted: true, file };
`,
},
{
toolCallId: "call_1",
messages,
},
);
if (isCodeModeApprovalInterrupt(result)) {
pendingApprovals.set(result.interruptId, result);
messages.push(...toCodeModeApprovalMessages(result));
// Render the approval request with your AI SDK/Ash approval UI.
}
// Later, after the UI appends a tool-approval-response message. The AI SDK
// approval id is the interrupt id of the stored approval interrupt:
async function continueAfterApproval(approvalId: string) {
const interrupt = pendingApprovals.get(approvalId);
if (interrupt === undefined) {
throw new Error(`Unknown approval: ${approvalId}`);
}
const approvalResponse = getCodeModeApprovalResponse(messages, interrupt);
if (approvalResponse === undefined) {
throw new Error(`Approval response is still pending: ${approvalId}`);
}
const output = await continueCodeModeApproval({
interrupt,
approvalResponse,
tools,
});
pendingApprovals.delete(interrupt.interruptId);
return output;
}toCodeModeApprovalMessages exposes the approval as the original inner tool
name and input. The user approves deleteFile, not code_mode.
Continuation Replay
Approval and generic interruption continuation use restart-and-replay. Code mode
restarts the same program and replays the recorded bridge ledger so
already-completed tool and fetch calls are not repeated. If replayed code does
not issue the same bridge calls in the same order, continuation fails with
CodeModeProtocolError instead of guessing.
Continuations also replay deterministic guest state for no-argument Date,
Date.now(), and Math.random(). After each completed async host bridge call,
the guest clock is reset to the recorded host timestamp for that call. WebCrypto
and performance are not exposed in the sandbox.
Continuation and interruption objects are signed bearer capabilities. A sandboxed program can return JSON shaped like a code-mode interrupt, but helper APIs only treat host-signed continuations as resumable. Continuations expire after one hour by default.
The default signing key is random and process-local. Hosts that need continuations to survive process restarts must configure a stable secret before creating or resuming continuations:
import { setCodeModeContinuationSigningKey } from "experimental-ai-sdk-code-mode";
setCodeModeContinuationSigningKey(process.env.CODE_MODE_CONTINUATION_KEY);The secret is used to authenticate the source, replay ledger, deterministic
state, interrupt ids, tool names, and tool inputs recorded in the continuation.
Mutating any signed field causes replay to fail with CodeModeProtocolError.
Fetch
fetch is not available by default. Enable it by passing a host fetch function
and an allow policy:
const codeMode = createCodeModeTool(tools, {
fetchPolicy: {
fetch: globalThis.fetch,
allowedOrigins: ["https://api.example.com"],
allowedMethods: ["GET", "POST"],
maxResponseBytes: 256 * 1024,
},
});Fetch policy rules:
- URLs must be
http:orhttps: - the original URL and final response URL must match
allowedOriginsorallowedUrlPrefixes allowedUrlPrefixesentries are origin plus path prefixes only; query strings and fragments in configured prefixes are rejected- allowed methods default to
GETandHEAD - redirects are not followed unless
allowRedirectsistrue; when enabled, code mode follows each redirect with another host fetch that is subject to the same fetch policy - response bodies are size-limited while streaming where the host
Responseexposes a readable body, and always before they enter the sandbox
The sandbox fetch response supports ok, status, statusText, url,
headers.get(), headers.entries(), text(), json(), and arrayBuffer().
Isolation And Security
Every invocation gets a fresh global scope. The sandbox disables or omits common host escape hatches:
evalFunction- Node globals such as
process,require, andmodule - module loading
- host filesystem access
The runtime also applies source-size, memory, stack, timeout, result-size, tool-input-size, tool-output-size, bridge-count, bridge-concurrency, and fetch-response-size limits.
Treat the sandbox as defense in depth. Any capability you expose through tools
or fetch is available to generated code, so keep tools narrow and validate
their inputs.
TypeScript
Code mode strips TypeScript syntax before execution. This is type stripping only; it is not a full TypeScript compiler. TypeScript types are accepted for model ergonomics, but the sandbox executes JavaScript.
Errors
The package exports these error classes:
CodeModeError
CodeModeTimeoutError
CodeModeAbortedError
CodeModeConcurrencyError
CodeModeSourceTooLargeError
CodeModeBridgeLimitError
CodeModeDetachedBridgeRequestError
CodeModeProtocolError
CodeModeToolError
CodeModeToolApprovalRequiredError
CodeModeToolApprovalDeniedError
CodeModeFetchErrorAll code-mode-specific errors include a code string and may include details
for debugging.
Errors thrown by host tools or host fetch implementations are sanitized before
they cross into sandboxed code. Sandboxed code can read a safe name, message,
and code, but not host stack traces or diagnostic details. Full diagnostics
remain available to host lifecycle hooks, traces, and telemetry.
Development
pnpm install
pnpm format:check
pnpm lint
pnpm typecheck
pnpm test
pnpm validate
pnpm pack:checkpnpm lint runs Biome and Knip. pnpm format applies Biome formatting, import
sorting, and safe fixes.
The test suite covers core execution, generated prompts, tool bridging, approvals, fetch, exceptions, sandbox hardening, worker concurrency, and concurrent tool calls within one worker.
End-to-end tests use the Vercel AI Gateway with
anthropic/claude-haiku-4.5 and are not part of pnpm test or
pnpm validate. They write the model-generated code-mode programs to
code-samples/*.ts. Set AI_GATEWAY_API_KEY, then run:
pnpm test:e2eRelease
The GitHub PR workflow runs pnpm validate and pnpm pack:check. It does not
run e2e tests because those require AI Gateway credentials.
npm publishing is handled by .github/workflows/release.yml when a GitHub
release is published. Before the first release, configure npm trusted publishing
for:
- package:
experimental-ai-sdk-code-mode - owner/repository:
vercel-labs/ai-sdk-code-mode - workflow filename:
release.yml - environment:
npm - allowed action:
npm publish
To release, update package.json to the target version, create a matching
GitHub release tag like v1.0.1, and publish the GitHub release. The workflow
checks that the tag matches the package version, validates the project, checks
the npm package contents, and publishes to npm. GitHub prereleases publish with
the next npm tag; normal releases publish with latest.
Benchmark
pnpm benchThe benchmark in benchmark/three-roundtrips.mjs measures a minimal script that
does three sequential sandbox-to-host tool round trips and no meaningful
compute. Use BENCH_WARMUP and BENCH_ITERATIONS to adjust run length:
BENCH_WARMUP=50 BENCH_ITERATIONS=1000 pnpm benchLicense
MIT License. Copyright (c) 2026 Vercel Inc.
