tool-sandbox
v1.0.1
Published
Let AI agents write and execute sandboxed code to call tools.
Maintainers
Readme
tool-sandbox
Library for executing code and calling tools in a sandbox. Particularly useful for letting AI agents write and execute code, following the code execution pattern for AI agents.
Why?
When agents call tools directly, every tool definition and intermediate result flows through the context window. This gets expensive fast.
Code execution solves this. The agent writes code that calls tools, which:
- Saves tokens: Load tool definitions on-demand, filter data before returning to the model.
- Enables complex logic: Loops, conditionals, error handling in one execution instead of many tool calls.
- Keeps data private: Intermediate results stay in the execution environment.
- Runs safely: Unlike
eval(), code runs in a WASM sandbox with no filesystem, network, or Node.js access. - Runs anywhere: Works in Node.js, browsers, Deno, Bun, and Cloudflare Workers.
Quick Start
npm install tool-sandboximport {createSandbox, type Tool} from 'tool-sandbox';
const tools: Tool[] = [
{
name: 'listUsers',
description: 'List all users',
inputSchema: {type: 'object'},
handler: async () => [
// In real-life, you'd fetch this from an API
// or plug in an MCP server (see below!)
{email: '[email protected]', active: false},
{email: '[email protected]', active: true},
],
},
{
name: 'sendReactivationEmail',
description: 'Send a reactivation email to a user',
inputSchema: {type: 'object', properties: {to: {type: 'string'}}},
handler: async () => ({sent: true}),
},
];
const sandbox = await createSandbox({tools});
// Code can be generated by an LLM - see "Using with LLMs" below
await sandbox.execute.handler({
code: `
const users = await tool('listUsers', {});
const inactiveUsers = users.filter(u => !u.active);
for (const user of inactiveUsers) {
await tool('sendReactivationEmail', {to: user.email});
}
return {notified: inactiveUsers.length};
`,
});Fetches users, filters to inactive, sends reactivation emails - all in one execution instead of many tool calls.
Using with LLMs
sandbox.execute is a Tool object you can pass to any LLM:
import Anthropic from '@anthropic-ai/sdk';
import {createSandbox, type Tool} from 'tool-sandbox';
const tools: Tool[] = [
{
name: 'getWeather',
description: 'Get weather for a city',
inputSchema: {type: 'object', properties: {city: {type: 'string'}}, required: ['city']},
handler: async (args) => ({temp: 72, conditions: 'sunny'}),
},
];
const anthropic = new Anthropic();
const sandbox = await createSandbox({tools});
const messages: Anthropic.MessageParam[] = [
{role: 'user', content: 'What is the weather in Tokyo and Paris?'},
];
// Agent loop - continue until model stops calling tools
while (true) {
const response = await anthropic.messages.create({
model: 'claude-haiku-4-5',
tools: [{
name: sandbox.execute.name,
description: sandbox.execute.description,
input_schema: sandbox.execute.inputSchema,
}],
messages,
});
messages.push({role: 'assistant', content: response.content});
if (response.stop_reason === 'end_turn') {
const text = response.content.filter((b): b is Anthropic.TextBlock => b.type === 'text');
console.log('Response:', text.map((b) => b.text).join(''));
break;
}
const toolResults: Anthropic.ToolResultBlockParam[] = [];
for (const block of response.content) {
if (block.type === 'tool_use' && block.name === 'execute') {
const result = await sandbox.execute.handler(block.input);
toolResults.push({type: 'tool_result', tool_use_id: block.id, content: JSON.stringify(result)});
}
}
messages.push({role: 'user', content: toolResults});
}import {createSandbox, type Tool} from 'tool-sandbox';
const tools: Tool[] = [
{
name: 'getWeather',
description: 'Get weather for a city',
inputSchema: {type: 'object', properties: {city: {type: 'string'}}, required: ['city']},
handler: async (args) => ({temp: 72, conditions: 'sunny'}),
},
];
const sandbox = await createSandbox({tools});
const messages = [
{role: 'user', content: 'What is the weather in Tokyo?'},
];
while (true) {
const response = await fetch('https://api.anthropic.com/v1/messages', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': process.env.ANTHROPIC_API_KEY,
'anthropic-version': '2023-06-01',
},
body: JSON.stringify({
model: 'claude-haiku-4-5',
tools: [{
name: sandbox.execute.name,
description: sandbox.execute.description,
input_schema: sandbox.execute.inputSchema,
}],
messages,
}),
});
const data = await response.json();
messages.push({role: 'assistant', content: data.content});
if (data.stop_reason === 'end_turn') {
console.log('Response:', data.content.filter((b) => b.type === 'text').map((b) => b.text).join(''));
break;
}
const toolResults = [];
for (const block of data.content) {
if (block.type === 'tool_use' && block.name === 'execute') {
const result = await sandbox.execute.handler(block.input);
toolResults.push({type: 'tool_result', tool_use_id: block.id, content: JSON.stringify(result)});
}
}
messages.push({role: 'user', content: toolResults});
}import OpenAI from 'openai';
import {createSandbox, type Tool} from 'tool-sandbox';
const tools: Tool[] = [
{
name: 'getWeather',
description: 'Get weather for a city',
inputSchema: {type: 'object', properties: {city: {type: 'string'}}, required: ['city']},
handler: async (args) => ({temp: 72, conditions: 'sunny'}),
},
];
const openai = new OpenAI();
const sandbox = await createSandbox({tools});
const messages: OpenAI.ChatCompletionMessageParam[] = [
{role: 'user', content: 'What is the weather in Tokyo and Paris?'},
];
// Agent loop - continue until model stops calling tools
while (true) {
const response = await openai.chat.completions.create({
model: 'gpt-5-mini-2025-08-07',
tools: [{
type: 'function',
function: {
name: sandbox.execute.name,
description: sandbox.execute.description,
parameters: sandbox.execute.inputSchema,
},
}],
messages,
});
const choice = response.choices[0];
messages.push(choice.message);
if (!choice.message.tool_calls) {
console.log('Response:', choice.message.content);
break;
}
for (const toolCall of choice.message.tool_calls) {
if (toolCall.function.name === 'execute') {
const args = JSON.parse(toolCall.function.arguments);
const result = await sandbox.execute.handler(args);
messages.push({role: 'tool', tool_call_id: toolCall.id, content: JSON.stringify(result)});
}
}
}import {createSandbox, type Tool} from 'tool-sandbox';
const tools: Tool[] = [
{
name: 'getWeather',
description: 'Get weather for a city',
inputSchema: {type: 'object', properties: {city: {type: 'string'}}, required: ['city']},
handler: async (args) => ({temp: 72, conditions: 'sunny'}),
},
];
const sandbox = await createSandbox({tools});
const messages = [
{role: 'user', content: 'What is the weather in Tokyo?'},
];
while (true) {
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
},
body: JSON.stringify({
model: 'gpt-5-mini-2025-08-07',
tools: [{
type: 'function',
function: {
name: sandbox.execute.name,
description: sandbox.execute.description,
parameters: sandbox.execute.inputSchema,
},
}],
messages,
}),
});
const data = await response.json();
const choice = data.choices[0];
messages.push(choice.message);
if (!choice.message.tool_calls) {
console.log('Response:', choice.message.content);
break;
}
for (const toolCall of choice.message.tool_calls) {
if (toolCall.function.name === 'execute') {
const args = JSON.parse(toolCall.function.arguments);
const result = await sandbox.execute.handler(args);
messages.push({role: 'tool', tool_call_id: toolCall.id, content: JSON.stringify(result)});
}
}
}import {GoogleGenAI, type Content} from '@google/genai';
import {createSandbox, type Tool} from 'tool-sandbox';
const tools: Tool[] = [
{
name: 'getWeather',
description: 'Get weather for a city',
inputSchema: {type: 'object', properties: {city: {type: 'string'}}, required: ['city']},
handler: async (args) => ({temp: 72, conditions: 'sunny'}),
},
];
const genai = new GoogleGenAI({apiKey: process.env.GEMINI_API_KEY});
const sandbox = await createSandbox({tools});
const contents: Content[] = [
{role: 'user', parts: [{text: 'What is the weather in Tokyo and Paris?'}]},
];
// Agent loop - continue until model stops calling tools
while (true) {
const response = await genai.models.generateContent({
model: 'gemini-3-flash-preview',
contents,
config: {
tools: [{
functionDeclarations: [{
name: sandbox.execute.name,
description: sandbox.execute.description,
parameters: sandbox.execute.inputSchema,
}],
}],
},
});
const calls = response.functionCalls();
if (!calls?.length) {
console.log('Response:', response.text());
break;
}
contents.push({role: 'model', parts: response.candidates?.[0]?.content?.parts ?? []});
const functionResponses = [];
for (const call of calls) {
if (call.name === 'execute') {
const result = await sandbox.execute.handler(call.args);
functionResponses.push({name: call.name, response: result});
}
}
contents.push({role: 'user', parts: functionResponses.map((r) => ({functionResponse: r}))});
}Using with MCP
Convert MCP clients to sandbox tools:
npm install @modelcontextprotocol/sdkimport {createSandbox, fromMcpClients} from 'tool-sandbox';
import {Client} from '@modelcontextprotocol/sdk/client/index.js';
import {StreamableHTTPClientTransport} from '@modelcontextprotocol/sdk/client/streamableHttp.js';
// Connect to an MCP server (e.g. https://github.com/domdomegg/gmail-mcp)
const transport = new StreamableHTTPClientTransport(new URL('http://localhost:3000/mcp'));
const client = new Client({name: 'my-app', version: '1.0.0'});
await client.connect(transport);
const tools = await fromMcpClients({gmail: client});
const sandbox = await createSandbox({tools});fromMcpClients fetches and wraps:
- Tools →
gmail__send,gmail__search, etc. - Prompts (with arguments) →
gmail__prompt__compose - Resources →
gmail__resource__inbox - Resource templates → parameterized resources like
files__resource__filewith{path}
Not supported: sampling, elicitation, roots, notifications, or other advanced MCP features.
Sandbox Environment
Code in the sandbox has access to:
| API | Description |
|-----|-------------|
| tool(name, args) | Call a tool and await its result |
| console.log(...) | Debug output (visible to host) |
| store | Persistent object across executions |
| store._prev | Result from previous execution (read-only) |
Examples:
const _examples = [
// Get a tool's schema on-demand
`const schema = await tool('describe_tool', {name: 'gmail__send'});`,
// Use store._prev to continue from last result
`const updated = store._prev.map(x => x * 2); return updated;`,
// Use store to accumulate across executions
`store.total = (store.total || 0) + 1; return store.total;`,
];Permissions
A couple ways to control what the sandbox can do:
Per-tool-call checks
Use onBeforeToolCall to inspect each tool call and block dangerous ones:
import {createSandbox, type Tool} from 'tool-sandbox';
const tools: Tool[] = [
{
name: 'readFile',
description: 'Read a file',
inputSchema: {type: 'object', properties: {path: {type: 'string'}}},
annotations: {readOnlyHint: true},
handler: async (args) => 'file contents...',
},
{
name: 'deleteFile',
description: 'Delete a file',
inputSchema: {type: 'object', properties: {path: {type: 'string'}}},
handler: async (args) => ({deleted: true}),
},
];
const readonlySandbox = await createSandbox({
tools,
onBeforeToolCall(event) {
const tool = tools.find((t) => t.name === event.toolName);
// Only allow tools marked as read-only
if (!tool?.annotations?.readOnlyHint) {
throw new Error(`Tool ${event.toolName} is not allowed`);
}
},
});Pre-execution review
Review the code before executing it—using a another model, SAST tool, or other logic:
import Anthropic from '@anthropic-ai/sdk';
import {createSandbox, type Tool} from 'tool-sandbox';
const anthropic = new Anthropic();
const sandbox = await createSandbox({tools});
// Get code from the model
const response = await anthropic.messages.create({
model: 'claude-haiku-4-5',
tools: [{
name: sandbox.execute.name,
description: sandbox.execute.description,
input_schema: sandbox.execute.inputSchema,
}],
messages: [{role: 'user', content: userPrompt}],
});
const block = response.content.find((b): b is Anthropic.ToolUseBlock => b.type === 'tool_use');
if (block) {
const {code} = block.input as {code: string};
// Review code somehow e.g. with SAST tool or another AI model
const review = await anthropic.messages.create({
model: 'claude-haiku-4-5',
system: 'Review this code against <policy>. Respond COMPLIANT or NON_COMPLIANT with a brief reason.',
messages: [{role: 'user', content: code}],
});
const verdict = (review.content[0] as Anthropic.TextBlock).text;
if (!verdict.startsWith('COMPLIANT')) {
throw new Error(`Code rejected: ${verdict}`);
}
await sandbox.execute.handler({code});
}Other approaches: allowlists (only include safe tools), SAST tools, capability tokens.
Other Use Cases
import {createSandbox, type Tool} from 'tool-sandbox';
const auditLog: Array<{tool: string; args: unknown; result: unknown}> = [];
const tools: Tool[] = [
{
name: 'getUser',
description: 'Get user by ID',
inputSchema: {type: 'object', properties: {id: {type: 'string'}}},
handler: async (args) => ({id: '1', name: 'Alice'}),
},
];
const sandbox = await createSandbox({
tools,
onToolCallSuccess(event) {
// Could also add onToolCallError to log failures
// or onBeforeToolCall to log before tools are called
auditLog.push({
tool: event.toolName,
args: event.args,
result: event.result,
});
},
});import {createSandbox, type Tool} from 'tool-sandbox';
const cache = new Map<string, unknown>();
const tools: Tool[] = [
{
name: 'expensiveQuery',
description: 'Run an expensive database query',
inputSchema: {type: 'object', properties: {query: {type: 'string'}}},
handler: async (args) => ({rows: []}),
},
];
const sandbox = await createSandbox({
tools,
onBeforeToolCall(event) {
const key = JSON.stringify([event.toolName, event.args]);
if (cache.has(key)) {
event.returnValue = cache.get(key);
}
},
onToolCallSuccess(event) {
const key = JSON.stringify([event.toolName, event.args]);
cache.set(key, event.result);
},
});Replace PII with tokens before results reach the model, then restore real values when the model uses them in tool calls:
import {createSandbox, type Tool} from 'tool-sandbox';
// Bidirectional token store
const realToToken = new Map<string, string>();
const tokenToReal = new Map<string, string>();
let counter = 0;
function tokenize(value: string): string {
if (!realToToken.has(value)) {
const token = `[EMAIL_${++counter}]`;
realToToken.set(value, token);
tokenToReal.set(token, value);
}
return realToToken.get(value)!;
}
function redact(obj: unknown): unknown {
if (typeof obj === 'string') {
return tokenToReal.has(obj) ? obj : obj; // Already a token
}
if (typeof obj !== 'object' || obj === null) return obj;
const result: Record<string, unknown> = {};
for (const [key, value] of Object.entries(obj)) {
if (key.toLowerCase().includes('email') && typeof value === 'string') {
result[key] = tokenize(value);
} else {
result[key] = redact(value);
}
}
return result;
}
function restore(obj: unknown): unknown {
if (typeof obj === 'string') {
return tokenToReal.get(obj) ?? obj;
}
if (typeof obj !== 'object' || obj === null) return obj;
const result: Record<string, unknown> = {};
for (const [key, value] of Object.entries(obj)) {
result[key] = restore(value);
}
return result;
}
const tools: Tool[] = [
{
name: 'getCustomers',
description: 'Get customer list',
inputSchema: {type: 'object'},
handler: async () => [{name: 'Alice', email: '[email protected]'}],
},
{
name: 'sendEmail',
description: 'Send an email',
inputSchema: {type: 'object', properties: {to: {type: 'string'}}},
handler: async (args) => ({sent: true}),
},
];
const sandbox = await createSandbox({
tools,
onBeforeToolCall(event) {
// Restore real emails before calling tools
event.args = restore(event.args);
},
onToolCallSuccess(event) {
// Tokenize emails before returning to model
event.result = redact(event.result);
},
});
// Model sees: [{name: 'Alice', email: '[EMAIL_1]'}]
// Model calls: sendEmail({to: '[EMAIL_1]'})
// Actual call: sendEmail({to: '[email protected]'})API Reference
createSandbox(options)
| Option | Description |
|--------|-------------|
| tools | Tool[] — Tools available in the sandbox |
| onBeforeToolCall | Called before each tool call |
| onToolCallSuccess | Called after successful tool call |
| onToolCallError | Called after failed tool call |
Sandbox
| Property/Method | Description |
|-----------------|-------------|
| execute | Tool object for code execution. Pass to LLM, call .handler({code}) |
| tools | Current tools (read-only) |
| store | Persistent store, shared with sandbox code |
| addTool(tool) | Add a tool at runtime |
| removeTool(name) | Remove a tool by name |
Tool
type Tool = {
name: string;
description?: string;
inputSchema: {type: 'object'; properties?: Record<string, unknown>; required?: string[]};
handler: (args: unknown) => Promise<unknown>;
};Contributing
Pull requests are welcomed on GitHub! To get started:
- Install Git and Node.js
- Clone the repository
- Install dependencies with
npm install - Run
npm run testto run tests - Build with
npm run build
Releases
Versions follow the semantic versioning spec.
To release:
- Use
npm version <major | minor | patch>to bump the version - Run
git push --follow-tagsto push with tags - Wait for GitHub Actions to publish to the NPM registry.
