tool-sandbox

v1.0.1

Published

a month ago

Let AI agents write and execute sandboxed code to call tools.

0High
0Medium
0Low

sandbox code-execution mcp model-context-protocol llm ai-agent code-interpreter quickjs tool-use wasm claude openai anthropic ai

tool-sandbox

Library for executing code and calling tools in a sandbox. Particularly useful for letting AI agents write and execute code, following the code execution pattern for AI agents.

Why?

When agents call tools directly, every tool definition and intermediate result flows through the context window. This gets expensive fast.

Code execution solves this. The agent writes code that calls tools, which:

Saves tokens: Load tool definitions on-demand, filter data before returning to the model.
Enables complex logic: Loops, conditionals, error handling in one execution instead of many tool calls.
Keeps data private: Intermediate results stay in the execution environment.
Runs safely: Unlike eval(), code runs in a WASM sandbox with no filesystem, network, or Node.js access.
Runs anywhere: Works in Node.js, browsers, Deno, Bun, and Cloudflare Workers.

Quick Start

npm install tool-sandbox

import {createSandbox, type Tool} from 'tool-sandbox';

const tools: Tool[] = [
  {
    name: 'listUsers',
    description: 'List all users',
    inputSchema: {type: 'object'},
    handler: async () => [
	  // In real-life, you'd fetch this from an API
	  // or plug in an MCP server (see below!)
      {email: '[email protected]', active: false},
      {email: '[email protected]', active: true},
    ],
  },
  {
    name: 'sendReactivationEmail',
    description: 'Send a reactivation email to a user',
    inputSchema: {type: 'object', properties: {to: {type: 'string'}}},
    handler: async () => ({sent: true}),
  },
];

const sandbox = await createSandbox({tools});

// Code can be generated by an LLM - see "Using with LLMs" below
await sandbox.execute.handler({
  code: `
    const users = await tool('listUsers', {});
    const inactiveUsers = users.filter(u => !u.active);

    for (const user of inactiveUsers) {
      await tool('sendReactivationEmail', {to: user.email});
    }

    return {notified: inactiveUsers.length};
  `,
});

Fetches users, filters to inactive, sends reactivation emails - all in one execution instead of many tool calls.

Using with LLMs

sandbox.execute is a Tool object you can pass to any LLM:

import Anthropic from '@anthropic-ai/sdk';
import {createSandbox, type Tool} from 'tool-sandbox';

const tools: Tool[] = [
	{
		name: 'getWeather',
		description: 'Get weather for a city',
		inputSchema: {type: 'object', properties: {city: {type: 'string'}}, required: ['city']},
		handler: async (args) => ({temp: 72, conditions: 'sunny'}),
	},
];

const anthropic = new Anthropic();
const sandbox = await createSandbox({tools});

const messages: Anthropic.MessageParam[] = [
	{role: 'user', content: 'What is the weather in Tokyo and Paris?'},
];

// Agent loop - continue until model stops calling tools
while (true) {
	const response = await anthropic.messages.create({
		model: 'claude-haiku-4-5',
		tools: [{
			name: sandbox.execute.name,
			description: sandbox.execute.description,
			input_schema: sandbox.execute.inputSchema,
		}],
		messages,
	});

	messages.push({role: 'assistant', content: response.content});

	if (response.stop_reason === 'end_turn') {
		const text = response.content.filter((b): b is Anthropic.TextBlock => b.type === 'text');
		console.log('Response:', text.map((b) => b.text).join(''));
		break;
	}

	const toolResults: Anthropic.ToolResultBlockParam[] = [];
	for (const block of response.content) {
		if (block.type === 'tool_use' && block.name === 'execute') {
			const result = await sandbox.execute.handler(block.input);
			toolResults.push({type: 'tool_result', tool_use_id: block.id, content: JSON.stringify(result)});
		}
	}
	messages.push({role: 'user', content: toolResults});
}

import {createSandbox, type Tool} from 'tool-sandbox';

const tools: Tool[] = [
	{
		name: 'getWeather',
		description: 'Get weather for a city',
		inputSchema: {type: 'object', properties: {city: {type: 'string'}}, required: ['city']},
		handler: async (args) => ({temp: 72, conditions: 'sunny'}),
	},
];

const sandbox = await createSandbox({tools});

const messages = [
	{role: 'user', content: 'What is the weather in Tokyo?'},
];

while (true) {
	const response = await fetch('https://api.anthropic.com/v1/messages', {
		method: 'POST',
		headers: {
			'Content-Type': 'application/json',
			'x-api-key': process.env.ANTHROPIC_API_KEY,
			'anthropic-version': '2023-06-01',
		},
		body: JSON.stringify({
			model: 'claude-haiku-4-5',
			tools: [{
				name: sandbox.execute.name,
				description: sandbox.execute.description,
				input_schema: sandbox.execute.inputSchema,
			}],
			messages,
		}),
	});

	const data = await response.json();
	messages.push({role: 'assistant', content: data.content});

	if (data.stop_reason === 'end_turn') {
		console.log('Response:', data.content.filter((b) => b.type === 'text').map((b) => b.text).join(''));
		break;
	}

	const toolResults = [];
	for (const block of data.content) {
		if (block.type === 'tool_use' && block.name === 'execute') {
			const result = await sandbox.execute.handler(block.input);
			toolResults.push({type: 'tool_result', tool_use_id: block.id, content: JSON.stringify(result)});
		}
	}
	messages.push({role: 'user', content: toolResults});
}

import OpenAI from 'openai';
import {createSandbox, type Tool} from 'tool-sandbox';

const tools: Tool[] = [
	{
		name: 'getWeather',
		description: 'Get weather for a city',
		inputSchema: {type: 'object', properties: {city: {type: 'string'}}, required: ['city']},
		handler: async (args) => ({temp: 72, conditions: 'sunny'}),
	},
];

const openai = new OpenAI();
const sandbox = await createSandbox({tools});

const messages: OpenAI.ChatCompletionMessageParam[] = [
	{role: 'user', content: 'What is the weather in Tokyo and Paris?'},
];

// Agent loop - continue until model stops calling tools
while (true) {
	const response = await openai.chat.completions.create({
		model: 'gpt-5-mini-2025-08-07',
		tools: [{
			type: 'function',
			function: {
				name: sandbox.execute.name,
				description: sandbox.execute.description,
				parameters: sandbox.execute.inputSchema,
			},
		}],
		messages,
	});

	const choice = response.choices[0];
	messages.push(choice.message);

	if (!choice.message.tool_calls) {
		console.log('Response:', choice.message.content);
		break;
	}

	for (const toolCall of choice.message.tool_calls) {
		if (toolCall.function.name === 'execute') {
			const args = JSON.parse(toolCall.function.arguments);
			const result = await sandbox.execute.handler(args);
			messages.push({role: 'tool', tool_call_id: toolCall.id, content: JSON.stringify(result)});
		}
	}
}

import {createSandbox, type Tool} from 'tool-sandbox';

const tools: Tool[] = [
	{
		name: 'getWeather',
		description: 'Get weather for a city',
		inputSchema: {type: 'object', properties: {city: {type: 'string'}}, required: ['city']},
		handler: async (args) => ({temp: 72, conditions: 'sunny'}),
	},
];

const sandbox = await createSandbox({tools});

const messages = [
	{role: 'user', content: 'What is the weather in Tokyo?'},
];

while (true) {
	const response = await fetch('https://api.openai.com/v1/chat/completions', {
		method: 'POST',
		headers: {
			'Content-Type': 'application/json',
			'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
		},
		body: JSON.stringify({
			model: 'gpt-5-mini-2025-08-07',
			tools: [{
				type: 'function',
				function: {
					name: sandbox.execute.name,
					description: sandbox.execute.description,
					parameters: sandbox.execute.inputSchema,
				},
			}],
			messages,
		}),
	});

	const data = await response.json();
	const choice = data.choices[0];
	messages.push(choice.message);

	if (!choice.message.tool_calls) {
		console.log('Response:', choice.message.content);
		break;
	}

	for (const toolCall of choice.message.tool_calls) {
		if (toolCall.function.name === 'execute') {
			const args = JSON.parse(toolCall.function.arguments);
			const result = await sandbox.execute.handler(args);
			messages.push({role: 'tool', tool_call_id: toolCall.id, content: JSON.stringify(result)});
		}
	}
}

import {GoogleGenAI, type Content} from '@google/genai';
import {createSandbox, type Tool} from 'tool-sandbox';

const tools: Tool[] = [
	{
		name: 'getWeather',
		description: 'Get weather for a city',
		inputSchema: {type: 'object', properties: {city: {type: 'string'}}, required: ['city']},
		handler: async (args) => ({temp: 72, conditions: 'sunny'}),
	},
];

const genai = new GoogleGenAI({apiKey: process.env.GEMINI_API_KEY});
const sandbox = await createSandbox({tools});

const contents: Content[] = [
	{role: 'user', parts: [{text: 'What is the weather in Tokyo and Paris?'}]},
];

// Agent loop - continue until model stops calling tools
while (true) {
	const response = await genai.models.generateContent({
		model: 'gemini-3-flash-preview',
		contents,
		config: {
			tools: [{
				functionDeclarations: [{
					name: sandbox.execute.name,
					description: sandbox.execute.description,
					parameters: sandbox.execute.inputSchema,
				}],
			}],
		},
	});

	const calls = response.functionCalls();
	if (!calls?.length) {
		console.log('Response:', response.text());
		break;
	}

	contents.push({role: 'model', parts: response.candidates?.[0]?.content?.parts ?? []});

	const functionResponses = [];
	for (const call of calls) {
		if (call.name === 'execute') {
			const result = await sandbox.execute.handler(call.args);
			functionResponses.push({name: call.name, response: result});
		}
	}
	contents.push({role: 'user', parts: functionResponses.map((r) => ({functionResponse: r}))});
}

Using with MCP

Convert MCP clients to sandbox tools:

npm install @modelcontextprotocol/sdk

import {createSandbox, fromMcpClients} from 'tool-sandbox';
import {Client} from '@modelcontextprotocol/sdk/client/index.js';
import {StreamableHTTPClientTransport} from '@modelcontextprotocol/sdk/client/streamableHttp.js';

// Connect to an MCP server (e.g. https://github.com/domdomegg/gmail-mcp)
const transport = new StreamableHTTPClientTransport(new URL('http://localhost:3000/mcp'));
const client = new Client({name: 'my-app', version: '1.0.0'});
await client.connect(transport);

const tools = await fromMcpClients({gmail: client});
const sandbox = await createSandbox({tools});

fromMcpClients fetches and wraps:

Tools → gmail__send, gmail__search, etc.
Prompts (with arguments) → gmail__prompt__compose
Resources → gmail__resource__inbox
Resource templates → parameterized resources like files__resource__file with {path}

Not supported: sampling, elicitation, roots, notifications, or other advanced MCP features.

Sandbox Environment

Code in the sandbox has access to:

| API | Description | |-----|-------------| | tool(name, args) | Call a tool and await its result | | console.log(...) | Debug output (visible to host) | | store | Persistent object across executions | | store._prev | Result from previous execution (read-only) |

Examples:

const _examples = [
  // Get a tool's schema on-demand
  `const schema = await tool('describe_tool', {name: 'gmail__send'});`,

  // Use store._prev to continue from last result
  `const updated = store._prev.map(x => x * 2); return updated;`,

  // Use store to accumulate across executions
  `store.total = (store.total || 0) + 1; return store.total;`,
];

Permissions

A couple ways to control what the sandbox can do:

Per-tool-call checks

Use onBeforeToolCall to inspect each tool call and block dangerous ones:

import {createSandbox, type Tool} from 'tool-sandbox';

const tools: Tool[] = [
  {
    name: 'readFile',
    description: 'Read a file',
    inputSchema: {type: 'object', properties: {path: {type: 'string'}}},
    annotations: {readOnlyHint: true},
    handler: async (args) => 'file contents...',
  },
  {
    name: 'deleteFile',
    description: 'Delete a file',
    inputSchema: {type: 'object', properties: {path: {type: 'string'}}},
    handler: async (args) => ({deleted: true}),
  },
];

const readonlySandbox = await createSandbox({
  tools,
  onBeforeToolCall(event) {
    const tool = tools.find((t) => t.name === event.toolName);
    // Only allow tools marked as read-only
    if (!tool?.annotations?.readOnlyHint) {
      throw new Error(`Tool ${event.toolName} is not allowed`);
    }
  },
});

Pre-execution review

Review the code before executing it—using a another model, SAST tool, or other logic:

import Anthropic from '@anthropic-ai/sdk';
import {createSandbox, type Tool} from 'tool-sandbox';

const anthropic = new Anthropic();
const sandbox = await createSandbox({tools});

// Get code from the model
const response = await anthropic.messages.create({
  model: 'claude-haiku-4-5',
  tools: [{
    name: sandbox.execute.name,
    description: sandbox.execute.description,
    input_schema: sandbox.execute.inputSchema,
  }],
  messages: [{role: 'user', content: userPrompt}],
});

const block = response.content.find((b): b is Anthropic.ToolUseBlock => b.type === 'tool_use');
if (block) {
  const {code} = block.input as {code: string};

  // Review code somehow e.g. with SAST tool or another AI model
  const review = await anthropic.messages.create({
    model: 'claude-haiku-4-5',
    system: 'Review this code against <policy>. Respond COMPLIANT or NON_COMPLIANT with a brief reason.',
    messages: [{role: 'user', content: code}],
  });

  const verdict = (review.content[0] as Anthropic.TextBlock).text;
  if (!verdict.startsWith('COMPLIANT')) {
    throw new Error(`Code rejected: ${verdict}`);
  }

  await sandbox.execute.handler({code});
}

Other approaches: allowlists (only include safe tools), SAST tools, capability tokens.

Other Use Cases

import {createSandbox, type Tool} from 'tool-sandbox';

const auditLog: Array<{tool: string; args: unknown; result: unknown}> = [];

const tools: Tool[] = [
  {
    name: 'getUser',
    description: 'Get user by ID',
    inputSchema: {type: 'object', properties: {id: {type: 'string'}}},
    handler: async (args) => ({id: '1', name: 'Alice'}),
  },
];

const sandbox = await createSandbox({
  tools,
  onToolCallSuccess(event) {
	// Could also add onToolCallError to log failures
	// or onBeforeToolCall to log before tools are called
    auditLog.push({
      tool: event.toolName,
      args: event.args,
      result: event.result,
    });
  },
});

import {createSandbox, type Tool} from 'tool-sandbox';

const cache = new Map<string, unknown>();

const tools: Tool[] = [
  {
    name: 'expensiveQuery',
    description: 'Run an expensive database query',
    inputSchema: {type: 'object', properties: {query: {type: 'string'}}},
    handler: async (args) => ({rows: []}),
  },
];

const sandbox = await createSandbox({
  tools,
  onBeforeToolCall(event) {
    const key = JSON.stringify([event.toolName, event.args]);
    if (cache.has(key)) {
      event.returnValue = cache.get(key);
    }
  },
  onToolCallSuccess(event) {
    const key = JSON.stringify([event.toolName, event.args]);
    cache.set(key, event.result);
  },
});

Replace PII with tokens before results reach the model, then restore real values when the model uses them in tool calls:

import {createSandbox, type Tool} from 'tool-sandbox';

// Bidirectional token store
const realToToken = new Map<string, string>();
const tokenToReal = new Map<string, string>();
let counter = 0;

function tokenize(value: string): string {
  if (!realToToken.has(value)) {
    const token = `[EMAIL_${++counter}]`;
    realToToken.set(value, token);
    tokenToReal.set(token, value);
  }
  return realToToken.get(value)!;
}

function redact(obj: unknown): unknown {
  if (typeof obj === 'string') {
    return tokenToReal.has(obj) ? obj : obj; // Already a token
  }
  if (typeof obj !== 'object' || obj === null) return obj;
  const result: Record<string, unknown> = {};
  for (const [key, value] of Object.entries(obj)) {
    if (key.toLowerCase().includes('email') && typeof value === 'string') {
      result[key] = tokenize(value);
    } else {
      result[key] = redact(value);
    }
  }
  return result;
}

function restore(obj: unknown): unknown {
  if (typeof obj === 'string') {
    return tokenToReal.get(obj) ?? obj;
  }
  if (typeof obj !== 'object' || obj === null) return obj;
  const result: Record<string, unknown> = {};
  for (const [key, value] of Object.entries(obj)) {
    result[key] = restore(value);
  }
  return result;
}

const tools: Tool[] = [
  {
    name: 'getCustomers',
    description: 'Get customer list',
    inputSchema: {type: 'object'},
    handler: async () => [{name: 'Alice', email: '[email protected]'}],
  },
  {
    name: 'sendEmail',
    description: 'Send an email',
    inputSchema: {type: 'object', properties: {to: {type: 'string'}}},
    handler: async (args) => ({sent: true}),
  },
];

const sandbox = await createSandbox({
  tools,
  onBeforeToolCall(event) {
    // Restore real emails before calling tools
    event.args = restore(event.args);
  },
  onToolCallSuccess(event) {
    // Tokenize emails before returning to model
    event.result = redact(event.result);
  },
});

// Model sees: [{name: 'Alice', email: '[EMAIL_1]'}]
// Model calls: sendEmail({to: '[EMAIL_1]'})
// Actual call: sendEmail({to: '[email protected]'})

API Reference

`createSandbox(options)`

| Option | Description | |--------|-------------| | tools | Tool[] — Tools available in the sandbox | | onBeforeToolCall | Called before each tool call | | onToolCallSuccess | Called after successful tool call | | onToolCallError | Called after failed tool call |

Sandbox

| Property/Method | Description | |-----------------|-------------| | execute | Tool object for code execution. Pass to LLM, call .handler({code}) | | tools | Current tools (read-only) | | store | Persistent store, shared with sandbox code | | addTool(tool) | Add a tool at runtime | | removeTool(name) | Remove a tool by name |

Tool

type Tool = {
	name: string;
	description?: string;
	inputSchema: {type: 'object'; properties?: Record<string, unknown>; required?: string[]};
	handler: (args: unknown) => Promise<unknown>;
};

Contributing

Pull requests are welcomed on GitHub! To get started:

Install Git and Node.js
Clone the repository
Install dependencies with npm install
Run npm run test to run tests
Build with npm run build

Releases

Versions follow the semantic versioning spec.

To release:

Use npm version <major | minor | patch> to bump the version
Run git push --follow-tags to push with tags
Wait for GitHub Actions to publish to the NPM registry.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

tool-sandbox

Why?

Quick Start

Using with LLMs

Using with MCP

Sandbox Environment

Permissions

Per-tool-call checks

Pre-execution review

Other Use Cases

API Reference

createSandbox(options)

Sandbox

Tool

Contributing

Releases

`createSandbox(options)`