pi-code-tool
v0.6.1
Published
Code-mode tool for the pi coding agent: a sandboxed Python meta-tool (via @pydantic/monty) that lets the agent write code against host tools and build ephemeral code tools
Maintainers
Readme
pi-code-tool
A code-mode meta-tool for agent harnesses: the agent writes sandboxed Python (via @pydantic/monty) that calls host tools as plain functions — and can save working code as named, reusable ephemeral tools. First target harness: pi.
- Plan: PLAN.md — M0–M5 (MVP) complete
- Research notes: docs/research/
Install (as a pi package)
pi install npm:pi-code-tool
pi install /path/to/checkout # or straight from a local clone (builds dist/ first: npm run build)@pydantic/monty (including its platform-specific native binary) and typebox are
regular npm dependencies, so pi's installer pulls them in automatically;
@earendil-works/pi-coding-agent is a peer dependency satisfied by pi itself.
Why
LLMs compose code better than they compose chained JSON tool calls: loops, filtering,
and aggregation happen inside the sandbox, and intermediate data never enters model
context — only what the code print()s. See docs/research/01-code-mode-articles.md
for the evidence (Cloudflare Code Mode, Anthropic programmatic tool calling, smolagents).
What you get
Installing registers a code tool. pi's own tools are bridged into the sandbox
as plain Python functions: read, grep, find, ls dispatch directly, so the
model can loop/filter/compose them in one snippet instead of one model round-trip
per call. The workspace is also mounted read-only at /workspace for plain
open()/pathlib reads (monty enforces read-only mode, symlink-escape and
..-traversal protection), plus http_get (host-side fetch) and
save_tool/delete_tool/list_saved_tools/read_tool for building a toolbox in
.pi/code-tools/*.py (plain, user-editable Python files that auto-load into future
sessions). Variables persist across calls; state rides in tool-result details, so
it survives session restore and branching.
Mutating tools (bash, edit, write) are approval-gated: the script freezes
mid-execution at the call, pi shows a confirm dialog with the exact invocation, and
your answer resumes the script in place — deny and the sandbox raises a catchable
PermissionError, or pick "Decide later" to suspend the run entirely: completed
work stays cached, the suspension survives pi restarts, and {"resume": true}
continues the script from the exact gated call — even days later; {"abandon": true}
discards the pending gated call without resetting the rest of the session. The model writes a
30-line codemod; you approve each mutation without it burning a single extra token.
Replayed (already-approved) calls never re-prompt. Headless runs deny gated calls
unless you opt in with autoApprove.
Code is statically type-checked before execution (monty's bundled ty) against
typed stubs of every host tool — wrong argument types, bad methods on tool results,
and undefined names come back as compiler diagnostics before any side effects run,
instead of as tracebacks after three tool calls already happened.
Example pi session
$ pi
> Use the code tool to fetch https://api.github.com/repos/pydantic/monty,
report the star count, and save a reusable tool gh_stars(repo) for next time.
● Code
data = json.loads(http_get("https://api.github.com/repos/pydantic/monty"))
print(f"stars: {data['stargazers_count']}")
tool_code = (
"def gh_stars(repo):\n"
" import json\n"
" url = f'https://api.github.com/repos/{repo}'\n"
" return json.loads(http_get(url))['stargazers_count']\n"
)
save_tool("gh_stars", tool_code, "Return the GitHub star count for owner/repo.")
pydantic/monty currently has 1,234 stars. I saved gh_stars(repo) for future use.In a later session — no redefinition needed, gh_stars auto-loads:
> how many stars does josephkern/pi-code-tool have?
● Code
gh_stars("josephkern/pi-code-tool")Because the work happens in one sandboxed snippet, intermediate data (the full API response, loop iterations, file contents) never enters the model's context — only what the code prints comes back.
Configuration
The default export works out of the box. For custom host tools or a different tool
name, re-export from your own extension file (e.g. .pi/extensions/code.ts):
import { createPythonExtension } from 'pi-code-tool/pi'
export default createPythonExtension({
toolName: 'code', // rename if your model responds better to e.g. 'python'
root: process.cwd(), // workspace root for mounts, file tools, and the default store
mountWorkspace: true, // false: no /workspace mount, read_file tool instead
bridgePiTools: true, // false: don't expose pi's read/grep/find/ls/bash/edit/write
noBuiltins: false, // true: skip read_file/list_files/http_get starter tools
toolStore: '.pi/code-tools', // false: disable save_tool/delete_tool/list/read helpers
typeCheck: true, // false: skip the pre-execution type-check gate
autoApprove: false, // true: run bash/edit/write without asking (headless automation)
limits: { maxDurationSecs: 5, maxMemory: 64 * 1024 * 1024 },
tools: [
{
name: 'query_db',
description: 'Run a read-only SQL query.',
params: [{ name: 'sql', type: 'str' }],
returns: 'list[dict]',
returnsDescription: 'rows as dicts',
execute: async ([sql]) => db.query(String(sql)),
},
],
})Develop against this repo
pi -e src/pi/extension.ts # load straight from source, no build neededUse as a library
import { CodeRunner, createBuiltinTools } from 'pi-code-tool'
const runner = new CodeRunner({ tools: createBuiltinTools({ root: process.cwd() }) })
const result = await runner.run('len(list_files("."))')
if (result.status === 'ok') console.log(result.output)RunResult is a discriminated union with status: 'ok' | 'error' | 'suspended',
stdout, stdoutTruncated, and per-call traces. Session adds persistent state
across runs (replay with a tool-call cache — earlier side effects never repeat) and
serializes to JSON with dump() / Session.load(). If an approval gate suspends a
run, call session.resume() to continue or session.abandon() to discard it before
running new code. ToolStore adds the saved-tools layer.
Develop
npm install
npm test # vitest (111 tests)
npm run typecheck
npm run smoke # verifies monty primitives on your machine
npx tsx examples/demo.tsArchitecture
src/core/ runner.ts CodeRunner: owns monty's start/resume loop; tool dispatch,
tracebacks, limits, abort, per-call traces
registry.ts ToolRegistry + Python/typecheck stub rendering + prompt rules
builtins.ts read_file / list_files / http_get starter tools
session.ts Persistent state via transcript replay + tool-call cache
toolstore.ts Agent-saved tools as plain .py files + manage-from-sandbox
src/pi/ extension.ts pi adapter: `code` tool (configurable name), streaming output, branch-safe state
bridge.ts pi built-in tools → Python stubs; bash/edit/write approval-gatedKnown monty 0.0.18 quirks we code around are documented in docs/research/03-monty.md.
