pi-code-tool

v0.6.1

Published

6 days ago

Code-mode tool for the pi coding agent: a sandboxed Python meta-tool (via @pydantic/monty) that lets the agent write code against host tools and build ephemeral code tools

0High
0Medium
0Low

josephakern

pi-package pi python sandbox code-mode monty code-execution

pi-code-tool

A code-mode meta-tool for agent harnesses: the agent writes sandboxed Python (via @pydantic/monty) that calls host tools as plain functions — and can save working code as named, reusable ephemeral tools. First target harness: pi.

Plan: PLAN.md — M0–M5 (MVP) complete
Research notes: docs/research/

Install (as a pi package)

pi install npm:pi-code-tool
pi install /path/to/checkout   # or straight from a local clone (builds dist/ first: npm run build)

@pydantic/monty (including its platform-specific native binary) and typebox are regular npm dependencies, so pi's installer pulls them in automatically; @earendil-works/pi-coding-agent is a peer dependency satisfied by pi itself.

Why

LLMs compose code better than they compose chained JSON tool calls: loops, filtering, and aggregation happen inside the sandbox, and intermediate data never enters model context — only what the code print()s. See docs/research/01-code-mode-articles.md for the evidence (Cloudflare Code Mode, Anthropic programmatic tool calling, smolagents).

What you get

Installing registers a code tool. pi's own tools are bridged into the sandbox as plain Python functions: read, grep, find, ls dispatch directly, so the model can loop/filter/compose them in one snippet instead of one model round-trip per call. The workspace is also mounted read-only at /workspace for plain open()/pathlib reads (monty enforces read-only mode, symlink-escape and ..-traversal protection), plus http_get (host-side fetch) and save_tool/delete_tool/list_saved_tools/read_tool for building a toolbox in .pi/code-tools/*.py (plain, user-editable Python files that auto-load into future sessions). Variables persist across calls; state rides in tool-result details, so it survives session restore and branching.

Mutating tools (bash, edit, write) are approval-gated: the script freezes mid-execution at the call, pi shows a confirm dialog with the exact invocation, and your answer resumes the script in place — deny and the sandbox raises a catchable PermissionError, or pick "Decide later" to suspend the run entirely: completed work stays cached, the suspension survives pi restarts, and {"resume": true} continues the script from the exact gated call — even days later; {"abandon": true} discards the pending gated call without resetting the rest of the session. The model writes a 30-line codemod; you approve each mutation without it burning a single extra token. Replayed (already-approved) calls never re-prompt. Headless runs deny gated calls unless you opt in with autoApprove.

Code is statically type-checked before execution (monty's bundled ty) against typed stubs of every host tool — wrong argument types, bad methods on tool results, and undefined names come back as compiler diagnostics before any side effects run, instead of as tracebacks after three tool calls already happened.

Example pi session

$ pi
> Use the code tool to fetch https://api.github.com/repos/pydantic/monty,
  report the star count, and save a reusable tool gh_stars(repo) for next time.

  ● Code
    data = json.loads(http_get("https://api.github.com/repos/pydantic/monty"))
    print(f"stars: {data['stargazers_count']}")
    tool_code = (
        "def gh_stars(repo):\n"
        "    import json\n"
        "    url = f'https://api.github.com/repos/{repo}'\n"
        "    return json.loads(http_get(url))['stargazers_count']\n"
    )
    save_tool("gh_stars", tool_code, "Return the GitHub star count for owner/repo.")

pydantic/monty currently has 1,234 stars. I saved gh_stars(repo) for future use.

In a later session — no redefinition needed, gh_stars auto-loads:

> how many stars does josephkern/pi-code-tool have?

  ● Code
    gh_stars("josephkern/pi-code-tool")

Because the work happens in one sandboxed snippet, intermediate data (the full API response, loop iterations, file contents) never enters the model's context — only what the code prints comes back.

Configuration

The default export works out of the box. For custom host tools or a different tool name, re-export from your own extension file (e.g. .pi/extensions/code.ts):

import { createPythonExtension } from 'pi-code-tool/pi'

export default createPythonExtension({
  toolName: 'code',              // rename if your model responds better to e.g. 'python'
  root: process.cwd(),           // workspace root for mounts, file tools, and the default store
  mountWorkspace: true,          // false: no /workspace mount, read_file tool instead
  bridgePiTools: true,           // false: don't expose pi's read/grep/find/ls/bash/edit/write
  noBuiltins: false,             // true: skip read_file/list_files/http_get starter tools
  toolStore: '.pi/code-tools',   // false: disable save_tool/delete_tool/list/read helpers
  typeCheck: true,               // false: skip the pre-execution type-check gate
  autoApprove: false,            // true: run bash/edit/write without asking (headless automation)
  limits: { maxDurationSecs: 5, maxMemory: 64 * 1024 * 1024 },
  tools: [
    {
      name: 'query_db',
      description: 'Run a read-only SQL query.',
      params: [{ name: 'sql', type: 'str' }],
      returns: 'list[dict]',
      returnsDescription: 'rows as dicts',
      execute: async ([sql]) => db.query(String(sql)),
    },
  ],
})

Develop against this repo

pi -e src/pi/extension.ts        # load straight from source, no build needed

Use as a library

import { CodeRunner, createBuiltinTools } from 'pi-code-tool'

const runner = new CodeRunner({ tools: createBuiltinTools({ root: process.cwd() }) })
const result = await runner.run('len(list_files("."))')
if (result.status === 'ok') console.log(result.output)

RunResult is a discriminated union with status: 'ok' | 'error' | 'suspended', stdout, stdoutTruncated, and per-call traces. Session adds persistent state across runs (replay with a tool-call cache — earlier side effects never repeat) and serializes to JSON with dump() / Session.load(). If an approval gate suspends a run, call session.resume() to continue or session.abandon() to discard it before running new code. ToolStore adds the saved-tools layer.

Develop

npm install
npm test            # vitest (111 tests)
npm run typecheck
npm run smoke       # verifies monty primitives on your machine
npx tsx examples/demo.ts

Architecture

src/core/    runner.ts    CodeRunner: owns monty's start/resume loop; tool dispatch,
                          tracebacks, limits, abort, per-call traces
             registry.ts  ToolRegistry + Python/typecheck stub rendering + prompt rules
             builtins.ts  read_file / list_files / http_get starter tools
             session.ts   Persistent state via transcript replay + tool-call cache
             toolstore.ts Agent-saved tools as plain .py files + manage-from-sandbox
src/pi/      extension.ts pi adapter: `code` tool (configurable name), streaming output, branch-safe state
             bridge.ts    pi built-in tools → Python stubs; bash/edit/write approval-gated

Known monty 0.0.18 quirks we code around are documented in docs/research/03-monty.md.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

pi-code-tool

Install (as a pi package)

Why

What you get

Example pi session

Configuration

Develop against this repo

Use as a library

Develop

Architecture