npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@tanstack/ai-code-mode

v0.1.8

Published

Code Mode for TanStack AI - LLM-driven code execution in secure sandboxes

Readme

@tanstack/ai-code-mode

Code Mode for TanStack AI — let LLMs write and execute TypeScript in secure sandboxes with typed tool access.

Overview

Code Mode gives your AI agent an execute_typescript tool. Instead of one tool call per action, the LLM writes a small TypeScript program that orchestrates multiple tool calls with loops, conditionals, Promise.all, and data transformations — all running in an isolated sandbox.

Installation

pnpm add @tanstack/ai-code-mode

You also need an isolate driver:

# Node.js (fastest, uses V8 isolates via isolated-vm)
pnpm add @tanstack/ai-isolate-node

# QuickJS WASM (browser-compatible, no native deps)
pnpm add @tanstack/ai-isolate-quickjs

# Cloudflare Workers (edge execution)
pnpm add @tanstack/ai-isolate-cloudflare

Quick Start

import { chat, toolDefinition } from '@tanstack/ai'
import { createCodeMode } from '@tanstack/ai-code-mode'
import { createNodeIsolateDriver } from '@tanstack/ai-isolate-node'
import { z } from 'zod'

// Define tools that the LLM can call from inside the sandbox
const weatherTool = toolDefinition({
  name: 'fetchWeather',
  description: 'Get weather for a city',
  inputSchema: z.object({ location: z.string() }),
  outputSchema: z.object({ temperature: z.number(), condition: z.string() }),
}).server(async ({ location }) => {
  // Your implementation
  return { temperature: 72, condition: 'sunny' }
})

// Create the execute_typescript tool and system prompt
const { tool, systemPrompt } = createCodeMode({
  driver: createNodeIsolateDriver(),
  tools: [weatherTool],
})

const result = await chat({
  adapter: yourAdapter,
  model: 'gpt-4o',
  systemPrompts: ['You are a helpful assistant.', systemPrompt],
  tools: [tool],
  messages: [
    { role: 'user', content: 'Compare weather in Tokyo, Paris, and NYC' },
  ],
})

The LLM will generate code like:

const cities = ['Tokyo', 'Paris', 'NYC']
const results = await Promise.all(
  cities.map((city) => external_fetchWeather({ location: city })),
)
const warmest = results.reduce((prev, curr) =>
  curr.temperature > prev.temperature ? curr : prev,
)
return { warmestCity: warmest.location, temperature: warmest.temperature }

API Reference

createCodeMode(config)

Creates both the execute_typescript tool and its matching system prompt. This is the recommended entry point.

Config:

  • driver — An IsolateDriver (Node, QuickJS, or Cloudflare)
  • tools — Array of ServerTool or ToolDefinition instances. Exposed as external_* functions in the sandbox
  • timeout — Execution timeout in ms (default: 30000)
  • memoryLimit — Memory limit in MB (default: 128, supported by Node and QuickJS drivers)
  • getSkillBindings — Optional async function returning dynamic bindings

createCodeModeTool(config) / createCodeModeSystemPrompt(config)

Lower-level functions if you need only the tool or only the prompt. createCodeMode calls both internally.

Advanced

These utilities are used internally and exported for custom pipelines:

  • stripTypeScript(code) — Strips TypeScript syntax using esbuild.
  • toolsToBindings(tools, prefix?) — Converts tools to ToolBinding records for sandbox injection.
  • generateTypeStubs(bindings, options?) — Generates TypeScript type declarations from tool bindings.

Driver Selection Guide

| Driver | Best For | Native Deps | Browser | Memory Limit | | --------------------------------- | -------------------------------------------- | ------------------- | ------- | ------------ | | @tanstack/ai-isolate-node | Server-side Node.js apps | Yes (isolated-vm) | No | Yes | | @tanstack/ai-isolate-quickjs | Browser, edge, or no-native-dep environments | No (WASM) | Yes | Yes | | @tanstack/ai-isolate-cloudflare | Cloudflare Workers deployments | No | N/A | N/A |

Custom Events

Code Mode emits custom events during execution that you can observe via the TanStack AI event system:

| Event | Description | | ----------------------------- | --------------------------------------------------- | | code_mode:execution_started | Emitted when code execution begins | | code_mode:console | Emitted for each console.log/error/warn/info call | | code_mode:external_call | Emitted before each external_* function call | | code_mode:external_result | Emitted after a successful external_* call | | code_mode:external_error | Emitted when an external_* call fails |

Models eval (development)

The benchmark lives in a separate workspace package so @tanstack/ai-code-mode does not depend on @tanstack/ai-isolate-node (avoids an Nx build cycle). See models-eval/package.json (@tanstack/ai-code-mode-models-eval).

  1. packages/typescript/ai-code-mode/models-eval/pull-models.sh — pull recommended Ollama models
  2. pnpm --filter @tanstack/ai-code-mode-models-eval eval:capture — run models and capture raw outputs/telemetry only (no judge LLM call)
  3. pnpm --filter @tanstack/ai-code-mode-models-eval eval:judge — judge latest captured session from logs (no model rerun)
  4. pnpm --filter @tanstack/ai-code-mode-models-eval eval — single-pass run+judge (legacy convenience mode)
  5. pnpm --filter @tanstack/ai-code-mode-models-eval eval -- --ollama-only — only Ollama models from eval-config.ts
  6. pnpm --filter @tanstack/ai-code-mode-models-eval eval -- --ollama-only --models qwen3-coder — one or more model ids (comma-separated)

Judge-phase flags:

  • --judge-latest judge latest captured session
  • --rejudge re-run judging even if logs already contain judge fields

The default list omits some small Ollama models that rarely complete code-mode successfully (see comments in eval-config.ts). You can still benchmark them with --models granite4:3b etc. if pulled locally.

Model comparison metrics

The models eval now tracks seven decision-oriented metrics plus an overall rating:

  • accuracy (1-10): numerical/factual correctness vs gold report
  • comprehensiveness (1-10): whether the response covers everything requested by the user query
  • typescriptQuality (1-10): quality/readability/type-safety of generated TypeScript
  • codeModeEfficiency (1-10): how efficiently the model uses code-mode/tooling to reach the answer
  • speedTier (1-5): relative wall-clock speed against peers in the same category (local or cloud)
  • tokenEfficiencyTier (1-5): relative token efficiency against peers in the same category
  • stabilityTier (1-5): success consistency over the latest 5 logged runs for that model
  • stars (1-3): weighted rollup score across all metrics

Raw run telemetry also includes compile/runtime failures, redundant schema checks, total tool calls, TTFT, token totals, stability sample size/rate, and per-model logs.

Methodology

Canonical output is written to packages/typescript/ai-code-mode/models-eval/results.json after each capture or judge run.

  • Benchmark: single code-mode benchmark prompt over the in-memory customers / products / purchases dataset
  • Primary quality scores (judge): accuracy, comprehensiveness, typescriptQuality, codeModeEfficiency
  • Computed comparative scores: speedTier, tokenEfficiencyTier, stabilityTier
  • Stability definition: a run is "stable" if it has no top-level run error, produces a non-empty candidate report, and has at least one successful execute_typescript call
  • Star rollup weights:
    • accuracy: 25%
    • comprehensiveness: 15%
    • typescriptQuality: 15%
    • codeModeEfficiency (with compile/runtime failure penalty): 10%
    • speedTier: 10%
    • tokenEfficiencyTier: 10%
    • stabilityTier: 15%

Model comparison table

The table below is transcribed from canonical models-eval/results.json (session 2026-03-26T15:38:44.006Z).

| Provider | Model | Category | Stars | Accuracy | Comprehensiveness | TypeScript | Code-Mode | Speed Tier | Token Tier | Stability Tier | | --------- | ----------------------------- | -------- | ----- | -------- | ----------------- | ---------- | --------- | ---------- | ---------- | -------------- | | Ollama | gpt-oss:20b | local | ★★★ | 10 | 8 | 5 | 5 | 5 | 5 | 5 | | Ollama | nemotron-cascade-2 | local | ★★☆ | 3 | 5 | 6 | 5 | 1 | 5 | 5 | | Anthropic | claude-haiku-4-5 | cloud | ★★★ | 10 | 10 | 6 | 7 | 3 | 2 | 5 | | OpenAI | gpt-4o-mini | cloud | ★★★ | 10 | 8 | 7 | 9 | 3 | 1 | 5 | | Gemini | gemini-2.5-flash | cloud | ★★★ | 10 | 8 | 7 | 10 | 4 | 2 | 5 | | xAI | grok-4-1-fast-non-reasoning | cloud | ★★★ | 10 | 8 | 6 | 10 | 4 | 5 | 5 | | Groq | llama-3.3-70b-versatile | cloud | ★★★ | 10 | 7 | 6 | 9 | 5 | 3 | 4 | | Groq | qwen/qwen3-32b | cloud | ★★☆ | 10 | 8 | 5 | 4 | 1 | 2 | 5 |

Suggested interpretation:

  • Local-first: favor stars >= 2 with high speed tier.
  • Cloud-first quality: favor high accuracy + typescriptQuality, then compare stars.
  • Cost-sensitive: prioritize tokenEfficiencyTier and speedTier together.

License

MIT