zidane

v5.4.2

Published

18 hours ago

an agent that goes straight to the goal

0High
0Medium
0Low

Zidane

An agent that goes straight to the goal. Minimal TypeScript agent loop built with Bun, hookable via hookable. Built to be embedded.

Features

Small, hookable core with sensible defaults. Three principles guide the design:

⚖️ Token discipline — cache, dedup, compaction, byte-accounting.
🩹 Self-healing fault paths — auto-coerce args, hallucinated-tool fallback, error rewriting.
🔁 Provider parity — server-side features on Anthropic, client-side equivalents elsewhere.

Everything below is in service of those:

🧠 Providers — Anthropic, OpenAI Codex, OpenRouter, Cerebras, plus openaiCompat (Baseten, Fireworks, Groq, local). OAuth + API key with auto-refresh.
🪝 Hookable turn loop — every text/thinking delta, tool call, MCP, session, skill, spawn, OAuth, validation, and budget event is observable and (mostly) mutable.
🛠️ First-class tools — shell, read_file, write_file, edit, multi_edit, glob, grep, spawn, human-in-the-loop, plus any MCP server. Per-call gates, arg auto-coerce, hallucinated-tool fallback, error rewriting. Lazy MCP disclosure via tool_search. Optional todowrite / todoread for persistent task checkpointing across prompts.
🧮 Per-tool concurrency — every ToolDef carries isConcurrencySafe (default false); the dispatcher fans safe siblings out in parallel and barriers unsafe ones. Order is preserved at yield time. Cap via behavior.maxConcurrentTools.
✂️ Token-aware — paginated reads, tail-truncated shell, idempotent write_file; outputBytes everywhere. toolOutputBudget, toolBudgets, thinkingDecay.
🗜️ Context discipline — cache_control breakpoints; server-side compaction on Anthropic, client-side compactStrategy: 'tail' elsewhere. Per-session read dedup + requireReadBeforeEdit; generalized dedupTools.
🎯 Reasoning + structured output — thinking levels with optional exact budgets; force final response to a JSON Schema (Zod v4 interop).
💾 Sessions, skills, multimodal — pluggable stores, incremental persistence; Agent Skills spec; images + documents via PromptPart[].
🧵 Sub-agents + execution contexts — child events bubble to parent; run tools in-process, Docker, or any SandboxProvider.
🧭 Typed errors + 1000+ tests — AgentContextExceededError / AgentProviderError / AgentAbortedError. Suite under 2s with mocks.

Quickstart

bun install
bun run auth                                    # Anthropic + OpenAI Codex OAuth (--openai / --anthropic to scope)
bun start --prompt "create a hello world app"

Agent Setup

import { createAgent } from 'zidane'
import { basic } from 'zidane/presets'
import { anthropic } from 'zidane/providers'

const agent = createAgent({
  ...basic,
  provider: anthropic({ apiKey: 'sk-ant-...' }),
})

const stats = await agent.run({ prompt: 'build a REST API' })
console.log(`Done in ${stats.turns} turns`)

createAgent options:

createAgent({
  provider,                          // required
  name: 'basic',                     // display name (traces/logs)
  system: 'You are a helpful...',
  tools: { shell, readFile },        // default: {}
  toolAliases: { shell: 'Bash' },    // canonical → LLM-facing names
  session,
  behavior: {
    maxConcurrentTools: 10,          // cap on in-flight tools per turn; set 1 to force sequential
    maxTurns: 50,
    maxTokens: 16384,
    thinkingBudget: 10240,
    thinkingDecay: { afterTurn: 5, factor: 0.5, floor: 1024 },
    cache: true,                     // prompt-cache breakpoints
    toolOutputBudget: 32768,         // soft per-turn byte cap (off by default)
    dedupReads: true,                // dedup re-reads in `read_file`
    dedupTools: { execute_sql: i => typeof i.query === 'string' ? i.query.trim() : undefined },
    requireReadBeforeEdit: false,    // refuse edits against unread/stale files
    toolBudgets: { execute_sql: { max: 20, onExceed: 'steer' } },
    compactStrategy: 'off',          // 'off' | 'tail' (non-Anthropic compaction)
    compactThreshold: 131_072,       // 128 KiB
    compactKeepTurns: 4,
    toolDisclosure: 'eager',         // 'eager' | 'lazy' (hide MCP schemas behind tool_search)
    toolSearch: { tool: true, limit: 20 },
  },
  execution: createProcessContext(),
  mcpServers: [],
  eager: true,                       // pre-warm MCP in background
  skills: {},
})

Presets are Partial<AgentOptions> — spread, override:

createAgent({ ...basic, provider, system: 'be concise' })

agent.run() options:

await agent.run({
  prompt: 'your task',       // optional when session has turns
  model: 'claude-opus-4-7',
  system: 'be concise',
  thinking: 'medium',        // off | minimal | low | medium | high
  behavior: { maxTurns: 10, maxTokens: 4096, thinkingBudget: 8192 },
  tools: {},                 // {} = no tools for this run
  images: [],
  signal: abortController.signal,
})

prompt is optional when the session already has turns — the agent resumes. Useful when the user message is persisted before the run (WebSocket → session → queue → agent).

Precedence: run.behavior > agent.behavior > defaults.

CLI

bun start \
  --prompt "your task"    \   # required
  --model claude-opus-4-7 \   # model id
  --provider anthropic    \   # anthropic | openai | openrouter | cerebras
  --preset basic          \   # preset name
  --system "be concise"   \   # system prompt
  --thinking off          \   # off | minimal | low | medium | high
  --context process       \   # process | docker
  --mcp '{"name":"fs","transport":"stdio","command":"npx","args":["-y","@modelcontextprotocol/server-filesystem","."]}'

Providers

All providers accept runtime credentials via params. Env vars are fallbacks.

Anthropic

import { anthropic } from 'zidane/providers'

anthropic({ apiKey: 'sk-ant-...' })
anthropic({ access: 'sk-ant-oat-...', refresh: '...', expires: Date.now() + 3600_000 }) // OAuth + auto-refresh

// First-party betas + server-side compaction:
anthropic({
  apiKey: '...',
  extraBetas: [
    'context-management-2025-06-27',     // token-accurate compaction
    'token-efficient-tools-2026-03-28',  // ~4.5% output reduction
    'interleaved-thinking-2025-05-14',   // think between tool calls
  ],
  contextManagement: {
    edits: [{
      type: 'clear_tool_uses_20250919',
      trigger: { type: 'input_tokens', value: 180_000 },
      clear_at_least: { type: 'input_tokens', value: 140_000 },
      clear_tool_inputs: ['Read', 'Bash', 'Grep', 'Glob'],
    }],
  },
})

Fallback: params.apiKey > params.access > ANTHROPIC_API_KEY env > .credentials.json. extraBetas merge with OAuth defaults and de-dupe. contextManagement is sent as context_management; pair with the matching beta. Non-Anthropic equivalent: behavior.compactStrategy: 'tail'.

extraBodyParams passes un-typed Messages API fields through (factory options win on collision). Use when Anthropic ships a beta before zidane has a knob. openaiCompat accepts the same field (e.g. reasoning_effort, metadata, OpenRouter provider routing).

OpenRouter / OpenAI / Cerebras

import { openrouter, openai, cerebras } from 'zidane/providers'

openrouter({ apiKey: 'sk-or-...', defaultModel: 'google/gemini-pro' })
openai()                                                                  // OpenAI Codex OAuth
openai({ access: 'eyJ...', refresh: '...', expires: Date.now() + 3600_000, accountId: 'acct_123' })
cerebras({ apiKey: 'csk-...', defaultModel: 'zai-glm-4.7' })

Fallbacks: params.apiKey > params.access (Codex) > <PROVIDER>_API_KEY env > .credentials.json (Codex). Pass full OAuth fields on openai() to auto-refresh without reading .credentials.json.

OpenAI-compatible (custom endpoints)

Any OpenAI Chat Completions endpoint — Baseten, Fireworks, Groq, local LM servers, corporate proxies:

import { openaiCompat } from 'zidane/providers'

openaiCompat({
  name: 'baseten',
  apiKey: process.env.BASETEN_API_KEY!,
  baseURL: process.env.BASETEN_PROXY_URL!,
  authHeader: { name: 'Authorization', scheme: 'Api-Key' },  // vendor-specific
  capabilities: { vision: false, imageInToolResult: false },
  cacheBreakpoints: false,                                    // true only when endpoint honors `cache_control`
})

openrouter and cerebras are thin wrappers with vendor defaults pinned. Use openaiCompat directly for new backends.

Prompt caching

behavior.cache (default on). cache_control: { type: 'ephemeral' } is inserted on three stable prefixes (system, last tool, last message's final block). Hits + writes surface on TurnUsage.cacheRead / cacheCreation via the usage hook.

| Provider | Behavior | |---|---| | anthropic | Honored natively. | | openrouter | Forwarded; Anthropic + Gemini honor; OpenAI / DeepSeek / Grok / Groq / Moonshot cache automatically and ignore the markers. | | openaiCompat | Opt-in via cacheBreakpoints: true. Off by default. | | cerebras | Off. | | openai (Codex) | Not affected (pi-ai wire format). |

Presets

basic ships:

| Tool | Description | |---|---| | shell | Combined stdout+stderr tail-truncated at 8 KB. maxOutputBytes: 0 disables. | | readFile | Line range, default 1..2000, 64 KB cap. Paging footer; binary marker. | | writeFile | Returns Created / Updated / No change needed: … for no-op detection. | | edit | Surgical old_string → new_string. Clear errors on non-unique / not-found (with nearest-match preview). | | multiEdit | Atomic edits to one file. All-or-nothing. | | listFiles | Directory listing. | | spawn | Sub-agent. |

Opt-in (via import from 'zidane'): glob (Bun.Glob; shells out in docker/sandbox), grep (ripgrep + Bun.Glob fallback; full Claude Code Grep semantics), createInteractionTool (HITL factory).

skills_use / skills_read / skills_run_script auto-inject when the skills catalog is non-empty.

import { basicTools, definePreset } from 'zidane/presets'

createAgent({ ...definePreset({ name: 'researcher', tools: basicTools }), provider })
createAgent({ provider })                            // no tools
await agent.run({ prompt: 'just chat', tools: {} }) // no tools for one run

Thinking

Named levels or exact budgets. Traces persist as { type: 'thinking', text } blocks and stream via stream:thinking. Supported by Anthropic (native) and OpenRouter/Cerebras (reasoning_content/reasoning SSE fields).

| Level | Default budget | |---|---| | off | disabled | | minimal | 1,024 | | low | 4,096 | | medium | 10,240 | | high | 32,768 | | adaptive | model self-budgets |

await agent.run({ prompt: '…', thinking: 'high' })
await agent.run({ prompt: '…', thinking: 'high', behavior: { thinkingBudget: 50000 } })  // exact
await agent.run({ prompt: '…', thinking: 'adaptive', behavior: { thinkingBudget: 32000 } })

adaptive is Anthropic-only (thinking.type='adaptive', avoids the opus 4.6+ deprecation warning). Pairing it with thinkingBudget caps max_tokens = min(maxTokens, thinkingBudget) to bound runaway reasoning. Other providers fall back to no reasoning on adaptive.

Hooks

Hooks fire at every lifecycle point via hookable. Awaited in registration order; ctx is shared per firing (last-writer wins). See docs/SKILL.md for the full hook reference table.

Practical examples

// Refuse or substitute a tool call.
agent.hooks.hook('tool:gate', (ctx) => {
  if (ctx.name === 'shell' && String(ctx.input.command).includes('rm -rf')) {
    ctx.block = true
    ctx.reason = 'dangerous command'
  }
  if (ctx.name === 'todowrite' && (ctx.runToolCounts.todowrite ?? 0) > 0)
    ctx.result = 'Already recorded; no-op.'  // `block` wins if both set
})

// Redact secrets before the model sees a tool result.
agent.hooks.hook('tool:transform', (ctx) => {
  if (typeof ctx.result === 'string')
    ctx.result = ctx.result.replace(/\b(API_KEY|TOKEN|PASSWORD)\s*=\s*\S+/gi, '$1=<redacted>')
})

// Substitute for hallucinated tool names instead of erroring.
agent.hooks.hook('tool:unknown', (ctx) => {
  if (ctx.name === 'EnterPlanMode') {
    ctx.result = 'EnterPlanMode is not available — use shell to draft a plan as comments.'
    ctx.suppressError = true
  }
})

// Per-turn observation.
agent.hooks.hook('turn:after', (ctx) => { /* ctx.turn, ctx.usage, ctx.message — always fires */ })
agent.hooks.hook('stream:text', (ctx) => { /* ctx.delta, ctx.text */ })
agent.hooks.hook('agent:done', (ctx) => { /* AgentStats — cumulative incl. children */ })

// Mutate messages / system before the provider call.
agent.hooks.hook('context:transform', (ctx) => {
  if (ctx.messages.length > 30) ctx.messages.splice(2, ctx.messages.length - 30)
})
agent.hooks.hook('system:transform', (ctx) => {
  if (ctx.session && ctx.turn > 1)
    ctx.system += `\n\n## Reminder: keep responses concise after turn ${ctx.turn}.`
})

Mutable hooks: tool:gate (block / reason / result), tool:transform (result / isError), tool:error + tool:unknown (result), context:transform (messages), system:transform + system:before (system), skills:catalog (catalog), mcp:tool:gate (block / reason / result), mcp:tool:transform (result). All tool hooks include turnId + callId. outputBytes is pre-mutation on *:transform, post-mutation on *:after — reproduce via toolOutputByteLength(). ctx.coercions is omitted when no coercion happened — guard with if (ctx.coercions).

Hook recipes

Three patterns the framework can't auto-infer. Copy-paste and tune.

// 1. Truncate MCP tool results — sizes vary too much for a default.
agent.hooks.hook('mcp:tool:transform', (ctx) => {
  if (ctx.outputBytes <= 8192 || typeof ctx.result !== 'string')
    return
  const tail = ctx.result.slice(-4096)
  ctx.result = `…(${ctx.outputBytes - tail.length} bytes truncated from head)…\n${tail}`
})

// 2. Substitute a friendly response when the model invents a tool name.
agent.hooks.hook('tool:unknown', (ctx) => {
  if (ctx.name === 'EnterPlanMode') {
    ctx.result = 'EnterPlanMode is not available — use shell to draft a plan as comments.'
    ctx.suppressError = true
  }
})

// 3. Drop old turns past a soft cap.
agent.hooks.hook('context:transform', (ctx) => {
  const KEEP_RECENT = 30
  if (ctx.messages.length > KEEP_RECENT) {
    const trimmed = [ctx.messages[0], ...ctx.messages.slice(-KEEP_RECENT + 1)]
    ctx.messages.splice(0, ctx.messages.length, ...trimmed)
  }
})

mcp:tool:transform, tool:unknown, and context:transform are the highest-leverage entries the framework doesn't auto-handle. Most production agents end up with one of each.

Per-turn output budget

behavior.toolOutputBudget injects a "summarize before continuing" message when a turn's combined post-tool:transform bytes exceed the cap. Off by default. Subscribe via budget:exceeded (byte) and tool-budget:exceeded (per-tool, fields: tool, count, max, turnId, mode).

Client-side context compaction (non-Anthropic)

behavior.compactStrategy: 'tail' elides older tool_result blocks once their combined size exceeds compactThreshold (default 128 KiB); the newest compactKeepTurns (default 4) stay intact. Anthropic users should prefer the server-side context-management-2025-06-27 beta via anthropic({ extraBetas, contextManagement }) — token-accurate.

Read dedup + read-before-edit guard

behavior.dedupReads (default on) — read_file returns "unchanged since the previous read" on identical re-reads. Per-session content-hash.
behavior.requireReadBeforeEdit (default off) — edit / multi_edit reject when the file hasn't been read this session or has drifted. Recommended for eval-grade runs.

Tracking and the dedup short-circuit are independent — turning dedupReads off does NOT disable requireReadBeforeEdit. Read-state keys are canonical absolute paths (resolved through node:path), so read_file('src/x') and edit('./src/x', …) resolve to the same entry. For subagent-shared tracking without shared turn history, use createSpawnTool({ shareReadState: true }).

Per-tool concurrency

Every ToolDef carries an isConcurrencySafe?: boolean | ((input) => boolean) flag (default false). The unified dispatcher walks calls in submission order:

Safe siblings fan out in parallel up to behavior.maxConcurrentTools (default 10).
Unsafe calls act as barriers — they wait for the in-flight fleet to drain, run alone, then unblock the queue.
Results are yielded in submission order regardless of completion order, so the model sees a deterministic transcript.

Builtin annotations: read_file, glob, grep, list_files, tool_search, todoread, spawn, skills_read are true. shell is conditional — ls, cat, git status, rg, … fan out; rm, npm install, pipes, redirects, command chains all barrier. edit, multi_edit, write_file, todowrite, interaction, custom tools, and third-party MCP tools default to false (barrier).

A shell error in a concurrent fleet cascade-cancels its siblings (shell commands often have implicit dependency chains); other failures are isolated.

Set behavior.maxConcurrentTools: 1 to force fully sequential dispatch for deterministic debugging.

Generic per-tool dedup

behavior.dedupTools extends the pattern to arbitrary tools via a hasher keyed by canonical name. Requires a session. Hasher contract — three returns, three meanings:

| Return | Meaning | |---|---| | non-empty string | Cache key. Equal keys replay the prior result. | | undefined | Skip dedup for this call. Tool runs normally. | | '' or non-string | Treated as undefined (defensive). |

behavior: {
  dedupTools: {
    todowrite: input => JSON.stringify(input),
    execute_sql: (input) => {
      const q = typeof input.query === 'string' ? input.query.trim().toLowerCase() : undefined
      if (!q || q.includes('now()') || q.includes('random()')) return undefined  // non-cacheable
      return q
    },
  },
}

Tools with side effects or non-determinism (network, time, randomness) must not be listed. For MCP tools, key by the namespaced wire name (mcp_<server>_<tool>).

Per-tool call budgets

behavior.toolBudgets caps per-tool calls per run. 'steer' lets the call run then nudges the model to commit (once per tool per run); 'block' refuses with Blocked: <reason>.

behavior: {
  toolBudgets: {
    todowrite: { max: 6, onExceed: 'steer' },
    execute_sql: { max: 3, onExceed: 'block' },
  },
}

Pass a function for custom messages: onExceed: ctx => ({ mode: 'steer', message: '...' }). Counts include dedup hits — by design.

Adaptive thinking budget

behavior.thinkingDecay tapers thinking across turns. Late turns are usually checkpoint work where reasoning rarely pays off.

behavior: {
  thinkingBudget: 8192,
  thinkingDecay: { afterTurn: 5, factor: 0.5, floor: 1024 },
  // turn 1-5 → 8192, turn 6 → 4096, turn 7 → 2048, turn 8+ → 1024
}

Pass a function for arbitrary curves: thinkingDecay: (turn, base) => base / Math.sqrt(turn). No-op when thinkingBudget is unset.

Steering and Follow-up

agent.steer(msg) — inject mid-run, delivered between tool calls.
agent.followUp(msg) — queue for after the run finishes.

agent.steer('focus only on the tests directory')
agent.followUp('now write tests for what you built')

Sub-agent Spawning

spawn delegates to independent child agents. Children inherit the parent's preset (tools, system, aliases, MCP servers, skills, behavior) by default. Pass preset on createSpawnTool() to override per child.

import { basicTools, definePreset, createSpawnTool } from 'zidane'

definePreset({
  name: 'orchestrator',
  tools: {
    ...basicTools,
    spawn: createSpawnTool({ maxConcurrent: 5, model: 'claude-haiku-4-5-20251001', thinking: 'low' }),
  },
})

Interaction Tool

Pause the agent and request structured input. Not in any preset by default. onRequest may be async — the agent waits. Return a string or object.

import { createInteractionTool } from 'zidane'

const askUser = createInteractionTool({
  name: 'ask_user',
  schema: { type: 'object', properties: { question: { type: 'string' } }, required: ['question'] },
  onRequest: async ({ question }) => ({ answer: await promptUser(question) }),
})

Sessions

Persistent turn history + run metadata across calls. Turns persist incrementally — a crash leaves history up to the last completed turn.

import { createAgent, createSession, createSqliteStore } from 'zidane'

const store = createSqliteStore({ path: './sessions.db' })
const session = await createSession({ store })

const agent = createAgent({ ...basic, provider, session })
await agent.run({ prompt: 'hello' })
await session.save()

Storage backends — createMemoryStore() (in-memory), createSqliteStore({ path }) from zidane/session/sqlite (Bun-only subpath; WAL, per-turn flush), createRemoteStore({ url }) (HTTP), createFileMapStore(adapter) (any { get, save, delete } backend; turns.jsonl + meta.json).

Restore via await loadSession(store, id). Session hooks: session:start, session:turns, session:end (always fires, carries turnRange).

MCP Servers

Connect any MCP server. Tools are namespaced mcp_{server}_{tool}. Connections are lazy (first run()) and reused; all servers bootstrap in parallel.

const agent = createAgent({
  ...basic,
  provider,
  mcpServers: [
    { name: 'fs', transport: 'stdio', command: 'npx', args: ['-y', '@modelcontextprotocol/server-filesystem', '.'], bootstrapTimeout: 10_000 },
    { name: 'api', transport: 'streamable-http', url: 'http://localhost:3002/mcp', disclosure: 'lazy' },
  ],
})

Per-server disclosure: 'lazy' | 'eager' overrides behavior.toolDisclosure (see Progressive tool disclosure).

Hiding bootstrap latency

The first run() still waits on the slowest server. Two knobs:

await Promise.all([agent.warmup(), authenticate(), loadConfig()])  // pre-warm manually
const agent = createAgent({ provider, mcpServers, eager: true })   // or kick off automatically

warmup() fans MCP connect + skills resolution out in parallel — both idempotent and concurrency-safe. Failures surface on the next warmup() / run(), not on the eager kickoff. agent.activateSkill(name) also auto-resolves the catalog on its first call, so a TUI can wire /slash-style activation at submit time without ordering it against run().

Two hooks fire per bootstrap regardless of outcome — attribute cold-start latency per server:

agent.hooks.hook('mcp:bootstrap:end', (ctx) => {
  // ctx.name, ctx.transport, ctx.durationMs, ctx.ok
  // ok ? ctx.toolCount : ctx.error
})

OAuth 2.1

HTTP MCP servers can be tagged auth: 'oauth' to wire the SDK's OAuth 2.1 client (PKCE, RFC 7591 dynamic client registration, RFC 9728 resource metadata, token refresh). Zidane provides the storage + redirect halves:

import { createMemoryMcpCredentialStore, McpOAuthProvider, connectMcpServers, loginMcpServer } from 'zidane'
import { createFileMcpCredentialStore } from 'zidane/chat'

const store = createFileMcpCredentialStore('/path/to/data-dir')

const agent = createAgent({
  provider,
  mcpServers: [{ name: 'linear', transport: 'streamable-http', url: 'https://mcp.linear.app/mcp', auth: 'oauth' }],
  mcpConnector: configs => connectMcpServers(configs, undefined, agent.hooks, {
    // Non-interactive bootstrap: uses stored tokens, refreshes on expiry,
    // never opens a browser. Missing tokens fire `mcp:auth:required`.
    buildAuthProvider: cfg => new McpOAuthProvider({ name: cfg.name, store }),
  }),
})

agent.hooks.hook('mcp:auth:required', ({ name }) => {
  console.log(`${name} needs login — run loginMcpServer(...) when ready`)
})

// Interactive login (opens browser, waits on a loopback callback):
await loginMcpServer(linearConfig, { store, hooks: agent.hooks })
// Tokens persisted. Next `agent.run()` / `warmup()` connects Linear with the new token.

Auto-detect: a server without auth: 'oauth' that returns 401 + RFC 9728 metadata is promoted to OAuth IF no static Authorization header is set — the headers check stops the harness from second-guessing user-managed bearer tokens.

The TUI exposes this via the MCP picker (l to login, o to logout). State is driven by four hooks: mcp:auth:required, mcp:auth:url, mcp:auth:success, mcp:auth:error.

Progressive tool disclosure

With hundreds of MCP tools, every turn ships every schema. behavior.toolDisclosure: 'lazy' flips MCP tools to a name-only catalog and auto-injects a tool_search native tool. Native + skill tools stay eager.

const agent = createAgent({
  ...basic,
  provider,
  mcpServers: [
    { name: 'github', transport: 'stdio', command: 'gh-mcp' },                    // 200+ tools
    { name: 'fs', transport: 'stdio', command: 'fs-mcp', disclosure: 'eager' },   // per-server override
  ],
  behavior: { toolDisclosure: 'lazy', toolSearch: { limit: 20 } },
})

System prompt gains <searchable_tools> with name + description per lazy tool. tool_search accepts query (substring), names, server, limit — matches unlock for the rest of the run. A tool:gate middleware refuses dispatch on un-surfaced lazy tools (covers custom/mock providers; production providers also refuse server-side). Catalog + search results show the wire name; the unlock set keys on canonical so dispatch and session.turns stay alias-stable.

Cost: one cache miss per discovery wave (the tool list grows); subsequent turns hit cache. Opt out via behavior.toolSearch.tool: false (catalog still emits, call-to-action drops). A pre-existing host tool named tool_search shadows the auto-injection.

Skills

Reusable instruction packages following the Agent Skills standard.

my-skill/
  SKILL.md       # frontmatter + instructions
  scripts/       # optional
  references/    # optional
  assets/        # optional

---
name: my-skill
description: When to activate this skill.
model: claude-opus-4-7
thinking: low
allowed-tools: Bash Read Write
paths: "src/**/*.ts, test/**/*.ts"
---

Full instructions the agent receives on activation.

Default scan paths (first found wins): {cwd}/.agents/skills, {cwd}/.zidane/skills, ~/.agents/skills, ~/.zidane/skills. Instructions support !\command`` — runs during resolution; output replaces the placeholder.

import { createAgent, defineSkill } from 'zidane'

createAgent({
  ...basic,
  provider,
  skills: {
    scan: ['./custom-skills'],
    write: [defineSkill({ name: 'review', description: 'Code review.', instructions: '...' })],
    exclude: ['deprecated-skill'],
    enabled: ['review', 'deploy'],
  },
})

Execution Contexts

Where tools run. Defaults to in-process. Docker isolates; sandbox runs remotely (E2B, Rivet, custom).

import { createSandboxContext } from 'zidane'
// Docker lives behind its own subpath so `dockerode` (and the ssh2 →
// cpu-features native chain) only enters your bundle when you opt in.
import { createDockerContext } from 'zidane/contexts/docker'

createDockerContext({ image: 'node:22', cwd: '/workspace', limits: { memory: 512, cpu: '1.0' } })
createSandboxContext(myProvider)  // implement SandboxProvider

State Management

agent.isRunning           // run in progress?
agent.turns               // SessionTurn[]
agent.abort()             // cancel current run
agent.reset()             // clear turns + queues
await agent.warmup()      // pre-connect MCP + resolve skills (idempotent)
await agent.destroy()     // clean up context + MCP
await agent.waitForIdle() // wait for run to complete

Message Format

Canonical format. Providers convert to/from wire formats internally.

type SessionContentBlock =
  | { type: 'text', text: string }
  | { type: 'image', mediaType: string, data: string }
  | { type: 'tool_call', id: string, name: string, input: Record<string, unknown> }
  | { type: 'tool_result', callId: string, output: string | ToolResultContent[], isError?: boolean }
  | { type: 'thinking', text: string, signature?: string }

type ToolResultContent =
  | { type: 'text', text: string }
  | { type: 'image', mediaType: string, data: string }

Image-producing tools (MCP browsers, screenshots) return ToolResultContent[] — routed natively on providers with imageInToolResult: true, via companion user message elsewhere. Flatten with toolResultToText(output).

External interop converters: fromAnthropic, toAnthropic, fromOpenAI, toOpenAI, autoDetectAndConvert (re-exported from zidane).

Typed Errors

Provider failures are wrapped before leaving agent.run(). Match on instanceof, not strings. Every provider ships classifyError(err); unrecognized errors fall through as AgentProviderError. Abort paths (agent.abort() / AbortSignal) always produce AgentAbortedError.

import { AgentAbortedError, AgentContextExceededError, AgentProviderError } from 'zidane'

try {
  await agent.run({ prompt })
}
catch (err) {
  if (err instanceof AgentContextExceededError) { /* prune history, retry */ }
  else if (err instanceof AgentAbortedError) { /* user cancelled */ }
  else if (err instanceof AgentProviderError) {
    console.error(`${err.provider}: ${err.message} (${err.providerCode})`)
  }
}

Structured Output

Force the final response to a JSON Schema via provider-level tool forcing. Lands on stats.output and fires the output hook (ctx.output, ctx.schema).

const stats = await agent.run({
  prompt: 'Extract the entities',
  behavior: {
    schema: {
      type: 'object',
      properties: { name: { type: 'string' }, age: { type: 'number' } },
      required: ['name', 'age'],
    },
  },
})
console.log(stats.output) // { name: 'Alice', age: 30 }

For Zod v4, normalize via zodToJsonSchema(z.toJsonSchema(schema)) — strips $schema (some providers reject it).

Usage Tracking

stats.totalIn / stats.totalOut / stats.cost are cumulative (parent + recursive children). stats.turns and stats.turnUsage cover the parent loop only. Use helpers for tree-wide breakdowns.

import { flattenTurns, statsByModel } from 'zidane'

stats.totalIn / stats.totalOut / stats.cost  // cumulative
stats.turnUsage                              // parent loop only
stats.children                               // ChildRunStats[] in completion order
stats.timeTillFirstTokenMs                   // ms to first stream/tool event

flattenTurns(stats)   // every TurnUsage in tree (DFS)
statsByModel(stats)   // Map<modelId, { input, output, cost, cacheRead, cacheCreation, turns }>

Types & Helpers

Types from zidane/types (Agent, SessionTurn, TurnUsage, Provider, ToolDef, ValidationResult, hook contexts).

Helpers re-exported from zidane:

toolResultToText(content) — flatten for logging.
toolOutputByteLength(content) — same formula as outputBytes.
validateToolArgs(input, schema) — the loop's validator.

Testing & Benchmarks

bun test

1000+ tests with mock provider + execution context. Under 2 s; no API keys or Docker.

Benchmark harnesses live in benchmarks/. First integration: Terminal-Bench.

Docs

| Doc | What | |---|---| | docs/ARCHITECTURE.md | Mermaid diagrams of the agent loop, tool execution, hook firing order, dependency graph. | | docs/SKILL.md | Authoritative reference for createAgent + hooks + tools + sessions + skills + MCP. Long-form. | | docs/CHAT.md | zidane/chat — renderer-agnostic chat engine (auth, sessions, safe-mode, settings, theme). Consumed by the TUI and any future GUI. | | docs/TUI.md | zidane/tui — terminal shell built on zidane/chat. runTui(), screens, modals, keyboard, theming. |

License

ISC