clavue-agent-sdk
v2.2.0
Published
In-process TypeScript agent runtime with multi-agent graph DSL, live tracing, 4-scope guardrails, capability-token sandbox, RAG, framework-agnostic Generative UI, voice adapters, worker_thread subagent isolation, Anthropic prompt caching, streaming, and s
Maintainers
Keywords
Readme
Clavue Agent SDK
Production-grade TypeScript agent runtime. Embed run(), query(), or
createAgent() directly in your Node.js process — no daemon, no subprocess,
no local CLI dependency. Multi-provider (Anthropic + OpenAI-compatible),
streaming-first, prompt-cache-aware, with a v3 capability layer that ships
multi-agent graph, live tracing, 4-scope guardrails, capability-token sandbox,
RAG, framework-agnostic Generative UI, and provider-agnostic voice.
生产级 TypeScript agent runtime。在你的 Node.js 进程内直接运行
run() / query() / createAgent() —— 无守护进程、无子进程、无本地 CLI 依赖。
多 provider(Anthropic + OpenAI 兼容),流式优先,自动 prompt cache,
v3 能力层提供:多 agent graph、实时 tracing、4 scope guardrails、
capability-token sandbox、RAG、框架无关的 Generative UI 与 provider 无关的
voice 接入。
npm install clavue-agent-sdk # latest = 1.0.5Also available in Go: clavue-agent-sdk-go
What's in 1.0.3 / 1.0.3 速览
The 1.0.x line closes the 17-item v2 audit, ships the v3 seven-axis
capability layer (1.0.1), and adds the Tier A performance landings plus the
Tier B list-cache family (1.0.3). Hard numbers (reproduce with npm run
bench):
1.0.x 关闭了 v2 审计的 17 项问题,发布 v3 七轴能力层(1.0.1),并在 1.0.3
补齐 Tier A 性能项与 Tier B list-cache 家族。下面这些数字都可以用
npm run bench 复现:
| Metric / 指标 | 0.7.5 baseline | 1.0.3 | Δ |
|---|---:|---:|---:|
| src/engine.ts lines | 1537 | 1041 | -32% |
| Engine helper modules | 0 | 15 | +15 files |
| Tests passing | 272 | 720 | +165% |
| Token estimator content classes separated | 1 (constant 0.25) | 4 (0.25 / 0.33 / 0.39 / 0.63) | — |
| Worker_thread spawn p50 | n/a (stub) | ~60 ms | runtime real |
| Worker_thread preflight abort p95 | n/a | <0.1 ms | short-circuit |
| listMemories / listAgentJobs / listSessions / listIssueWorkflowRuns warm calls | O(N) readdir + readFile | O(0) (cached, invalidated on write) | new in 1.0.3 |
See docs/v2_benchmark_report.md for the
v2 methodology and raw output, and
docs/tier-a-summary.md for the 1.0.3 Tier A /
Tier B landings (per-item goal, files, public API, trace surface, tests).
Headline capabilities at 1.0.3
- Streaming first.
client.messages.stream(...)for Anthropic, SSE for OpenAI Chat. SetincludePartialMessages: trueand consumepartial_messageevents for sub-second first-token latency in your UI. - Prompt caching on by default. Anthropic
cache_control: ephemeralis applied tosystemand the trailing tool — multi-turn input cost drops 70-80% on cache hits, with no code change required from hosts. - Real subagent isolation.
runtime: 'worker_thread'spawns a fresh V8 isolate per subagent. Pre-flight abort short-circuit, hard timeout, curated env forwarding, no shared registries. - Structured outputs that actually work.
outputSchemasynthesizes an_outputtool withtool_choice(Anthropic) orresponse_format: { type: 'json_schema' }(OpenAI). The 0.7.xjsonSchemafield was wired but never read; in 1.0.1 it's a deprecated alias that's actually plumbed through. - Real issue-workflow loop.
runIssueWorkflowWithAgent(input)now drives a build → verify → review → fix loop with an Agent and aVerifieryou supply (defaultCommandVerifiershellsnpm test/npm run lint). The legacyrunIssueWorkflowis kept for back-compat but only records what the host evaluates. - Subpath imports.
clavue-agent-sdk/core,/tools,/contracts,/workflow,/retro,/testing, plus the v3 axes (/graph,/guardrails,/tracing,/sandbox,/rag,/genui,/voice). Root barrel still re-exports everything for back-compat. - Honest deprecations.
jsonSchema,maxThinkingTokens,runIssueWorkflow, and thecostresult field have ≥90 days of overlap before removal in 1.1.0. Seedocs/v1_to_v2_migration.md.
v3 capability layer / v3 能力层
Every axis is exercised by tests in this repo. Line numbers below point at live source so you can verify, not just trust the README.
每个能力轴都有测试覆盖,下表行号指向源码本身。
| Capability / 能力 | Where it lives / 实现位置 | Tests | Example |
|---|---|---|---|
| Multi-agent graph DSL (6 node kinds: agent / verifier / router / parallel / human / retriever) | src/graph/runtime.ts, src/graph/types.ts | tests/graph.test.ts (+ retriever) | examples/19-graph-dsl.ts, 28-rag-graph.ts |
| 4-scope guardrails (input / output / tool_input / tool_output) with 'abort' \| 'skip' \| 'continue' policy hook | src/guardrails/runtime.ts, src/engine.ts | tests/guardrails.test.ts (×3) | examples/20-guardrails.ts |
| Live tracing + replay + OTel SDK bridge | src/tracing/runtime.ts, src/tracing/otel-shim.ts | tests/tracing*.test.ts (×3) | examples/21-tracing-replay.ts, 27-trace-exporter.ts, 29-otel-shim.ts |
| Capability tokens + sandbox primitives | src/sandbox/runtime.ts | tests/sandbox.test.ts | examples/22-capability-tokens.ts |
| RAG: RetrieverInterface + InMemoryRetriever + PgvectorRetriever + retriever graph node | src/rag/runtime.ts, src/rag/pgvector.ts | tests/rag*.test.ts (×3) | examples/23-rag-retriever.ts, 28-rag-graph.ts |
| Framework-agnostic Generative UI stream (UiStreamSink / UiStreamSource) | src/genui/index.ts | tests/genui.test.ts | examples/24-generative-ui.ts |
| Provider-agnostic voice (ASR + TTS) with Deepgram / Whisper-OpenAI / ElevenLabs stub adapters | src/voice/runtime.ts, src/voice/adapters.ts | tests/voice*.test.ts (×2) | examples/25-voice.ts, 30-voice-adapters.ts |
Run any axis offline with no API key:
npx tsx examples/19-graph-dsl.ts # 6-node-kind graph
npx tsx examples/20-guardrails.ts # 4-scope guardrails
npx tsx examples/21-tracing-replay.ts # TraceStore + replay
npx tsx examples/22-capability-tokens.ts # sandbox capability tokens
npx tsx examples/23-rag-retriever.ts # InMemoryRetriever
npx tsx examples/24-generative-ui.ts # framework-agnostic GenUI stream
npx tsx examples/27-trace-exporter.ts # console + JSONL exporters
npx tsx examples/28-rag-graph.ts # retriever node kind
npx tsx examples/29-otel-shim.ts # OtelTraceExporter bridge
npx tsx examples/30-voice-adapters.ts # Deepgram / Whisper / ElevenLabs (stub fetch)
npx tsx examples/32-tier-a-performance.ts # tool cache + adaptive concurrency + fallback chain + memory consolidationexamples/31-worker-thread-subagent.ts needs a real API key (it spawns a
Worker that constructs its own Agent from env credentials):
CLAVUE_AGENT_API_KEY=... npx tsx examples/31-worker-thread-subagent.ts详细用法见 docs/USAGE.md(每个能力轴:何时用 / 如何用 / 示例 / 注意事项)。
Documentation map / 文档地图
Get started
docs/USAGE.md— feature-by-feature usage guide (when, how, gotchas) for every v3 capability + worker_thread + caching- structured outputs.
docs/programmatic-integration-guide.md— embedding patterns: services, CI, workers, internal platforms.docs/desktop-im-archiver-integration.md— handbook for embedding the SDK into a desktop app that archives WeChat / Feishu / QQ / DingTalk messages via clipboard + screenshots (no client internals touched). Pairs withexamples/33-im-archiver.ts.docs/desktop-tools-integration.md— opt-inclavue-agent-sdk/desktopsubpath: clipboard, screen capture, file search, open folders/apps/URLs, draft emails, system notifications. Per-app recipes for WeChat / Lark / Office / WPS / Gmail. Pairs withexamples/34-desktop-tools.ts.
Upgrade & contracts
docs/v1_to_v2_migration.md— upgrade path from 0.7.x to 1.0.x with deprecation table.docs/v2_audit_report.md— 17-item audit of 0.7.5 with reconciliation showing what's fixed in 1.0.1.docs/v2_benchmark_report.md— every improvement paired with a reproducible measurement.docs/tier-a-summary.md— Tier A performance landings (tool-result cache, adaptive concurrency, fallback chain,prompt_cache_key, memory list cache, memory consolidation) and the Tier B list-cache family (listMemories,listAgentJobs,listSessions,listIssueWorkflowRuns).
Architecture & roadmap
docs/v2_architecture.md— turn-pipeline target design (final pipeline lands in 1.1.x).docs/v3_rfc.md— capability layer RFC and decision log for the seven axes.docs/v2_v3_v4_upgrade_chain.md— full upgrade chain through the v4 platform layer.docs/v2_roadmap.md— milestone breakdown (M0..M7) with benchmark SLOs and breaking-change overlap windows.
Why Clavue / 为什么选择 Clavue
Library-first agent runtime: embed
run(),query(), orcreateAgent()directly in Node.js services, CI, workers, web backends, and internal platforms. No subprocess.Production controls baked in: named toolsets, allow/deny filters, hooks, koa-style middleware, permission modes, workspace path guards, schema-versioned events, policy traces, quality gates, and budget controls.
Durable workflow contracts: background AgentJobs, real
runIssueWorkflowWithAgentloop,WORKFLOW.mdparsing, proof-of-work artifacts, orchestration policy helpers, runtime namespaces, session persistence, memory injection, self-improvement capture, retro/eval loops.Provider portability: Anthropic Messages and OpenAI-compatible providers share the same tool, memory, event, and result contracts. Models from third-party gateways (OpenRouter, etc.) are first-class.
Honest measurements:
npm run benchproduces reproducible numbers. Every README claim points at code or a measurement, not marketing.库优先的 agent runtime: 直接在 Node.js 服务、CI、worker、Web 后端 和内部平台里嵌入
run()/query()/createAgent()。无子进程。生产级控制内建: 命名工具集、allow/deny、hooks、koa 风格 middleware、 权限模式、workspace 路径防护、schema-versioned 事件、policy trace、 quality gates 与预算控制。
持久化工作流契约: background AgentJobs、真实的
runIssueWorkflowWithAgent闭环、WORKFLOW.md解析、proof-of-work artifact、orchestration policy helper、runtime namespace、session persistence、memory 注入、self-improvement、retro/eval。多 provider 可移植: Anthropic Messages 与 OpenAI 兼容 provider 共用同一套工具、记忆、事件与结果协议;第三方 gateway(OpenRouter 等) 是一等公民。
fallbackModel支持单字符串或有序数组,在主 provider 失败时按链路依次尝试(如 GPT → Claude → GLM)。诚实的可量化能力:
npm run bench输出可复现数字。README 里每条 断言都指向源码或测量结果,不是营销话术。
Quick start / 快速开始
Use directly with npx / 直接用 npx 运行
No local install is required for quick automation from a terminal or CI job.
终端或 CI 里可以直接用 npx 运行,不需要先安装到项目里。
export CLAVUE_AGENT_API_KEY=your-api-key
npx clavue-agent-sdk "Read package.json and summarize this project"
# Safer read-only review / 更安全的只读审查
npx clavue-agent-sdk "Review src for obvious bugs" --toolset repo-readonly
# Combine named toolsets / 组合命名工具集
npx clavue-agent-sdk "Research and review this repo" --toolset repo-readonly,research
# OpenAI-compatible model / OpenAI 兼容模型
npx clavue-agent-sdk \
--api-type openai-completions \
--model gpt-5.4 \
--base-url https://api.openai.com/v1 \
"Explain the repository structure"
# Opt-in run learning / 可选开启 run 自学习
npx clavue-agent-sdk \
--self-improvement \
--allow Read,Glob,Grep \
"Review package.json for release readiness risks"
# Or enable it from CI/env / 也可以通过 CI/env 开启
CLAVUE_AGENT_SELF_IMPROVEMENT=true \
npx clavue-agent-sdk --allow Read,Glob,Grep "Review package.json"CLI options: --prompt, --model, --api-type, --api-key, --base-url, --cwd, --max-turns, --autonomy, --permission-mode, --allow, --toolset, --deny, --self-improvement, --json. Issue subcommand only: --max-iterations, --passing-score, --require-gate.
Environment variables: CLAVUE_AGENT_API_KEY, CLAVUE_AGENT_AUTH_TOKEN, CLAVUE_AGENT_API_TYPE, CLAVUE_AGENT_MODEL, CLAVUE_AGENT_BASE_URL, CLAVUE_AGENT_AUTONOMY, CLAVUE_AGENT_PERMISSION_MODE, CLAVUE_AGENT_SELF_IMPROVEMENT, AGENT_SDK_MAX_TOOL_CONCURRENCY.
命令行参数:--prompt、--model、--api-type、--api-key、--base-url、--cwd、--max-turns、--autonomy、--permission-mode、--allow、--toolset、--deny、--self-improvement、--json。仅 issue 子命令:--max-iterations、--passing-score、--require-gate。
环境变量:CLAVUE_AGENT_API_KEY、CLAVUE_AGENT_AUTH_TOKEN、CLAVUE_AGENT_API_TYPE、CLAVUE_AGENT_MODEL、CLAVUE_AGENT_BASE_URL、CLAVUE_AGENT_AUTONOMY、CLAVUE_AGENT_PERMISSION_MODE、CLAVUE_AGENT_SELF_IMPROVEMENT、AGENT_SDK_MAX_TOOL_CONCURRENCY。
Best practices / 最佳使用实践
Pick the right integration mode / 选择合适的集成方式
Use
npx clavue-agent-sdk ...for quick terminal automation, CI checks, and one-off repository analysis.Use
run()for backend jobs where you want one prompt in, one typedAgentRunResultout.Use
query()for streaming UIs, logs, dashboards, and integrations that need live assistant/tool events.Use
createAgent()for long-lived apps that need multi-turn state, sessions, hooks, MCP servers, custom subagents, or repeated prompts.快速终端自动化、CI 检查、一次性仓库分析:使用
npx clavue-agent-sdk ...。后端任务只需要“一次输入、一次结构化结果”:使用
run()。前端 UI、日志面板、实时事件流:使用
query()。长生命周期应用、多轮会话、hooks、MCP、自定义 subagent 或重复调用:使用
createAgent()。
Start narrow, then expand tools / 先收窄权限,再逐步扩展工具
Prefer the smallest tool surface that can complete the task. Start with read-only tools for review and analysis, then add write or shell tools only when the workflow needs them.
优先使用能完成任务的最小工具权限。审查和分析先从只读工具开始,只有在确实需要修改文件或执行命令时再增加写入或 shell 工具。
# Read-only repository review / 只读仓库审查
npx clavue-agent-sdk "Review this repo for release risks" \
--toolset repo-readonly \
--max-turns 6
# Focused code change with explicit tools / 明确授权工具的定向修改
npx clavue-agent-sdk "Fix the failing package payload test" \
--allow Read,Glob,Grep,Edit,Bash \
--permission-mode trustedAutomation \
--autonomy autonomous \
--max-turns 10
# Safer low-confirmation local edits / 更安全的低确认本地编辑
npx clavue-agent-sdk "Update usage docs and run verification" \
--toolset repo-edit \
--permission-mode acceptEdits \
--autonomy autonomous \
--max-turns 8
# CI-friendly JSON output / 适合 CI 的 JSON 输出
npx clavue-agent-sdk "Check whether package.json is release-ready" \
--toolset repo-readonly \
--jsonSet cwd, model, and budgets explicitly / 显式设置 cwd、模型和预算
For automation, set cwd, model, maxTurns, and tool permissions explicitly so runs are reproducible and bounded.
自动化场景建议显式设置 cwd、model、maxTurns 和工具权限,让运行结果更可复现、成本和轮次更可控。
import { run } from "clavue-agent-sdk";
const result = await run({
prompt: "Review the package for publish-readiness and return concise findings.",
options: {
cwd: process.cwd(),
model: "claude-sonnet-4-6",
toolsets: ["repo-readonly"],
maxTurns: 6,
},
});
if (result.status !== "completed") {
throw new Error(result.errors?.join("\n") || result.subtype);
}
console.log(result.text);Use structured outputs in automation / 自动化中使用结构化结果
In CI or services, prefer run() or CLI --json instead of scraping assistant text from stdout. Check status, subtype, errors, usage, and total_cost_usd before deciding whether a job passed.
在 CI 或服务端集成里,优先使用 run() 或 CLI --json,不要依赖解析普通文本输出。根据 status、subtype、errors、usage 和 total_cost_usd 判断任务是否成功。
Enforce production controls / 启用生产控制能力
For production hosts, combine narrow toolsets, permissionMode, qualityGatePolicy, memory policy, doctor(), and runBenchmarks() instead of relying only on prompt instructions.
生产宿主应组合使用最小工具集、permissionMode、qualityGatePolicy、memory policy、doctor() 和 runBenchmarks(),不要只依赖 prompt 约束。
import { doctor, run, runBenchmarks } from "clavue-agent-sdk";
const health = await doctor({
toolsets: ["repo-readonly"],
memory: { enabled: true },
});
if (health.status === "error") throw new Error("SDK runtime is not ready");
const result = await run({
prompt: "Review the current package and report release blockers.",
options: {
toolsets: ["repo-readonly"],
permissionMode: "default",
memory: { enabled: true, policy: { mode: "brainFirst" } },
quality_gates: [{ name: "release-review", status: "passed" }],
qualityGatePolicy: { required: ["release-review"] },
maxTurns: 6,
},
});
if (result.subtype === "error_quality_gate_failed") {
throw new Error(result.errors?.join("\n") || "Required quality gate failed");
}
const benchmarks = await runBenchmarks({ iterations: 3 });
console.log(benchmarks.metrics);Current memory trace records policy, query, repo path, selected memory IDs, selected memory score/reason metadata, source/scope/confidence, validation state, retrieval steps, injected count, and whether retrieval happened before the first model call.
当前 memory trace 会记录 policy、query、repo path、selected memory IDs、被选记忆的分数和原因、source/scope/confidence、validation state、retrieval steps、injected count,以及是否在首次模型调用前完成检索。
The current capability upgrade program is tracked in docs/agent-sdk-capability-upgrade-program.md. It expands the SDK beyond coding automation into collection, organization, planning, problem solving, memory intelligence, skill creation, self-learning, reusable agents, and workflow templates.
当前能力升级计划见 docs/agent-sdk-capability-upgrade-program.md。它会把 SDK 从代码自动化扩展到资料收集、整理、规划、问题解决、记忆智能、技能创建、自学习、可复用 agent 和工作流模板。
Keep prompts operational / 让 Prompt 面向执行
Good prompts specify the goal, boundaries, expected output format, and verification command. Avoid broad prompts that mix unrelated work.
好的 prompt 应包含目标、边界、期望输出格式和验证命令。避免把多个无关任务混在一个过大的 prompt 里。
Good: Review src/providers/openai.ts for cancellation bugs. Do not edit files. Return findings with file:line references.
Good: Update README quick-start examples only. Run npm run build after editing.
Avoid: Make the project better.Recommended production pattern / 推荐生产集成模式
Store credentials in environment variables, not source code.
Pin
CLAVUE_AGENT_MODELor passmodelin code for predictable behavior.Use
allowedToolsortoolsetsfor every automated workflow.Set
maxTurnsfor bounded execution.Log the final
AgentRunResultmetadata:status,subtype,num_turns,usage,duration_ms, andtotal_cost_usd.Enable
selfImprovementonly for workflows where persisting run lessons is expected.Close reusable agents with
await agent.close()so sessions, MCP connections, and memory hooks flush cleanly.凭证放在环境变量中,不要写进源码。
通过
CLAVUE_AGENT_MODEL或代码里的model固定模型,保证行为可预测。每个自动化流程都设置
allowedTools或toolsets。设置
maxTurns,避免无界运行。记录
AgentRunResult元数据:status、subtype、num_turns、usage、duration_ms、total_cost_usd。只有在确实希望持久化运行经验时才开启
selfImprovement。可复用 agent 使用完后调用
await agent.close(),确保 session、MCP 连接和 memory hooks 正常收尾。
Common recipes / 常用方法
# Explain a repository / 解释仓库结构
npx clavue-agent-sdk "Explain this repository architecture" --toolset repo-readonly
# Review a pull-request checkout / 审查当前 PR 工作区
npx clavue-agent-sdk "Review the current diff for bugs and release risks" --toolset repo-readonly
# Generate a machine-readable report / 生成机器可读报告
npx clavue-agent-sdk "Return JSON listing package release blockers" --toolset repo-readonly --json1. Install as a library / 作为库安装
npm install clavue-agent-sdk2. Configure / 配置
Set the environment variables once, then start using the SDK immediately.
先设置环境变量,然后就可以直接开始调用 SDK。
export CLAVUE_AGENT_API_KEY=your-api-key
# Optional / 可选
# export CLAVUE_AGENT_MODEL=claude-sonnet-4-6OpenAI-compatible setup / OpenAI 兼容模型配置
export CLAVUE_AGENT_API_TYPE=openai-completions
export CLAVUE_AGENT_API_KEY=sk-...
export CLAVUE_AGENT_BASE_URL=https://api.openai.com/v1
export CLAVUE_AGENT_MODEL=gpt-4oAnthropic-compatible gateway setup / Anthropic 兼容网关配置
export CLAVUE_AGENT_BASE_URL=https://openrouter.ai/api
export CLAVUE_AGENT_API_KEY=sk-or-...
export CLAVUE_AGENT_MODEL=anthropic/claude-sonnet-43. Easiest integration for another program / 其他程序最简单集成方式
If another Node.js service just needs one clear call, use run(). It creates an agent, executes the prompt, closes the agent, and returns a complete typed artifact.
如果其他 Node.js 服务只想用最简单的一次调用,使用 run()。它会创建 agent、执行 prompt、关闭 agent,并返回完整的类型化结果。
import { run } from "clavue-agent-sdk";
const result = await run({
prompt: "Read package.json and return the name and version as JSON.",
options: {
cwd: process.cwd(),
allowedTools: ["Read"],
maxTurns: 3,
},
});
if (result.status !== "completed") {
throw new Error(result.errors?.join("\n") || result.subtype);
}
console.log(result.text);run() returns AgentRunResult: status, subtype, final text, events, messages, usage, num_turns, duration_ms, duration_api_ms, total_cost_usd, timestamps, optional errors, and optional self_improvement artifacts when enabled.
run() 返回 AgentRunResult:包含 status、subtype、最终 text、events、messages、usage、num_turns、耗时、费用、时间戳、可选 errors,以及启用时返回的可选 self_improvement 结果。
4. Streaming events / 流式事件
Use query() when your program wants live events: assistant text, tool calls, tool results, and the final result.
当你的程序需要实时事件流时使用 query():包括 assistant 文本、工具调用、工具结果和最终结果。
import { query } from "clavue-agent-sdk";
for await (const message of query({
prompt: "Read package.json and tell me the project name.",
options: {
allowedTools: ["Read", "Glob"],
},
})) {
if (message.type === "assistant") {
for (const block of message.message.content) {
if ("text" in block) console.log(block.text);
}
}
if (message.type === "result") {
console.log(`Done in ${message.num_turns} turns`);
}
}Partial-message streaming (live text deltas) / 字符级流式(TTFT 体感)
Pass includePartialMessages: true to receive partial_message events with each text delta as the model emits it — useful for character-by-character UI rendering. The aggregated assistant event still arrives after all partials. Defaults to false so existing callers are unaffected.
传 includePartialMessages: true 即可拿到模型逐 token 返回的 partial_message 事件,适合做字符级 UI 渲染。最终聚合的 assistant 事件仍然在所有 partial 之后到达。默认关闭,不影响现有调用方。
import { createAgent } from "clavue-agent-sdk";
const agent = createAgent({
model: "claude-sonnet-4-6",
includePartialMessages: true,
});
for await (const event of agent.query("Count from 1 to 5.")) {
if (event.type === "partial_message" && event.partial?.type === "text") {
process.stdout.write(event.partial.text); // live deltas
}
if (event.type === "assistant") {
process.stdout.write("\n"); // aggregated message arrives last
}
}See examples/18-streaming.ts for a runnable demo.
5. Reusable agent / 可复用 Agent
Use createAgent() when your application needs multi-turn state, session persistence, MCP connections, hooks, or repeated calls.
当你的应用需要多轮上下文、会话持久化、MCP 连接、hooks 或重复调用时,使用 createAgent()。
import { createAgent } from "clavue-agent-sdk";
const agent = createAgent({ model: "claude-sonnet-4-6" });
try {
const result = await agent.prompt("What files are in this project?");
console.log(result.text);
console.log(
`Turns: ${result.num_turns}, Tokens: ${result.usage.input_tokens + result.usage.output_tokens}`,
);
} finally {
await agent.close();
}6. OpenAI / GPT models
import { createAgent } from "clavue-agent-sdk";
const agent = createAgent({
apiType: "openai-completions",
model: "gpt-4o",
apiKey: "sk-...",
baseURL: "https://api.openai.com/v1",
});
const result = await agent.prompt("What files are in this project?");
console.log(result.text);The apiType is auto-detected from model name — models containing gpt-, o1, o3, o4, deepseek, qwen, glm, grok, kimi, moonshot, gemini, mistral, gemma, yi-, etc. automatically use openai-completions.
apiType 也可以根据模型名自动推断:包含 gpt-、o1、o3、o4、deepseek、qwen、glm、grok、kimi、moonshot、gemini、mistral、gemma、yi- 等关键字时,会自动选择 openai-completions。
7. Web demo / Web 演示
npm run web
# Open http://localhost:8081Use this when you want a fast local sandbox for prompt-tool behavior and event streaming.
如果你想快速验证 prompt、tool 调用和事件流,这个本地 Web 演示是最快的入口。
More examples / 更多示例
Multi-turn conversation
import { createAgent } from "clavue-agent-sdk";
const agent = createAgent({ maxTurns: 5 });
const r1 = await agent.prompt(
'Create a file /tmp/hello.txt with "Hello World"',
);
console.log(r1.text);
const r2 = await agent.prompt("Read back the file you just created");
console.log(r2.text);
console.log(`Session messages: ${agent.getMessages().length}`);Custom tools (Zod schema)
import { z } from "zod";
import { query, tool, createSdkMcpServer } from "clavue-agent-sdk";
const getWeather = tool(
"get_weather",
"Get the temperature for a city",
{ city: z.string().describe("City name") },
async ({ city }) => ({
content: [{ type: "text", text: `${city}: 22°C, sunny` }],
}),
);
const server = createSdkMcpServer({ name: "weather", tools: [getWeather] });
for await (const msg of query({
prompt: "What is the weather in Tokyo?",
options: { mcpServers: { weather: server } },
})) {
if (msg.type === "result")
console.log(`Done: $${msg.total_cost_usd?.toFixed(4)}`);
}Custom tools (low-level)
import {
createAgent,
getAllBaseTools,
defineTool,
} from "clavue-agent-sdk";
const calculator = defineTool({
name: "Calculator",
description: "Evaluate a math expression",
inputSchema: {
type: "object",
properties: { expression: { type: "string" } },
required: ["expression"],
},
isReadOnly: true,
async call(input) {
const result = Function(`'use strict'; return (${input.expression})`)();
return `${input.expression} = ${result}`;
},
});
const agent = createAgent({ tools: [...getAllBaseTools(), calculator] });
const r = await agent.prompt("Calculate 2**10 * 3");
console.log(r.text);Skills
Skills are reusable executable workflows that extend agent capabilities. Bundled skills include coding/review helpers such as simplify, commit, review, debug, and test, plus lifecycle workflows such as define, plan, build, verify, workflow-review, ship, and repair.
import {
createAgent,
registerSkill,
getAllSkills,
} from "clavue-agent-sdk";
// Register a custom skill
registerSkill({
name: "explain",
description: "Explain a concept in simple terms",
userInvocable: true,
async getPrompt(args) {
return [
{
type: "text",
text: `Explain in simple terms: ${args || "Ask what to explain."}`,
},
];
},
});
console.log(`${getAllSkills().length} skills registered`);
// The model can invoke skills via the Skill tool
const agent = createAgent();
const result = await agent.prompt('Use the "explain" skill to explain git rebase');
console.log(result.text);Skills can also run in a forked subagent context by setting context: "fork". Forked skills create durable background AgentJobs, inherit the parent provider and permission policy, apply skill-level model and allowedTools, and preserve the subagent trace, evidence, and quality_gates on the final job record.
import {
SkillTool,
getAgentJob,
registerAgents,
registerSkill,
} from "clavue-agent-sdk";
registerAgents({
reviewer: {
description: "Specialized review agent",
prompt: "Review carefully and produce concise findings.",
tools: ["Read", "Glob", "Grep"],
},
}, { runtimeNamespace: "docs-forked-skill" });
registerSkill({
name: "deep-review",
description: "Run a durable background code review",
context: "fork",
agent: "reviewer",
allowedTools: ["Read", "Glob", "Grep"],
model: "gpt-5.4",
userInvocable: true,
async getPrompt(args) {
return [{ type: "text", text: `Review this target: ${args}` }];
},
}, { runtimeNamespace: "docs-forked-skill" });
const result = await SkillTool.call(
{ skill: "deep-review", args: "src/agent.ts" },
{
cwd: process.cwd(),
runtimeNamespace: "docs-forked-skill",
model: "gpt-5.4",
provider,
},
);
const { job_id } = JSON.parse(String(result.content));
const job = await getAgentJob(job_id, { runtimeNamespace: "docs-forked-skill" });
console.log(job?.status, job?.trace, job?.evidence, job?.quality_gates);Self-improvement memory
Enable selfImprovement when you want each structured run to capture reusable operational lessons for future runs. It is opt-in and stores bounded improvement memories after Agent.run() / top-level run() completes.
import { createAgent, queryMemories } from "clavue-agent-sdk";
const agent = createAgent({
cwd: process.cwd(),
memory: {
enabled: true,
autoInject: true,
repoPath: process.cwd(),
},
selfImprovement: {
memory: {
repoPath: process.cwd(),
maxEntriesPerRun: 4,
},
},
});
try {
const run = await agent.run("Verify the package release is ready.");
console.log(run.self_improvement?.savedMemories.length ?? 0);
const lessons = await queryMemories({
repoPath: process.cwd(),
type: "improvement",
text: "package release verification",
limit: 5,
});
console.log(lessons.map((lesson) => lesson.title));
} finally {
await agent.close();
}By default this captures failed tool-result signals and terminal run failures. Successful run patterns are only saved when selfImprovement.memory.captureSuccessfulRuns is explicitly enabled. Captured text is trimmed, common API keys and bearer tokens are redacted, and future runs must still verify current repo state before applying a remembered lesson.
默认只捕获工具失败信号和 run 终态失败;只有显式设置 captureSuccessfulRuns 时才会记录成功模式。记录内容会裁剪并脱敏常见 API key / bearer token,未来 run 使用这些经验前仍需要验证当前仓库状态。
You can combine run learning with the deterministic retro/eval cycle, and optionally allow a bounded retry loop guarded by verification gates:
const run = await agent.run("Improve this SDK safely.", {
selfImprovement: {
memory: { repoPath: process.cwd() },
retro: {
enabled: true,
targetName: "clavue-agent-sdk",
gates: [
{ name: "build", command: "npm", args: ["run", "build"] },
{ name: "test", command: "npm", args: ["test"] },
],
loop: {
enabled: true,
maxAttempts: 3,
retryPrompt: "Fix the highest-priority verified issue, then stop.",
},
},
},
});
console.log(run.self_improvement?.retroLoop?.summary.completedAttempts);
console.log(run.self_improvement?.retroCycle?.summary.statusLine);Nested retry runs automatically disable nested selfImprovement capture to keep the loop bounded. retroCycle always points at the final cycle for compatibility; retroLoop contains every cycle and retry lineage when loop mode is enabled.
Exported helpers: extractRunImprovementCandidates(run, config, options) for dry-run extraction and runSelfImprovement(run, config, options) for direct persistence/retro orchestration.
Retro / eval core
Run a deterministic engine-level evaluation loop and get structured findings, scores, and upgrade workstreams. createDefaultRetroEvaluators() inspects package/import/build/test/onboarding readiness across the four core dimensions:
import {
createDefaultRetroEvaluators,
runRetroEvaluation,
} from "clavue-agent-sdk";
const evaluators = createDefaultRetroEvaluators();
const result = await runRetroEvaluation({
target: { name: "my-project", cwd: process.cwd() },
evaluators,
});
console.log(result.scores.overall.score);
console.log(result.proposed_workstreams);Run the full retro cycle in one call:
import {
createDefaultRetroEvaluators,
runRetroCycle,
} from "clavue-agent-sdk";
const cycle = await runRetroCycle({
target: { name: "my-project", cwd: process.cwd() },
evaluators: createDefaultRetroEvaluators(),
gates: [
{ name: "build", command: "npm", args: ["run", "build"] },
{ name: "test", command: "npm", args: ["test"] },
],
runId: "run-current",
previousRunId: "run-previous",
policy: { maxAttempts: 3 },
});
console.log(cycle.run.summary);
console.log(cycle.verification?.summary);
console.log(cycle.action.kind);
console.log(cycle.decision.disposition); // accepted | rejected | retry
console.log(cycle.summary.statusLine);
console.log(cycle.summary.text);Or use the built-in defaults with just a target:
import { runRetroCycle } from "clavue-agent-sdk";
const cycle = await runRetroCycle({
target: { name: "my-project", cwd: process.cwd() },
});
console.log(cycle.verification?.gates.map((gate) => gate.name)); // ["build", "test"]Persist a run for later comparison:
import {
compareRetroRuns,
loadRetroCycle,
loadRetroRun,
saveRetroCycle,
saveRetroRun,
} from "clavue-agent-sdk";
await saveRetroRun("run-2026-04-14", result);
await saveRetroCycle("cycle-2026-04-14", cycle);
const previous = await loadRetroRun("run-2026-04-13");
const previousCycle = await loadRetroCycle("cycle-2026-04-13");
if (previous) {
const drift = compareRetroRuns(previous, result);
console.log(drift.scoreDeltas.overall.delta);
console.log(drift.newFindings);
}
console.log(previousCycle?.decision.disposition);Run fixed quality gates before or after a retro pass:
import { runRetroVerification } from "clavue-agent-sdk";
const verification = await runRetroVerification({
target: { name: "my-project", cwd: process.cwd() },
gates: [
{ name: "build", command: "npm", args: ["run", "build"] },
{ name: "test", command: "npm", args: ["test"] },
],
});
console.log(verification.passed);
console.log(verification.gates);Decide the next machine action from retro state:
import {
compareRetroRuns,
decideRetroAction,
loadRetroRun,
runRetroEvaluation,
runRetroVerification,
saveRetroRun,
} from "clavue-agent-sdk";
const verification = await runRetroVerification({
target: { name: "my-project", cwd: process.cwd() },
});
const current = await runRetroEvaluation({
target: { name: "my-project", cwd: process.cwd() },
evaluators,
});
const previous = await loadRetroRun("run-previous");
const comparison = previous ? compareRetroRuns(previous, current) : undefined;
const action = decideRetroAction({
run: current,
verification,
previousRun: previous ?? undefined,
comparison,
attemptCount: 0,
policy: { maxAttempts: 3 },
});
await saveRetroRun("run-current", current);
console.log(verification.summary);
console.log(action.kind);Hooks (lifecycle events)
import { createAgent, createHookRegistry } from "clavue-agent-sdk";
const hooks = createHookRegistry({
PreToolUse: [
{
handler: async (input) => {
console.log(`About to use: ${input.toolName}`);
// Return { block: true } to prevent tool execution
},
},
],
PostToolUse: [
{
handler: async (input) => {
console.log(`Tool ${input.toolName} completed`);
},
},
],
});20 lifecycle events: PreToolUse, PostToolUse, PostToolUseFailure, SessionStart, SessionEnd, Stop, SubagentStart, SubagentStop, UserPromptSubmit, PermissionRequest, PermissionDenied, TaskCreated, TaskCompleted, ConfigChange, CwdChanged, FileChanged, Notification, PreCompact, PostCompact, TeammateIdle.
MCP server integration
import { createAgent } from "clavue-agent-sdk";
const agent = createAgent({
mcpServers: {
filesystem: {
command: "npx",
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
},
},
});
const result = await agent.prompt("List files in /tmp");
console.log(result.text);
await agent.close();Subagents
import { query } from "clavue-agent-sdk";
for await (const msg of query({
prompt: "Use the code-reviewer agent to review src/index.ts",
options: {
agents: {
"code-reviewer": {
description: "Expert code reviewer",
prompt: "Analyze code quality. Focus on security and performance.",
tools: ["Read", "Glob", "Grep"],
},
},
},
})) {
if (msg.type === "result") console.log("Done");
}Durable background AgentJobs
Use AgentTool with run_in_background: true when a subagent should continue without blocking the parent turn. The tool returns a durable job envelope immediately:
{
"success": true,
"type": "clavue.agent.job",
"version": 1,
"job_id": "agent_job_...",
"status": "queued"
}The job is persisted under the current runtime namespace, stores final output, trace, evidence, quality gates, errors, and heartbeat status, and can be inspected or cancelled through tools or SDK APIs.
import {
AgentTool,
AgentJobListTool,
AgentJobGetTool,
AgentJobStopTool,
getAgentJob,
listAgentJobs,
} from "clavue-agent-sdk";
const context = {
cwd: process.cwd(),
runtimeNamespace: "docs-background-demo",
model: "gpt-5.4",
provider,
};
const started = await AgentTool.call({
prompt: "Review src/ for security risks.",
description: "security review",
subagent_type: "Explore",
run_in_background: true,
}, context);
const { job_id } = JSON.parse(String(started.content));
console.log(await listAgentJobs({ runtimeNamespace: context.runtimeNamespace }));
console.log(await getAgentJob(job_id, { runtimeNamespace: context.runtimeNamespace }));
await AgentJobListTool.call({}, context);
await AgentJobGetTool.call({ id: job_id }, context);
await AgentJobStopTool.call({ id: job_id, reason: "no longer needed" }, context);Exported helpers include createAgentJob(), getAgentJob(), listAgentJobs(), stopAgentJob(), clearAgentJobs(), and the public types AgentJobRecord, AgentJobStatus, AgentJobKind, AgentJobCompletion, AgentJobStoreOptions, and CreateAgentJobInput.
AgentJob storage defaults to ~/.clavue-agent-sdk/agent-jobs; set CLAVUE_AGENT_JOBS_DIR or pass AgentJobStoreOptions.dir to isolate stores in tests or multi-tenant hosts.
Permissions and tool execution safety
import { query } from "clavue-agent-sdk";
// Trusted automation is the default; restrict tools for a read-only agent.
for await (const msg of query({
prompt: "Review the code in src/ for best practices.",
options: {
toolsets: ["repo-readonly"],
disallowedTools: ["WebSearch"],
canUseTool: async (tool, input) => {
if (tool.name === "Read") return { behavior: "allow" };
return { behavior: "allow", updatedInput: input };
},
},
})) {
// ...
}Tool access is controlled in layers: toolsets and allowedTools choose the available tool names, disallowedTools removes names last, canUseTool can deny or rewrite a specific tool input, and hooks can block lifecycle events. Subagents inherit the parent permission policy.
工具访问按层控制:toolsets 和 allowedTools 选择可用工具名,disallowedTools 最后移除工具名,canUseTool 可以拒绝或改写单次工具输入,hooks 可以拦截生命周期事件。Subagent 会继承父 agent 的权限策略。
permissionMode also has built-in semantics. default allows read-only tools only. plan freezes mutating tools while allowing planning/read tools. acceptEdits allows local file edits but blocks shell, network, external-state, destructive, or approval-required tools. trustedAutomation and bypassPermissions are high-trust modes; still use allowedTools, disallowedTools, and canUseTool for least privilege.
permissionMode 也有内置语义。default 只允许只读工具。plan 会冻结修改型工具,同时允许规划和读取工具。acceptEdits 允许本地文件编辑,但会阻止 shell、网络、外部状态、破坏性或需要审批的工具。trustedAutomation 和 bypassPermissions 是高信任模式;生产环境仍建议配合 allowedTools、disallowedTools 和 canUseTool 做最小权限控制。
Low-confirmation development mode
Use autonomyMode: "autonomous" when the user has already authorized a development task and wants the agent to inspect, edit, verify, and repair without routine confirmation prompts. This changes initiative and question-asking behavior only; it does not bypass permissionMode, tool filters, hooks, or host canUseTool.
import { run } from "clavue-agent-sdk";
const result = await run({
prompt: "Resolve the P0-P3 todo list, fix failures, and run verification.",
options: {
cwd: process.cwd(),
model: "gpt-5.5",
toolsets: ["repo-edit"],
allowedTools: ["Bash"],
permissionMode: "trustedAutomation",
autonomyMode: "autonomous",
maxTurns: 16,
},
});
console.log(result.trace?.policy_decisions);CLI equivalent:
CLAVUE_AGENT_AUTONOMY=autonomous \
CLAVUE_AGENT_PERMISSION_MODE=trustedAutomation \
npx clavue-agent-sdk "Fix the P0-P3 todo list and verify" \
--toolset repo-edit \
--allow Bash \
--jsonFor safer local-edit-only automation, combine autonomyMode: "autonomous" with permissionMode: "acceptEdits" and omit shell/network tools. Run traces include policy_decisions for both allows and denials, with a safe input summary instead of raw tool input, plus the backward-compatible permission_denials list.
Local issue workflows
Use the issue workflow when you want a bounded builder, reviewer, fixer, and verifier loop around a concrete bug report or todo item. issue run creates the workflow record and background jobs without executing the full loop. issue execute runs the local workflow loop immediately.
# Create a workflow from inline text / 从内联文本创建 workflow
npx clavue-agent-sdk issue run "Fix provider retry handling for 429 responses" \
--passing-score 85 \
--require-gate tests \
--json
# Execute from a local markdown issue / 从本地 markdown issue 执行
npx clavue-agent-sdk issue execute .clavue/issues/p0-provider-retry.md \
--max-iterations 3 \
--passing-score 90 \
--require-gate build,tests \
--json
# Inspect and stop workflow runs / 查看和停止 workflow run
npx clavue-agent-sdk issue list --json
npx clavue-agent-sdk issue get issue_run_... --json
npx clavue-agent-sdk issue stop issue_run_... --jsonProgrammatic usage:
import { normalizeIssueInput, runIssueWorkflow } from "clavue-agent-sdk";
const workflow = await runIssueWorkflow({
cwd: process.cwd(),
issue: normalizeIssueInput("Fix flaky package payload verification."),
maxIterations: 3,
passingScore: 90,
requiredGates: ["build", "tests"],
});
console.log(workflow.status, workflow.finalScore);
console.log(workflow.proof_of_work.status, workflow.proof_of_work.verification);Issue workflow records are stored under ~/.clavue-agent-sdk/issue-runs by default. Use the SDK store options to isolate runs for tests, CI, or multi-tenant hosts. runIssueWorkflow() returns proof_of_work, so hosts get a standard handoff artifact without the SDK owning GitHub, PR, CI, Linear, or Jira integrations.
Workflow contracts, proof of work, and orchestration policy
For host applications that want Symphony-style discipline without coupling the SDK to a tracker or daemon, use the SDK-core workflow primitives:
import {
createProofOfWork,
loadWorkflowDefinition,
renderWorkflowPrompt,
resolveWorkflowServiceConfig,
selectDispatchCandidates,
validateWorkflowDispatchConfig,
} from "clavue-agent-sdk";
const definition = await loadWorkflowDefinition({ cwd: repoPath });
const config = resolveWorkflowServiceConfig(definition);
const configIssues = validateWorkflowDispatchConfig(config, { requireTracker: false });
if (configIssues.length > 0) throw new Error(configIssues[0]!.message);
const selection = selectDispatchCandidates({
config,
issues: [{
id: "issue-42",
identifier: "SDK-42",
title: "Fix autonomous workflow handoff",
state: "Todo",
priority: 1,
}],
});
const prompt = renderWorkflowPrompt(definition, {
issue: {
identifier: selection.selected[0]?.identifier,
title: selection.selected[0]?.title,
description: "Produce a tested SDK-core implementation and proof of work.",
},
});
const proof = createProofOfWork({
target: { kind: "issue", id: "SDK-42", title: "Fix autonomous workflow handoff" },
evidence: [{ type: "test", summary: "Focused verification passed", source: "external" }],
quality_gates: [{ name: "tests", status: "passed" }],
required_gates: ["tests"],
references: [{ type: "issue", label: "Host issue", url: "https://tracker.example/SDK-42" }],
});
console.log(prompt);
console.log(proof.status, proof.handoff);The SDK standardizes the contract, proof, and policy layers. Your host application still owns task polling, external tracker updates, PR creation, CI execution, dashboards, and worker lifecycle.
Runtime profiles
Runtime profiles turn a high-level workflow mode into concrete toolsets, permission mode, memory policy, autonomy mode, prompt guidance, and quality-gate behavior. This is the recommended path for hosts that want consistent behavior across collect, organize, plan, solve, build, verify, review, and ship flows.
import { getAllRuntimeProfiles, run } from "clavue-agent-sdk";
console.log(getAllRuntimeProfiles().map((profile) => profile.name));
const result = await run({
prompt: "Verify this package is ready to publish.",
options: {
workflowMode: "verify",
cwd: process.cwd(),
maxTurns: 6,
},
});
console.log(result.status, result.trace?.policy_decisions);The engine only parallelizes tool calls when a tool declares both isReadOnly() and isConcurrencySafe(). Mutating tools and read-only tools that are not concurrency-safe run serially. Set maxToolConcurrency per run to cap safe parallel batches; when omitted, AGENT_SDK_MAX_TOOL_CONCURRENCY is used as the fallback. Invalid, zero, or negative values fall back to 10 so runs do not hang. Run traces include tool_concurrency_limit, tool_concurrency_source, and the existing concurrency_batches.
引擎只会并行执行同时声明 isReadOnly() 与 isConcurrencySafe() 的工具调用。会修改状态的工具,以及只读但非并发安全的工具,会串行执行。可通过每次运行的 maxToolConcurrency 限制安全并行批次;未设置时回退使用 AGENT_SDK_MAX_TOOL_CONCURRENCY。无效、零或负数会回退到 10,避免运行卡住。运行 trace 会包含 tool_concurrency_limit、tool_concurrency_source 和已有的 concurrency_batches。
Tool result memoization (turn-scoped)
When a turn dispatches multiple tool_use blocks for the same read-only concurrency-safe tool with the same input — including duplicates within a single concurrent batch — the engine reuses one tool.call() instead of running it once per block. Permission checks, PreToolUse / PostToolUse hooks, and tool_input / tool_output guardrails still run on every block; only the tool's own work is elided. The cache resets between turns, so model state never sees a stale read. is_error: true results are never retained. Aggregate counters land in trace.tool_cache = { hits, misses }.
每个 turn 内对同一个只读且并发安全工具、相同输入的重复 tool_use 块(含同一并发批次内的重复请求)只会执行一次 tool.call(),其它块直接复用已有结果。权限检查、PreToolUse / PostToolUse hooks 和 tool_input / tool_output guardrails 仍按块执行;只是跳过工具本体的副作用。每个 turn 结束后缓存清空,确保模型状态不会读到陈旧数据。is_error: true 结果不会被缓存。聚合计数会写到 trace.tool_cache = { hits, misses }。
OpenAI prompt prefix caching
Both Chat Completions and Responses requests now include a stable prompt_cache_key derived from (model, system prompt, tool schema). When the prefix is unchanged across turns, OpenAI's prompt cache reuses the cached prefix and bills only the suffix tokens. The key intentionally excludes the conversation input, since that grows every turn. Gateways that don't recognize the field simply ignore it. Anthropic's cache_control: ephemeral markers continue to be applied automatically in the Anthropic provider.
每个 OpenAI 请求(Chat Completions 与 Responses 同时支持)会附带一个稳定的 prompt_cache_key,由 (model, system prompt, tool schema) 派生。当前缀稳定时 OpenAI 仅按差异部分计费。该 key 不包含会话 input,避免随每个 turn 变动。不识别此字段的网关会自动忽略。Anthropic 端继续自动添加 cache_control: ephemeral 标记。
Provider retries and tolerance
Provider calls automatically retry transient API and network failures with exponential backoff. Retryable conditions include rate limits, common 5xx/overload statuses, fetch/socket failures, and Retry-After headers; abort signals are honored during backoff.
Provider 调用会对临时 API 和网络失败自动指数退避重试。可重试场景包括限流、常见 5xx/overload 状态、fetch/socket 失败以及 Retry-After 响应头;退避等待期间会响应 abort signal。
For OpenAI-compatible GPT-5 models, the SDK uses the Responses API by default and falls back to Chat Completions when a gateway does not support /responses. Incomplete Responses output caused by output-token limits maps to max_tokens so the engine can continue; failed or cancelled Responses runs surface as errors instead of empty text.
对于 OpenAI 兼容的 GPT-5 模型,SDK 默认使用 Responses API;如果网关不支持 /responses,会回退到 Chat Completions。因输出 token 限制导致的 incomplete Responses 会映射为 max_tokens,方便引擎继续;failed 或 cancelled 的 Responses 会以错误暴露,而不是返回空文本。
Web UI
A built-in web chat interface is included for testing:
npx tsx examples/web/server.ts
# Open http://localhost:8081API reference
Which API should I use? / 应该使用哪个 API?
| Need / 需求 | Use / 使用 |
| ----------- | ---------- |
| Terminal or CI one-off task / 终端或 CI 一次性任务 | npx clavue-agent-sdk "prompt" |
| Simplest Node.js integration / 最简单 Node.js 集成 | run({ prompt, options }) |
| Streaming UI or progress logs / 流式 UI 或进度日志 | query({ prompt, options }) |
| Multi-turn service, sessions, MCP, hooks / 多轮服务、会话、MCP、hooks | createAgent(options) |
Program logic / 程序逻辑
- Your app calls
run(),query(), or a reusableagent.prompt()/agent.query(). - The SDK builds the system context from options, repo context files, git status, tools, MCP servers, skills, hooks, and permission policy.
- The provider layer sends normalized messages and tool schemas to Anthropic Messages or an OpenAI-compatible chat endpoint.
- When the model requests a tool, the engine applies allow/deny filters,
canUseTool, permission mode, and hooks, then executes the tool. - Tool results are appended to the conversation and the engine repeats until the provider returns a final answer or the run reaches limits.
- The SDK returns either streaming
SDKMessageevents or a structuredAgentRunResultartifact, reusable agents can persist sessions under~/.clavue-agent-sdk, and background AgentJobs persist under~/.clavue-agent-sdk/agent-jobs.
Top-level functions
| Function | Description |
| ------------------------------------- | -------------------------------------------------------------- |
| run({ prompt, options }) | One-shot blocking run, returns Promise<AgentRunResult> |
| query({ prompt, options }) | One-shot streaming query, returns AsyncGenerator<SDKMessage> |
| createAgent(options) | Create a reusable agent with session persistence |
| tool(name, desc, schema, handler) | Create a tool with Zod schema validation |
| createSdkMcpServer({ name, tools }) | Bundle tools into an in-process MCP server |
| defineTool(config) | Low-level tool definition helper |
| doctor(options) | Run structured provider, tool, skill, MCP, storage, and package checks |
| runBenchmarks(options) | Run offline benchmark metrics without live model calls |
| getAllBaseTools() | Get all 35+ built-in tools |
| registerSkill(definition) | Register a custom skill |
| getAllSkills() | Get all registered skills |
| createAgentJob(input, opts) | Create a durable background agent job record |
| getAgentJob(id, opts) | Read a durable background job by ID |
| listAgentJobs(opts) | List durable background jobs in a runtime namespace |
| stopAgentJob(id, reason, opts) | Cancel a queued or running background job |
| clearAgentJobs(opts) | Clear background jobs for a runtime namespace |
| runSelfImprovement(run, config, opts) | Persist bounded improvement memories and optionally run retro/eval feedback |
| extractRunImprovementCandidates(run, config, opts) | Inspect which improvement memories a run would generate |
| runRetroEvaluation(input) | Run deterministic retro/eval orchestration and return typed results |
| createDefaultRetroEvaluators() | Inspect package/import/build/test/onboarding readiness across the core dimensions |
| compareRetroRuns(previous, current) | Compare two retro runs for score deltas and finding drift |
| decideRetroAction(input) | Decide the next machine action from current retro state |
| runRetroVerification(input) | Run fixed quality gates and return pass/fail command results |
| runRetroCycle(input) | Run evaluation, verification, policy, comparison, and optional persistence in one call |
| saveRetroRun(runId, result, opts) | Persist a retro run result to the run ledger |
| loadRetroRun(runId, opts) | Load a persisted retro run result from the run ledger |
| saveRetroCycle(cycleId, result, opts) | Persist a full retro cycle result including decision and summary |
| loadRetroCycle(cycleId, opts) | Load a persisted retro cycle result from the run ledger |
| normalizeIssueInput(input, source?) | Normalize inline or file-backed issue text into a workflow record |
| createIssueWorkflowRun(input, opts) | Create a durable local issue workflow with role-based AgentJobs |
| runIssueWorkflow(input, opts) | Execute a bounded local builder/reviewer/fixer/verifier loop and return proof_of_work |
| listIssueWorkflowRuns(opts) | List persisted issue workflow runs |
| loadIssueWorkflowRun(id, opts) | Load one persisted issue workflow run |
| stopIssueWorkflowRun(id, reason, opts) | Stop an issue workflow run and cancel its associated jobs |
| loadWorkflowDefinition(opts) | Load a repository-owned WORKFLOW.md contract |
| renderWorkflowPrompt(def, input) | Strictly render an issue/task prompt from a workflow contract |
| resolveWorkflowServiceConfig(def) | Resolve workflow defaults, env indirection, workspaces, and runtime settings |
| validateWorkflowDispatchConfig(config) | Validate workflow config before dispatch |
| selectDispatchCandidates(input) | Select eligible issues under active/terminal state and concurrency policy |
| calculateRetryDelayMs(input) | Compute continuation or capped exponential retry delay |
| shouldReleaseIssueForState(state, config) | Decide whether an issue state should release a claim |
| createProofOfWork(input) | Create a standard proof-of-work handoff artifact |
| getRuntimeProfile(mode) | Read a built-in workflow profile |
| getAllRuntimeProfiles() | List built-in workflow profiles |
| applyRuntimeProfile(options) | Expand workflowMode into concrete runtime options |
| normalizeFindings(findings) | Normalize retro findings into a stable schema |
| scoreFindings(findings) | Compute per-dimension and overall retro scores |
| planUpgrades(findings) | Turn retro findings into prioritized workstreams |
| createProvider(apiType, opts) | Create an LLM provider directly |
| createHookRegistry(config) | Create a hook registry for lifecycle events |
| listSessions() | List persisted sessions |
| forkSession(id) | Fork a session for branching |
Agent methods
| Method | Description |
| ------------------------------- | ----------------------------------------------------- |
| agent.query(prompt) | Streaming query, returns AsyncGenerator<SDKMessage> |
| agent.run(text, overrides) | Blocking run, returns full AgentRunResult including self_improvement when enabled |
| agent.prompt(text) | Blocking query, returns Promise<QueryResult> |
| agent.getMessages() | Get conversation history |
| agent.clear() | Reset session |
| agent.interrupt() | Abort current query |
| agent.setModel(model) | Change model mid-session |
| agent.setPermissionMode(mode) | Change permission mode |
| agent.stopTask(id) | Stop a durable AgentJob by ID, then fall back to legacy task cancellation |
| agent.getApiType() | Get current API type |
| agent.close() | Close MCP connections, persist session |
Options
| Option | Type | Default | Description |
| -------------------- | --------------------------------------- | ---------------------- | -------------------------------------------------------------------- |
| apiType | string | auto-detected | 'anthropic-messages' or 'openai-completions' |
| model | string | claude-sonnet-4-6 | LLM model ID |
| apiKey | string | CLAVUE_AGENT_API_KEY | API key |
| baseURL | string | — | Custom API endpoint |
| cwd | string | process.cwd() | Working directory |
| systemPrompt | string | — | System prompt override |
| appendSystemPrompt | string | — | Append to default system prompt |
| tools | ToolDefinition[] | All built-in | Available tools |
| toolsets | ToolsetName[] | — | Named built-in tool groups |
| allowedTools | string[] | — | Tool allow-list |
| disallowedTools | string[] | — | Tool deny-list |
| permissionMode | string | trustedAutomation | trustedAutomation / auto / default / acceptEdits / dontAsk / bypassPermissions / plan |
| autonomyMode | string | inferred from permission/profile | supervised / proactive / autonomous; controls initiative and confirmations without bypassing permissions |
| canUseTool | function | allow all | Custom tool guard or input modifier |
| qualityGatePolicy | QualityGatePolicy | — | Mark a successful run as failed when required quality gates fail or are missing |
| maxTurns | number | 10 | Max agentic turns |
| maxToolConcurrency | number | env or 10 | Max concurrent read-only concurrency-safe tool calls per batch |
| maxBudgetUsd | number | — | Spending cap |
| thinking | ThinkingConfig | { type: 'adaptive' } | Extended thinking |
| effort | string | high | Reasoning effort: low / medium / high / max |
| mcpServers | Record<string, McpServerConfig> | — | MCP server connections |
| agents | Record<string, AgentDefinition> | — | Subagent definitions |
| hooks | Record<string, HookCallbackMatcher[]> | — | Lifecycle hooks |
| memory | MemoryConfig | — | Structured memory injection, off / autoInject / brainFirst policy, and session-summary persistence |
| selfImprovement | boolean \| SelfImprovementConfig | false | Opt-in run learning via improvement memories and optional retro cycle |
| resume | string | — | Resume session by ID |
| continue | boolean | false | Continue most recent session |
| persistSession | boolean | true | Persist session to disk |
| sessionId | string | auto | Explicit session ID |
| outputFormat | { type: 'json_schema', schema } | — | Structured output |
| sandbox | SandboxSettings | — | Filesystem/network sandbox |
| settingSources | SettingSource[] | — | Load AGENT.md, project settings |
| env | Record<string, string> | — | Environment variables |
| abortController | AbortController | — | Cancellation controller |
Named toolsets
Use toolsets in the SDK or --toolset in the CLI to enable named groups of built-in tools without listing every tool name. The SDK also exports TOOLSET_NAMES, isToolsetName(), and getToolsetTools() for validation and UI generation.
在 SDK 中使用 toolsets,或在 CLI 中使用 --toolset,可以启用命名的内置工具组,而不必逐个列出工具名。SDK 也导出 TOOLSET_NAMES、isToolsetName() 和 getToolsetTools(),方便做校验或生成 UI。
import { TOOLSET_NAMES, getToolsetTools, isToolsetName, run } from "clavue-agent-sdk";
const selected = "repo-readonly";
if (!isToolsetName(selected)) throw new Error("Unknown toolset");
const result = await run({
prompt: "Review this repository and check current docs.",
options: {
toolsets: [selected, "research"],
disallowedTools: ["WebSearch"],
},
})