@fengjunhui31/task-relay

v0.8.0

Published

2 months ago

CC 外部模型 Subagent 中继 — 让 Claude Code 调用任意 AI 模型

0High
0Medium
0Low

fengjunhui31

claude-code subagent ai relay anthropic

Goal

Claude Code has a powerful built-in subagent system (the Agent tool): it spawns child agents that inherit the full tool suite, share permissions, and return structured results. But these subagents always use Anthropic's own models, consuming CC subscription tokens.

Task Relay's goal is to replace CC's built-in subagent with external models — while preserving the same capabilities. The external model gets the same tools, the same permission context, the same workflow. CC doesn't need to know the difference.

This is not a human-facing CLI. It's an internal component that CC calls automatically.

Background: CC Native Agent vs relay

CC's Agent tool is an in-process AsyncGenerator — parent and child share memory, MCP connections, and prompt cache. Task Relay replaces this with claude -p subprocess + ANTHROPIC_BASE_URL pointing to an external model. The key challenge is mapping every native capability to claude -p equivalents.

Capability Mapping

| Capability | CC Native Agent | Task Relay | How | |-----------|----------------|-----------|-----| | Tool access | Inherits full tool suite | Inherits full tool suite | claude -p natively provides all CC tools | | Tool restriction | tools / disallowedTools in agent definition | --allowedTools / --disallowedTools / --tools flags | Same granularity, different syntax | | Permission bubbling | In-process toolPermissionContext, prompts bubble to parent | tool_deferred → relay catches → auto-resume | CC's defer mechanism in -p mode | | Agent behavior | Built-in system prompts per agent type | CC passes agent constraints via prompt / --append-system-prompt | CC already knows each agent's role | | Agent behavior | Built-in system prompts per agent type | CC passes agent constraints via prompt / --append-system-prompt | CC already knows each agent's role | | Model selection | model field (opus/sonnet/haiku) | External model via config | No tier distinction — uses preference model | | Background execution | Native run_in_background | Bash({ run_in_background: true }) | CC's Bash tool handles async | | Worktree isolation | isolation: "worktree" | --worktree flag | Direct passthrough to claude -p | | Session persistence | In-memory message history | Session ID + --resume | claude -p saves transcripts to disk | | Output format | Structured AgentToolResult | JSON result + session ID | --output-format json parsing | | Prompt cache | Shared across parent/child (fork path) | relay fork resumes parent session → cache inherited | --fork-session --resume <sid> | | MCP connections | Shared parent MCP clients | Independent connections per subprocess | Architectural limitation | | SendMessage | In-process bidirectional messaging | AskUserQuestion + defer for subprocess→parent | Enabled by tool_deferred mechanism | | Real-time progress | yield Message streaming | Final result only | Architectural limitation |

What relay can't do

A few native capabilities are fundamentally tied to the in-process architecture:

MCP connection sharing — each subprocess opens its own MCP connections.
Real-time progress streaming — relay returns the final result, not intermediate messages.
isolation: "remote" — CC's remote execution (CCR) is an internal feature.

These are architectural limitations of the claude -p subprocess model, not relay bugs.

Architecture

┌─────────────────────────────────────────────────────┐
│              Claude Code (interactive session)       │
│                                                     │
│  CC calls Agent(subagent_type, prompt, ...)          │
│         │                                           │
│         ▼                                           │
│  ┌─────────────────────────────────┐                │
│  │  PreToolUse Hook (relay-hook)   │                │
│  │                                 │                │
│  │  Creating teammate?             │──yes──▶ allow  │
│  │    (team_name+name or           │    (native)    │
│  │     leadSessionId+name)         │                │
│  │                                 │                │
│  │  Passthrough whitelist?         │──yes──▶ allow  │
│  │    (statusline-setup, etc.)     │    (native)    │
│  │                                 │                │
│  │  agents config match?           │──yes──▶ deny   │
│  │    → precise model + tools      │    + command   │
│  │                                 │                │
│  │  no match?                      │──yes──▶ deny   │
│  │    → relay help reference       │    + help      │
│  └─────────────────────────────────┘                │
│         │                                           │
│         ▼                                           │
│  CC runs: Bash("relay <model> '<task>' ...")     │
│         │                                           │
└─────────┼───────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────┐
│              Task Relay (thin wrapper)               │
│                                                     │
│  1. Load ~/.relay.yaml                              │
│  2. Resolve endpoint + API key                      │
│  3. Inject ANTHROPIC_BASE_URL env                   │
│  4. Spawn claude -p "<task>" [options]               │
│  5. If tool_deferred → auto-resume loop             │
│  6. Parse JSON result + session ID                  │
│  7. Return result to parent CC                      │
└─────────┬───────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────┐
│         claude -p (external model subprocess)        │
│                                                     │
│  ANTHROPIC_BASE_URL → external endpoint             │
│  Full CC tools: Read, Write, Edit, Bash, Glob, ...  │
│  Skills, MCP servers (independent connections)      │
└─────────────────────────────────────────────────────┘

Interception Logic

When CC tries to spawn a subagent via the Agent tool, a PreToolUse hook intercepts and routes:

Teammate creation → passthrough (team_name+name explicit, or leadSessionId+name for context-inherited cases)
Passthrough whitelist → allow (statusline-setup, claude-code-guide, etc.)
agents config match → deny + precise relay command (model, tools, options)
No match → deny + redirect to relay help mapping reference

Team lead's own subagent calls (without name) are intercepted like any other — only the teammate-creation action is exempt. Teammates' internal subagent calls are also intercepted (different session_id, not a team lead).

Config Structure

# Agent mapping: CC subagent_type → relay model + tools
agents:
  Explore:
    model: minimax-m27
    tools: Read,Glob,Grep,Bash       # maps to --allowedTools
    bare: true
  Plan:
    model: minimax-m27
    tools: Read,Glob,Grep
    bare: true

# Endpoints: how to connect
endpoints:
  minimax:
    base_url: https://api.minimaxi.com/anthropic
    api_key: sk-xxx

# Models: what they can do
models:
  minimax-m27:
    endpoint: minimax
    capabilities: [text, code, agent]
    context_window: 256000

# Preferences: fallback routing for Skill
preferences:
  coding: [minimax-m27]
  code_review: [minimax-m27]
  default: [minimax-m27]

Prerequisites

Claude Code installed and authenticated (claude command available)
Node.js >= 18
An API key from an Anthropic-compatible provider (e.g. MiniMax, DeepSeek)

Quick Start

# 1. Install
npm install -g @fengjunhui31/task-relay

# 2. Install Skill + Hook into Claude Code, generate config template
relay install claude

# 3. Edit ~/.relay.yaml — add your API key
# 4. Test
relay models
relay minimax-m27 "say hello"

Done. Claude Code now automatically delegates subagent tasks to external models.

Commands

| Command | Description | |---------|-------------| | relay models | List models, capabilities, preferences | | relay <model> "<task>" | Relay a task — returns result + session ID | | relay fork <model> "<task>" | Fork CC's current session into an external model | | relay install claude | Install Skill + Hook into Claude Code | | relay update claude | Update the Skill + Hook |

Features

Permission Handling via `defer`

claude -p is non-interactive and auto-allows most tools. For edge cases where a tool is deferred (by external hooks or the permission system), relay's run loop handles tool_deferred and auto-resumes — no --dangerously-skip-permissions needed.

Session Persistence

Every relay returns a session ID. Resume with --session-id:

relay minimax-m27 "analyze the codebase"
# → result + [relay] session: abc-123

relay minimax-m27 "now refactor auth" --session-id abc-123
# → continues with full context

Context-Aware Fork

Fork CC's live session into an external model. Auto-detects the active session and warns if context exceeds 60% of the target model's window:

relay fork minimax-m27 "continue this analysis"

`claude -p` Passthrough

Extra arguments are forwarded directly to claude -p:

relay minimax-m27 "review code" --allowedTools "Read,Glob,Grep"
relay minimax-m27 "write docs" --bare
relay minimax-m27 "refactor" --worktree
relay minimax-m27 "explain" --append-system-prompt "用中文回复"

Benchmark: CC Token Savings

10 tasks (6 bug fix + 4 code review), comparing CC doing everything vs CC orchestrating + relay executing (MiniMax M2.7).

| Metric | CC only | CC + relay | Delta | |--------|:-:|:-:|:-:| | Work tokens (input+output) | 8,977 | 5,381 | -40% | | Total tokens (incl. cache) | 628,247 | 512,546 | -18% | | Cost (CC subscription) | $1.32 | $1.13 | -14% | | Correctness | 10/10 | 10/10 | — |

Work tokens cut by ~40% — CC's orchestration overhead is much smaller than doing the work itself
Code review benefits most — 57% output token reduction
Cache overhead dominates — both groups pay ~48K-96K tokens per task for system context

Run node bench/run.js && node bench/report.js to reproduce.

Compatible Endpoints

Any Anthropic-compatible API endpoint works. Non-Anthropic models (GPT-4o, Qwen, etc.) require a proxy like one-api or litellm.

License

MIT

中文说明

项目目标

Claude Code 内置了强大的 subagent 系统（Agent 工具）：它在进程内通过 AsyncGenerator 生成子 agent，共享内存、权限、MCP 连接和 prompt cache。但这些子 agent 始终使用 Anthropic 自己的模型，消耗 CC 订阅 token。

Task Relay 的目标是用外部模型替代 CC 的内置 subagent，同时保留相同的能力。 外部模型获得相同的工具、相同的权限上下文、相同的工作流程。CC 无需感知差异。

这不是一个给人用的 CLI 工具，而是 CC 自动调用的内部组件。

原生 Agent vs relay 能力对比

| 能力 | CC 原生 Agent | Task Relay | 实现方式 | |------|-------------|-----------|---------| | 工具访问 | 继承全套工具 | 继承全套工具 | claude -p 天然提供 | | 工具限制 | tools / disallowedTools | --allowedTools / --disallowedTools / --tools | 相同粒度，不同语法 | | 权限冒泡 | 进程内 toolPermissionContext，冒泡到父终端 | tool_deferred → relay 捕获 → 自动 resume | CC 的 defer 机制 | | Agent 行为 | 内置 system prompt | CC 通过 prompt / --append-system-prompt 传递 | CC 知道每个 agent 的职责 | | 后台执行 | 原生 run_in_background | Bash({ run_in_background: true }) | CC 的 Bash 工具 | | 会话持久化 | 内存消息历史 | session ID + --resume | 磁盘 transcript | | Prompt cache | 父子共享（fork 路径） | relay fork 继承父 session → cache 复用 | --fork-session --resume <sid> | | MCP 连接共享 | 共享父 MCP 客户端 | 每个子进程独立连接 | 架构限制 | | SendMessage | 进程内双向消息 | AskUserQuestion + defer | tool_deferred 机制 |

核心原理

claude -p 支持 ANTHROPIC_BASE_URL 环境变量。Task Relay 利用这一点，将 CC 的子 agent 运行时指向任意 Anthropic 兼容端点——外部模型天然获得 CC 全套工具。

不需要自建 agentic loop，不需要重实现工具，不需要 MCP Server。只是 claude -p 上的薄配置层。

两层拦截机制

CC 调用 Agent 工具时，PreToolUse Hook 拦截并路由：

Tier 1 — agents 直接映射：~/.relay.yaml 的 agents 配置按 subagent_type 匹配，命中则给出精确的 relay 命令和工具限制。CC 负责把 agent 的行为约束写进 prompt。
Tier 2 — Skill 回退：无 agents 匹配时，Hook 引导 CC 通过 task-relay Skill 自行决策路由。

安装

npm install -g @fengjunhui31/task-relay
relay install claude    # 安装 Skill + Hook

编辑 ~/.relay.yaml，填入模型端点和 API key 即可使用。

评测结果

10 道题（6 bug fix + 4 code review），CC 独立 vs CC + relay：

| 指标 | CC 独立 | CC + relay | 节省 | |------|:-:|:-:|:-:| | 工作 token | 8,977 | 5,381 | -40% | | CC 订阅成本 | $1.32 | $1.13 | -14% | | 正确率 | 10/10 | 10/10 | — |

运行 node bench/run.js && node bench/report.js 复现。