@salesforce/sfdx-agent-harness-claude
v0.24.0
Published
Claude Agent SDK-backed AgentHarness implementation for @salesforce/sfdx-agent-sdk
Maintainers
Keywords
Readme
@salesforce/sfdx-agent-harness-claude
Claude Agent SDK-backed AgentHarness implementation for @salesforce/sfdx-agent-sdk. Provides
ClaudeHarnessFactory, the entry point consumers pass to createAgentManager.
Closed source. This package is published to npm under the Salesforce Public Code License and is for use by Salesforce only.
Install
npm install @salesforce/sfdx-agent-sdk @salesforce/sfdx-agent-harness-claudeQuick start
import { createAgentManager, DefaultAgentConnectivityResolver } from '@salesforce/sfdx-agent-sdk';
import { ClaudeHarnessFactory } from '@salesforce/sfdx-agent-harness-claude';
const harnessFactory = new ClaudeHarnessFactory({
permissionMode: 'bypassPermissions',
});
const manager = await createAgentManager('/path/to/storage', harnessFactory, {
connectivityResolver: new DefaultAgentConnectivityResolver(),
});Connectivity (gateway URL, JWT, headers, native model id) is resolved by an AgentConnectivityResolver on the SDK and
handed to the harness as a ModelConnectivityInfo bag. The harness factory itself owns harness-only concerns
(permission mode, tool-approval timeout, bypass list, query defaults). For BYOK / direct-Anthropic / LLMG-Express
deployments use ApiKeyConnectivityResolver; for non-Salesforce hosts implement your own resolver. See the
@salesforce/sfdx-agent-sdk README for the full resolver surface.
See the @salesforce/sfdx-agent-sdk README for the full consumer-facing API. For
internal architecture, see ARCHITECTURE.md.
Public API
| Export | Kind | Description |
| ------------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| ClaudeHarnessFactory | class | HarnessFactory that produces a Claude-backed AgentHarness bound to a storage root. |
| ClaudeHarnessFactoryConfig | interface | Configuration for ClaudeHarnessFactory (permissionMode, optional queryDefaults, toolApprovalTimeoutMs, deprecated bypassApprovalTools, subprocessEnv). |
| CLAUDE_BUILT_IN_TOOL_POLICIES | const | Frozen ReadonlyArray<ToolPolicyRule> the harness feeds the SDK's resolveToolApprovalPolicy as the tiers.harness slice — require-approval rules for Claude's mutating built-ins Bash / Edit / MultiEdit / Write / NotebookEdit. |
| CLAUDE_BUILT_IN_TOOL_NAMES | const | The five built-in tool names CLAUDE_BUILT_IN_TOOL_POLICIES covers. |
| ClaudeHarnessPermissionMode | type | Narrowed permission mode: 'bypassPermissions' \| 'acceptEdits' \| 'plan'. |
| ClaudeAgentHarness | type | Branded AgentHarness subtype carrying the Claude-specific config inference. Inferred at manager.createAgent automatically — annotation rarely needed. |
| ClaudeAgentConfig | type | AgentConfig extended with Claude-only fields (skillSearch, toolSearch). |
| ClaudeSkillSearchConfig | type | Shape of ClaudeAgentConfig.skillSearch (topK). Opts the in-process skill_bridge MCP server into search-on-demand mode. |
| ClaudeToolSearchConfig | type | Shape of ClaudeAgentConfig.toolSearch (alwaysActive). Stamps _meta['anthropic/alwaysLoad'] on matching tools so they bypass Claude tool-search deferral. |
| ClaudeQueryDefaults | type | Consumer-overridable defaults for the Claude query() call (excludes model). |
Permission modes
ClaudeHarnessFactoryConfig.permissionMode is required and narrowed to the modes that work without an interactive
prompt source:
'bypassPermissions'— all tools run without prompting (setsallowDangerouslySkipPermissions: true).'acceptEdits'— auto-accept file edits; prompt for everything else.'plan'— planning mode; the model proposes actions without executing.
'default', 'dontAsk', and 'auto' are intentionally excluded because they require a prompt UI the harness does not
provide. For human-in-the-loop gating, configure AgentConfig.toolPolicies (see "Tool-approval policy" below) — when
gating engages, the harness installs a PreToolUse SDK hook for the turn that resolves every tool call through the
SDK's policy resolver, regardless of the factory-pinned permission mode.
HTTPS proxy routing
The harness honors HTTPS_PROXY / HTTP_PROXY / NO_PROXY (either casing) for the host-side MCP HTTP transport via
@salesforce/agentic-common's resolveProxyDispatcher(). When either proxy var is set at factory create() time, the
harness wraps the transport fetch with an undici-backed routing function — no globalThis mutation. When no proxy env
is set, fetch runs unwrapped (zero overhead).
The Claude subprocess inherits HTTPS_PROXY from the host env separately and is unaffected by the host-side wrapping.
Subprocess-side proxy routing follows whatever env was set when the subprocess spawned.
Snapshot timing. HTTPS_PROXY / HTTP_PROXY are captured at create() time and pinned for the harness's lifetime;
NO_PROXY is re-evaluated per request. Set the proxy env vars before constructing the factory's first harness.
Skill search
AgentConfig.skills is a cross-harness field; both Mastra and Claude surface configured skills to the model. By default
the Claude harness exposes the catalog via the in-process skill_bridge MCP server's load_skill tool, whose
description enumerates skill names and descriptions so the model can pick directly. That shape is the right call at
small N (no extra round-trip).
For larger catalogs, set ClaudeAgentConfig.skillSearch to switch the same in-process server into a
search_skills(query) → topK + load_skill(name) shape. The catalog is dropped from load_skill's description so it
never sits in the prompt at rest, and the model is forced to discover skills via search. The opt-in field mirrors
MastraAgentConfig.skillSearch so cross-harness consumers configure the on-demand path the same way on both harnesses.
const agent = await manager.createAgent(projectRoot, {
instructions: '...',
skills: ['/path/to/skills'],
skillSearch: { topK: 5 }, // ranked, on-demand discovery; defaults to 5 if omitted
});skillSearch.topK is advisory; defaults to 5. The ranker is an in-tree TF-style scorer with name-match boost — no
embedder dependency, no external service, deterministic across runs (alphabetical tie-break). The model sees both
mcp__skill_bridge__search_skills and mcp__skill_bridge__load_skill in its tool catalog and the existing always-load
meta keeps both tools exempt from Claude tool-search deferral.
When skillSearch is unset (the default), the existing description-enumeration behavior is preserved unchanged.
Multi-file skills. load_skill returns the SKILL.md body prefixed with the skill's on-disk location (its
directory and absolute SKILL.md path), so a skill whose body references sibling files (e.g. references/api-guide.md,
assets/...) gives the model an absolute anchor to read them — including when the skill lives outside the agent's
project directory. This holds in both the default and skillSearch modes.
Tool-exposure policy (toolSearch.alwaysActive)
The Claude Agent SDK enables tool-search deferral once the active tool surface crosses a threshold (≈ tens of tools);
once on, individual tools are hidden behind a tool_search → load → invoke round-trip until the model asks for them. On
the Salesforce LLM Gateway's Bedrock-Sonnet path that round-trip can stall the turn (see ARCHITECTURE.md →
_meta['anthropic/alwaysLoad']). To pin specific tools into the model's catalog regardless of how the surrounding
surface grows, list them on ClaudeAgentConfig.toolSearch.alwaysActive:
const agent = await manager.createAgent(projectRoot, {
instructions: '...',
mcpServers: {
sfdx: { type: 'stdio', command: '...' },
other: { type: 'stdio', command: '...' },
},
toolSearch: {
alwaysActive: [
{ serverName: 'sfdx' }, // every tool from this server (post-discovery)
{ serverName: 'other', toolName: 'lookup_account' }, // exactly one tool
{ toolName: 'resume_tool_operation' }, // any tool by this name, regardless of source
],
},
});Each entry covers one of three patterns:
| Entry shape | Matches |
| ------------------------------------ | ---------------------------------------------------------------------- |
| { serverName: 'X' } | every tool advertised by server X |
| { serverName: 'X', toolName: 'Y' } | exactly tool Y on server X |
| { toolName: 'Y' } | any tool named Y regardless of source (built-ins, workspace, MCP, …) |
At least one of serverName / toolName must be present — empty {} rejects at config time. Mirrors
MastraAgentConfig.toolSearch.alwaysActive so cross-harness consumers configure tool-exposure policy the same way on
both harnesses.
Caveat — Claude built-ins: the Claude SDK has no host-side hook for stamping _meta['anthropic/alwaysLoad'] on
built-in tools (Bash, Read, Edit, …). A { toolName: 'Bash' } entry today is a no-op on built-ins and only takes
effect when a tool by that name comes from a configured MCP server or consumer-declared AgentConfig.tools.
Multimodal input
chatSession.chat() / harness stream() accept a MessagePart[] carrying image (PNG/JPEG) and file (PDF) parts
alongside text (see the SDK README for the part shapes). The harness lowers them to Anthropic image / document
content blocks sent over the Bedrock-native path. Files are validated pre-stream against the agent's gateway model
capabilities and caps — an unsupported format, an oversized file, or too many files rejects with
AgentSDKError(MULTIMODAL_NOT_SUPPORTED); a malformed input part rejects with INVALID_MESSAGE_CONTENT. The caps
mirror the gateway SDK's conservative limits so a file behaves identically here and on the Mastra harness.
Tool-approval policy
The harness gates harness-executed tools through the SDK's resolveToolApprovalPolicy (configure rules via
AgentConfig.toolPolicies / AgentConfig.defaultToolDecision — see the SDK README). The PreToolUse hook is the
decision site: for each non-consumer tool it resolves a decision and acts on it:
allow— the tool runs without surfacing atool-approval-request.deny— the SDK synthesizes atool_result(isError=true)so the model can recover; notool-approval-requestsurfaces and the assistant turn continues.require-approval— atool-approval-requestis emitted and the SDK hook parks until the consumer settles viachatSession.approveToolCall(toolCallId)/declineToolCall(toolCallId).
Gating is opt-in: it engages only when the agent's config sets toolPolicies or defaultToolDecision, or the
chat() call sets the deprecated requireToolApproval or batchApprovals. With none configured, tools run without
gating (no PreToolUse hook is installed), matching the SDK's "no policy ⇒ no gating" default. The harness adds
CLAUDE_BUILT_IN_TOOL_POLICIES (require-approval rules for the mutating built-ins Bash / Edit / MultiEdit /
Write / NotebookEdit) as the tiers.harness slice and the SDK's skill_bridge meta-tool rules as allow defaults,
so capability-discovery never prompts and read-only built-ins (Read / Glob / Grep / Task) run free by absence.
Consumer-executed tools (AgentConfig.tools) always bypass the gate — their execution is the consumer's responsibility
via submitToolResult. Set StreamOptions.batchApprovals: true to surface parallel require-approval requests on one
stream for a batch-approval UI.
StreamOptions.requireToolApprovalis deprecated — superseded by per-toolAgentConfig.toolPolicies. It still works for one release: a truthy value engages all-or-nothing gating (every non-consumer tool requires approval), and'batch'maps ontobatchApprovals.
Unlike Mastra (which suspends every tool and re-stamps a declined native result), Claude's deny is the SDK's native
permissionDecision: 'deny' — the Claude Agent SDK synthesizes the tool_result(isError=true) itself, so no
re-stamping is needed.
The require-approval flow on the chat()-returned eventStream:
- Emits a
tool-approval-requestevent carrying{ toolCall: { toolCallId, toolName, args } }. - Parks the SDK hook Promise until the consumer settles (
approveToolCall/declineToolCallreturnPromise<void>). - Continues delivering events on the same
eventStreamafter the consumer settles — the resultingtool-result, the model's follow-uptext-deltaevents, and the terminalfinishall arrive on that one stream.- On approve: the SDK's
tool_useruns and thetool-resultlands on the stream. - On decline: the SDK synthesizes a
tool_resultwithisError: true, the model receives the denial and emits an acknowledgement, thenfinish. The decline does not abort the assistant turn.
- On approve: the SDK's
This is the post-#529 single-stream-per-turn contract — see the SDK README's "Tool Approval Flow" for consumer-facing
patterns. Single-tool approve / decline, parallel batches (batchApprovals: true), approval-timeout enforcement,
abortSignal propagation, and shutdown / destroyThread cleanup all share that contract.
Abort signal
When StreamOptions.abortSignal fires during an active turn with requireToolApproval: true:
- All pending permission Promises reject with an
AbortError. - The underlying Claude subprocess is killed (existing
AbortControllerwiring). - The single per-turn
eventStreamends with anerrorevent (code: 'aborted',error.name === 'AbortError') followed by afinishevent (finishReason: 'error'). The'error'finish reason is used because the SDK'sFinishReasontype has no'aborted'value; the precedingerrorevent carries the abort identity.
If the signal is already aborted when stream() is called, stream() throws synchronously (existing
AbortSignal.throwIfAborted() behavior — unchanged).
Shutdown and thread teardown
Calling harness.shutdown() disposes every tracked coordinator (across all agents and threads). For each pending
approval, the parked Promise rejects with an AgentSDKError whose type is 'DISPOSED'; the per-turn eventStream
ends with a single error event (no finish) and the underlying Claude subprocess is killed. Calling
destroyThread(agentId, threadId) performs the same cleanup but only for the coordinator associated with that
(agentId, threadId) pair — coordinators on other threads are untouched. Both entry points are idempotent (the
harness's shuttingDown guard prevents double-dispose; the coordinator's disposed flag does the same).
Approval timeout
Each pending approval is auto-denied after ClaudeHarnessFactoryConfig.toolApprovalTimeoutMs (default 600_000, i.e.
10 minutes — matching the Mastra MCP timeout default). When the timeout fires:
- The parked
PreToolUsehook Promise resolves withpermissionDecision: 'deny'plusdecision: 'block'so the agentic loop terminates instead of letting the model retry the timed-out call. - An
errorevent withcode: 'tool-approval-timeout'is emitted on the consumer's currently-iteratingeventStream, then that stream ends. - Any subsequent
approveToolCall/declineToolCallfor the timed-out tool throws because the coordinator marks every emitted approval as auto-resolved during teardown.
Calling approveToolCall or declineToolCall before the timeout cancels the timer — no spurious error event surfaces
afterwards.
Consumer pattern: to observe the timeout (or
abort/dispose) error on theeventStream, iterate the stream to its natural end rather thanbreaking out after thetool-approval-requestevent. Callingiterator.return()(whichbreakdoes insidefor await) finalizes the harness's source generator before the terminal event can reach you. Settle in-line and keep iterating; the loop exits naturally when the terminalfinisharrives.
Connectivity (gateway URL, auth, native model id)
The Claude harness reads connectivity per-spawn from a ModelConnectivityInfo bag the SDK hands it. The bag carries the
gateway base URL, the native model id (e.g. anthropic.claude-sonnet-4-6), the provider hint (bedrock-anthropic or
anthropic), and a getHeaders() callback the harness invokes once per subprocess spawn. JWT rotation rides on that
callback — the next stream() call reads the rotated JWT through the same closure without rebuilding anything.
The SDK ships two bundled resolvers and a documented extension point — see the
@salesforce/sfdx-agent-sdk README:
DefaultAgentConnectivityResolver— Salesforce-org gateway path. Targets the LLM Gateway's pass-through Bedrock endpoint (/invoke-with-response-stream) using an org JWT.ApiKeyConnectivityResolver— direct-Anthropic / OpenAI-direct / LLMG-Express. Constructor takes{ getApiKey, baseUrl, providerHint, ... }.- Custom
AgentConnectivityResolver— for deployments neither bundled resolver covers, implement the SDK'sAgentConnectivityResolverinterface and pass it tocreateAgentManager({ ..., connectivityResolver }).
The harness factory itself takes no resolver, gateway URL, or auth field — those concerns live entirely on the connectivity resolver.
Operational subprocess env (subprocessEnv)
Rule: subprocessEnv is for operational subprocess/runtime vars with no model-connectivity meaning — anything
dynamic or auth/endpoint-shaped belongs on the connectivity resolver. The map is static for the factory's lifetime;
per-spawn-dynamic values (a rotating key, a fresh trace id) belong on the resolver's getHeaders(), not here.
Connectivity (auth headers, gateway/endpoint URL, provider hint) flows through the connectivity resolver. Some
deployments also need to set operational environment variables on the Claude subprocess that aren't connectivity and
therefore have no home on ModelConnectivityInfo. ClaudeHarnessFactoryConfig.subprocessEnv is the injection point:
const harnessFactory = new ClaudeHarnessFactory({
permissionMode: 'bypassPermissions',
subprocessEnv: {
NODE_EXTRA_CA_CERTS: '/etc/corp/ca-bundle.pem', // corporate-proxy CA trust
NODE_TLS_REJECT_UNAUTHORIZED: '1',
DISABLE_TELEMETRY: '1',
DISABLE_AUTOUPDATER: '1',
},
});Layering (last wins): inherited host env (with the dangerous auth vars stripped) → subprocessEnv →
connectivity-derived transport env → CLAUDE_CONFIG_DIR. Two consequences fall out of that order:
subprocessEnvoverrides inherited host vars — that's the point of the slot.subprocessEnvcannot override auth / endpoint. Connectivity is layered last, so asubprocessEnventry forANTHROPIC_BEDROCK_BASE_URL/ANTHROPIC_CUSTOM_HEADERS/ etc. is overwritten by the resolver-derived value. Keep auth on the connectivity resolver; put only operational vars here.
The three dangerous vars stripped from inherited env (ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_BASE_URL)
are not stripped from subprocessEnv — a value placed there is an explicit consumer choice, not an accidental host
leak.
Note:
ANTHROPIC_DEFAULT_{HAIKU,SONNET,OPUS}_MODEL(the Claude CLI's internal tier-model ids) is a valid current use case for this slot. If a future consumer needs those resolved dynamically per model rather than statically per factory, that's the signal to growModelConnectivityInforather than widen this slot to accept a function.
Per-server MCP reconnect (eager semantics)
The Claude harness implements Agent.reconnectMcpServer(name) eagerly: the request closes the existing host-owned
@modelcontextprotocol/sdk Client, rebuilds a fresh transport pair, and runs discovery (client.listTools()) before
resolving. By the time await agent.reconnectMcpServer('sf') returns, agent.getMcpServerInfo() reflects the
post-reconnect state.
This matches the Mastra harness's shape — both harnesses now own MCP clients in the host process and reconnect synchronously rather than deferring to the next stream.
const failed = agent.getMcpServerInfo().find((s) => s.status === 'error');
if (failed) {
await agent.reconnectMcpServer(failed.name); // eager: returns when discovery has settled
agent.getMcpServerInfo(); // reflects the reconnect
}Validation throws (unknown server, disabled server). Transport / handshake failures during the reconnect are recorded on
getMcpServerInfo() as status: 'error' and emit mcp-server-discovery-failed telemetry; the call itself does not
reject.
Supplying query defaults
const harnessFactory = new ClaudeHarnessFactory({
permissionMode: 'bypassPermissions',
queryDefaults: {
maxTurns: 5,
},
});Suppressing Claude built-ins for cross-harness consumer tools
Claude's subprocess ships built-in tools the model can invoke without the host wiring them up — AskUserQuestion,
Bash, Read, WebSearch, etc. When a consumer registers an equivalent cross-harness tool via AgentConfig.tools
(which works on both Mastra and Claude), the model can wind up choosing the Claude built-in over the consumer tool,
breaking parity with Mastra. Pass disallowedTools through queryDefaults to keep the model's choice deterministic
across harnesses.
disallowedTools flows through to the Claude SDK verbatim (it's a field on the SDK's Options shape that
ClaudeQueryDefaults does not strip). It blocks the model from emitting tool_use for the named built-ins entirely.
import { createAgentManager, DefaultAgentConnectivityResolver } from '@salesforce/sfdx-agent-sdk';
import { ClaudeHarnessFactory } from '@salesforce/sfdx-agent-harness-claude';
const harnessFactory = new ClaudeHarnessFactory({
permissionMode: 'bypassPermissions',
// Suppress the built-in `AskUserQuestion` so the cross-harness consumer
// tool below is the only ask-the-user surface the model sees on Claude
// (matching what Mastra consumers experience by default).
queryDefaults: {
disallowedTools: ['AskUserQuestion'],
},
});
const manager = await createAgentManager('/path/to/storage', harnessFactory, {
connectivityResolver: new DefaultAgentConnectivityResolver(),
});
const agent = await manager.createAgent(projectRoot, {
instructions: '...',
tools: [
{
name: 'ask_user_choice',
description: 'Ask the human user a multiple-choice question and wait for the answer.',
inputSchema: {
type: 'object',
properties: {
question: { type: 'string' },
options: { type: 'array', items: { type: 'string' } },
},
required: ['question', 'options'],
},
},
],
});
// In your event loop, answer the consumer tool's `tool-call` via
// `session.submitToolResult({ toolCallId, toolName, result })`. Consumer
// tools never reach the approval gate — the options builder's `PreToolUse`
// matcher excludes the consumer-tool wire prefix unconditionally, so they
// surface as a normal `tool-call` event with no preceding
// `tool-approval-request`, regardless of any policy configured.This is the recommended pattern when the consumer wants identical behavior across Mastra and Claude. For Claude-only consumers who prefer to drive the built-in directly, see "Bypassing the approval gate for specific tools" below.
Bypassing the approval gate for specific tools (deprecated)
Deprecated.
ClaudeHarnessFactoryConfig.bypassApprovalToolsis superseded by per-toolAgentConfig.toolPolicies(e.g.definePolicy({ AskUserQuestion: 'allow' })). It still works for one release — the harness runtime-merges the list into the policy resolver as synthetic'allow'rules (thetiers.factoryslice) and logs a once-per-process warning pointing attoolPolicies. It is removed in a later phase. PrefertoolPoliciesfor new code.
When gating is active, every built-in / MCP tool call reaches the PreToolUse hook, which resolves it against the
policy resolver. (Consumer-declared tools from AgentConfig.tools[i] always bypass — see "Tool-approval policy" above.)
bypassApprovalTools opts specific tool names to 'allow':
const harnessFactory = new ClaudeHarnessFactory({
permissionMode: 'bypassPermissions',
// DEPRECATED — equivalent to:
// toolPolicies: definePolicy({ AskUserQuestion: 'allow' })
// Calls to these tools resolve to 'allow'; they still emit `tool-call` /
// `tool-result` ChatEvents so the consumer can drive UX off the `tool-call`.
bypassApprovalTools: ['AskUserQuestion'],
});The names become synthetic { type: 'builtin', name, decision: 'allow' } rules in the resolver's tiers.factory slice.
They sit below the SDK built-ins and the harness built-ins but above consumer toolPolicies, so a consumer deny of
the same tool still wins (cross-tier deny-wins). Bare-name form only — consumer-declared tools already bypass via the
always-on consumer-tool prefix, so the old dual-form (mcp__sfdx-agent-sdk-consumer-tools__<name>) expansion is no
longer needed.
Note:
permissionMode: 'plan'denies mutating built-ins (Edit,Write,Bash, …) at the SDK's permission cache rather than running them natively. Resolving such a tool to'allow'does not "unlock" it under plan mode — the SDK still denies it. Plan mode applies to the whole subprocess; the policy resolver only chooses whether the harness's coordinator gates the call.
When bypassApprovalTools is undefined or empty and no toolPolicies / defaultToolDecision is set, gating is off and
no PreToolUse hook is installed — no behavior change for existing consumers. (Consumer-declared tools always bypass.)
Tool-result redaction
Tool-result redaction is configured at the manager layer, not the factory. Pass a hooksForAgent callback to
createAgentManager; the SDK threads the resolved AgentHooks bag through createAgent's options.hooks, and the
Claude harness wires hooks.onToolResult to the Claude Agent SDK's PostToolUse hook (updatedToolOutput)
automatically. The same ToolResultRedactor works on the Mastra harness. See the SDK README → "Tool-Result Redaction"
for the full pattern (including a failClosed(...) wrapper consumers can use to substitute a safe stub on throw).
Bash gotcha. Claude's built-in Bash tool requires the { stdout, stderr, interrupted } shape on the redactor's
replacement output. A bare-string return is rejected by the Claude Agent SDK and the original leaks. The harness does
NOT validate this — the redactor knows what tool it's redacting.
If you also register your own PostToolUse hooks via queryDefaults.hooks.PostToolUse, both fire — the harness's hook
is appended last in the list so its updatedToolOutput has final say over the value the model sees.
Development
See DEVELOPING.md for build, test, and packaging instructions.
