@diabolicallabs/agent-sdk
v3.1.0
Published
Cost-tracking middleware for @diabolicallabs/llm-client. Async fire-and-forget ingestion to Agent Spend Dashboard. © Diabolical Labs
Readme
@diabolicallabs/agent-sdk
Cost-tracking middleware for @diabolicallabs/llm-client. Drop-in wrapper that captures call records and dispatches them asynchronously to the Agent Spend Dashboard. © Diabolical Labs
Pre-1.0. APIs may change between minor versions.
Status
Published — v3.0.1. instrumentClient() wraps all five LlmClient methods: complete(), stream(), structured(), streamStructured(), withTools(). Cost propagation (v1.1.0), failover requestedModel tracking (v1.2.0), and streamStructured() (v1.3.0) are included.
v2.0.0 — architecture migration complete: all 5 call types now route through a single buildAfterCallDispatch() function. The stream() and streamStructured() bespoke usage-capture wrappers retained in v1.4.0 are deleted. LlmAfterCallContext.usage is now populated by [email protected] for streaming paths, so agent-sdk no longer needs its own generator iteration for usage capture. Public API is unchanged.
v3.0.1 — peer-dep cleanup: @diabolicallabs/llm-pricing is no longer declared as a peer dependency. The only usage was two import type statements compiled away at build time. The LlmCost type is now defined inline in agent-sdk — no consumer-side install change required. If your project uses @diabolicallabs/llm-client with pricing enabled, install @diabolicallabs/llm-pricing via the llm-client peer-dep path (or directly) — not as an agent-sdk requirement.
Install
pnpm add @diabolicallabs/agent-sdk @diabolicallabs/llm-clientUsage
import { createClientFromEnv } from '@diabolicallabs/llm-client';
import { instrumentClient } from '@diabolicallabs/agent-sdk';
const base = createClientFromEnv('anthropic', 'claude-sonnet-4-6');
const client = instrumentClient(base, {
identity: { agentId: process.env.AGENT_ID!, taskLabel: 'geo-audit' },
ingestionUrl: process.env.SPEND_INGESTION_URL!,
ingestionKey: process.env.SPEND_INGESTION_KEY!,
});
// complete() — non-streaming
const response = await client.complete([{ role: 'user', content: 'Hello' }]);
// CallRecord dispatched asynchronously — response returned immediately
// streamStructured() — token streaming + validated output (v1.3.0)
const { z } = await import('zod');
const schema = z.object({ name: z.string(), score: z.number() });
for await (const event of client.streamStructured(messages, schema)) {
if (event.type === 'token') process.stdout.write(event.token);
if (event.type === 'done') console.log(event.data, event.usage);
}
// One CallRecord dispatched after the done event — not per tokenAPI
instrumentClient(client, config): InstrumentedLlmClient
Wraps any LlmClient with cost-tracking middleware. The returned InstrumentedLlmClient is a drop-in replacement — it implements the same interface.
Config:
| Field | Type | Default | Description |
|---|---|---|---|
| identity.agentId | string | required | UUID from Spend Dashboard agent registry |
| identity.taskLabel | string? | — | Optional label for this call (max 200 chars) |
| identity.projectId | string? | — | Optional project override |
| ingestionUrl | string | required | Agent Spend Dashboard /api/ingest endpoint |
| ingestionKey | string | required | Agent-scoped bearer token |
| maxIngestionRetries | number | 3 | Retries before dropping the record |
| ingestionTimeoutMs | number | 5000 | Ingestion request timeout — never blocks the LLM call |
| disabled | boolean | false | Set true in test/dev to skip all instrumentation |
Instrumented methods and CallRecord behavior:
| Method | CallRecord timing | Cost propagated | Notes |
|---|---|---|---|
| complete() | After response | Yes | requestedModel included on failover |
| stream() | After final chunk | No | Usage from the chunk that carries usage |
| structured() | After response | Yes | requestedModel included on failover |
| streamStructured() | After done event | No | One record per call, usage from the done event |
| withTools() | After response | Yes | tool_calls array included for per-tool attribution |
Ingestion contract
Every LLM call produces a CallRecord dispatched to the ingestion URL:
interface CallRecord {
agent_id: string;
model: string;
requestedModel?: string; // Present when provider failover fired (v1.2.0+)
prompt_tokens: number;
completion_tokens: number;
cache_creation_tokens?: number; // Anthropic prompt cache only
cache_read_tokens?: number; // Anthropic prompt cache only
latency_ms: number;
task_label?: string;
project_id?: string;
timestamp: string; // ISO 8601 UTC
call_id: string; // UUID v4 — idempotency key
tool_calls?: LlmToolCall[]; // withTools() only, omitted when array is empty
cost?: LlmCost; // Present when LlmClient has pricing configured (v1.1.0+)
}When to use instrumentClient vs raw createClient({ hooks })
@diabolicallabs/llm-client v1.5.0+ ships a native hooks API (beforeCall/afterCall) on createClient(). Use it directly when you want request-level interception: PII redaction, cache short-circuit, custom logging.
Use instrumentClient() when you want structured CallRecord ingestion to the Agent Spend Dashboard. It owns the CallRecord schema and the ingestion retry/backoff contract. In v2.0.0, all 5 call types dispatch through a single uniform afterCall handler — the public entry point is always instrumentClient(client, config).
Composition: if you set hooks on a LlmClient config and then pass that client to instrumentClient(), both hooks run. The consumer's afterCall fires first; the ingestion dispatch fires second.
const base = createClient({
provider: 'anthropic',
model: 'claude-sonnet-4-6',
apiKey: process.env.ANTHROPIC_API_KEY!,
hooks: {
beforeCall: async (ctx) => ({
messages: redactPii(ctx.messages),
}),
},
});
// Both the PII hook and ingestion dispatch run on every call
const client = instrumentClient(base, {
identity: { agentId: process.env.AGENT_ID! },
ingestionUrl: process.env.SPEND_INGESTION_URL!,
ingestionKey: process.env.SPEND_INGESTION_KEY!,
});Failure behavior
Ingestion failures are always silent — they never surface to the LLM caller.
- Endpoint down or slow: retried up to
maxIngestionRetrieswith exponential backoff - Retries exhausted: record dropped, structured warning logged (includes
call_idfor audit) disabled: true: all instrumentation skipped, underlying client returned directly
