argus-bedrock-tracer
v0.1.0
Published
TypeScript SDK for integrating Node.js/Bedrock agents with the argus Python evaluation runner
Maintainers
Readme
@ai-eval/bedrock-tracer
TypeScript SDK for integrating your Node.js / Amazon Bedrock agent with the Argus evaluation and observability dashboard.
The SDK provides two integration modes:
| Mode | When to use |
|---|---|
| instrument() | Live tracing — wrap an existing BedrockRuntimeClient to stream real-time traces into Argus with zero agent logic changes |
| EvalTracer + EvalServer | Batch evaluation — expose an eval endpoint that the Python runner calls to score offline prompt sets |
Live Tracing — instrument()
Installation
npm install
npm run build # compiles TypeScript → dist/Usage
import { BedrockRuntimeClient } from "@aws-sdk/client-bedrock-runtime";
import { instrument } from "./dist/instrument";
const client = new BedrockRuntimeClient({ region: "us-east-1" });
const tracer = instrument(client, {
server: "http://localhost:7070", // Argus UI base URL
apiKey: "aek_...", // project API key from Argus Settings
runName: "My Agent — Production", // optional label shown in the UI
tags: ["prod", "v2"],
agentId: "my-agent",
verbose: true, // log each push to console (default: true)
});
// Use `client` exactly as before — instrument() intercepts transparently
const response = await client.send(converseCommand);
// Stop intercepting when done
tracer.stop();
console.log("run ID:", tracer.runId);How it works
instrument()patchesclient.send()to interceptConverseCommandandConverseStreamCommandcalls.- On the first call of each agent invocation it captures:
- System prompt from
command.input.system - Context — the injected content block(s) prepended before the user question (e.g. page URL, user state JSON)
- User prompt — the actual last user question (last content block in the current turn)
- System prompt from
- After each LLM step it pushes a trace snapshot to
POST /api/ingest— the same run is updated in-place so you see live progress in the dashboard. - When the model returns
end_turnthe run is marked complete and state is reset for the next invocation. - A conversation ID is auto-derived from a djb2 hash of the system prompt + first user message — the same conversation always gets the same ID across turns with no extra code.
Context vs. User Prompt separation
If your agent prepends an injected context block as a separate content block before the user's question, the SDK detects and separates them automatically:
command.input.messages = [
{
role: "user",
content: [
{ text: "Current page url is https://... user state: {...}" }, // ← context
{ text: "Give me a login overview" } // ← user prompt
]
}
]contextis sent separately in the ingest payload and shown in a collapsible Context card in the Argus trace detail panel.promptis the actual user question — shown in the traces table and the User Prompt card.
InstrumentOptions
| Option | Type | Default | Description |
|---|---|---|---|
| server | string | required | Argus UI base URL, e.g. "http://localhost:7070" |
| apiKey | string | — | Project API key — automatically assigns traces to the matching project |
| runName | string | "Live — <date>" | Label for the run shown in the UI |
| tags | string[] | — | Tags attached to the run |
| agentId | string | — | Agent identifier stored in run metadata |
| verbose | boolean | true | Log each push to console |
Instrumenter interface
interface Instrumenter {
readonly runId: string | null; // eval-server run ID (null until first push succeeds)
stop(): void; // restores the original client.send()
}Batch Evaluation — EvalTracer + EvalServer
Use this mode when the Python eval runner drives evaluation (it calls your agent server with each prompt from a prompt set).
Python eval runner
│
│ POST /eval { prompt, system_prompt, run_id, prompt_id }
▼
EvalServer (this SDK)
│
│ calls your handler
▼
EvalTracer.run(prompt, toolHandler, tools, systemPrompt)
│
│ Bedrock ConverseCommand loop
▼
Bedrock (Claude / Nova / Llama …)
│
│ tool_use → toolHandler → tool results
▼
EvalTracer returns { content, trace }
│
│ POST response { content, trace: EvalTrace }
▼
Python eval runner ← evaluators run against content + traceQuick start
import {
EvalTracer,
EvalServer,
ToolHandler,
} from "@ai-eval/bedrock-tracer";
import { Tool } from "@aws-sdk/client-bedrock-runtime";
const tools: Tool[] = [
{
toolSpec: {
name: "calculator",
description: "Evaluate a simple arithmetic expression",
inputSchema: {
json: {
type: "object",
properties: {
expression: { type: "string", description: "e.g. '2 + 2'" },
},
required: ["expression"],
},
},
},
},
];
const toolHandler: ToolHandler = async (name, input) => {
if (name === "calculator") {
const result = Function(`"use strict"; return (${input.expression})`)();
return String(result);
}
throw new Error(`Unknown tool: ${name}`);
};
const tracer = new EvalTracer({
modelId: "anthropic.claude-3-5-sonnet-20241022-v2:0",
region: "us-east-1",
maxIterations: 10,
});
const server = new EvalServer({
port: 3000,
handler: async (req) => {
const { content, trace } = await tracer.run(
req.prompt,
toolHandler,
tools,
req.system_prompt ?? undefined
);
return { content, trace };
},
});
server.start();
// Listening on http://localhost:3000
// POST /eval — eval endpoint
// GET /health — health checkEvalTracer options
| Option | Type | Default | Description |
|---|---|---|---|
| modelId | string | required | Bedrock model ID |
| region | string | AWS_REGION env / "us-east-1" | AWS region |
| maxIterations | number | 20 | Max agentic loop iterations |
| client | BedrockRuntimeClient | auto-created | Bring your own client |
tracer.run(
prompt: string,
toolHandler?: ToolHandler,
tools?: Tool[],
systemPrompt?: string,
): Promise<{ content: string; trace: EvalTrace }>EvalServer options
| Option | Type | Default | Description |
|---|---|---|---|
| port | number | 3000 | Listen port |
| evalPath | string | "/eval" | POST endpoint path |
| healthPath | string | "/health" | GET health check path |
| handler | AgentHandler | required | Your agent function |
| logger | Console-like | console | Custom logger |
Connecting to the Python eval runner
from eval_framework.connectors.node_connector import NodeAgentConnector
from eval_framework.runner.runner import EvaluationRunner
connector = NodeAgentConnector(
base_url="http://localhost:3000",
timeout_seconds=120,
)
runner = EvaluationRunner(connector=connector, ...)Shared types
interface EvalTrace {
steps: StepTrace[]; // one per Converse API call
toolCallChain: string[]; // ordered list of every tool invoked
toolsUsed: string[]; // unique tools (ordered by first use)
totalInputTokens: number;
totalOutputTokens: number;
totalToolCalls: number;
totalLatencyMs: number;
finalResponse: string;
terminatedReason: "end_turn" | "max_iterations" | "error" | "unknown";
}Building
npm run build # compiles to dist/
npm run dev # watch modeAWS credentials
Credentials are resolved by @aws-sdk/client-bedrock-runtime in standard priority order:
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY/AWS_SESSION_TOKENenv vars~/.aws/credentialsprofile (AWS_PROFILEenv var)- IAM instance / task role
