@amplitude/ai

v0.14.0

Published

2 days ago

Amplitude AI SDK - LLM usage tracking for Amplitude Analytics

0High
0Medium
0Low

daniel-graham-amplitude

jjwang123

amplitude ai llm analytics tracking openai anthropic

@amplitude/ai

Agent analytics for Amplitude. Track every LLM call, user message, tool call, and quality signal as events in your Amplitude project — then build funnels, cohorts, and retention charts across AI and product behavior.

npm install @amplitude/ai @amplitude/analytics-node

import { AmplitudeAI, OpenAI } from '@amplitude/ai';

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const openai = new OpenAI({ amplitude: ai, apiKey: process.env.OPENAI_API_KEY });
const agent = ai.agent('my-agent');

app.post('/chat', async (req, res) => {
  const session = agent.session({ userId: req.userId, sessionId: req.sessionId });

  const result = await session.run(async (s) => {
    s.trackUserMessage(req.body.message);
    const response = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: req.body.messages,
    });
    return response.choices[0].message.content;
  });

  await ai.flush();
  res.json({ response: result });
});
// Events: [Agent] User Message, [Agent] AI Response (with model, tokens, cost, latency),
//         [Agent] Session End — all tied to userId and sessionId

How to Get Started

Instrument with a coding agent (recommended)

npm install @amplitude/ai
npx amplitude-ai

The CLI prints a prompt to paste into any AI coding agent (Cursor, Claude Code, Windsurf, Copilot, Codex, etc.):

Instrument this app with @amplitude/ai. Follow node_modules/@amplitude/ai/amplitude-ai.md

The agent reads the guide, scans your project, discovers your agents and LLM call sites, and instruments everything — provider wrappers, session lifecycle, multi-agent delegation, tool tracking, scoring, and a verification test. You review and approve each step.

Manual setup

Whether you use a coding agent or set up manually, the goal is the same: full instrumentation — agents + sessions + provider wrappers. This gives you every event type, per-user analytics, and server-side enrichment.

Follow the code example above to get started. The pattern is:

Swap your LLM import — import { OpenAI } from '@amplitude/ai' (or Anthropic, Gemini, etc.)
Create an agent — ai.agent('my-agent') to name and track your AI component
Wrap in a session — agent.session({ userId, sessionId }).run(async (s) => { ... }) for per-user analytics, funnels, cohorts, and server-side enrichment
Track user messages — s.trackUserMessage(...) for conversation context
Score responses — s.score(...) for quality measurement

patch() exists for quick verification or legacy codebases where you can't modify call sites, but it only captures [Agent] AI Response without user identity — no funnels, no cohorts, no retention. Start with full instrumentation; fall back to patch() only if you can't modify call sites.

| Property | Value | | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Name | @amplitude/ai | | Version | 0.11.0 | | Runtime | Node.js | | Peer dependency | @amplitude/analytics-node >= 1.3.0 | | Dependency | @pydantic/genai-prices (cost calculation — installed automatically) | | Optional peers | openai, @anthropic-ai/sdk, @google/generative-ai, @google/genai, @mistralai/mistralai, @aws-sdk/client-bedrock-runtime, tiktoken or js-tiktoken (token counting) |

Installation

npm install @amplitude/ai @amplitude/analytics-node

Install provider SDKs based on what you use (for example: openai, @anthropic-ai/sdk, @google/generative-ai, @google/genai, @mistralai/mistralai, @aws-sdk/client-bedrock-runtime).

Quick Start

5-minute quick start

Install: npm install @amplitude/ai @amplitude/analytics-node
Get your API key: In Amplitude, go to Settings > Projects and copy the API key.
Instrument: Run npx amplitude-ai and paste the printed prompt into your AI coding agent. Or follow the manual setup steps — the goal is the same: agents + sessions + provider wrappers.
Set your API key in the generated .env file and replace the placeholder userId/sessionId.
Run your app. You should see [Agent] User Message, [Agent] AI Response, and [Agent] Session End within 30 seconds.

To verify locally before checking Amplitude, add debug: true:

const ai = new AmplitudeAI({
  apiKey: process.env.AMPLITUDE_AI_API_KEY!,
  config: new AIConfig({ debug: true }),
});
// Prints: [amplitude-ai] [Agent] AI Response | model=gpt-4o | tokens=847 | cost=$0.0042 | latency=1,203ms

Tip: Call enableLivePriceUpdates() at startup so cost tracking stays accurate when new models are released. See Cache-Aware Cost Tracking.

Current Limitations

| Area | Status | | ---- | ------ | | Runtime | Node.js only (no browser). Python SDK available separately (amplitude-ai on PyPI). | | Zero-code patching | OpenAI, Anthropic, Azure OpenAI, Gemini, Mistral, Bedrock (Converse, ConverseStream, and InvokeModel). | | CrewAI | Python-only; the Node.js export throws ProviderError by design. Use LangChain or OpenTelemetry integrations instead. | | OTEL scope filtering | Not yet supported (Python SDK has allowed_scopes/blocked_scopes). | | Streaming cost tracking | Automatic for OpenAI and Anthropic. Manual token counts required for other providers' streamed responses. |

Is this for me?

Yes, if you're building an AI-powered feature (chatbot, copilot, agent, RAG pipeline) and you want to measure how it impacts real user behavior. AI events land in the same Amplitude project as your product events, so you can build funnels from "user asks a question" to "user converts," create cohorts of users with low AI quality scores, and measure retention without stitching data across tools.

Already using an LLM observability tool? Keep it. The OTEL bridge adds Amplitude as a second destination in one line. Your existing traces stay, and you get product analytics on top.

Why this SDK?

Most AI observability tools give you traces. This SDK gives you per-turn events that live in your product analytics so you can:

Build funnels from "user opens chat" through "AI responds" to "user converts"
Create cohorts of users with low AI quality scores and measure their 7-day retention
Answer "is this AI feature helping or hurting?" without moving data between tools

The structural difference is the event model. Trace-centric tools typically produce spans per LLM call. This SDK produces one event per conversation turn with 40+ properties: model, tokens, cost, latency, reasoning, implicit feedback signals (regeneration, copy, abandonment), cache breakdowns, agent hierarchy, and experiment context. Each event is independently queryable in Amplitude's charts, cohorts, funnels, and retention analysis.

Every AI event carries your product user_id. No separate identity system, no data joining required. Build a funnel from "user opens chat" to "AI responds" to "user upgrades" directly in Amplitude.

Server-side enrichment does the evals for you. When content is available (contentMode: 'full'), Amplitude's enrichment pipeline runs automatically on every session after it closes. You get topic classifications, quality rubrics, behavioral flags, and session outcomes without writing or maintaining any eval code. Define your own topics and scoring rubrics; the pipeline applies them to every session automatically. Results appear as [Agent] Evaluator Result events with rubric scores (output_type: 'score'), [Agent] Topic Classification events with category labels, and [Agent] Session Record summaries, all queryable in charts, cohorts, and funnels alongside your product events.

Quality signals from multiple event types. User thumbs up/down and customer-run evals produce [Agent] Score events via the SDK's score() method (source: 'user', source: 'ai', or source: 'reviewer'). Server-side enrichment rubric scores produce [Agent] Evaluator Result events (output_type: 'score'). Both are queryable in charts, cohorts, and funnels. Filter by [Agent] Evaluation Source or [Agent] Agent ID for per-agent quality attribution.

Three content-control tiers. full sends content and Amplitude runs enrichments for you. metadata_only sends zero content (you still get cost, latency, tokens, session grouping). customer_enriched sends zero content but lets you provide your own structured labels via trackSessionEnrichment().

Cache-aware cost tracking. Pass cacheReadTokens and cacheCreationTokens for accurate blended costs. Without this breakdown, naive cost calculation can overestimate by 2-5x for cache-heavy workloads.

What you can build

Once AI events are in Amplitude alongside your product events:

Cohorts. "Users who had 3+ task failures in the last 30 days." "Users with low task completion scores." Target them with Guides, measure churn impact.
Funnels. "AI session about charts -> Chart Created." "Sign Up -> First AI Session -> Conversion." Measure whether AI drives feature adoption and onboarding.
Retention. Do users with successful AI sessions retain better than those with failures? Segment retention curves by [Agent] Session Outcome or [Agent] Task Completed.
Agent analytics. Compare quality, cost, and failure rate across agents in one chart. Identify which agent in a multi-agent chain introduced a failure.

How quality measurement works

The SDK captures quality signals at three layers, from most direct to most comprehensive:

1. Explicit user feedback — Instrument thumbs up/down, star ratings, or CSAT scores via trackScore(). Each call produces an [Agent] Score event with source: 'user':

ai.trackScore({
  userId: 'u1', name: 'user-feedback', value: 1,
  targetId: aiMessageId, targetType: 'message', source: 'user',
});

2. Implicit behavioral signals — The SDK auto-tracks behavioral proxies for quality on every turn, with zero additional instrumentation:

| Signal | Property | Event | Interpretation | |--------|----------|-------|----------------| | Copy | [Agent] Was Copied | [Agent] AI Response | User copied the output — positive | | Regeneration | [Agent] Is Regeneration | [Agent] User Message | User asked for a redo — negative | | Edit | [Agent] Is Edit | [Agent] User Message | User refined their prompt — friction | | Abandonment | [Agent] Abandonment Turn | [Agent] Session End | User left after N turns — potential failure |

3. Automated server-side evaluation — When contentMode: 'full', Amplitude's enrichment pipeline runs LLM-as-judge evaluators on every session after it closes. No eval code to write or maintain:

| Rubric | What it measures | Scale | |--------|-----------------|-------| | task_completion | Did the agent accomplish what the user asked? | 0–2 | | response_quality | Was the response clear, accurate, and helpful? | 0–2 | | user_satisfaction | Did the user seem satisfied based on conversation signals? | 0–2 | | agent_confusion | Did the agent misunderstand or go off track? | 0–2 |

Plus boolean detectors: negative_feedback (frustration phrases), task_failure (agent failed to deliver), data_quality_issues, and user_friction (clarification loops, topic drift). All results are emitted as [Agent] Score events with source: 'ai'.

All three layers use the same [Agent] Score event type, differentiated by [Agent] Evaluation Source ('user', 'ai', or 'reviewer'). One chart shows user feedback alongside automated evals. No joins, no separate tables.

What You Set vs What You Get

| You set | Where it comes from | What you unlock | |---|---|---| | API key | Amplitude project settings | Events reach Amplitude | | userId | Your auth layer (JWT, session cookie, API token) | Per-user analytics, cohorts, retention | | agentId | Your choice (e.g. 'chat-handler') | Per-agent cost, latency, quality dashboards | | sessionId | Agent session ID — your thread, ticket, call, or run ID (see What is an agent session?) | Multi-turn analysis, session enrichment, quality scores | | description | Your choice (e.g. 'Handles support queries via GPT-4o') | Human-readable agent registry from event streams | | contentMode + redactPii | Config (defaults work) | Server enrichment (automatic), PII scrubbing | | model, tokens, cost | Auto-captured by provider wrappers | Cost analytics, latency monitoring | | parentAgentId | Auto via child()/runAs() | Multi-agent hierarchy | | env, agentVersion, context | Your deploy pipeline | Segmentation, regression detection |

Italicized rows require zero developer effort — they're automatic or have sensible defaults.

The minimum viable setup is 4 fields: API key, userId, agentId, sessionId. Everything else is either automatic or a progressive enhancement.

What You Get at Each Level

The coding agent workflow defaults to full instrumentation — the top row below. Lower levels exist as fallbacks, not as recommended starting points.

| Level | Events you get | What it unlocks in Amplitude | |---|---|---| | Full (agents + sessions + wrappers) | User Message, AI Response, Tool Call, Session End, Score, Enrichments | Per-user funnels, cohorts, retention, session replay linking, quality scoring | | Wrappers only (no sessions) | AI Response (with cost, tokens, latency) | Aggregate cost monitoring, model comparison | | patch() only (no wrappers, no sessions) | AI Response (basic) | Aggregate call counts — useful for verification only |

Support matrix

Fully supported in Node.js: OpenAI chat completions, OpenAI Responses API, Azure OpenAI chat completions, Anthropic messages, Gemini, Mistral, Bedrock, LangChain, OpenTelemetry, LlamaIndex.
Partial support: zero-code patch() is best-effort by installed SDK and provider surface; OpenAI Agents tracing depends on incoming span payload shape from the host SDK.
Not currently supported in Node.js:
- AmplitudeCrewAIHooks is Python-only and throws in Node.js.

Parity and runtime limitations

This section is the source of truth for behavior that is intentionally different from Python due to runtime constraints:

AmplitudeCrewAIHooks is unsupported in Node.js (CrewAI is Python-only).
tool() does not auto-generate JSON Schema from runtime type hints; pass inputSchema explicitly.
Tool timeout behavior is async Promise.race based and cannot preempt synchronous CPU-bound code.
Auto-instrument bootstrap differs by runtime (node --import in Node vs sitecustomize in Python).
Request middleware differs by runtime (Express-compatible in Node vs ASGI middleware in Python).

Zero-code (for verification or legacy codebases)

patch() monkey-patches provider SDKs so existing LLM calls are tracked without code changes. This is useful for verifying the SDK works or for legacy codebases where you can't modify call sites. It only captures [Agent] AI Response without user identity — for the full event model, use agents + sessions (see Quick Start).

import { AmplitudeAI, patch } from '@amplitude/ai';
// OpenAI/Azure OpenAI chat completions (+ parse), OpenAI Responses, Anthropic, Gemini, Mistral,
// and Bedrock Converse calls are tracked when patching succeeds.
// No changes to your existing code needed.
import OpenAI from 'openai';

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
patch({ amplitudeAI: ai });

const openai = new OpenAI();

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
});
// ^ automatically tracked as [Agent] AI Response

Warning: Patched calls that fire outside an active session context are silently dropped — no event is emitted and no error is thrown. If you instrument with patch() but see no events, this is the most likely cause. Wrap your LLM calls in session.run(), use the Express middleware, or pass context explicitly. See Session and Middleware.

Or use the CLI to auto-patch at process start without touching application code:

AMPLITUDE_AI_API_KEY=xxx AMPLITUDE_AI_AUTO_PATCH=true amplitude-ai-instrument node app.js

Wrap (recommended for production)

Replace the provider constructor with the Amplitude-instrumented version for automatic tracking with full control over options per call:

import { AmplitudeAI, OpenAI } from '@amplitude/ai';

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const openai = new OpenAI({
  amplitude: ai,
  apiKey: process.env.OPENAI_API_KEY,
});

const agent = ai.agent('my-agent', { userId: 'user-123' });
const session = agent.session();

await session.run(async () => {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: 'Hello' }],
  });
  // AI response tracked automatically via wrapper

  const responseV2 = await openai.responses.create({
    model: 'gpt-4.1',
    instructions: 'You are concise.',
    input: [{ role: 'user', content: 'Summarize this in one sentence.' }],
  });
  // OpenAI Responses API is also tracked automatically
});

Or wrap an existing client instance (supports OpenAI, Azure OpenAI, Anthropic, Gemini, Google Gen AI, Bedrock, and Mistral):

import { wrap } from '@amplitude/ai';
import OpenAI from 'openai';

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const instrumented = wrap(client, ai);

All provider constructors and wrap() accept either an AmplitudeAI instance or a raw Amplitude client — both work:

new OpenAI({ amplitude: ai }); // AmplitudeAI instance
new OpenAI({ amplitude: ai.amplitude }); // raw Amplitude client
wrap(client, ai); // AmplitudeAI instance
wrap(client, ai.amplitude); // raw Amplitude client

Note: wrap() supports OpenAI, Azure OpenAI, Anthropic, Gemini (@google/generative-ai), Google Gen AI (@google/genai), Bedrock (@aws-sdk/client-bedrock-runtime), and Mistral clients. OpenAI/Anthropic clients are reconstructed from extracted credentials; Gemini/Google Gen AI/Bedrock/Mistral clients are adopted directly so your configured transport (Vertex AI project, AWS region/credentials, custom server URL) is preserved.

Full control

Call tracking methods directly for maximum flexibility. Works with any LLM provider, including custom or self-hosted models:

import { AmplitudeAI } from '@amplitude/ai';

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const agent = ai.agent('my-agent', { userId: 'user-123' });
const session = agent.session({ userId: 'user-123' });

await session.run(async (s) => {
  s.trackUserMessage('Summarize this document');

  const start = performance.now();
  const response = await myCustomLLM.generate('Summarize this document');
  const latencyMs = performance.now() - start;

  s.trackAiMessage(response.text, 'my-model-v2', 'custom', latencyMs, {
    inputTokens: response.usage.input,
    outputTokens: response.usage.output,
  });
});

Core Concepts

AmplitudeAI

Main client that wraps Amplitude analytics-node. Create it with an API key or an existing Amplitude instance:

const ai = new AmplitudeAI({ apiKey: 'YOUR_API_KEY' });
// Or with existing client:
const ai = new AmplitudeAI({ amplitude: existingAmplitudeClient });

BoundAgent

Agent with pre-bound defaults (agentId, description, userId, env, etc.). Use agent() to create:

const agent = ai.agent('support-bot', {
  description: 'Handles customer support queries via OpenAI GPT-4o',
  userId: 'user-123',
  env: 'production',
  customerOrgId: 'org-456',
});

Child agents inherit context from their parent and automatically set parentAgentId (note: description is agent-specific and is not inherited — pass it explicitly if needed):

const orchestrator = ai.agent('orchestrator', {
  description: 'Routes queries to specialized child agents',
  userId: 'user-123',
});
const researcher = orchestrator.child('researcher');
const writer = orchestrator.child('writer', {
  description: 'Drafts responses using retrieved context',
});
// researcher.parentAgentId === 'orchestrator'
// researcher inherits orchestrator's description; writer has its own

TenantHandle

Multi-tenant helper that pre-binds customerOrgId for all agents created from it:

const tenant = ai.tenant('org-456', { env: 'production' });
const agent = tenant.agent('support-bot', { userId: 'user-123' });

User Identity

User identity flows through the session, per-call, or middleware -- not at agent creation or patch time. This keeps the agent reusable across users.

Via sessions (recommended): pass userId when opening a session:

const agent = ai.agent('support-bot', { env: 'production' });
const session = agent.session({ userId: 'user-42' });

await session.run(async (s) => {
  s.trackUserMessage('Hello');
  // userId inherited from session context
});

Per-call: pass userId on each tracking call (useful with the zero-code tier):

agent.trackUserMessage('Hello', {
  userId: 'user-42',
  sessionId: 'sess-1',
});

Via middleware: createAmplitudeAIMiddleware extracts user identity from the request (see Middleware):

app.use(
  createAmplitudeAIMiddleware({
    amplitudeAI: ai,
    userIdResolver: (req) => req.headers['x-user-id'] ?? null,
  }),
);

Session

What is an agent session?

An agent session is not the same as Amplitude's standard analytics session.

| | Agent session | Analytics session | |--|---------------|-------------------| | Property | [Agent] Session ID | $session_id (Browser SDK) | | SDK parameter | sessionId on agent.session() | browserSessionId on agent.session() | | Meaning | One unit of work the user hands the agent — a job with a real outcome | The user's app or web visit; powers Session Replay and standard product analytics |

You can have multiple agent sessions inside one analytics session. For example: when you open Amplitude, you start an analytics session; when you launch Global Agent from within Amplitude, you start an agent session.

What counts as one agent session: one job from start to finish. Pass the ID you already track as sessionId:

Chatbot / copilot → thread or conversation ID
Support agent → ticket ID
Coding agent → task or ticket ID
Voice agent → call ID
Background / autonomous agent → run or job ID

You're not inventing a new identifier — use the thread, ticket, call, or run ID your app already has. Quickstart examples that omit sessionId (UUID auto-generation) are for demos only; production apps should pass your real job ID.

When does an agent session end?

Close explicitly when the job is done (recommended). Call trackSessionEnd() or let session.run() complete when the unit of work is finished (ticket resolved, call ended, run completed). This marks the session as complete and makes it eligible for server-side enrichment immediately.

Closing does not block later events with the same sessionId — it is a completion marker, not a hard lock. If the user continues the same job (same thread or ticket), reusing the same sessionId is fine.

Idle timeout (automatic fallback). If you don't close explicitly, the server marks the session eligible for enrichment after idleTimeoutMinutes of inactivity (default 30 minutes). Raise it for jobs with long natural gaps — e.g. 240 for support tickets or coding agents with pauses between turns.

New goal → new session. When the user starts a different job, use a new sessionId.

Async context manager using AsyncLocalStorage. Use session.run() to execute a callback within session context; session end is tracked automatically on exit:

const session = agent.session({ userId: 'user-123' });
await session.run(async (s) => {
  s.trackUserMessage('Hello');
  s.trackAiMessage(response.content, 'gpt-4', 'openai', latencyMs);
});

Start a new trace within an ongoing session to group related operations:

await session.run(async (s) => {
  const traceId = s.newTrace();
  s.trackUserMessage('Follow-up question');
  s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs);
});

For sessions where gaps between messages may exceed 30 minutes (e.g., coding assistants, support agents waiting on customer replies), pass idleTimeoutMinutes so Amplitude knows the session is still active:

const session = agent.session({
  userId: 'user-123',
  idleTimeoutMinutes: 240, // expect up to 4-hour gaps
});

Without this, sessions with long idle periods may be closed and enrichment may run earlier than expected. The default is 30 minutes.

Link to Session Replay: If your frontend uses Amplitude's Session Replay, pass the browser's analytics $session_id as browserSessionId (separate from agent sessionId) plus deviceId to link agent sessions to browser recordings:

const session = agent.session({
  userId: 'user-123',
  sessionId: req.headers['x-thread-id'], // agent job ID (thread, ticket, call, or run)
  deviceId: req.headers['x-amp-device-id'],
  browserSessionId: req.headers['x-amp-session-id'], // analytics $session_id for replay
});

await session.run(async (s) => {
  s.trackUserMessage('What is retention?');
  // All events now carry [Amplitude] Session Replay ID = deviceId/browserSessionId
});

tool()

Higher-order function wrapping functions to auto-track as [Agent] Tool Call events:

import { tool } from '@amplitude/ai';

const searchDb = tool(
  async (query: { q: string }) => {
    return await db.search(query.q);
  },
  {
    name: 'search_db',
    inputSchema: { type: 'object', properties: { q: { type: 'string' } } },
  },
);

Note on inputSchema: Unlike the Python SDK which accepts a Pydantic model class and extracts the JSON Schema automatically, the TypeScript SDK accepts a raw JSON Schema object. For type-safe schema generation, consider using Zod with zod-to-json-schema:

import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';

const QuerySchema = z.object({ q: z.string(), limit: z.number().optional() });
const searchDb = tool(mySearchFn, {
  name: 'search_db',
  inputSchema: zodToJsonSchema(QuerySchema),
});

observe()

Higher-order function wrapping functions to auto-track as [Agent] Span events:

import { observe } from '@amplitude/ai';

const processRequest = observe(
  async (input: Request) => {
    return await handleRequest(input);
  },
  { name: 'process_request' },
);

Configuration

import { AIConfig, AmplitudeAI, ContentMode } from '@amplitude/ai';

const config = new AIConfig({
  contentMode: ContentMode.FULL, // FULL | METADATA_ONLY | CUSTOMER_ENRICHED — both ContentMode.FULL and 'full' work
  redactPii: true,
  customRedactionPatterns: ['sensitive-\\d+'],
  debug: false,
  dryRun: false,
});

const ai = new AmplitudeAI({ apiKey: 'YOUR_API_KEY', config });

| Option | Description | | ------------------------- | ----------------------------------------------------------------------------------------------------------- | | contentMode | 'full' (default), 'metadata_only', or 'customer_enriched'. Both ContentMode.FULL and 'full' work. | | redactPii | Redact email, phone, SSN, credit-card, and IP-address patterns from tracked content. Defaults to true — set to false to opt out. | | customRedactionPatterns | Additional regex patterns for redaction. Accepts strings ([REDACTED] label) or { pattern, replacement } objects for named labels. | | customRedactionFn | Optional (text: string) => string callback for custom redaction logic (e.g. compromise.js NER). Called after all regex-based redaction. | | debug | Log events to stderr | | dryRun | Log without sending to Amplitude | | validate | Enable strict validation of required fields | | onEventCallback | Callback invoked exactly once per tracked event (event, statusCode, message) => void — fired from the underlying Amplitude client's delivery path, so it is not double-invoked by provider wrappers or patch() | | propagateContext | Enable cross-service context propagation |

Context Dict Conventions

The context parameter on ai.agent() accepts an arbitrary Record<string, unknown> that is JSON-serialized and attached to every event as [Agent] Context. This is the recommended way to add segmentation dimensions without requiring new global properties.

Recommended keys:

| Key | Example Values | Use Case | | --- | --- | --- | | agent_type | "planner", "executor", "retriever", "router" | Filter/group analytics by agent role in multi-agent systems. | | experiment_variant | "control", "treatment-v2", "prompt-rewrite-a" | Segment AI sessions by A/B test variant. Compare quality scores, abandonment rates, or cost across experiment arms. | | feature_flag | "new-rag-pipeline", "reasoning-model-enabled" | Track which feature flags were active during the session. | | surface | "chat", "search", "copilot", "email-draft" | Identify which UI surface or product area triggered the AI interaction. | | prompt_revision | "v7", "abc123", "2026-02-15" | Track which prompt version was used. Detect prompt regression when combined with agentVersion. | | deployment_region | "us-east-1", "eu-west-1" | Segment by deployment region for latency analysis or compliance tracking. | | canary_group | "canary", "stable" | Identify canary vs. stable deployments for progressive rollout monitoring. |

Example:

const agent = ai.agent('support-bot', {
  userId: 'u1',
  description: 'Handles customer support queries via OpenAI GPT-4o',
  agentVersion: '4.2.0',
  context: {
    agent_type: 'executor',
    experiment_variant: 'reasoning-enabled',
    surface: 'chat',
    feature_flag: 'new-rag-pipeline',
    prompt_revision: 'v7',
  },
});

// All events from this agent (and its sessions, child agents, and provider
// wrappers) will include [Agent] Context with these keys.

Context merging in child agents:

const parent = ai.agent('orchestrator', {
  context: { experiment_variant: 'treatment', surface: 'chat' },
});
const child = parent.child('researcher', {
  context: { agent_type: 'retriever' },
});
// child context = { experiment_variant: 'treatment', surface: 'chat', agent_type: 'retriever' }
// Child keys override parent keys; parent keys absent from the child are preserved.

Querying in Amplitude: The [Agent] Context property is a JSON string. To query individual keys:

Derived properties: For frequently-used keys, create a derived event property in Amplitude (Data → Properties → Derived → New) that extracts the value permanently.
Filter: Use [Agent] Context contains "key":"value" for string matching in chart filters.

Note on experiment_variant and server-generated events: Context keys appear on all SDK-emitted events ([Agent] User Message, [Agent] AI Response, etc.). [Agent] Session Record also carries [Agent] Context — the enrichment pipeline echoes it from the session's SDK events onto the server-generated record, so you can segment session-level quality, outcomes, and flags directly by a context key (e.g., experiment_variant). Other server-generated events (e.g. [Agent] Score with source="ai") do not yet inherit context keys; to segment those by experiment arm, use Amplitude Derived Properties to extract from [Agent] Context on the SDK events.

Privacy & Content Control

Three content modes control what data is sent to Amplitude:

| Mode | Message Content | Token/Cost/Latency | Session Grouping | Server Enrichments | | ------------------- | ------------------------- | ------------------ | ---------------- | ------------------ | | FULL | Sent (with PII redaction) | Yes | Yes | Yes (auto) | | METADATA_ONLY | Not sent | Yes | Yes | No | | CUSTOMER_ENRICHED | Not sent | Yes | Yes | Yes (you provide) |

FULL mode (default)

Message content is captured and sent to Amplitude. PII redaction is on by default — built-in patterns scrub emails, phone numbers (US and international E.164), SSNs (dashed and spaced), credit card numbers, IPv4/IPv6 addresses, and base64 image data before the event leaves your process. Set redactPii: false to opt out:

const config = new AIConfig({
  contentMode: ContentMode.FULL,
  redactPii: true, // default; pass false to disable
});

With the default redactPii: true, a message like "Contact me at [email protected] or 555-123-4567" is sanitized to "Contact me at [email] or [phone]" before being sent.

Upgrading to 0.7.0 with redactPii: true? This release adds IPv4/IPv6 → [ip_address], international phone → [phone], and space-separated SSN → [ssn] placeholders. If any downstream pipeline or dashboard regex matches on raw IP or phone content in event properties, update those filters before upgrading.

Built-in patterns now include international phone numbers (E.164 +country...) and IPv4/IPv6 addresses. Add custom patterns for domain-specific PII:

const config = new AIConfig({
  contentMode: ContentMode.FULL,
  redactPii: true,
  customRedactionPatterns: ['ACCT-\\d{6,}', 'internal-key-[a-f0-9]+'],
});

Named replacements — use { pattern, replacement } objects for descriptive labels:

const config = new AIConfig({
  redactPii: true,
  customRedactionPatterns: [
    { pattern: '\\bACME-\\d+\\b', replacement: '[ticket_id]' },
    { pattern: '\\bORD-[A-Z0-9]+\\b', replacement: '[order_id]' },
  ],
});

Custom redaction function — plug in any external PII engine:

const config = new AIConfig({
  redactPii: true,
  customRedactionFn: myCustomScrubber, // (text: string) => string
});

The function runs after all built-in and custom-pattern redaction, receives the partially-redacted text, and must return a string. If it throws an exception, the SDK logs a warning and preserves the text from prior tiers unchanged.

Recipe: compromise.js for name/address detection

import nlp from 'compromise';

function redactNames(text: string): string {
  const doc = nlp(text);
  doc.people().replaceWith('[person]');
  doc.places().replaceWith('[location]');
  return doc.text();
}

const ai = new AmplitudeAI({
  apiKey: 'YOUR_KEY',
  config: new AIConfig({
    redactPii: true,
    customRedactionFn: redactNames,
  }),
});

Custom redaction patterns are your responsibility: avoid expensive or catastrophic regexes in performance-sensitive paths.

Message content is stored at full length with no truncation or size limits. The $llm_message property is whitelisted server-side, and the Node SDK does not apply per-property string truncation.

METADATA_ONLY mode

No message content is sent. You still get token counts, cost, latency, model name, and session grouping — everything needed for cost analytics and performance monitoring:

const config = new AIConfig({
  contentMode: ContentMode.METADATA_ONLY,
});

Use this when you cannot send user content to a third-party analytics service (e.g., regulated industries, sensitive data).

CUSTOMER_ENRICHED mode

Like METADATA_ONLY (no content sent), but designed for workflows where you enrich sessions with your own classifications, quality scores, and topic labels via the SessionEnrichments API:

const config = new AIConfig({
  contentMode: ContentMode.CUSTOMER_ENRICHED,
});

// Later, after running your own classification pipeline:
const enrichments = new SessionEnrichments({
  qualityScore: 0.85,
  overallOutcome: 'resolved',
});
session.setEnrichments(enrichments);

PrivacyConfig (advanced)

PrivacyConfig is derived from AIConfig via config.toPrivacyConfig(). For advanced use, create directly:

import { PrivacyConfig } from '@amplitude/ai';

const privacy = new PrivacyConfig({
  privacyMode: true,
  redactPii: true,
  customRedactionPatterns: ['sensitive-\\d+'],
});

When to use which mode

FULL: You want to see actual conversation content in Amplitude, debug individual sessions, and leverage server-side enrichment pipelines. Best for development, internal tools, and applications where data sharing agreements permit it.
METADATA_ONLY: You want cost/performance analytics without exposing any message content. Best for regulated environments (healthcare, finance) or when content contains proprietary data.
CUSTOMER_ENRICHED: You want the privacy of METADATA_ONLY but also want structured analytics (topic classification, quality scores) that you compute on your own infrastructure before sending to Amplitude.

Cache-Aware Cost Tracking

When using provider prompt caching (Anthropic's cache, OpenAI's cached completions, etc.), pass cache token breakdowns for accurate cost calculation:

s.trackAiMessage(
  response.content,
  'claude-3.5-sonnet',
  'anthropic',
  latencyMs,
  {
    inputTokens: response.usage.input_tokens,
    outputTokens: response.usage.output_tokens,
    cacheReadTokens: response.usage.cache_read_input_tokens,
    cacheCreationTokens: response.usage.cache_creation_input_tokens,
  },
);

Without cache breakdowns, cost calculation treats all input tokens at the standard rate. With caching enabled, cache-read tokens are typically 10x cheaper than standard input tokens and cache-creation tokens are ~25% more expensive. Naive cost calculation without this breakdown can overestimate costs by 2-5x for cache-heavy workloads.

The SDK tracks four token categories:

[Agent] Input Tokens — standard (non-cached) input tokens
[Agent] Output Tokens — generated output tokens
[Agent] Cache Read Tokens — tokens read from provider cache (cheap)
[Agent] Cache Creation Tokens — tokens written to provider cache (slightly expensive)

Cost is auto-calculated when token counts are provided. The @pydantic/genai-prices package is included as a dependency and installed automatically with the SDK. If the package fails to load (e.g. in certain bundler environments), calculateCost() returns 0 and logs a warning. You can also pass totalCostUsd directly if you compute cost yourself:

s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs, {
  totalCostUsd: 0.0034,
});

Note — pricing data freshness. Cost calculation relies on pricing data bundled in the installed @pydantic/genai-prices package. Newly released models may return $0 until the package is updated. To get the latest pricing between package releases, opt in to live updates at startup:
import { enableLivePriceUpdates } from '@amplitude/ai';
enableLivePriceUpdates(); // fetches latest prices from genai-prices GitHub repo hourly
This makes periodic HTTPS requests to raw.githubusercontent.com (~26 KB each). Only enable in environments where outbound network access is permitted.

Semantic Cache Tracking

Track full-response semantic cache hits (distinct from token-level prompt caching above):

s.trackAiMessage(cachedResponse.content, 'gpt-4o', 'openai', latencyMs, {
  wasCached: true, // served from Redis/semantic cache
});

Maps to [Agent] Was Cached. Enables "cache hit rate" charts and cost optimization analysis. Only emitted when true; omitted (not false) when the response was not cached.

Model Tier Classification

Models are automatically classified into tiers for cost/performance analysis:

| Tier | Examples | When to Use | | ----------- | -------------------------------------------------------- | ------------------------------ | | fast | gpt-4o-mini, claude-3-haiku, gemini-flash, gpt-3.5-turbo | High-volume, latency-sensitive | | standard | gpt-4o, claude-3.5-sonnet, gemini-pro, llama, command | General purpose | | reasoning | o1, o3-mini, deepseek-r1, claude with extended thinking | Complex reasoning tasks |

The tier is inferred automatically from the model name and attached as [Agent] Model Tier on every [Agent] AI Response event:

import {
  inferModelTier,
  TIER_FAST,
  TIER_REASONING,
  TIER_STANDARD,
} from '@amplitude/ai';

inferModelTier('gpt-4o-mini'); // 'fast'
inferModelTier('claude-3.5-sonnet'); // 'standard'
inferModelTier('o1-preview'); // 'reasoning'

Override the auto-inferred tier for custom or fine-tuned models:

s.trackAiMessage(
  response.content,
  'ft:gpt-4o:my-org:custom',
  'openai',
  latencyMs,
  {
    modelTier: 'standard',
    inputTokens: response.usage.prompt_tokens,
    outputTokens: response.usage.completion_tokens,
  },
);

Provider Wrappers

Use instrumented provider wrappers for automatic tracking:

| Provider | Class | Package | | ------------ | ------------- | ------------------------------- | | OpenAI | OpenAI | openai | | Anthropic | Anthropic | @anthropic-ai/sdk | | Gemini | Gemini | @google/generative-ai | | Google Gen AI | GoogleGenAI | @google/genai | | AzureOpenAI | AzureOpenAI | openai | | Bedrock | Bedrock | @aws-sdk/client-bedrock-runtime | | Mistral | Mistral | @mistralai/mistralai |

Gemini targets the legacy @google/generative-ai SDK (getGenerativeModel(...).generateContent(...)). GoogleGenAI targets the new unified @google/genai SDK (ai.models.generateContent({ model, contents, config })). Pick the class that matches the package you installed.

Feature coverage by provider:

| Feature | OpenAI | Anthropic | Gemini | AzureOpenAI | Bedrock | Mistral | | --------------------- | ------ | --------- | ------ | ----------- | ------- | ------- | | Streaming | Yes | Yes | Yes | Yes | Yes | Yes | | Tool call tracking | Yes | Yes | No | Yes | Yes | No | | TTFB measurement | Yes | Yes | No | Yes | No | No | | Cache token stats | Yes | Yes | No | No | No | No | | Responses API | Yes | - | - | - | - | - | | Reasoning content | Yes | Yes | No | Yes | No | No | | System prompt capture | Yes | Yes | Yes | Yes | Yes | Yes | | Cost estimation | Yes | Yes | Yes | Yes | Yes | Yes |

Provider wrappers use injected TrackFn callbacks instead of class hierarchy casts, enabling easier composition and custom tracking logic.

Provider wrappers + delegation: Inside session.runAs(), provider wrappers suppress auto [Agent] User Message emission so that internal role: "user" prompts in delegation calls don't create spurious user turns.

Bedrock model IDs like us.anthropic.claude-3-5-sonnet are automatically normalized for price lookup (e.g., to claude-3-5-sonnet).

OpenAI example:

import { AmplitudeAI, OpenAI } from '@amplitude/ai';

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const openai = new OpenAI({
  amplitude: ai,
  apiKey: process.env.OPENAI_API_KEY,
});

const agent = ai.agent('my-agent', { userId: 'user-123' });
const session = agent.session();

await session.run(async (s) => {
  const resp = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: 'Hello' }],
  });
  // AI response tracked automatically via wrapper
});

Or wrap an existing client:

import { wrap } from '@amplitude/ai';
import OpenAI from 'openai';

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const instrumented = wrap(client, ai);

Streaming Tracking

Automatic streaming (provider wrappers)

Provider wrappers (OpenAI, AzureOpenAI, Anthropic, Gemini, Mistral, Bedrock) automatically detect supported streaming responses and track them transparently. The wrapper intercepts the AsyncIterable, accumulates chunks, measures TTFB, and emits an [Agent] AI Response event after the stream is fully consumed:

const openai = new OpenAI({ amplitude: ai, apiKey: '...' });

// Streaming is handled automatically — just iterate the result
const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
// ^ AI Response event emitted automatically after loop ends

Manual streaming

Track streaming responses manually with time-to-first-byte (TTFB) for latency analysis:

s.trackAiMessage(fullContent, 'gpt-4o', 'openai', totalMs, {
  isStreaming: true,
  ttfbMs: timeToFirstByte,
  inputTokens: usage.prompt_tokens,
  outputTokens: usage.completion_tokens,
});

The SDK tracks two timing properties for streaming:

[Agent] Latency Ms — total wall-clock time from request to final chunk
[Agent] TTFB Ms — time-to-first-byte, the delay before the first token arrives

StreamingAccumulator

For manual streaming, use StreamingAccumulator to collect chunks and automatically measure TTFB:

import { StreamingAccumulator } from '@amplitude/ai';

const accumulator = new StreamingAccumulator();

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    accumulator.addContent(content);
  }
}

accumulator.setUsage({
  inputTokens: finalUsage.prompt_tokens,
  outputTokens: finalUsage.completion_tokens,
});

s.trackAiMessage(
  accumulator.content,
  'gpt-4o',
  'openai',
  accumulator.elapsedMs,
  {
    isStreaming: true,
    ttfbMs: accumulator.ttfbMs,
    inputTokens: accumulator.inputTokens,
    outputTokens: accumulator.outputTokens,
    finishReason: accumulator.finishReason,
  },
);

The accumulator automatically records TTFB when addContent() is called for the first time, and tracks total elapsed time via elapsedMs. For streaming errors, call setError(message) to set isError and errorMessage, which are included on the tracked AI Response event.

Attachment Tracking

Track files sent with user messages (images, PDFs, URLs):

s.trackUserMessage('Analyze this document', {
  attachments: [
    { type: 'image', name: 'chart.png', size_bytes: 102400 },
    { type: 'pdf', name: 'report.pdf', size_bytes: 2048576 },
  ],
});

The SDK automatically derives aggregate properties from the attachment array:

[Agent] Has Attachments — boolean, true when attachments are present
[Agent] Attachment Count — number of attachments
[Agent] Attachment Types — deduplicated list of attachment types (e.g., ["image", "pdf"])
[Agent] Total Attachment Size Bytes — sum of all size_bytes values
[Agent] Attachments — serialized JSON of the full attachment metadata

Attachments can also be tracked on AI responses (e.g., when the model generates images or files):

s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs, {
  attachments: [{ type: 'image', name: 'generated.png', size_bytes: 204800 }],
});

Implicit Feedback

Track behavioral signals that indicate whether a response met the user's need, without requiring explicit ratings:

// User asks a question
s.trackUserMessage('How do I create a funnel?');

// AI responds — user copies the answer (positive signal)
s.trackAiMessage('To create a funnel, go to...', 'gpt-4o', 'openai', latencyMs, {
  wasCopied: true,
});

// User regenerates (negative signal — first response wasn't good enough)
s.trackUserMessage('How do I create a funnel?', {
  isRegeneration: true,
});

// User edits their question (refining intent)
s.trackUserMessage('How do I create a conversion funnel for signups?', {
  isEdit: true,
  editedMessageId: originalMsgId, // links the edit to the original
});

Track abandonment at session end — a low abandonmentTurn (e.g., 1) strongly signals first-response dissatisfaction:

agent.trackSessionEnd({
  sessionId: 'sess-1',
  abandonmentTurn: 1, // user left after first AI response
});

These signals map to [Agent] Was Copied, [Agent] Is Regeneration, [Agent] Is Edit, [Agent] Edited Message ID, and [Agent] Abandonment Turn. Use them in Amplitude to build quality dashboards without requiring user surveys.

tool() and observe() HOFs

tool()

Wraps an async function to track as [Agent] Tool Call:

import { tool, ToolCallTracker } from '@amplitude/ai';

ToolCallTracker.setAmplitude(ai.amplitude, 'user-123', {
  sessionId: 'sess-1',
  traceId: 'trace-1',
  agentId: 'my-agent',
  privacyConfig: ai.config.toPrivacyConfig(),
});

const fetchWeather = tool(
  async (args: { city: string }) => {
    return await weatherApi.get(args.city);
  },
  {
    name: 'fetch_weather',
    inputSchema: { type: 'object', properties: { city: { type: 'string' } } },
    timeoutMs: 5000,
    onError: (err, name) => console.error(`Tool ${name} failed:`, err),
  },
);

Business attribution for agent actions

When a tool performs a real business action (add to cart, purchase, signup), emit two events on two planes: the [Agent] Tool Call above for operational health, and the standard product event via the base Amplitude SDK for business attribution:

import { track } from '@amplitude/analytics-node'; // or @amplitude/analytics-browser client-side

// inside the tool, after the action succeeds:
track(
  'Product Added',                                              // reuse the product's existing event/property names
  { product_id: productId, journey_type: 'agent' },
  { user_id: currentUserId },                                   // same user_id as the [Agent] events
);

The journey_type discriminator (agent / web / mobile) makes agent-driven and click-driven journeys comparable in the same funnel with no joining. This is the one case where the base SDK's track() is correct — it stays a non-[Agent] event so it lands in the standard product taxonomy. Only for actions with a click-driven equivalent (not read-only tools), and reuse the existing event name/properties — don't invent them. See Step 3e of amplitude-ai.md.

observe()

Wraps a function to track as [Agent] Span:

import { observe } from '@amplitude/ai';

const enrichData = observe(async (data: unknown) => transform(data), {
  name: 'enrich_data',
  agentId: 'enricher',
});

Custom Events in Agent Analytics

trackSpan() is the catch-all for any operation not covered by trackUserMessage, trackAiMessage, trackToolCall, or trackEmbedding. It emits an [Agent] Span event with full session context (session ID, agent ID, trace ID, SDK version) so custom events appear in Agent Analytics alongside auto-tracked events:

// Track a custom business event that shows up in Agent Analytics
const spanId = session.trackSpan({
  spanName: 'subscription_check',
  latencyMs: 45,
  outputState: 'active',
  eventProperties: { plan: 'enterprise', seats: 50 },
});

Note: eventProperties adds flat top-level properties to the Amplitude event. On managed [Agent] event types, properties not in the schema may not be queryable in charts. For custom segmentation dimensions you want to chart, prefer context instead — it maps to the pre-registered [Agent] Context JSON property. Use eventProperties only when you have explicitly registered the properties in your project's tracking plan.

trackSpan() is the recommended way to emit custom events. It supports parent-child nesting via parentSpanId, error tracking via isError, and all the standard session-level metadata.

Scoring Patterns

Track quality feedback from multiple sources using the score() method. Scores are emitted as [Agent] Score events.

User Feedback (thumbs up/down)

s.score('thumbs-up', 1, messageId, { source: 'user' });
s.score('thumbs-down', 0, messageId, { source: 'user' });

Numeric Rating

s.score('rating', 4, messageId, {
  source: 'user',
  comment: 'Very helpful but slightly verbose',
});

LLM-as-Judge

s.score('quality', 0.85, messageId, {
  source: 'ai',
  comment: 'Clear and accurate response with proper citations',
});

Session-Level Scoring

Score an entire session rather than a single message by setting targetType to 'session':

s.score('session-quality', 0.9, session.sessionId, {
  targetType: 'session',
  source: 'ai',
});

Score Properties

Each [Agent] Score event includes:

[Agent] Score Name — the name you provide (e.g., "thumbs-up", "quality")
[Agent] Score Value — numeric value
[Agent] Target ID — the message ID or session ID being scored
[Agent] Target Type — "message" (default) or "session"
[Agent] Evaluation Source — "user" (default) or "ai"
[Agent] Comment — optional free-text comment (respects content mode)

Enrichments

Session Enrichments

Attach structured metadata to sessions for analytics. Enrichments are included when the session auto-ends:

import {
  RubricScore,
  SessionEnrichments,
  TopicClassification,
} from '@amplitude/ai';

const enrichments = new SessionEnrichments({
  qualityScore: 0.85,
  sentimentScore: 0.7,
  overallOutcome: 'resolved',
  topicClassifications: {
    intent: new TopicClassification({
      l1: 'billing',
      primary: 'billing',
      values: ['billing', 'refund'],
      subcategories: ['REFUND_REQUEST', 'PRICING_QUESTION'],
    }),
  },
  rubrics: [
    new RubricScore({
      name: 'helpfulness',
      score: 4,
      rationale: 'Provided clear step-by-step instructions',
    }),
    new RubricScore({
      name: 'accuracy',
      score: 5,
      rationale: 'All information was factually correct',
    }),
  ],
  agentChain: ['orchestrator', 'researcher', 'writer'],
  rootAgentName: 'orchestrator',
  requestComplexity: 'medium',
});

session.setEnrichments(enrichments);
// Enrichments are included automatically when session.run() completes

Track Enrichments Separately

Send enrichments as a standalone event without ending the session:

agent.trackSessionEnrichment(enrichments, {
  sessionId: 'sess-abc123',
});

End-to-End Example: `customer_enriched` Mode

This mode is for teams that run their own evaluation pipeline (or can't send message content to Amplitude) but still want rich session-level analytics. Here's a complete workflow:

import {
  AIConfig,
  AmplitudeAI,
  ContentMode,
  MessageLabel,
  RubricScore,
  SessionEnrichments,
  TopicClassification,
} from '@amplitude/ai';

// 1. Configure: no content sent to Amplitude
const ai = new AmplitudeAI({
  apiKey: process.env.AMPLITUDE_AI_API_KEY!,
  config: new AIConfig({
    contentMode: ContentMode.CUSTOMER_ENRICHED,
  }),
});

const agent = ai.agent('support-bot', {
  description: 'Handles support conversations in metadata-only mode',
  agentVersion: '2.1.0',
});

// 2. Run the conversation — content is NOT sent (metadata only)
const session = agent.session({ userId: 'user-42' });
const { sessionId, messageIds } = await session.run(async (s) => {
  const msgIds: string[] = [];
  msgIds.push(s.trackUserMessage('Why was I charged twice?'));
  msgIds.push(
    s.trackAiMessage(
      aiResponse.content,
      'gpt-4o',
      'openai',
      latencyMs,
    ),
  );
  return { sessionId: s.sessionId, messageIds: msgIds };
});

// 3. Run your eval pipeline on the raw messages (e.g., your own LLM judge)
const evalResults = await myEvalPipeline(conversationHistory);

// 4. Ship enrichments back to Amplitude
const enrichments = new SessionEnrichments({
  qualityScore: evalResults.quality,
  sentimentScore: evalResults.sentiment,
  overallOutcome: evalResults.outcome,
  topicClassifications: {
    'billing': new TopicClassification({
      topic: 'billing-dispute',
      confidence: 0.92,
    }),
  },
  rubricScores: [
    new RubricScore({ name: 'accuracy', score: 4, maxScore: 5 }),
    new RubricScore({ name: 'helpfulness', score: 5, maxScore: 5 }),
  ],
  messageLabels: {
    [messageIds[0]]: [
      new MessageLabel({ key: 'intent', value: 'billing-dispute', confidence: 0.94 }),
    ],
  },
  customMetadata: { eval_model: 'gpt-4o-judge-v2' },
});

agent.trackSessionEnrichment(enrichments, { sessionId });

This produces the same Amplitude event properties as Amplitude's built-in server-side enrichment (topics, rubrics, outcomes, message labels), but sourced from your pipeline. Use it when compliance requires zero-content transmission, or when you need custom evaluation logic beyond what the built-in enrichment provides.

Available Enrichment Fields

Quality & Sentiment: qualityScore, sentimentScore
Outcome: overallOutcome, hasTaskFailure, taskFailureType, taskFailureReason
Topics: topicClassifications — a map of taxonomy name to TopicClassification
Rubrics: rubrics — array of RubricScore with name, score, rationale, and evidence
Failure Signals: hasNegativeFeedback, hasDataQualityIssues, hasTechnicalFailure
Error Analysis: errorCategories, technicalErrorCount
Behavioral: userFriction, negativeFeedbackPhrases, dataQualityIssues
Agent Topology: agentChain, rootAgentName
Complexity: requestComplexity
Labels: messageLabels — per-message labels keyed by message ID
Custom: customMetadata — arbitrary key/value data for your own analytics

Message Labels

Attach classification labels to individual messages within a session. Labels are flexible key-value pairs for filtering and segmentation in Amplitude.

Common use cases: routing tags (flow, surface), classifier output (intent, sentiment, toxicity), business context (tier, plan).

Inline labels (at tracking time):

import { MessageLabel } from '@amplitude/ai';

s.trackUserMessage('I want to cancel my subscription', {
  labels: [
    new MessageLabel({
      key: 'intent',
      value: 'cancellation',
      confidence: 0.95,
    }),
    new MessageLabel({
      key: 'sentiment',
      value: 'frustrated',
      confidence: 0.8,
    }),
  ],
});

Retrospective labels (after the session, from a background pipeline):

When classifier results arrive after the session ends, attach them via SessionEnrichments.messageLabels, keyed by the messageId returned from tracking calls:

import { MessageLabel, SessionEnrichments } from '@amplitude/ai';

const enrichments = new SessionEnrichments({
  messageLabels: {
    [userMsgId]: [
      new MessageLabel({ key: 'intent', value: 'cancellation', confidence: 0.94 }),
    ],
    [aiMsgId]: [
      new MessageLabel({ key: 'quality', value: 'good', confidence: 0.91 }),
    ],
  },
});

agent.trackSessionEnrichment(enrichments, { sessionId: 'sess-abc123' });

Labels are emitted as [Agent] Message Labels on the event. In Amplitude, filter or group by label key/value to build charts like "messages by intent" or "sessions where flow=onboarding".

Debug and Dry-Run Modes

Debug Mode

Prints a colored (ANSI) summary of every tracked event to stderr. All 8 event types (User Message, AI Response, Tool Call, Embedding, Span, Session End, Session Enrichment, Score) are formatted. Events are still sent to Amplitude:

const ai = new AmplitudeAI({
  apiKey: 'xxx',
  config: new AIConfig({ debug: true }),
});

// stderr output for each event:
// [amplitude-ai] [Agent] AI Response | user=user-123 session=sess-abc agent=my-agent model=gpt-4o latency=1203ms tokens=150→847 cost=$0.0042
// [amplitude-ai] [Agent] Tool Call | user=user-123 session=sess-abc agent=my-agent tool=search_db success=tru

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@amplitude/ai

How to Get Started

Instrument with a coding agent (recommended)

Manual setup

Table of Contents

Installation

Quick Start

5-minute quick start

Current Limitations

Is this for me?

Why this SDK?

What you can build

How quality measurement works

What You Set vs What You Get

What You Get at Each Level

Support matrix

Parity and runtime limitations

Zero-code (for verification or legacy codebases)

Wrap (recommended for production)

Full control

Core Concepts

AmplitudeAI

BoundAgent

TenantHandle

User Identity

Session

What is an agent session?

When does an agent session end?

tool()

observe()

Configuration

Context Dict Conventions

Privacy & Content Control

FULL mode (default)

METADATA_ONLY mode

CUSTOMER_ENRICHED mode

PrivacyConfig (advanced)

When to use which mode

Cache-Aware Cost Tracking

Semantic Cache Tracking

Model Tier Classification

Provider Wrappers

Streaming Tracking

Automatic streaming (provider wrappers)

Manual streaming

StreamingAccumulator

Attachment Tracking

Implicit Feedback

tool() and observe() HOFs

tool()

Business attribution for agent actions

observe()

Custom Events in Agent Analytics

Scoring Patterns

User Feedback (thumbs up/down)

Numeric Rating

LLM-as-Judge

Session-Level Scoring

Score Properties

Enrichments

Session Enrichments

Track Enrichments Separately

End-to-End Example: customer_enriched Mode

Available Enrichment Fields

Message Labels

Debug and Dry-Run Modes

Debug Mode

End-to-End Example: `customer_enriched` Mode