@amplitude/ai
v0.3.4
Published
Amplitude AI SDK - LLM usage tracking for Amplitude Analytics
Readme
@amplitude/ai
Agent analytics for Amplitude. Track every LLM call, user message, tool call, and quality signal as events in your Amplitude project — then build funnels, cohorts, and retention charts across AI and product behavior.
npm install @amplitude/ai @amplitude/analytics-nodeimport { AmplitudeAI, OpenAI } from '@amplitude/ai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const openai = new OpenAI({ amplitude: ai, apiKey: process.env.OPENAI_API_KEY });
const agent = ai.agent('my-agent');
app.post('/chat', async (req, res) => {
const session = agent.session({ userId: req.userId, sessionId: req.sessionId });
const result = await session.run(async (s) => {
s.trackUserMessage(req.body.message);
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: req.body.messages,
});
return response.choices[0].message.content;
});
await ai.flush();
res.json({ response: result });
});
// Events: [Agent] User Message, [Agent] AI Response (with model, tokens, cost, latency),
// [Agent] Session Start, [Agent] Session End — all tied to userId and sessionIdHow to Get Started
Instrument with a coding agent (recommended)
npm install @amplitude/ai
npx amplitude-aiThe CLI prints a prompt to paste into any AI coding agent (Cursor, Claude Code, Windsurf, Copilot, Codex, etc.):
Instrument this app with @amplitude/ai. Follow node_modules/@amplitude/ai/amplitude-ai.md
The agent reads the guide, scans your project, discovers your agents and LLM call sites, and instruments everything — provider wrappers, session lifecycle, multi-agent delegation, tool tracking, scoring, and a verification test. You review and approve each step.
Other paths
| Your situation | Recommended path | What happens |
|---|---|---|
| Manual setup | Follow the Quick Start guide | Agents + sessions + provider wrappers — the full event model |
| Just want to verify the SDK works | patch() (details below) | Aggregate cost/latency monitoring only — no user analytics, no funnels |
Start with full instrumentation. The coding agent workflow defaults to agents + sessions + provider wrappers. This gives you every event type, per-user analytics, and server-side enrichment.
patch()exists for quick verification or legacy codebases where you can't modify call sites, but it only captures[Agent] AI Responsewithout user identity — no funnels, no cohorts, no retention.
| Property | Value | | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Name | @amplitude/ai | | Version | 0.2.1 | | Runtime | Node.js | | Peer dependency | @amplitude/analytics-node >= 1.3.0 | | Optional peers | openai, @anthropic-ai/sdk, @google/generative-ai, @mistralai/mistralai, @aws-sdk/client-bedrock-runtime, @pydantic/genai-prices (cost), tiktoken or js-tiktoken (token counting) |
Table of Contents
- How to Get Started
- Installation
- Quick Start
- What You Get at Each Level
- Core Concepts
- Configuration
- Context Dict Conventions
- Privacy & Content Control
- Cache-Aware Cost Tracking
- Semantic Cache Tracking
- Model Tier Classification
- Provider Wrappers
- Streaming Tracking
- Attachment Tracking
- Implicit Feedback
- tool() and observe() HOFs
- Scoring Patterns
- Enrichments
- Debug and Dry-Run Modes
- Patching
- Auto-Instrumentation CLI
- Integrations
- Data Flow
- Which Integration Should I Use?
- Integration Patterns
- Serverless Environments
- Error Handling and Reliability
- Testing
- Troubleshooting
- Context Propagation
- Middleware
- Bulk Conversation Import
- Event Schema
- Event Property Reference
- Event JSON Examples
- Sending Events Without the SDK
- Register Event Schema in Your Data Catalog
- Utilities and Type Exports
- Constants
- API Reference
- For AI Coding Agents
- For Python SDK Migrators
- Need Help?
- Contributing
- License
Installation
npm install @amplitude/ai @amplitude/analytics-nodeInstall provider SDKs based on what you use (for example: openai, @anthropic-ai/sdk, @google/generative-ai, @mistralai/mistralai, @aws-sdk/client-bedrock-runtime).
Quick Start
5-minute quick start
- Install:
npm install @amplitude/ai @amplitude/analytics-node - Get your API key: In Amplitude, go to Settings > Projects and copy the API key.
- Auto-instrument: Run
npx amplitude-aiand paste the printed prompt into your AI coding agent — it scans your project, generates a bootstrap file, instruments your LLM call sites, and creates a verification test. Or follow the manual patterns below. - Set your API key in the generated
.envfile and replace the placeholderuserId/sessionId. - Run your app. You should see
[Agent] User Message,[Agent] AI Response, and[Agent] Session Endwithin 30 seconds.
To verify locally before checking Amplitude, add debug: true:
const ai = new AmplitudeAI({
apiKey: process.env.AMPLITUDE_AI_API_KEY!,
config: new AIConfig({ debug: true }),
});
// Prints: [amplitude-ai] [Agent] AI Response | model=gpt-4o | tokens=847 | cost=$0.0042 | latency=1,203msTip: Call
enableLivePriceUpdates()at startup so cost tracking stays accurate when new models are released. See Cache-Aware Cost Tracking.
Current Limitations
| Area | Status |
| ---- | ------ |
| Runtime | Node.js only (no browser). Python SDK available separately (amplitude-ai on PyPI). |
| Zero-code patching | OpenAI, Anthropic, Azure OpenAI, Gemini, Mistral, Bedrock (Converse/ConverseStream only). |
| CrewAI | Python-only; the Node.js export throws ProviderError by design. Use LangChain or OpenTelemetry integrations instead. |
| OTEL scope filtering | Not yet supported (Python SDK has allowed_scopes/blocked_scopes). |
| Streaming cost tracking | Automatic for OpenAI and Anthropic. Manual token counts required for other providers' streamed responses. |
Is this for me?
Yes, if you're building an AI-powered feature (chatbot, copilot, agent, RAG pipeline) and you want to measure how it impacts real user behavior. AI events land in the same Amplitude project as your product events, so you can build funnels from "user asks a question" to "user converts," create cohorts of users with low AI quality scores, and measure retention without stitching data across tools.
Already using an LLM observability tool? Keep it. The OTEL bridge adds Amplitude as a second destination in one line. Your existing traces stay, and you get product analytics on top.
Why this SDK?
Most AI observability tools give you traces. This SDK gives you per-turn events that live in your product analytics so you can:
- Build funnels from "user opens chat" through "AI responds" to "user converts"
- Create cohorts of users with low AI quality scores and measure their 7-day retention
- Answer "is this AI feature helping or hurting?" without moving data between tools
The structural difference is the event model. Trace-centric tools typically produce spans per LLM call. This SDK produces one event per conversation turn with 40+ properties: model, tokens, cost, latency, reasoning, implicit feedback signals (regeneration, copy, abandonment), cache breakdowns, agent hierarchy, and experiment context. Each event is independently queryable in Amplitude's charts, cohorts, funnels, and retention analysis.
Every AI event carries your product user_id. No separate identity system, no data joining required. Build a funnel from "user opens chat" to "AI responds" to "user upgrades" directly in Amplitude.
Server-side enrichment does the evals for you. When content is available (contentMode: 'full'), Amplitude's enrichment pipeline runs automatically on every session after it closes. You get topic classifications, quality rubrics, behavioral flags, and session outcomes without writing or maintaining any eval code. Define your own topics and scoring rubrics; the pipeline applies them to every session automatically. Results appear as [Agent] Score events with rubric scores, [Agent] Topic Classification events with category labels, and [Agent] Session Evaluation summaries, all queryable in charts, cohorts, and funnels alongside your product events.
Quality signals from every source in one event type. User thumbs up/down (source: 'user'), automated rubric scores from the enrichment pipeline (source: 'ai'), and reviewer assessments (source: 'reviewer') all produce [Agent] Score events differentiated by [Agent] Evaluation Source. One chart shows all three side by side. Filter by source or view them together. Filter by [Agent] Agent ID for per-agent quality attribution.
Three content-control tiers. full sends content and Amplitude runs enrichments for you. metadata_only sends zero content (you still get cost, latency, tokens, session grouping). customer_enriched sends zero content but lets you provide your own structured labels via trackSessionEnrichment().
Cache-aware cost tracking. Pass cacheReadTokens and cacheCreationTokens for accurate blended costs. Without this breakdown, naive cost calculation can overestimate by 2-5x for cache-heavy workloads.
What you can build
Once AI events are in Amplitude alongside your product events:
- Cohorts. "Users who had 3+ task failures in the last 30 days." "Users with low task completion scores." Target them with Guides, measure churn impact.
- Funnels. "AI session about charts -> Chart Created." "Sign Up -> First AI Session -> Conversion." Measure whether AI drives feature adoption and onboarding.
- Retention. Do users with successful AI sessions retain better than those with failures? Segment retention curves by
[Agent] Overall Outcomeor task completion score. - Agent analytics. Compare quality, cost, and failure rate across agents in one chart. Identify which agent in a multi-agent chain introduced a failure.
How quality measurement works
The SDK captures quality signals at three layers, from most direct to most comprehensive:
1. Explicit user feedback — Instrument thumbs up/down, star ratings, or CSAT scores via trackScore(). Each call produces an [Agent] Score event with source: 'user':
ai.trackScore({
userId: 'u1', name: 'user-feedback', value: 1,
targetId: aiMessageId, targetType: 'message', source: 'user',
});2. Implicit behavioral signals — The SDK auto-tracks behavioral proxies for quality on every turn, with zero additional instrumentation:
| Signal | Property | Event | Interpretation |
|--------|----------|-------|----------------|
| Copy | [Agent] Was Copied | [Agent] AI Response | User copied the output — positive |
| Regeneration | [Agent] Is Regeneration | [Agent] User Message | User asked for a redo — negative |
| Edit | [Agent] Is Edit | [Agent] User Message | User refined their prompt — friction |
| Abandonment | [Agent] Abandonment Turn | [Agent] Session End | User left after N turns — potential failure |
3. Automated server-side evaluation — When contentMode: 'full', Amplitude's enrichment pipeline runs LLM-as-judge evaluators on every session after it closes. No eval code to write or maintain:
| Rubric | What it measures | Scale |
|--------|-----------------|-------|
| task_completion | Did the agent accomplish what the user asked? | 0–2 |
| response_quality | Was the response clear, accurate, and helpful? | 0–2 |
| user_satisfaction | Did the user seem satisfied based on conversation signals? | 0–2 |
| agent_confusion | Did the agent misunderstand or go off track? | 0–2 |
Plus boolean detectors: negative_feedback (frustration phrases), task_failure (agent failed to deliver), data_quality_issues, and behavioral_patterns (clarification loops, topic drift). All results are emitted as [Agent] Score events with source: 'ai'.
All three layers use the same [Agent] Score event type, differentiated by [Agent] Evaluation Source ('user', 'ai', or 'reviewer'). One chart shows user feedback alongside automated evals. No joins, no separate tables.
What You Set vs What You Get
| You set | Where it comes from | What you unlock |
|---|---|---|
| API key | Amplitude project settings | Events reach Amplitude |
| userId | Your auth layer (JWT, session cookie, API token) | Per-user analytics, cohorts, retention |
| agentId | Your choice (e.g. 'chat-handler') | Per-agent cost, latency, quality dashboards |
| sessionId | Your conversation/thread/ticket ID | Multi-turn analysis, session enrichment, quality scores |
| description | Your choice (e.g. 'Handles support queries via GPT-4o') | Human-readable agent registry from event streams |
| contentMode + redactPii | Config (defaults work) | Server enrichment (automatic), PII scrubbing |
| model, tokens, cost | Auto-captured by provider wrappers | Cost analytics, latency monitoring |
| parentAgentId | Auto via child()/runAs() | Multi-agent hierarchy |
| env, agentVersion, context | Your deploy pipeline | Segmentation, regression detection |
Italicized rows require zero developer effort — they're automatic or have sensible defaults.
The minimum viable setup is 4 fields: API key, userId, agentId, sessionId. Everything else is either automatic or a progressive enhancement.
What You Get at Each Level
The coding agent workflow defaults to full instrumentation — the top row below. Lower levels exist as fallbacks, not as recommended starting points.
| Level | Events you get | What it unlocks in Amplitude |
|---|---|---|
| Full (agents + sessions + wrappers) | User Message, AI Response, Tool Call, Session Start/End, Score, Enrichments | Per-user funnels, cohorts, retention, session replay linking, quality scoring |
| Wrappers only (no sessions) | AI Response (with cost, tokens, latency) | Aggregate cost monitoring, model comparison |
| patch() only (no wrappers, no sessions) | AI Response (basic) | Aggregate call counts — useful for verification only |
Support matrix
- Fully supported in Node.js: OpenAI chat completions, OpenAI Responses API, Azure OpenAI chat completions, Anthropic messages, Gemini, Mistral, Bedrock, LangChain, OpenTelemetry, LlamaIndex.
- Partial support: zero-code
patch()is best-effort by installed SDK and provider surface; OpenAI Agents tracing depends on incoming span payload shape from the host SDK. - Not currently supported in Node.js:
AmplitudeCrewAIHooksis Python-only and throws in Node.js.
Parity and runtime limitations
This section is the source of truth for behavior that is intentionally different from Python due to runtime constraints:
AmplitudeCrewAIHooksis unsupported in Node.js (CrewAI is Python-only).tool()does not auto-generate JSON Schema from runtime type hints; passinputSchemaexplicitly.- Tool timeout behavior is async
Promise.racebased and cannot preempt synchronous CPU-bound code. - Auto-instrument bootstrap differs by runtime (
node --importin Node vssitecustomizein Python). - Request middleware differs by runtime (Express-compatible in Node vs ASGI middleware in Python).
Zero-code (for verification or legacy codebases)
patch() monkey-patches provider SDKs so existing LLM calls are tracked without code changes. This is useful for verifying the SDK works or for legacy codebases where you can't modify call sites. It only captures [Agent] AI Response without user identity — for the full event model, use agents + sessions (see Quick Start).
import { AmplitudeAI, patch } from '@amplitude/ai';
// OpenAI/Azure OpenAI chat completions (+ parse), OpenAI Responses, Anthropic, Gemini, Mistral,
// and Bedrock Converse calls are tracked when patching succeeds.
// No changes to your existing code needed.
import OpenAI from 'openai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
patch({ amplitudeAI: ai });
const openai = new OpenAI();
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
});
// ^ automatically tracked as [Agent] AI ResponseWarning: Patched calls that fire outside an active session context are silently dropped — no event is emitted and no error is thrown. If you instrument with
patch()but see no events, this is the most likely cause. Wrap your LLM calls insession.run(), use the Express middleware, or pass context explicitly. See Session and Middleware.
Or use the CLI to auto-patch at process start without touching application code:
AMPLITUDE_AI_API_KEY=xxx AMPLITUDE_AI_AUTO_PATCH=true amplitude-ai-instrument node app.jsWrap (recommended for production)
Replace the provider constructor with the Amplitude-instrumented version for automatic tracking with full control over options per call:
import { AmplitudeAI, OpenAI } from '@amplitude/ai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const openai = new OpenAI({
amplitude: ai,
apiKey: process.env.OPENAI_API_KEY,
});
const agent = ai.agent('my-agent', { userId: 'user-123' });
const session = agent.session();
await session.run(async () => {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
});
// AI response tracked automatically via wrapper
const responseV2 = await openai.responses.create({
model: 'gpt-4.1',
instructions: 'You are concise.',
input: [{ role: 'user', content: 'Summarize this in one sentence.' }],
});
// OpenAI Responses API is also tracked automatically
});Or wrap an existing client instance (supports OpenAI, Azure OpenAI, and Anthropic):
import { wrap } from '@amplitude/ai';
import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const instrumented = wrap(client, ai);All provider constructors and wrap() accept either an AmplitudeAI instance or a raw Amplitude client — both work:
new OpenAI({ amplitude: ai }); // AmplitudeAI instance
new OpenAI({ amplitude: ai.amplitude }); // raw Amplitude client
wrap(client, ai); // AmplitudeAI instance
wrap(client, ai.amplitude); // raw Amplitude clientNote:
wrap()only supports OpenAI, Azure OpenAI, and Anthropic clients. For Gemini, Mistral, and Bedrock, use the SDK's provider classes directly (e.g.,new Gemini({ amplitude: ai })).
Full control
Call tracking methods directly for maximum flexibility. Works with any LLM provider, including custom or self-hosted models:
import { AmplitudeAI } from '@amplitude/ai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const agent = ai.agent('my-agent', { userId: 'user-123' });
const session = agent.session({ userId: 'user-123' });
await session.run(async (s) => {
s.trackUserMessage('Summarize this document');
const start = performance.now();
const response = await myCustomLLM.generate('Summarize this document');
const latencyMs = performance.now() - start;
s.trackAiMessage(response.text, 'my-model-v2', 'custom', latencyMs, {
inputTokens: response.usage.input,
outputTokens: response.usage.output,
});
});Core Concepts
AmplitudeAI
Main client that wraps Amplitude analytics-node. Create it with an API key or an existing Amplitude instance:
const ai = new AmplitudeAI({ apiKey: 'YOUR_API_KEY' });
// Or with existing client:
const ai = new AmplitudeAI({ amplitude: existingAmplitudeClient });BoundAgent
Agent with pre-bound defaults (agentId, description, userId, env, etc.). Use agent() to create:
const agent = ai.agent('support-bot', {
description: 'Handles customer support queries via OpenAI GPT-4o',
userId: 'user-123',
env: 'production',
customerOrgId: 'org-456',
});Child agents inherit context from their parent and automatically set parentAgentId (note: description is agent-specific and is not inherited — pass it explicitly if needed):
const orchestrator = ai.agent('orchestrator', {
description: 'Routes queries to specialized child agents',
userId: 'user-123',
});
const researcher = orchestrator.child('researcher');
const writer = orchestrator.child('writer', {
description: 'Drafts responses using retrieved context',
});
// researcher.parentAgentId === 'orchestrator'
// researcher inherits orchestrator's description; writer has its ownTenantHandle
Multi-tenant helper that pre-binds customerOrgId for all agents created from it:
const tenant = ai.tenant('org-456', { env: 'production' });
const agent = tenant.agent('support-bot', { userId: 'user-123' });User Identity
User identity flows through the session, per-call, or middleware -- not at agent creation or patch time. This keeps the agent reusable across users.
Via sessions (recommended): pass userId when opening a session:
const agent = ai.agent('support-bot', { env: 'production' });
const session = agent.session({ userId: 'user-42' });
await session.run(async (s) => {
s.trackUserMessage('Hello');
// userId inherited from session context
});Per-call: pass userId on each tracking call (useful with the zero-code tier):
agent.trackUserMessage('Hello', {
userId: 'user-42',
sessionId: 'sess-1',
});Via middleware: createAmplitudeAIMiddleware extracts user identity from the request (see Middleware):
app.use(
createAmplitudeAIMiddleware({
amplitudeAI: ai,
userIdResolver: (req) => req.headers['x-user-id'] ?? null,
}),
);Session
Async context manager using AsyncLocalStorage. Use session.run() to execute a callback within session context; session end is tracked automatically on exit:
const session = agent.session({ userId: 'user-123' });
await session.run(async (s) => {
s.trackUserMessage('Hello');
s.trackAiMessage(response.content, 'gpt-4', 'openai', latencyMs);
});Start a new trace within an ongoing session to group related operations:
await session.run(async (s) => {
const traceId = s.newTrace();
s.trackUserMessage('Follow-up question');
s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs);
});For sessions where gaps between messages may exceed 30 minutes (e.g., coding assistants, support agents waiting on customer replies), pass idleTimeoutMinutes so Amplitude knows the session is still active:
const session = agent.session({
userId: 'user-123',
idleTimeoutMinutes: 240, // expect up to 4-hour gaps
});Without this, sessions with long idle periods may be closed and evaluated prematurely. The default is 30 minutes.
Link to Session Replay: If your frontend uses Amplitude's Session Replay, pass the browser's deviceId and browserSessionId to link AI sessions to browser recordings:
const session = agent.session({
userId: 'user-123',
deviceId: req.headers['x-amp-device-id'],
browserSessionId: req.headers['x-amp-session-id'],
});
await session.run(async (s) => {
s.trackUserMessage('What is retention?');
// All events now carry [Amplitude] Session Replay ID = deviceId/browserSessionId
});tool()
Higher-order function wrapping functions to auto-track as [Agent] Tool Call events:
import { tool } from '@amplitude/ai';
const searchDb = tool(
async (query: { q: string }) => {
return await db.search(query.q);
},
{
name: 'search_db',
inputSchema: { type: 'object', properties: { q: { type: 'string' } } },
},
);Note on inputSchema: Unlike the Python SDK which accepts a Pydantic model class and extracts the JSON Schema automatically, the TypeScript SDK accepts a raw JSON Schema object. For type-safe schema generation, consider using Zod with zod-to-json-schema:
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
const QuerySchema = z.object({ q: z.string(), limit: z.number().optional() });
const searchDb = tool(mySearchFn, {
name: 'search_db',
inputSchema: zodToJsonSchema(QuerySchema),
});observe()
Higher-order function wrapping functions to auto-track as [Agent] Span events:
import { observe } from '@amplitude/ai';
const processRequest = observe(
async (input: Request) => {
return await handleRequest(input);
},
{ name: 'process_request' },
);Configuration
import { AIConfig, AmplitudeAI, ContentMode } from '@amplitude/ai';
const config = new AIConfig({
contentMode: ContentMode.FULL, // FULL | METADATA_ONLY | CUSTOMER_ENRICHED — both ContentMode.FULL and 'full' work
redactPii: true,
customRedactionPatterns: ['sensitive-\\d+'],
debug: false,
dryRun: false,
});
const ai = new AmplitudeAI({ apiKey: 'YOUR_API_KEY', config });| Option | Description |
| ------------------------- | ----------------------------------------------------------------------------------------------------------- |
| contentMode | 'full' (default), 'metadata_only', or 'customer_enriched'. Both ContentMode.FULL and 'full' work. |
| redactPii | Redact email, phone, SSN, credit card patterns |
| customRedactionPatterns | Additional regex patterns for redaction |
| debug | Log events to stderr |
| dryRun | Log without sending to Amplitude |
| validate | Enable strict validation of required fields |
| onEventCallback | Callback invoked after every tracked event (event, statusCode, message) => void |
| propagateContext | Enable cross-service context propagation |
Context Dict Conventions
The context parameter on ai.agent() accepts an arbitrary Record<string, unknown> that is JSON-serialized and attached to every event as [Agent] Context. This is the recommended way to add segmentation dimensions without requiring new global properties.
Recommended keys:
| Key | Example Values | Use Case |
| --- | --- | --- |
| agent_type | "planner", "executor", "retriever", "router" | Filter/group analytics by agent role in multi-agent systems. |
| experiment_variant | "control", "treatment-v2", "prompt-rewrite-a" | Segment AI sessions by A/B test variant. Compare quality scores, abandonment rates, or cost across experiment arms. |
| feature_flag | "new-rag-pipeline", "reasoning-model-enabled" | Track which feature flags were active during the session. |
| surface | "chat", "search", "copilot", "email-draft" | Identify which UI surface or product area triggered the AI interaction. |
| prompt_revision | "v7", "abc123", "2026-02-15" | Track which prompt version was used. Detect prompt regression when combined with agentVersion. |
| deployment_region | "us-east-1", "eu-west-1" | Segment by deployment region for latency analysis or compliance tracking. |
| canary_group | "canary", "stable" | Identify canary vs. stable deployments for progressive rollout monitoring. |
Example:
const agent = ai.agent('support-bot', {
userId: 'u1',
description: 'Handles customer support queries via OpenAI GPT-4o',
agentVersion: '4.2.0',
context: {
agent_type: 'executor',
experiment_variant: 'reasoning-enabled',
surface: 'chat',
feature_flag: 'new-rag-pipeline',
prompt_revision: 'v7',
},
});
// All events from this agent (and its sessions, child agents, and provider
// wrappers) will include [Agent] Context with these keys.Context merging in child agents:
const parent = ai.agent('orchestrator', {
context: { experiment_variant: 'treatment', surface: 'chat' },
});
const child = parent.child('researcher', {
context: { agent_type: 'retriever' },
});
// child context = { experiment_variant: 'treatment', surface: 'chat', agent_type: 'retriever' }
// Child keys override parent keys; parent keys absent from the child are preserved.Querying in Amplitude: The [Agent] Context property is a JSON string. Use Amplitude's JSON property parsing to extract individual keys for charts, cohorts, and funnels. For example, group by [Agent] Context.agent_type to see metrics by agent role.
Note on
experiment_variantand server-generated events: Context keys appear on all SDK-emitted events ([Agent] User Message,[Agent] AI Response, etc.). Server-generated events ([Agent] Session Evaluation,[Agent] Scorewithsource="ai") do not yet inherit context keys. To segment server-generated quality scores by experiment arm, use Amplitude Derived Properties to extract from[Agent] Contexton SDK events.
Privacy & Content Control
Three content modes control what data is sent to Amplitude:
| Mode | Message Content | Token/Cost/Latency | Session Grouping | Server Enrichments |
| ------------------- | ------------------------- | ------------------ | ---------------- | ------------------ |
| FULL | Sent (with PII redaction) | Yes | Yes | Yes (auto) |
| METADATA_ONLY | Not sent | Yes | Yes | No |
| CUSTOMER_ENRICHED | Not sent | Yes | Yes | Yes (you provide) |
FULL mode (default)
Message content is captured and sent to Amplitude. When you opt in with redactPii: true, built-in PII redaction patterns scrub emails, phone numbers, SSNs, credit card numbers, and base64 image data before the event leaves your process:
const config = new AIConfig({
contentMode: ContentMode.FULL,
redactPii: true,
});With redactPii: true, a message like "Contact me at [email protected] or 555-123-4567" is sanitized to "Contact me at [email] or [phone]" before being sent.
Built-in phone and SSN detection are currently tuned for common US formats. If you need broader international coverage, add explicit customRedactionPatterns for your locales.
Add custom redaction patterns for domain-specific PII:
const config = new AIConfig({
contentMode: ContentMode.FULL,
redactPii: true,
customRedactionPatterns: ['ACCT-\\d{6,}', 'internal-key-[a-f0-9]+'],
});Custom redaction patterns are your responsibility: avoid expensive or catastrophic regexes in performance-sensitive paths.
Message content is stored at full length with no truncation or size limits. The $llm_message property is whitelisted server-side, and the Node SDK does not apply per-property string truncation.
METADATA_ONLY mode
No message content is sent. You still get token counts, cost, latency, model name, and session grouping — everything needed for cost analytics and performance monitoring:
const config = new AIConfig({
contentMode: ContentMode.METADATA_ONLY,
});Use this when you cannot send user content to a third-party analytics service (e.g., regulated industries, sensitive data).
CUSTOMER_ENRICHED mode
Like METADATA_ONLY (no content sent), but designed for workflows where you enrich sessions with your own classifications, quality scores, and topic labels via the SessionEnrichments API:
const config = new AIConfig({
contentMode: ContentMode.CUSTOMER_ENRICHED,
});
// Later, after running your own classification pipeline:
const enrichments = new SessionEnrichments({
qualityScore: 0.85,
overallOutcome: 'resolved',
});
session.setEnrichments(enrichments);PrivacyConfig (advanced)
PrivacyConfig is derived from AIConfig via config.toPrivacyConfig(). For advanced use, create directly:
import { PrivacyConfig } from '@amplitude/ai';
const privacy = new PrivacyConfig({
privacyMode: true,
redactPii: true,
customRedactionPatterns: ['sensitive-\\d+'],
});When to use which mode
- FULL: You want to see actual conversation content in Amplitude, debug individual sessions, and leverage server-side enrichment pipelines. Best for development, internal tools, and applications where data sharing agreements permit it.
- METADATA_ONLY: You want cost/performance analytics without exposing any message content. Best for regulated environments (healthcare, finance) or when content contains proprietary data.
- CUSTOMER_ENRICHED: You want the privacy of METADATA_ONLY but also want structured analytics (topic classification, quality scores) that you compute on your own infrastructure before sending to Amplitude.
Cache-Aware Cost Tracking
When using provider prompt caching (Anthropic's cache, OpenAI's cached completions, etc.), pass cache token breakdowns for accurate cost calculation:
s.trackAiMessage(
response.content,
'claude-3.5-sonnet',
'anthropic',
latencyMs,
{
inputTokens: response.usage.input_tokens,
outputTokens: response.usage.output_tokens,
cacheReadTokens: response.usage.cache_read_input_tokens,
cacheCreationTokens: response.usage.cache_creation_input_tokens,
},
);Without cache breakdowns, cost calculation treats all input tokens at the standard rate. With caching enabled, cache-read tokens are typically 10x cheaper than standard input tokens and cache-creation tokens are ~25% more expensive. Naive cost calculation without this breakdown can overestimate costs by 2-5x for cache-heavy workloads.
The SDK tracks four token categories:
[Agent] Input Tokens— standard (non-cached) input tokens[Agent] Output Tokens— generated output tokens[Agent] Cache Read Tokens— tokens read from provider cache (cheap)[Agent] Cache Creation Tokens— tokens written to provider cache (slightly expensive)
Cost is auto-calculated when token counts are provided and the @pydantic/genai-prices package is installed. When genai-prices is not available, calculateCost() returns 0 (never null). You can also pass totalCostUsd directly if you compute cost yourself:
s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs, {
totalCostUsd: 0.0034,
});Note — pricing data freshness. Cost calculation relies on pricing data bundled in the installed
@pydantic/genai-pricespackage. Newly released models may return$0until the package is updated. To get the latest pricing between package releases, opt in to live updates at startup:import { enableLivePriceUpdates } from '@amplitude/ai'; enableLivePriceUpdates(); // fetches latest prices from genai-prices GitHub repo hourlyThis makes periodic HTTPS requests to
raw.githubusercontent.com(~26 KB each). Only enable in environments where outbound network access is permitted.
Semantic Cache Tracking
Track full-response semantic cache hits (distinct from token-level prompt caching above):
s.trackAiMessage(cachedResponse.content, 'gpt-4o', 'openai', latencyMs, {
wasCached: true, // served from Redis/semantic cache
});Maps to [Agent] Was Cached. Enables "cache hit rate" charts and cost optimization analysis. Only emitted when true; omitted (not false) when the response was not cached.
Model Tier Classification
Models are automatically classified into tiers for cost/performance analysis:
| Tier | Examples | When to Use |
| ----------- | -------------------------------------------------------- | ------------------------------ |
| fast | gpt-4o-mini, claude-3-haiku, gemini-flash, gpt-3.5-turbo | High-volume, latency-sensitive |
| standard | gpt-4o, claude-3.5-sonnet, gemini-pro, llama, command | General purpose |
| reasoning | o1, o3-mini, deepseek-r1, claude with extended thinking | Complex reasoning tasks |
The tier is inferred automatically from the model name and attached as [Agent] Model Tier on every [Agent] AI Response event:
import {
inferModelTier,
TIER_FAST,
TIER_REASONING,
TIER_STANDARD,
} from '@amplitude/ai';
inferModelTier('gpt-4o-mini'); // 'fast'
inferModelTier('claude-3.5-sonnet'); // 'standard'
inferModelTier('o1-preview'); // 'reasoning'Override the auto-inferred tier for custom or fine-tuned models:
s.trackAiMessage(
response.content,
'ft:gpt-4o:my-org:custom',
'openai',
latencyMs,
{
modelTier: 'standard',
inputTokens: response.usage.prompt_tokens,
outputTokens: response.usage.completion_tokens,
},
);Provider Wrappers
Use instrumented provider wrappers for automatic tracking:
| Provider | Class | Package |
| ----------- | ------------- | ------------------------------- |
| OpenAI | OpenAI | openai |
| Anthropic | Anthropic | @anthropic-ai/sdk |
| Gemini | Gemini | @google/generative-ai |
| AzureOpenAI | AzureOpenAI | openai |
| Bedrock | Bedrock | @aws-sdk/client-bedrock-runtime |
| Mistral | Mistral | @mistralai/mistralai |
Feature coverage by provider:
| Feature | OpenAI | Anthropic | Gemini | AzureOpenAI | Bedrock | Mistral | | --------------------- | ------ | --------- | ------ | ----------- | ------- | ------- | | Streaming | Yes | Yes | Yes | Yes | Yes | Yes | | Tool call tracking | Yes | Yes | No | Yes | Yes | No | | TTFB measurement | Yes | Yes | No | Yes | No | No | | Cache token stats | Yes | Yes | No | No | No | No | | Responses API | Yes | - | - | - | - | - | | Reasoning content | Yes | Yes | No | Yes | No | No | | System prompt capture | Yes | Yes | Yes | Yes | Yes | Yes | | Cost estimation | Yes | Yes | Yes | Yes | Yes | Yes |
Provider wrappers use injected TrackFn callbacks instead of class hierarchy casts, enabling easier composition and custom tracking logic.
Bedrock model IDs like us.anthropic.claude-3-5-sonnet are automatically normalized for price lookup (e.g., to claude-3-5-sonnet).
OpenAI example:
import { AmplitudeAI, OpenAI } from '@amplitude/ai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const openai = new OpenAI({
amplitude: ai,
apiKey: process.env.OPENAI_API_KEY,
});
const agent = ai.agent('my-agent', { userId: 'user-123' });
const session = agent.session();
await session.run(async (s) => {
const resp = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Hello' }],
});
// AI response tracked automatically via wrapper
});Or wrap an existing client:
import { wrap } from '@amplitude/ai';
import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const instrumented = wrap(client, ai);Streaming Tracking
Automatic streaming (provider wrappers)
Provider wrappers (OpenAI, AzureOpenAI, Anthropic, Gemini, Mistral, Bedrock) automatically detect supported streaming responses and track them transparently. The wrapper intercepts the AsyncIterable, accumulates chunks, measures TTFB, and emits an [Agent] AI Response event after the stream is fully consumed:
const openai = new OpenAI({ amplitude: ai, apiKey: '...' });
// Streaming is handled automatically — just iterate the result
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
// ^ AI Response event emitted automatically after loop endsManual streaming
Track streaming responses manually with time-to-first-byte (TTFB) for latency analysis:
s.trackAiMessage(fullContent, 'gpt-4o', 'openai', totalMs, {
isStreaming: true,
ttfbMs: timeToFirstByte,
inputTokens: usage.prompt_tokens,
outputTokens: usage.completion_tokens,
});The SDK tracks two timing properties for streaming:
[Agent] Latency Ms— total wall-clock time from request to final chunk[Agent] TTFB Ms— time-to-first-byte, the delay before the first token arrives
StreamingAccumulator
For manual streaming, use StreamingAccumulator to collect chunks and automatically measure TTFB:
import { StreamingAccumulator } from '@amplitude/ai';
const accumulator = new StreamingAccumulator();
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
accumulator.addContent(content);
}
}
accumulator.setUsage({
inputTokens: finalUsage.prompt_tokens,
outputTokens: finalUsage.completion_tokens,
});
s.trackAiMessage(
accumulator.content,
'gpt-4o',
'openai',
accumulator.elapsedMs,
{
isStreaming: true,
ttfbMs: accumulator.ttfbMs,
inputTokens: accumulator.inputTokens,
outputTokens: accumulator.outputTokens,
finishReason: accumulator.finishReason,
},
);The accumulator automatically records TTFB when addContent() is called for the first time, and tracks total elapsed time via elapsedMs. For streaming errors, call setError(message) to set isError and errorMessage, which are included on the tracked AI Response event.
Attachment Tracking
Track files sent with user messages (images, PDFs, URLs):
s.trackUserMessage('Analyze this document', {
attachments: [
{ type: 'image', name: 'chart.png', size_bytes: 102400 },
{ type: 'pdf', name: 'report.pdf', size_bytes: 2048576 },
],
});The SDK automatically derives aggregate properties from the attachment array:
[Agent] Has Attachments— boolean, true when attachments are present[Agent] Attachment Count— number of attachments[Agent] Attachment Types— deduplicated list of attachment types (e.g.,["image", "pdf"])[Agent] Total Attachment Size Bytes— sum of allsize_bytesvalues[Agent] Attachments— serialized JSON of the full attachment metadata
Attachments can also be tracked on AI responses (e.g., when the model generates images or files):
s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs, {
attachments: [{ type: 'image', name: 'generated.png', size_bytes: 204800 }],
});Implicit Feedback
Track behavioral signals that indicate whether a response met the user's need, without requiring explicit ratings:
// User asks a question
s.trackUserMessage('How do I create a funnel?');
// AI responds — user copies the answer (positive signal)
s.trackAiMessage('To create a funnel, go to...', 'gpt-4o', 'openai', latencyMs, {
wasCopied: true,
});
// User regenerates (negative signal — first response wasn't good enough)
s.trackUserMessage('How do I create a funnel?', {
isRegeneration: true,
});
// User edits their question (refining intent)
s.trackUserMessage('How do I create a conversion funnel for signups?', {
isEdit: true,
editedMessageId: originalMsgId, // links the edit to the original
});Track abandonment at session end — a low abandonmentTurn (e.g., 1) strongly signals first-response dissatisfaction:
agent.trackSessionEnd({
sessionId: 'sess-1',
abandonmentTurn: 1, // user left after first AI response
});These signals map to [Agent] Was Copied, [Agent] Is Regeneration, [Agent] Is Edit, [Agent] Edited Message ID, and [Agent] Abandonment Turn. Use them in Amplitude to build quality dashboards without requiring user surveys.
tool() and observe() HOFs
tool()
Wraps an async function to track as [Agent] Tool Call:
import { tool, ToolCallTracker } from '@amplitude/ai';
ToolCallTracker.setAmplitude(ai.amplitude, 'user-123', {
sessionId: 'sess-1',
traceId: 'trace-1',
agentId: 'my-agent',
privacyConfig: ai.config.toPrivacyConfig(),
});
const fetchWeather = tool(
async (args: { city: string }) => {
return await weatherApi.get(args.city);
},
{
name: 'fetch_weather',
inputSchema: { type: 'object', properties: { city: { type: 'string' } } },
timeoutMs: 5000,
onError: (err, name) => console.error(`Tool ${name} failed:`, err),
},
);observe()
Wraps a function to track as [Agent] Span:
import { observe } from '@amplitude/ai';
const enrichData = observe(async (data: unknown) => transform(data), {
name: 'enrich_data',
agentId: 'enricher',
});Scoring Patterns
Track quality feedback from multiple sources using the score() method. Scores are emitted as [Agent] Score events.
User Feedback (thumbs up/down)
s.score('thumbs-up', 1, messageId, { source: 'user' });
s.score('thumbs-down', 0, messageId, { source: 'user' });Numeric Rating
s.score('rating', 4, messageId, {
source: 'user',
comment: 'Very helpful but slightly verbose',
});LLM-as-Judge
s.score('quality', 0.85, messageId, {
source: 'ai',
comment: 'Clear and accurate response with proper citations',
});Session-Level Scoring
Score an entire session rather than a single message by setting targetType to 'session':
s.score('session-quality', 0.9, session.sessionId, {
targetType: 'session',
source: 'ai',
});Score Properties
Each [Agent] Score event includes:
[Agent] Score Name— the name you provide (e.g.,"thumbs-up","quality")[Agent] Score Value— numeric value[Agent] Target ID— the message ID or session ID being scored[Agent] Target Type—"message"(default) or"session"[Agent] Evaluation Source—"user"(default) or"ai"[Agent] Comment— optional free-text comment (respects content mode)
Enrichments
Session Enrichments
Attach structured metadata to sessions for analytics. Enrichments are included when the session auto-ends:
import {
RubricScore,
SessionEnrichments,
TopicClassification,
} from '@amplitude/ai';
const enrichments = new SessionEnrichments({
qualityScore: 0.85,
sentimentScore: 0.7,
overallOutcome: 'resolved',
topicClassifications: {
intent: new TopicClassification({
l1: 'billing',
primary: 'billing',
values: ['billing', 'refund'],
subcategories: ['REFUND_REQUEST', 'PRICING_QUESTION'],
}),
},
rubrics: [
new RubricScore({
name: 'helpfulness',
score: 4,
rationale: 'Provided clear step-by-step instructions',
}),
new RubricScore({
name: 'accuracy',
score: 5,
rationale: 'All information was factually correct',
}),
],
agentChain: ['orchestrator', 'researcher', 'writer'],
rootAgentName: 'orchestrator',
requestComplexity: 'medium',
});
session.setEnrichments(enrichments);
// Enrichments are included automatically when session.run() completesTrack Enrichments Separately
Send enrichments as a standalone event without ending the session:
agent.trackSessionEnrichment(enrichments, {
sessionId: 'sess-abc123',
});End-to-End Example: customer_enriched Mode
This mode is for teams that run their own evaluation pipeline (or can't send message content to Amplitude) but still want rich session-level analytics. Here's a complete workflow:
import {
AIConfig,
AmplitudeAI,
ContentMode,
MessageLabel,
RubricScore,
SessionEnrichments,
TopicClassification,
} from '@amplitude/ai';
// 1. Configure: no content sent to Amplitude
const ai = new AmplitudeAI({
apiKey: process.env.AMPLITUDE_AI_API_KEY!,
config: new AIConfig({
contentMode: ContentMode.CUSTOMER_ENRICHED,
}),
});
const agent = ai.agent('support-bot', {
description: 'Handles support conversations in metadata-only mode',
agentVersion: '2.1.0',
});
// 2. Run the conversation — content is NOT sent (metadata only)
const session = agent.session({ userId: 'user-42' });
const { sessionId, messageIds } = await session.run(async (s) => {
const msgIds: string[] = [];
msgIds.push(s.trackUserMessage('Why was I charged twice?'));
msgIds.push(
s.trackAiMessage(
aiResponse.content,
'gpt-4o',
'openai',
latencyMs,
),
);
return { sessionId: s.sessionId, messageIds: msgIds };
});
// 3. Run your eval pipeline on the raw messages (e.g., your own LLM judge)
const evalResults = await myEvalPipeline(conversationHistory);
// 4. Ship enrichments back to Amplitude
const enrichments = new SessionEnrichments({
qualityScore: evalResults.quality,
sentimentScore: evalResults.sentiment,
overallOutcome: evalResults.outcome,
topicClassifications: {
'billing': new TopicClassification({
topic: 'billing-dispute',
confidence: 0.92,
}),
},
rubricScores: [
new RubricScore({ name: 'accuracy', score: 4, maxScore: 5 }),
new RubricScore({ name: 'helpfulness', score: 5, maxScore: 5 }),
],
messageLabels: {
[messageIds[0]]: [
new MessageLabel({ key: 'intent', value: 'billing-dispute', confidence: 0.94 }),
],
},
customMetadata: { eval_model: 'gpt-4o-judge-v2' },
});
agent.trackSessionEnrichment(enrichments, { sessionId });This produces the same Amplitude event properties as Amplitude's built-in server-side enrichment (topics, rubrics, outcomes, message labels), but sourced from your pipeline. Use it when compliance requires zero-content transmission, or when you need custom evaluation logic beyond what the built-in enrichment provides.
Available Enrichment Fields
- Quality & Sentiment:
qualityScore,sentimentScore - Outcome:
overallOutcome,hasTaskFailure,taskFailureType,taskFailureReason - Topics:
topicClassifications— a map of taxonomy name toTopicClassification - Rubrics:
rubrics— array ofRubricScorewith name, score, rationale, and evidence - Failure Signals:
hasNegativeFeedback,hasDataQualityIssues,hasTechnicalFailure - Error Analysis:
errorCategories,technicalErrorCount - Behavioral:
behavioralPatterns,negativeFeedbackPhrases,dataQualityIssues - Agent Topology:
agentChain,rootAgentName - Complexity:
requestComplexity - Labels:
messageLabels— per-message labels keyed by message ID - Custom:
customMetadata— arbitrary key/value data for your own analytics
Message Labels
Attach classification labels to individual messages within a session. Labels are flexible key-value pairs for filtering and segmentation in Amplitude.
Common use cases: routing tags (flow, surface), classifier output (intent, sentiment, toxicity), business context (tier, plan).
Inline labels (at tracking time):
import { MessageLabel } from '@amplitude/ai';
s.trackUserMessage('I want to cancel my subscription', {
labels: [
new MessageLabel({
key: 'intent',
value: 'cancellation',
confidence: 0.95,
}),
new MessageLabel({
key: 'sentiment',
value: 'frustrated',
confidence: 0.8,
}),
],
});Retrospective labels (after the session, from a background pipeline):
When classifier results arrive after the session ends, attach them via SessionEnrichments.messageLabels, keyed by the messageId returned from tracking calls:
import { MessageLabel, SessionEnrichments } from '@amplitude/ai';
const enrichments = new SessionEnrichments({
messageLabels: {
[userMsgId]: [
new MessageLabel({ key: 'intent', value: 'cancellation', confidence: 0.94 }),
],
[aiMsgId]: [
new MessageLabel({ key: 'quality', value: 'good', confidence: 0.91 }),
],
},
});
agent.trackSessionEnrichment(enrichments, { sessionId: 'sess-abc123' });Labels are emitted as [Agent] Message Labels on the event. In Amplitude, filter or group by label key/value to build charts like "messages by intent" or "sessions where flow=onboarding".
Debug and Dry-Run Modes
Debug Mode
Prints a colored (ANSI) summary of every tracked event to stderr. All 8 event types (User Message, AI Response, Tool Call, Embedding, Span, Session End, Session Enrichment, Score) are formatted. Events are still sent to Amplitude:
const ai = new AmplitudeAI({
apiKey: 'xxx',
config: new AIConfig({ debug: true }),
});
// stderr output for each event:
// [amplitude-ai] [Agent] AI Response | user=user-123 session=sess-abc agent=my-agent model=gpt-4o latency=1203ms tokens=150→847 cost=$0.0042
// [amplitude-ai] [Agent] Tool Call | user=user-123 session=sess-abc agent=my-agent tool=search_db success=true latency=340ms
// [amplitude-ai] [Agent] User Message | user=user-123 session=sess-abc agent=my-agentDry-Run Mode
Logs the full event JSON to stderr WITHOUT sending to Amplitude. Events are never transmitted:
const ai = new AmplitudeAI({
apiKey: 'xxx',
config: new AIConfig({ dryRun: true }),
});
// stderr: full JSON of each event
// Useful for local development, CI pipelines, and validating event shapeEnvironment Variable Configuration
Both modes can be enabled via environment variables when using auto-instrumentation:
AMPLITUDE_AI_DEBUG=true amplitude-ai-instrument node app.jsPatching
Monkey-patch provider SDKs to auto-track without changing call sites. This is useful for quick verification that the SDK is connected, or for legacy codebases where modifying call sites is impractical. For the full event model (user messages, sessions, scoring, enrichments), use agents + sessions as shown in Quick Start.
import {
AmplitudeAI,
patch,
patchOpenAI,
unpatch,
unpatchOpenAI,
} from '@amplitude/ai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
// Patch installed/available providers (OpenAI, Anthropic, Gemini, Mistral, Bedrock)
patch({ amplitudeAI: ai });
// Or patch specific provider
patchOpenAI({ amplitudeAI: ai });
// Unpatch
unpatch();
unpatchOpenAI();Available patch functions: patchOpenAI, patchAnthropic, patchAzureOpenAI, patchGemini, patchMistral, patchBedrock. Corresponding unpatch for each: unpatchOpenAI, unpatchAnthropic, unpatchAzureOpenAI, unpatchGemini, unpatchMistral, unpatchBedrock.
patch() returns a string[] of providers where at least one supported surface was successfully patched (e.g., ['openai', 'anthropic']), matching the Python SDK's return signature.
Patch surface notes:
- OpenAI/Azure OpenAI:
chat.completions.create,chat.completions.parse, and Responses APIs are instrumented (including streaming shapes where exposed by the SDK). - Bedrock: only
ConverseCommandandConverseStreamCommandare instrumented when patchingclient.send.
Auto-Instrumentation CLI
Preload the register module to auto-patch providers at process start:
AMPLITUDE_AI_API_KEY=xxx AMPLITUDE_AI_AUTO_PATCH=true amplitude-ai-instrument node app.jsOr directly with Node's ESM preload flag:
AMPLITUDE_AI_API_KEY=xxx AMPLITUDE_AI_AUTO_PATCH=true node --import @amplitude/ai/register app.jsEnvironment variables:
| Variable | Description |
| --------------------------- | ----------------------------------------------- |
| AMPLITUDE_AI_API_KEY | Required for auto-patch |
| AMPLITUDE_AI_AUTO_PATCH | Must be "true" to enable |
| AMPLITUDE_AI_CONTENT_MODE | full, metadata_only, or customer_enriched |
| AMPLITUDE_AI_DEBUG | "true" for debug output to stderr |
Doctor CLI
Validate setup (env, provider deps, mock event capture, mock flush path):
amplitude-ai doctorUseful flags:
amplitude-ai doctor --no-mock-check
Status
Show the installed SDK version, detected provider packages, and environment variable configuration at a glance:
amplitude-ai statusShell Completions
Enable tab-completion for all CLI commands and flags:
# bash
eval "$(amplitude-ai-completions bash)"
# zsh
eval "$(amplitude-ai-completions zsh)"MCP Server
Run the SDK-local MCP server over stdio:
amplitude-ai mcpMCP surface:
| Tool | Description |
| ------------------------- | -------------------------------------------------------------------------- |
| scan_project | Scan project structure, detect providers, frameworks, and multi-agent patterns |
| validate_file | Analyze a source file to detect uninstrumented LLM call sites |
| instrument_file | Apply instrumentation transforms to a source file |
| generate_verify_test | Generate a dry-run verification test using MockAmplitudeAI |
| get_event_schema | Return the full event schema and property definitions |
| get_integration_pattern | Return canonical instrumentation code patterns |
| validate_setup | Check env vars and dependency presence |
| suggest_instrumentation | Context-aware next steps based on your framework and provider |
| search_docs | Full-text search across SDK documentation (README, llms-full.txt) |
Resources: amplitude-ai://event-schema, amplitude-ai://integration-patterns, amplitude-ai://instrument-guide
Prompt: instrument_app — guided walkthrough for instrumenting an application
Examples and AI Coding Agent Guide
amplitude-ai.md— self-contained instrumentation guide for any AI coding agent (Cursor, Claude Code, Windsurf, Copilot, Codex, etc.). Runnpx amplitude-aito see the prompt that points your agent to this file.- Mock-based examples demonstrating the event model (also used as CI smoke tests):
examples/zero-code.tsexamples/wrap-openai.tsexamples/multi-agent.tsexamples/framework-integration.ts
- Real provider examples (require API keys):
examples/real-openai.ts— end-to-end OpenAI integration with session tracking and flushexamples/real-anthropic.ts— end-to-end Anthropic integration with session tracking and flush
Integrations
LangChain
import { AmplitudeAI, AmplitudeCallbackHandler } from '@amplitude/ai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const handler = new AmplitudeCallbackHandler({
amplitudeAI: ai,
userId: 'user-123',
sessionId: 'sess-1',
});
// Pass handler to LangChain callbacksOpenTelemetry
Two exporters add Amplitude as a destination alongside your existing trace backend (Datadog, Honeycomb, Jaeger, etc.):
import {
AmplitudeAgentExporter,
AmplitudeGenAIExporter,
} from '@amplitude/ai';
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import {
BatchSpanProcessor,
SimpleSpanProcessor,
} from '@opentelemetry/sdk-trace-base';
const provider = new NodeTracerProvider();
// GenAI exporter — converts gen_ai.* spans into Amplitude AI events
provider.addSpanProcessor(
new BatchSpanProcessor(
new AmplitudeGenAIExporter({
apiKey: process.env.AMPLITUDE_AI_API_KEY!,
}),
),
);
// Agent exporter — converts agent.* spans into Amplitude session events
provider.addSpanProcessor(
new SimpleSpanProcessor(
new AmplitudeAgentExporter({
apiKey: process.env.AMPLITUDE_AI_API_KEY!,
}),
),
);
provider.register();Only spans with gen_ai.provider.name or gen_ai.system attributes are processed; all other spans are silently ignored. This means it's safe to add the exporter to a pipeline that produces mixed (GenAI + HTTP + DB) spans.
Attribute mapping reference:
| OTEL Span Attribute | Amplitude Event Property | Notes |
| --- | --- | --- |
| gen_ai.response.model / gen_ai.request.model | [Agent] Model | Response model preferred |
| gen_ai.system / gen_ai.provider.name | [Agent] Provider | |
| gen_ai.usage.input_tokens | [Agent] Input Tokens | |
| gen_ai.usage.output_tokens | [Agent] Output Tokens | |
| gen_ai.usage.total_tokens | [Agent] Total Tokens | Derived if not present |
| gen_ai.usage.cache_read.input_tokens | [Agent] Cache Read Tokens | |
| gen_ai.usage.cache_creation.input_tokens | [Agent] Cache Creation Tokens | |
| gen_ai.request.temperature | [Agent] Temperature | |
| gen_ai.request.top_p | [Agent] Top P | |
| gen_ai.request.max_output_tokens | [Agent] Max Output Tokens | |
| gen_ai.response.finish_reasons | [Agent] Finish Reason | |
| gen_ai.input.messages | [Agent] LLM Message | Only if content mode allows |
| Span duration | [Agent] Latency Ms | |
| Span status ERROR | [Agent] Is Error, [Agent] Error Message | |
Not available via OTEL (use native wrappers): reasoning content/tokens, TTFB, streaming detection, implicit feedback, file attachments, event graph linking (parent_message_id).
When to use OTEL vs. native wrappers: If you already have @opentelemetry/instrumentation-openai or similar producing GenAI spans, the OTEL bridge gives you Amplitude analytics with zero
