@llmtracer/sdk
v2.0.0
Published
See where your AI budget goes. Lightweight LLM cost tracking SDK.
Maintainers
Readme
@llmtracer/sdk
See where your AI budget goes. Lightweight LLM cost tracking SDK for OpenAI.
Wrap your OpenAI client in two lines and get automatic tracking of every API call -- tokens, latency, cost, and model usage -- with zero changes to your application code.
Install
npm install @llmtracer/sdkQuickstart
import { LLMTracer } from "@llmtracer/sdk";
import OpenAI from "openai";
const tracer = new LLMTracer({
apiKey: process.env.LLMTRACER_KEY,
});
const openai = new OpenAI();
// 2 lines -- that's it
tracer.instrumentOpenAI(openai, {
tags: { feature: "customer-support-bot", env: "production" },
});
// Every OpenAI call is now automatically tracked
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello" }],
});
// In serverless (Lambda, Cloud Functions), flush before returning
await tracer.flush();Tagging Guide
Tags let you slice costs by any dimension in the dashboard. Global tags (set in the constructor) apply to all calls. Per-call tags (set via the llmtracer property) override globals for that specific call.
Track cost by feature
await openai.chat.completions.create({
model: "gpt-4o",
messages: [...],
llmtracer: { tags: { feature: "chat" } }
});Track cost by user (for B2B apps)
await openai.chat.completions.create({
model: "gpt-4o",
messages: [...],
llmtracer: { tags: { feature: "chat", user_id: req.user.id } }
});Track cost by customer/tenant
await openai.chat.completions.create({
model: "gpt-4o",
messages: [...],
llmtracer: { tags: { customer: req.tenant.name, feature: "search" } }
});Track cost by conversation
await openai.chat.completions.create({
model: "gpt-4o",
messages: [...],
llmtracer: { tags: { conversation_id: sessionId, feature: "chat" } }
});Track environment (global tag)
const tracer = new LLMTracer({
apiKey: "lt_...",
});
tracer.instrumentOpenAI(openai, {
tags: { env: process.env.NODE_ENV } // applies to all calls
});Tags appear in the dashboard's Breakdown page and Top Tags card. Use them to answer questions like "which customer costs the most?" or "which feature should I optimize?"
Serverless Usage
In environments like AWS Lambda or Google Cloud Functions, call flush() before your function returns to ensure all events are sent:
export async function handler(event) {
const response = await openai.chat.completions.create({ ... });
await tracer.flush();
return response;
}Agentic Workflow Tracking
Group related LLM calls into traces with named phases:
await tracer.trace("user-request-123", async (t) => {
await t.phase("planning", async () => {
await openai.chat.completions.create({ ... });
});
await t.phase("execution", async () => {
await openai.chat.completions.create({ ... });
});
});Streaming Support
Streaming calls are instrumented automatically. Token counts are captured from the final chunk:
const stream = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello" }],
stream: true,
});
for await (const chunk of stream) {
// use chunk as normal
}Configuration
| Option | Type | Default | Description |
|---|---|---|---|
| apiKey | string | required | Your LLM Tracer API key (starts with lt_) |
| endpoint | string | Production URL | Ingestion endpoint URL |
| maxBatchSize | number | 50 | Max events per batch before auto-flush |
| flushIntervalMs | number | 10000 | Auto-flush interval in milliseconds |
| maxQueueSize | number | 10000 | Max events in queue before dropping oldest |
| maxRetries | number | 3 | Max retry attempts for failed flushes |
| retryBaseMs | number | 1000 | Base delay for exponential backoff |
| sampleRate | number | 1.0 | Sampling rate (0.0-1.0). 1.0 captures everything |
| capturePrompt | boolean | false | Whether to capture full prompt content |
| debug | boolean | false | Enable debug logging to console |
| onFlush | function | null | Callback after each flush with stats |
| onError | function | null | Callback on transport errors |
API Reference
new LLMTracer(config)
Create a new tracer instance. See Configuration for options.
tracer.instrumentOpenAI(client, options?)
Instrument an OpenAI client instance. All subsequent chat.completions.create calls (streaming and non-streaming) will be tracked automatically.
client-- an OpenAI client instanceoptions.tags-- key-value pairs attached to every event (e.g.{ env: "production" })
tracer.flush(): Promise<void>
Flush all buffered events to the backend. Call this in serverless environments before the function returns.
tracer.trace(traceId, fn): Promise<void>
Track an agentic workflow. All LLM calls within the callback are grouped under the given traceId. Use t.phase(name, fn) inside the callback to label phases.
tracer.shutdown(): Promise<void>
Flush remaining events and stop the auto-flush timer. Call this on graceful shutdown.
Reliability
The SDK is designed to never interfere with your application:
- Never throws -- all internal errors are swallowed silently (enable
debug: truefor visibility) - Batching -- events are queued and sent in configurable batches
- Retry with backoff -- failed flushes are retried with exponential backoff and jitter
- Circuit breaker -- after 5 consecutive failures, stops attempting for 60 seconds
- Queue overflow -- drops oldest events when the queue exceeds
maxQueueSize
License
MIT
