@burnwise/sdk
v0.1.9
Published
Track and optimize your LLM costs with zero friction. Wrap OpenAI, Anthropic, Google, Mistral, xAI, and more.
Maintainers
Readme
@burnwise/sdk
Track and optimize your LLM costs with zero friction. Wrap your AI SDK client, use it as normal, and see exactly where your money goes.
Features
- Zero friction: Wrap once, track forever
- All major providers: OpenAI, Anthropic, Google, Mistral, xAI, DeepSeek, Perplexity
- Multi-modal support: Track text, image, video, and audio generation costs
- Streaming support: Full support for streaming responses with automatic token tracking
- Privacy first: We only track metadata (tokens, cost, model, latency). We never read prompts or completions.
- Feature tagging: Track costs by feature to understand your spending
- Cost optimization insights: Discover cheaper alternatives with comparable quality
- Real-time dashboard: See costs, anomalies, and optimization opportunities
Installation
npm install @burnwise/sdkQuick Start
import { burnwise } from "@burnwise/sdk";
import OpenAI from "openai";
// Initialize with your API key
burnwise.init({
apiKey: process.env.BURNWISE_API_KEY!,
});
// Wrap your client
const openai = burnwise.openai.wrap(new OpenAI(), {
feature: "chat-support", // Tag for cost attribution
});
// Use normally - costs are tracked automatically!
const response = await openai.chat.completions.create({
model: "gpt-5.2", // Latest GPT model
messages: [{ role: "user", content: "Hello!" }],
});Supported Providers
OpenAI
import OpenAI from "openai";
const openai = burnwise.openai.wrap(new OpenAI(), {
feature: "chat-support",
});
// Flagship model - best quality
await openai.chat.completions.create({
model: "gpt-5.2",
messages: [{ role: "user", content: "Hello!" }],
});
// Mini model - 90% cheaper, great for simple tasks
await openai.chat.completions.create({
model: "gpt-5.2-mini",
messages: [{ role: "user", content: "Summarize this text" }],
});
// Reasoning model - for complex logic
await openai.chat.completions.create({
model: "o3-mini", // or "o3" for max reasoning
messages: [{ role: "user", content: "Solve this math problem" }],
});Available models:
| Model | Best for | Cost (input/output per 1M tokens) |
|-------|----------|-----------------------------------|
| gpt-5.2 | Complex tasks, creativity | $1.75 / $14 |
| gpt-5.1 | Stable flagship | $1.25 / $10 |
| gpt-5-mini | Simple tasks, high volume | $0.30 / $1 |
| gpt-4.1 | Stable production, 1M context | $2 / $8 |
| gpt-4.1-mini | Cost-efficient, long context | $0.40 / $1.60 |
| gpt-4.1-nano | Ultra-fast, embeddings | $0.10 / $0.40 |
| o3 | Advanced reasoning | $10 / $40 |
| o3-mini | Fast reasoning, math/code | $1.10 / $4.40 |
| o4-mini | Optimized reasoning | $1.10 / $4.40 |
Anthropic
import Anthropic from "@anthropic-ai/sdk";
const anthropic = burnwise.anthropic.wrap(new Anthropic(), {
feature: "analysis",
});
// Most intelligent model
await anthropic.messages.create({
model: "claude-opus-4-5-20251101",
max_tokens: 1024,
messages: [{ role: "user", content: "Analyze this complex document" }],
});
// Best coding model - recommended for most use cases
await anthropic.messages.create({
model: "claude-sonnet-4-5-20250929",
max_tokens: 1024,
messages: [{ role: "user", content: "Review this code" }],
});
// Streaming - usage is tracked automatically when stream completes
const stream = await anthropic.messages.create({
model: "claude-sonnet-4-5-20250929",
max_tokens: 1024,
messages: [{ role: "user", content: "Write a story" }],
stream: true,
});
for await (const event of stream) {
if (event.type === "content_block_delta") {
process.stdout.write(event.delta.text || "");
}
}
// Usage is automatically tracked after the stream completes
// Fast & cheap - great for simple tasks
await anthropic.messages.create({
model: "claude-haiku-4-5-20251001",
max_tokens: 1024,
messages: [{ role: "user", content: "Classify this text" }],
});Available models:
| Model | Best for | Cost (input/output per 1M tokens) |
|-------|----------|-----------------------------------|
| claude-opus-4-5-20251101 | Complex reasoning, enterprise | $5 / $25 |
| claude-sonnet-4-5-20250929 | Coding, agents, balanced | $3 / $15 |
| claude-haiku-4-5-20251001 | Fast responses, high volume | $1 / $5 |
Google Gemini
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
// Flagship model - #1 on LMArena
const pro = burnwise.google.wrapModel(
genAI.getGenerativeModel({ model: "gemini-3.0-pro" }),
{ feature: "analysis" }
);
// Fast & efficient - great default
const flash = burnwise.google.wrapModel(
genAI.getGenerativeModel({ model: "gemini-3.0-flash" }),
{ feature: "summarization" }
);
const result = await flash.generateContent("Summarize this article");Available models:
| Model | Best for | Cost (input/output per 1M tokens) |
|-------|----------|-----------------------------------|
| gemini-3-pro-preview | Complex reasoning, flagship | $2 / $12 |
| gemini-3-flash-preview | Fast, cost-efficient | $0.50 / $3 |
| gemini-2.5-pro | Stable production | $1.25 / $10 |
| gemini-2.5-flash | Fast responses | $0.30 / $2.50 |
| gemini-2.0-flash | Ultra-fast, cheap | $0.10 / $0.40 |
Mistral
import { Mistral } from "@mistralai/mistralai";
const mistral = burnwise.mistral.wrap(new Mistral(), {
feature: "code-completion",
});
// Flagship MoE model (675B params, 41B active)
await mistral.chat.complete({
model: "mistral-large-3",
messages: [{ role: "user", content: "Complex analysis" }],
});
// Small models - run locally or ultra-cheap
await mistral.chat.complete({
model: "ministral-8b", // or "ministral-3b" for even smaller
messages: [{ role: "user", content: "Quick task" }],
});
// Coding specialist
await mistral.chat.complete({
model: "devstral-2", // or "devstral-small-2" for efficiency
messages: [{ role: "user", content: "Write a function" }],
});Available models:
| Model | Best for | Cost (input/output per 1M tokens) |
|-------|----------|-----------------------------------|
| mistral-large-3 | Complex tasks, MoE flagship | $2 / $6 |
| mistral-medium-3 | Balanced performance | $1 / $3 |
| mistral-small-3 | Cost-efficient | $0.20 / $0.60 |
| ministral-8b | Edge/local deployment | $0.10 / $0.10 |
| ministral-3b | Ultra-lightweight | $0.04 / $0.04 |
| devstral-2 | Code agents (123B) | $0.50 / $1.50 |
| devstral-small-2 | Fast coding (24B) | $0.10 / $0.30 |
xAI (Grok)
import OpenAI from "openai";
const xai = burnwise.xai.wrap(
new OpenAI({
baseURL: "https://api.x.ai/v1",
apiKey: process.env.XAI_API_KEY!,
}),
{ feature: "reasoning" }
);
// Top reasoning model (#1 on LMArena Text Arena)
await xai.chat.completions.create({
model: "grok-4.1",
messages: [{ role: "user", content: "Complex reasoning task" }],
});
// Fast variant for agents (2M context!)
await xai.chat.completions.create({
model: "grok-4.1-fast",
messages: [{ role: "user", content: "Agent task" }],
});Available models:
| Model | Best for | Cost (input/output per 1M tokens) |
|-------|----------|-----------------------------------|
| grok-4 | Top reasoning, flagship | $3 / $15 |
| grok-4.1-fast | Agents, 2M context, ultra-cheap | $0.20 / $0.50 |
| grok-4-fast | Fast inference | $0.20 / $0.50 |
| grok-3 | Stable production | $3 / $15 |
| grok-3-mini | Cost-efficient | $0.30 / $0.50 |
| grok-2-vision | Vision tasks | $2 / $10 |
DeepSeek
import OpenAI from "openai";
const deepseek = burnwise.deepseek.wrap(
new OpenAI({
baseURL: "https://api.deepseek.com/v1",
apiKey: process.env.DEEPSEEK_API_KEY!,
}),
{ feature: "coding" }
);
// Latest hybrid model with thinking
await deepseek.chat.completions.create({
model: "deepseek-v3.2",
messages: [{ role: "user", content: "Code review" }],
});
// Reasoning model
await deepseek.chat.completions.create({
model: "deepseek-r1",
messages: [{ role: "user", content: "Solve this problem" }],
});Available models:
| Model | Best for | Cost (input/output per 1M tokens) |
|-------|----------|-----------------------------------|
| deepseek-v3.2 | Hybrid thinking + tools | $0.27 / $1.10 |
| deepseek-r1 | Deep reasoning | $0.55 / $2.19 |
| deepseek-chat | Fast chat | $0.14 / $0.28 |
Perplexity
import OpenAI from "openai";
const perplexity = burnwise.perplexity.wrap(
new OpenAI({
baseURL: "https://api.perplexity.ai",
apiKey: process.env.PERPLEXITY_API_KEY!,
}),
{ feature: "research" }
);
// Deep research with citations
await perplexity.chat.completions.create({
model: "sonar-deep-research",
messages: [{ role: "user", content: "Research this topic" }],
});
// Fast search
await perplexity.chat.completions.create({
model: "sonar",
messages: [{ role: "user", content: "Quick search" }],
});Available models:
| Model | Best for | Cost (input/output per 1M tokens) |
|-------|----------|-----------------------------------|
| sonar-pro | Pro search with citations | $3 / $15 |
| sonar-reasoning-pro | Reasoning + search | $2 / $8 |
| sonar-reasoning | Fast reasoning | $1 / $5 |
| sonar | Quick search | $1 / $1 |
Note: Perplexity also charges per-request fees based on search context ($5-$14 per 1K requests).
Multi-Modal Support
Burnwise tracks costs across all content types: text (LLM completions), images, videos, and audio.
Image Generation
import { calculateImageCost, IMAGE_PRICING } from "@burnwise/sdk";
// DALL-E 3 pricing (per image)
const cost = calculateImageCost("dall-e-3", 4, "1024x1024", "standard");
// → $0.16 (4 images × $0.04)
const hdCost = calculateImageCost("dall-e-3", 2, "1792x1024", "hd");
// → $0.24 (2 images × $0.12)Available image models:
| Model | Provider | Cost per image |
|-------|----------|----------------|
| dall-e-3 (1024x1024) | OpenAI | $0.04 |
| dall-e-3 (1024x1024, HD) | OpenAI | $0.08 |
| dall-e-3 (1792x1024) | OpenAI | $0.08 |
| dall-e-3 (1792x1024, HD) | OpenAI | $0.12 |
| dall-e-2 (1024x1024) | OpenAI | $0.02 |
| imagen-4.0-generate-001 | Google | $0.04 |
| imagen-4.0-ultra-generate-001 | Google | $0.06 |
| imagen-4.0-fast-generate-001 | Google | $0.02 |
| grok-2-image-1212 | xAI | $0.07 |
Video Generation
import { calculateVideoCost, VIDEO_PRICING } from "@burnwise/sdk";
// Veo 3.1 pricing (per second)
const cost = calculateVideoCost("veo-3.1-generate-preview", 8);
// → $3.20 (8 seconds × $0.40)
const fastCost = calculateVideoCost("veo-3.1-fast-generate-preview", 8);
// → $1.20 (8 seconds × $0.15)Available video models:
| Model | Provider | Cost per second |
|-------|----------|-----------------|
| veo-3.1-generate-preview | Google | $0.40 |
| veo-3.1-fast-generate-preview | Google | $0.15 |
| veo-3.0-generate-001 | Google | $0.40 |
| veo-3.0-fast-generate-001 | Google | $0.15 |
Audio (Text-to-Speech)
import { calculateAudioCost, AUDIO_PRICING } from "@burnwise/sdk";
// TTS pricing (per 1K characters)
const cost = calculateAudioCost("tts-1", 5000);
// → $0.075 (5K chars × $0.015)
const hdCost = calculateAudioCost("tts-1-hd", 5000);
// → $0.15 (5K chars × $0.030)Available audio models:
| Model | Provider | Cost |
|-------|----------|------|
| tts-1 | OpenAI | $0.015 / 1K chars |
| tts-1-hd | OpenAI | $0.030 / 1K chars |
| whisper-1 | OpenAI | $0.0001 / second |
Manual Multi-Modal Tracking
import { track } from "@burnwise/sdk";
// Track image generation
await track({
provider: "openai",
model: "dall-e-3",
contentType: "image",
feature: "avatar-generation",
imageCount: 4,
imageResolution: "1024x1024",
imageQuality: "hd",
costUsd: 0.32, // 4 × $0.08
latencyMs: 12000,
});
// Track video generation
await track({
provider: "google",
model: "veo-3.1-generate-preview",
contentType: "video",
feature: "marketing-video",
videoDurationSec: 15,
videoResolution: "1080p",
costUsd: 6.0, // 15s × $0.40
latencyMs: 45000,
});
// Track TTS
await track({
provider: "openai",
model: "tts-1-hd",
contentType: "audio",
feature: "podcast-narration",
audioCharacters: 10000,
audioVoice: "nova",
costUsd: 0.30, // 10K chars × $0.03
latencyMs: 5000,
});Streaming Support
All provider wrappers support streaming responses with automatic token tracking. The SDK intercepts the stream, captures usage data from stream events, and tracks costs when the stream completes.
How It Works
For OpenAI-compatible APIs (OpenAI, xAI, DeepSeek, Perplexity): The SDK automatically adds
stream_options: { include_usage: true }to ensure token counts are included in the final chunk.For Anthropic: Usage is extracted from
message_start(input tokens) andmessage_delta(output tokens) events.For Google Gemini: Both
generateContent()andgenerateContentStream()are wrapped, with usage extracted fromusageMetadata.For Mistral: The
chat.stream()method is wrapped to capture usage from stream chunks.
Examples
// OpenAI streaming
const stream = await openai.chat.completions.create({
model: "gpt-5.2",
messages: [{ role: "user", content: "Tell me a story" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
// Usage tracked automatically when loop completes
// Anthropic streaming
const stream = await anthropic.messages.create({
model: "claude-sonnet-4-5-20250929",
max_tokens: 1024,
messages: [{ role: "user", content: "Write a poem" }],
stream: true,
});
for await (const event of stream) {
if (event.type === "content_block_delta") {
process.stdout.write(event.delta.text || "");
}
}
// Usage tracked automatically
// Google Gemini streaming
const result = await model.generateContentStream("Explain quantum computing");
for await (const chunk of result.stream) {
process.stdout.write(chunk.text());
}
// Usage tracked automatically
// Mistral streaming
const stream = await mistral.chat.stream({
model: "mistral-large-3",
messages: [{ role: "user", content: "Hello!" }],
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
// Usage tracked automaticallyCost Optimization Tips
Burnwise automatically identifies optimization opportunities. Here are common patterns:
1. Use Mini/Small Models for Simple Tasks
// Instead of using gpt-5.2 for everything...
// Use mini models for:
// - Text classification
// - Simple extractions
// - Summarization
// - Formatting
// Save 90% on costs with comparable quality:
const model = isComplexTask ? "gpt-5.2" : "gpt-5.2-mini";2. Match Model to Task Complexity
| Task Type | Recommended Model | Why |
|-----------|-------------------|-----|
| Classification | gpt-5.2-mini, claude-haiku-4-5 | Simple pattern matching |
| Summarization | gemini-3.0-flash, ministral-8b | Fast, good enough |
| Code generation | claude-sonnet-4-5, devstral-2 | Specialized for code |
| Complex reasoning | gpt-5.2, claude-opus-4-5, grok-4.1 | Full capability needed |
| Research | sonar-deep-research | Built-in search |
| High volume | deepseek-chat, ministral-3b | Ultra-cheap |
3. Consider Open-Weight Alternatives
DeepSeek and Mistral offer open-weight models with excellent quality at fraction of the cost:
| Proprietary | Open Alternative | Cost Savings | |-------------|------------------|--------------| | GPT-5.2 | DeepSeek-V3.2 | ~95% cheaper | | Claude Sonnet | Devstral-2 | ~85% cheaper | | GPT-5.2-mini | Ministral-8b | ~80% cheaper |
Configuration Options
burnwise.init({
// Required: Your Burnwise API key
apiKey: "bw_live_xxx",
// Optional: API endpoint (defaults to Burnwise cloud)
endpoint: "https://api.burnwise.io",
// Optional: Enable debug logging
debug: false,
// Optional: Batch size for sending events (default: 10)
batchSize: 10,
// Optional: Flush interval in ms (default: 5000)
flushInterval: 5000,
});Wrap Options
const client = burnwise.openai.wrap(new OpenAI(), {
// Required: Feature name for cost attribution
feature: "chat-support",
// Optional: Project ID (auto-detected if not provided)
projectId: "proj_xxx",
// Optional: Additional metadata
metadata: {
environment: "production",
userId: "user_123",
},
});Manual Tracking
For advanced use cases, you can track events manually:
import { track } from "@burnwise/sdk";
await track({
provider: "openai",
model: "gpt-5.2",
feature: "custom-feature",
inputTokens: 100,
outputTokens: 50,
latencyMs: 1200,
metadata: { custom: "data" },
});Hierarchical Tracing (Agent Orchestration)
When building AI agents that call other agents, you often want to track costs both individually AND as a total for the orchestrating agent. Burnwise supports hierarchical tracing with automatic context propagation using Node.js AsyncLocalStorage.
Basic Usage
import { burnwise } from "@burnwise/sdk";
// Wrap a function to create a trace span
await burnwise.trace("idea-analysis", async () => {
// All LLM calls inside this function will be tagged with:
// - traceId: unique ID for this entire execution tree
// - spanId: unique ID for this specific span
// - spanName: "idea-analysis"
// - traceDepth: 0 (root level)
const market = await burnwise.trace("market-scan", async () => {
// This nested span will have:
// - same traceId as parent
// - its own spanId
// - parentSpanId pointing to "idea-analysis"
// - traceDepth: 1
return await marketAgent.run(idea);
});
const financial = await burnwise.trace("financial-analysis", async () => {
// Nesting works up to 3 levels deep
const projections = await burnwise.trace("projections", async () => {
return await projectionsAgent.run();
});
const risks = await burnwise.trace("risk-assessment", async () => {
return await riskAgent.run();
});
return { projections, risks };
});
return { market, financial };
});How It Works
Automatic Context Propagation: When you call
burnwise.trace(), it creates a trace context using Node.js AsyncLocalStorage. All LLM calls made within that function automatically inherit the trace context.Tree Structure: Each span has:
traceId: UUID shared by all spans in the same execution treespanId: UUID unique to this specific spanparentSpanId: UUID of the parent span (undefined for root)spanName: Human-readable name (e.g., "market-scan")traceDepth: Level in the tree (0 = root, max 3)
Depth Limit: Maximum 3 levels of nesting. If you exceed this, a warning is logged and the function runs without creating a new span.
Querying Traces
In the dashboard, you can:
- View all spans belonging to a trace grouped together
- See the total cost of an agent orchestration (sum of all spans in a trace)
- See individual sub-agent costs
- Visualize the call tree timeline
API Reference
// Async trace (most common)
const result = await burnwise.trace("span-name", async () => {
return await doSomething();
});
// Sync trace (for synchronous functions)
const result = burnwise.traceSync("span-name", () => {
return doSomethingSync();
});
// Trace with detailed result info
const { result, spanId, traceId, durationMs } = await burnwise.traceWithResult(
"span-name",
async () => {
return await doSomething();
}
);
// Check if currently inside a trace
if (burnwise.isInTrace()) {
console.log("Currently in a trace");
}
// Get current trace context
const context = burnwise.getTraceContext();
if (context) {
console.log(`Trace: ${context.traceId}, Span: ${context.spanId}`);
}Example: Multi-Agent System
import { burnwise } from "@burnwise/sdk";
import Anthropic from "@anthropic-ai/sdk";
burnwise.init({ apiKey: process.env.BURNWISE_API_KEY! });
const anthropic = burnwise.anthropic.wrap(new Anthropic(), {
feature: "idea-analysis",
});
async function analyzeIdea(idea: string) {
return burnwise.trace("idea-analysis", async () => {
// Market analysis sub-agent
const market = await burnwise.trace("market-scan", async () => {
const response = await anthropic.messages.create({
model: "claude-sonnet-4-5-20250929",
max_tokens: 2000,
messages: [{ role: "user", content: `Analyze market for: ${idea}` }],
});
return response.content[0].text;
});
// Competitor analysis sub-agent
const competitors = await burnwise.trace("competitor-analysis", async () => {
const response = await anthropic.messages.create({
model: "claude-sonnet-4-5-20250929",
max_tokens: 2000,
messages: [{ role: "user", content: `Find competitors for: ${idea}` }],
});
return response.content[0].text;
});
// Final synthesis
const synthesis = await burnwise.trace("synthesis", async () => {
const response = await anthropic.messages.create({
model: "claude-opus-4-5-20251101",
max_tokens: 4000,
messages: [{
role: "user",
content: `Synthesize analysis:\nMarket: ${market}\nCompetitors: ${competitors}`,
}],
});
return response.content[0].text;
});
return { market, competitors, synthesis };
});
}
// All 4 LLM calls will be tracked with the same traceId
// You can see total cost of "idea-analysis" and individual costs
const analysis = await analyzeIdea("AI-powered recipe generator");Privacy
Burnwise is designed with privacy as a core principle:
- We only track metadata: token counts, cost, model, latency
- We never read, store, or transmit prompt content
- We never read, store, or transmit completion content
- All data is encrypted in transit and at rest
- GDPR compliant
License
MIT
