@burnwise/sdk

v0.1.9

Published

2 months ago

Track and optimize your LLM costs with zero friction. Wrap OpenAI, Anthropic, Google, Mistral, xAI, and more.

@burnwise/sdk

Track and optimize your LLM costs with zero friction. Wrap your AI SDK client, use it as normal, and see exactly where your money goes.

Features

Zero friction: Wrap once, track forever
All major providers: OpenAI, Anthropic, Google, Mistral, xAI, DeepSeek, Perplexity
Multi-modal support: Track text, image, video, and audio generation costs
Streaming support: Full support for streaming responses with automatic token tracking
Privacy first: We only track metadata (tokens, cost, model, latency). We never read prompts or completions.
Feature tagging: Track costs by feature to understand your spending
Cost optimization insights: Discover cheaper alternatives with comparable quality
Real-time dashboard: See costs, anomalies, and optimization opportunities

Installation

npm install @burnwise/sdk

Quick Start

import { burnwise } from "@burnwise/sdk";
import OpenAI from "openai";

// Initialize with your API key
burnwise.init({
  apiKey: process.env.BURNWISE_API_KEY!,
});

// Wrap your client
const openai = burnwise.openai.wrap(new OpenAI(), {
  feature: "chat-support", // Tag for cost attribution
});

// Use normally - costs are tracked automatically!
const response = await openai.chat.completions.create({
  model: "gpt-5.2", // Latest GPT model
  messages: [{ role: "user", content: "Hello!" }],
});

Supported Providers

OpenAI

import OpenAI from "openai";

const openai = burnwise.openai.wrap(new OpenAI(), {
  feature: "chat-support",
});

// Flagship model - best quality
await openai.chat.completions.create({
  model: "gpt-5.2",
  messages: [{ role: "user", content: "Hello!" }],
});

// Mini model - 90% cheaper, great for simple tasks
await openai.chat.completions.create({
  model: "gpt-5.2-mini",
  messages: [{ role: "user", content: "Summarize this text" }],
});

// Reasoning model - for complex logic
await openai.chat.completions.create({
  model: "o3-mini", // or "o3" for max reasoning
  messages: [{ role: "user", content: "Solve this math problem" }],
});

Available models: | Model | Best for | Cost (input/output per 1M tokens) | |-------|----------|-----------------------------------| | gpt-5.2 | Complex tasks, creativity | $1.75 / $14 | | gpt-5.1 | Stable flagship | $1.25 / $10 | | gpt-5-mini | Simple tasks, high volume | $0.30 / $1 | | gpt-4.1 | Stable production, 1M context | $2 / $8 | | gpt-4.1-mini | Cost-efficient, long context | $0.40 / $1.60 | | gpt-4.1-nano | Ultra-fast, embeddings | $0.10 / $0.40 | | o3 | Advanced reasoning | $10 / $40 | | o3-mini | Fast reasoning, math/code | $1.10 / $4.40 | | o4-mini | Optimized reasoning | $1.10 / $4.40 |

Anthropic

import Anthropic from "@anthropic-ai/sdk";

const anthropic = burnwise.anthropic.wrap(new Anthropic(), {
  feature: "analysis",
});

// Most intelligent model
await anthropic.messages.create({
  model: "claude-opus-4-5-20251101",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Analyze this complex document" }],
});

// Best coding model - recommended for most use cases
await anthropic.messages.create({
  model: "claude-sonnet-4-5-20250929",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Review this code" }],
});

// Streaming - usage is tracked automatically when stream completes
const stream = await anthropic.messages.create({
  model: "claude-sonnet-4-5-20250929",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Write a story" }],
  stream: true,
});

for await (const event of stream) {
  if (event.type === "content_block_delta") {
    process.stdout.write(event.delta.text || "");
  }
}
// Usage is automatically tracked after the stream completes

// Fast & cheap - great for simple tasks
await anthropic.messages.create({
  model: "claude-haiku-4-5-20251001",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Classify this text" }],
});

Available models: | Model | Best for | Cost (input/output per 1M tokens) | |-------|----------|-----------------------------------| | claude-opus-4-5-20251101 | Complex reasoning, enterprise | $5 / $25 | | claude-sonnet-4-5-20250929 | Coding, agents, balanced | $3 / $15 | | claude-haiku-4-5-20251001 | Fast responses, high volume | $1 / $5 |

Google Gemini

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);

// Flagship model - #1 on LMArena
const pro = burnwise.google.wrapModel(
  genAI.getGenerativeModel({ model: "gemini-3.0-pro" }),
  { feature: "analysis" }
);

// Fast & efficient - great default
const flash = burnwise.google.wrapModel(
  genAI.getGenerativeModel({ model: "gemini-3.0-flash" }),
  { feature: "summarization" }
);

const result = await flash.generateContent("Summarize this article");

Available models: | Model | Best for | Cost (input/output per 1M tokens) | |-------|----------|-----------------------------------| | gemini-3-pro-preview | Complex reasoning, flagship | $2 / $12 | | gemini-3-flash-preview | Fast, cost-efficient | $0.50 / $3 | | gemini-2.5-pro | Stable production | $1.25 / $10 | | gemini-2.5-flash | Fast responses | $0.30 / $2.50 | | gemini-2.0-flash | Ultra-fast, cheap | $0.10 / $0.40 |

Mistral

import { Mistral } from "@mistralai/mistralai";

const mistral = burnwise.mistral.wrap(new Mistral(), {
  feature: "code-completion",
});

// Flagship MoE model (675B params, 41B active)
await mistral.chat.complete({
  model: "mistral-large-3",
  messages: [{ role: "user", content: "Complex analysis" }],
});

// Small models - run locally or ultra-cheap
await mistral.chat.complete({
  model: "ministral-8b", // or "ministral-3b" for even smaller
  messages: [{ role: "user", content: "Quick task" }],
});

// Coding specialist
await mistral.chat.complete({
  model: "devstral-2", // or "devstral-small-2" for efficiency
  messages: [{ role: "user", content: "Write a function" }],
});

Available models: | Model | Best for | Cost (input/output per 1M tokens) | |-------|----------|-----------------------------------| | mistral-large-3 | Complex tasks, MoE flagship | $2 / $6 | | mistral-medium-3 | Balanced performance | $1 / $3 | | mistral-small-3 | Cost-efficient | $0.20 / $0.60 | | ministral-8b | Edge/local deployment | $0.10 / $0.10 | | ministral-3b | Ultra-lightweight | $0.04 / $0.04 | | devstral-2 | Code agents (123B) | $0.50 / $1.50 | | devstral-small-2 | Fast coding (24B) | $0.10 / $0.30 |

xAI (Grok)

import OpenAI from "openai";

const xai = burnwise.xai.wrap(
  new OpenAI({
    baseURL: "https://api.x.ai/v1",
    apiKey: process.env.XAI_API_KEY!,
  }),
  { feature: "reasoning" }
);

// Top reasoning model (#1 on LMArena Text Arena)
await xai.chat.completions.create({
  model: "grok-4.1",
  messages: [{ role: "user", content: "Complex reasoning task" }],
});

// Fast variant for agents (2M context!)
await xai.chat.completions.create({
  model: "grok-4.1-fast",
  messages: [{ role: "user", content: "Agent task" }],
});

Available models: | Model | Best for | Cost (input/output per 1M tokens) | |-------|----------|-----------------------------------| | grok-4 | Top reasoning, flagship | $3 / $15 | | grok-4.1-fast | Agents, 2M context, ultra-cheap | $0.20 / $0.50 | | grok-4-fast | Fast inference | $0.20 / $0.50 | | grok-3 | Stable production | $3 / $15 | | grok-3-mini | Cost-efficient | $0.30 / $0.50 | | grok-2-vision | Vision tasks | $2 / $10 |

DeepSeek

import OpenAI from "openai";

const deepseek = burnwise.deepseek.wrap(
  new OpenAI({
    baseURL: "https://api.deepseek.com/v1",
    apiKey: process.env.DEEPSEEK_API_KEY!,
  }),
  { feature: "coding" }
);

// Latest hybrid model with thinking
await deepseek.chat.completions.create({
  model: "deepseek-v3.2",
  messages: [{ role: "user", content: "Code review" }],
});

// Reasoning model
await deepseek.chat.completions.create({
  model: "deepseek-r1",
  messages: [{ role: "user", content: "Solve this problem" }],
});

Available models: | Model | Best for | Cost (input/output per 1M tokens) | |-------|----------|-----------------------------------| | deepseek-v3.2 | Hybrid thinking + tools | $0.27 / $1.10 | | deepseek-r1 | Deep reasoning | $0.55 / $2.19 | | deepseek-chat | Fast chat | $0.14 / $0.28 |

Perplexity

import OpenAI from "openai";

const perplexity = burnwise.perplexity.wrap(
  new OpenAI({
    baseURL: "https://api.perplexity.ai",
    apiKey: process.env.PERPLEXITY_API_KEY!,
  }),
  { feature: "research" }
);

// Deep research with citations
await perplexity.chat.completions.create({
  model: "sonar-deep-research",
  messages: [{ role: "user", content: "Research this topic" }],
});

// Fast search
await perplexity.chat.completions.create({
  model: "sonar",
  messages: [{ role: "user", content: "Quick search" }],
});

Available models: | Model | Best for | Cost (input/output per 1M tokens) | |-------|----------|-----------------------------------| | sonar-pro | Pro search with citations | $3 / $15 | | sonar-reasoning-pro | Reasoning + search | $2 / $8 | | sonar-reasoning | Fast reasoning | $1 / $5 | | sonar | Quick search | $1 / $1 |

Note: Perplexity also charges per-request fees based on search context ($5-$14 per 1K requests).

Multi-Modal Support

Burnwise tracks costs across all content types: text (LLM completions), images, videos, and audio.

Image Generation

import { calculateImageCost, IMAGE_PRICING } from "@burnwise/sdk";

// DALL-E 3 pricing (per image)
const cost = calculateImageCost("dall-e-3", 4, "1024x1024", "standard");
// → $0.16 (4 images × $0.04)

const hdCost = calculateImageCost("dall-e-3", 2, "1792x1024", "hd");
// → $0.24 (2 images × $0.12)

Available image models: | Model | Provider | Cost per image | |-------|----------|----------------| | dall-e-3 (1024x1024) | OpenAI | $0.04 | | dall-e-3 (1024x1024, HD) | OpenAI | $0.08 | | dall-e-3 (1792x1024) | OpenAI | $0.08 | | dall-e-3 (1792x1024, HD) | OpenAI | $0.12 | | dall-e-2 (1024x1024) | OpenAI | $0.02 | | imagen-4.0-generate-001 | Google | $0.04 | | imagen-4.0-ultra-generate-001 | Google | $0.06 | | imagen-4.0-fast-generate-001 | Google | $0.02 | | grok-2-image-1212 | xAI | $0.07 |

Video Generation

import { calculateVideoCost, VIDEO_PRICING } from "@burnwise/sdk";

// Veo 3.1 pricing (per second)
const cost = calculateVideoCost("veo-3.1-generate-preview", 8);
// → $3.20 (8 seconds × $0.40)

const fastCost = calculateVideoCost("veo-3.1-fast-generate-preview", 8);
// → $1.20 (8 seconds × $0.15)

Available video models: | Model | Provider | Cost per second | |-------|----------|-----------------| | veo-3.1-generate-preview | Google | $0.40 | | veo-3.1-fast-generate-preview | Google | $0.15 | | veo-3.0-generate-001 | Google | $0.40 | | veo-3.0-fast-generate-001 | Google | $0.15 |

Audio (Text-to-Speech)

import { calculateAudioCost, AUDIO_PRICING } from "@burnwise/sdk";

// TTS pricing (per 1K characters)
const cost = calculateAudioCost("tts-1", 5000);
// → $0.075 (5K chars × $0.015)

const hdCost = calculateAudioCost("tts-1-hd", 5000);
// → $0.15 (5K chars × $0.030)

Available audio models: | Model | Provider | Cost | |-------|----------|------| | tts-1 | OpenAI | $0.015 / 1K chars | | tts-1-hd | OpenAI | $0.030 / 1K chars | | whisper-1 | OpenAI | $0.0001 / second |

Manual Multi-Modal Tracking

import { track } from "@burnwise/sdk";

// Track image generation
await track({
  provider: "openai",
  model: "dall-e-3",
  contentType: "image",
  feature: "avatar-generation",
  imageCount: 4,
  imageResolution: "1024x1024",
  imageQuality: "hd",
  costUsd: 0.32, // 4 × $0.08
  latencyMs: 12000,
});

// Track video generation
await track({
  provider: "google",
  model: "veo-3.1-generate-preview",
  contentType: "video",
  feature: "marketing-video",
  videoDurationSec: 15,
  videoResolution: "1080p",
  costUsd: 6.0, // 15s × $0.40
  latencyMs: 45000,
});

// Track TTS
await track({
  provider: "openai",
  model: "tts-1-hd",
  contentType: "audio",
  feature: "podcast-narration",
  audioCharacters: 10000,
  audioVoice: "nova",
  costUsd: 0.30, // 10K chars × $0.03
  latencyMs: 5000,
});

Streaming Support

All provider wrappers support streaming responses with automatic token tracking. The SDK intercepts the stream, captures usage data from stream events, and tracks costs when the stream completes.

How It Works

For OpenAI-compatible APIs (OpenAI, xAI, DeepSeek, Perplexity): The SDK automatically adds stream_options: { include_usage: true } to ensure token counts are included in the final chunk.
For Anthropic: Usage is extracted from message_start (input tokens) and message_delta (output tokens) events.
For Google Gemini: Both generateContent() and generateContentStream() are wrapped, with usage extracted from usageMetadata.
For Mistral: The chat.stream() method is wrapped to capture usage from stream chunks.

Examples

// OpenAI streaming
const stream = await openai.chat.completions.create({
  model: "gpt-5.2",
  messages: [{ role: "user", content: "Tell me a story" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
// Usage tracked automatically when loop completes

// Anthropic streaming
const stream = await anthropic.messages.create({
  model: "claude-sonnet-4-5-20250929",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Write a poem" }],
  stream: true,
});

for await (const event of stream) {
  if (event.type === "content_block_delta") {
    process.stdout.write(event.delta.text || "");
  }
}
// Usage tracked automatically

// Google Gemini streaming
const result = await model.generateContentStream("Explain quantum computing");
for await (const chunk of result.stream) {
  process.stdout.write(chunk.text());
}
// Usage tracked automatically

// Mistral streaming
const stream = await mistral.chat.stream({
  model: "mistral-large-3",
  messages: [{ role: "user", content: "Hello!" }],
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
// Usage tracked automatically

Cost Optimization Tips

Burnwise automatically identifies optimization opportunities. Here are common patterns:

1. Use Mini/Small Models for Simple Tasks

// Instead of using gpt-5.2 for everything...
// Use mini models for:
// - Text classification
// - Simple extractions
// - Summarization
// - Formatting

// Save 90% on costs with comparable quality:
const model = isComplexTask ? "gpt-5.2" : "gpt-5.2-mini";

2. Match Model to Task Complexity

| Task Type | Recommended Model | Why | |-----------|-------------------|-----| | Classification | gpt-5.2-mini, claude-haiku-4-5 | Simple pattern matching | | Summarization | gemini-3.0-flash, ministral-8b | Fast, good enough | | Code generation | claude-sonnet-4-5, devstral-2 | Specialized for code | | Complex reasoning | gpt-5.2, claude-opus-4-5, grok-4.1 | Full capability needed | | Research | sonar-deep-research | Built-in search | | High volume | deepseek-chat, ministral-3b | Ultra-cheap |

3. Consider Open-Weight Alternatives

DeepSeek and Mistral offer open-weight models with excellent quality at fraction of the cost:

| Proprietary | Open Alternative | Cost Savings | |-------------|------------------|--------------| | GPT-5.2 | DeepSeek-V3.2 | ~95% cheaper | | Claude Sonnet | Devstral-2 | ~85% cheaper | | GPT-5.2-mini | Ministral-8b | ~80% cheaper |

Configuration Options

burnwise.init({
  // Required: Your Burnwise API key
  apiKey: "bw_live_xxx",

  // Optional: API endpoint (defaults to Burnwise cloud)
  endpoint: "https://api.burnwise.io",

  // Optional: Enable debug logging
  debug: false,

  // Optional: Batch size for sending events (default: 10)
  batchSize: 10,

  // Optional: Flush interval in ms (default: 5000)
  flushInterval: 5000,
});

Wrap Options

const client = burnwise.openai.wrap(new OpenAI(), {
  // Required: Feature name for cost attribution
  feature: "chat-support",

  // Optional: Project ID (auto-detected if not provided)
  projectId: "proj_xxx",

  // Optional: Additional metadata
  metadata: {
    environment: "production",
    userId: "user_123",
  },
});

Manual Tracking

For advanced use cases, you can track events manually:

import { track } from "@burnwise/sdk";

await track({
  provider: "openai",
  model: "gpt-5.2",
  feature: "custom-feature",
  inputTokens: 100,
  outputTokens: 50,
  latencyMs: 1200,
  metadata: { custom: "data" },
});

Hierarchical Tracing (Agent Orchestration)

When building AI agents that call other agents, you often want to track costs both individually AND as a total for the orchestrating agent. Burnwise supports hierarchical tracing with automatic context propagation using Node.js AsyncLocalStorage.

Basic Usage

import { burnwise } from "@burnwise/sdk";

// Wrap a function to create a trace span
await burnwise.trace("idea-analysis", async () => {
  // All LLM calls inside this function will be tagged with:
  // - traceId: unique ID for this entire execution tree
  // - spanId: unique ID for this specific span
  // - spanName: "idea-analysis"
  // - traceDepth: 0 (root level)

  const market = await burnwise.trace("market-scan", async () => {
    // This nested span will have:
    // - same traceId as parent
    // - its own spanId
    // - parentSpanId pointing to "idea-analysis"
    // - traceDepth: 1
    return await marketAgent.run(idea);
  });

  const financial = await burnwise.trace("financial-analysis", async () => {
    // Nesting works up to 3 levels deep
    const projections = await burnwise.trace("projections", async () => {
      return await projectionsAgent.run();
    });

    const risks = await burnwise.trace("risk-assessment", async () => {
      return await riskAgent.run();
    });

    return { projections, risks };
  });

  return { market, financial };
});

How It Works

Automatic Context Propagation: When you call burnwise.trace(), it creates a trace context using Node.js AsyncLocalStorage. All LLM calls made within that function automatically inherit the trace context.
Tree Structure: Each span has:
- traceId: UUID shared by all spans in the same execution tree
- spanId: UUID unique to this specific span
- parentSpanId: UUID of the parent span (undefined for root)
- spanName: Human-readable name (e.g., "market-scan")
- traceDepth: Level in the tree (0 = root, max 3)
Depth Limit: Maximum 3 levels of nesting. If you exceed this, a warning is logged and the function runs without creating a new span.

Querying Traces

In the dashboard, you can:

View all spans belonging to a trace grouped together
See the total cost of an agent orchestration (sum of all spans in a trace)
See individual sub-agent costs
Visualize the call tree timeline

API Reference

// Async trace (most common)
const result = await burnwise.trace("span-name", async () => {
  return await doSomething();
});

// Sync trace (for synchronous functions)
const result = burnwise.traceSync("span-name", () => {
  return doSomethingSync();
});

// Trace with detailed result info
const { result, spanId, traceId, durationMs } = await burnwise.traceWithResult(
  "span-name",
  async () => {
    return await doSomething();
  }
);

// Check if currently inside a trace
if (burnwise.isInTrace()) {
  console.log("Currently in a trace");
}

// Get current trace context
const context = burnwise.getTraceContext();
if (context) {
  console.log(`Trace: ${context.traceId}, Span: ${context.spanId}`);
}

Example: Multi-Agent System

import { burnwise } from "@burnwise/sdk";
import Anthropic from "@anthropic-ai/sdk";

burnwise.init({ apiKey: process.env.BURNWISE_API_KEY! });

const anthropic = burnwise.anthropic.wrap(new Anthropic(), {
  feature: "idea-analysis",
});

async function analyzeIdea(idea: string) {
  return burnwise.trace("idea-analysis", async () => {
    // Market analysis sub-agent
    const market = await burnwise.trace("market-scan", async () => {
      const response = await anthropic.messages.create({
        model: "claude-sonnet-4-5-20250929",
        max_tokens: 2000,
        messages: [{ role: "user", content: `Analyze market for: ${idea}` }],
      });
      return response.content[0].text;
    });

    // Competitor analysis sub-agent
    const competitors = await burnwise.trace("competitor-analysis", async () => {
      const response = await anthropic.messages.create({
        model: "claude-sonnet-4-5-20250929",
        max_tokens: 2000,
        messages: [{ role: "user", content: `Find competitors for: ${idea}` }],
      });
      return response.content[0].text;
    });

    // Final synthesis
    const synthesis = await burnwise.trace("synthesis", async () => {
      const response = await anthropic.messages.create({
        model: "claude-opus-4-5-20251101",
        max_tokens: 4000,
        messages: [{
          role: "user",
          content: `Synthesize analysis:\nMarket: ${market}\nCompetitors: ${competitors}`,
        }],
      });
      return response.content[0].text;
    });

    return { market, competitors, synthesis };
  });
}

// All 4 LLM calls will be tracked with the same traceId
// You can see total cost of "idea-analysis" and individual costs
const analysis = await analyzeIdea("AI-powered recipe generator");

Privacy

Burnwise is designed with privacy as a core principle:

We only track metadata: token counts, cost, model, latency
We never read, store, or transmit prompt content
We never read, store, or transmit completion content
All data is encrypted in transit and at rest
GDPR compliant

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@burnwise/sdk

Features

Installation

Quick Start

Supported Providers

OpenAI

Anthropic

Google Gemini

Mistral

xAI (Grok)

DeepSeek

Perplexity

Multi-Modal Support

Image Generation

Video Generation

Audio (Text-to-Speech)

Manual Multi-Modal Tracking

Streaming Support

How It Works

Examples

Cost Optimization Tips

1. Use Mini/Small Models for Simple Tasks

2. Match Model to Task Complexity

3. Consider Open-Weight Alternatives

Configuration Options

Wrap Options

Manual Tracking

Hierarchical Tracing (Agent Orchestration)

Basic Usage

How It Works

Querying Traces

API Reference

Example: Multi-Agent System

Privacy

License