llm-observatory

v0.6.0

Published

3 months ago

LLM Observatory SDK - Monitor your LLM costs, latency and quality

0High
0Medium
0Low

arieldelg

llm observability monitoring openai anthropic gemini google ai

LLM Observatory SDK

Early Access — This SDK is in early access. The API may change between versions. We'd love your feedback — report an issue if you run into anything.

The official Node.js SDK for LLM Observatory — monitor your LLM costs, latency, and quality with zero code changes.

npm install llm-observatory

Quick Start

import { Observatory } from 'llm-observatory';
import Anthropic from '@anthropic-ai/sdk';

const observatory = new Observatory({
  apiKey: 'lo_your_api_key',
});

const anthropic = new Anthropic();
const traced = observatory.anthropic(anthropic);

// Use exactly like the original client — traces are captured automatically
const response = await traced.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello!' }],
});

That's it. Every LLM call is now traced with model, tokens, cost, latency, and full input/output.

For full documentation visit the docs.

Supported Providers

| Provider | Wrapper | Streaming | Tool Calls | |----------|---------|-----------|------------| | Anthropic | observatory.anthropic(client) | stream: true, .messages.stream() | Yes | | OpenAI | observatory.openai(client) | stream: true | Yes | | Google Gemini | observatory.gemini(client) | generateContentStream(), chat.sendMessageStream() | Yes |

Install

npm install llm-observatory

Configuration

const observatory = new Observatory({
  apiKey: 'lo_...',                    // Required for Cloud. Optional for self-hosted.
  captureInput: true,                  // Capture input messages (default: true)
  captureOutput: true,                 // Capture output content (default: true)
  flushInterval: 5000,                 // Flush buffer every N ms (default: 5000)
  maxBatchSize: 25,                    // Max traces before auto-flush (default: 25)
  debug: false,                        // Log SDK activity to console (default: false)
  promptSlug: 'my-agent',             // Default prompt slug for all traces
  tags: ['production'],                // Default tags (merged with per-call tags)
  metadata: { service: 'backend' },    // Default metadata (merged with per-call metadata)
  maxBudgetUsd: 10.00,                // Global budget limit (throws BudgetExceededError)
  tagBudgets: { chat: 5.00 },         // Per-tag budget limits
  onBudgetWarning: (info) => { ... }, // Warning callback at 80%, 90%, 95%
});

Note: baseUrl defaults to Observatory Cloud (https://api.myllmobservatory.com). You don't need to set it unless you're self-hosting — see below.

Cost Budget

Set spending limits to prevent runaway costs. The SDK estimates cost locally using a built-in pricing table and throws BudgetExceededError before making any call that would exceed the limit.

Global Budget

import { Observatory, BudgetExceededError } from 'llm-observatory';

const observatory = new Observatory({
  apiKey: 'lo_...',
  maxBudgetUsd: 10.00,  // Max $10 total spend
});

const traced = observatory.openai(new OpenAI());

try {
  const response = await traced.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: 'Hello!' }],
  });
} catch (err) {
  if (err instanceof BudgetExceededError) {
    console.log(`Budget exceeded: ${err.message}`);
    // err.currentUsd, err.limitUsd, err.estimatedCostUsd
  }
}

Per-Tag Budgets

Limit spend per use case (e.g., chat, summarization, tools):

const observatory = new Observatory({
  apiKey: 'lo_...',
  maxBudgetUsd: 50.00,
  tagBudgets: {
    'chat': 20.00,
    'summarization': 10.00,
    'embedding': 5.00,
  },
});

// This call's cost counts against the 'chat' tag budget
await traced.chat.completions.create({
  model: 'gpt-4o',
  messages: [...],
  __tags: ['chat'],
});

Budget Warnings

Get notified before hitting the limit:

const observatory = new Observatory({
  apiKey: 'lo_...',
  maxBudgetUsd: 100.00,
  warningThresholds: [0.8, 0.9, 0.95],  // default
  onBudgetWarning: (info) => {
    console.warn(`${info.type} budget at ${info.percentUsed}%: $${info.currentUsd}/$${info.limitUsd}`);
    // info.type: 'global' | 'tag'
    // info.tag: string (only for tag warnings)
  },
});

Budget Snapshot

Check budget status at any time:

const budget = observatory.getBudget();
// { totalCostUsd: 4.52, tagCosts: { chat: 3.10, tools: 1.42 }, globalLimit: 10, tagLimits: { chat: 5 } }

Budget works with all providers (OpenAI, Anthropic, Gemini), all modes (streaming, non-streaming), and self-hosted deployments. Cost is estimated locally using the built-in pricing table and synced with server data on each flush.

Privacy & Data

Your prompts and completions are yours. Observatory is built with privacy as a core principle:

Full control over what's captured — disable input/output capture at any time with captureInput: false and captureOutput: false. Traces still record model, tokens, cost, and latency — just without the actual content.
We never sell, share, or train on your data — your traces are stored securely and only accessible to your team.
Self-hosted mode — point the SDK to your own backend and only metrics are sent (no prompts or completions leave your infrastructure).
Encryption in transit — all data is sent over HTTPS with TLS encryption.

Deployment Modes

The SDK supports three deployment modes:

| Mode | apiKey | baseUrl | What's sent | |------|----------|-----------|-------------| | Observatory Cloud (default) | Required (lo_...) | Not needed | Full traces: model, tokens, cost, latency, input, output | | Self-hosted with auth | Optional (your own keys) | Your backend URL | Metrics only: model, tokens, cost, latency, status, tags | | Self-hosted no auth | Not needed | Your backend URL | Metrics only: model, tokens, cost, latency, status, tags |

Observatory Cloud

The default mode. Sign up at myllmobservatory.com to get your lo_... API key. Full traces including input/output are captured.

Self-Hosted Mode

If you prefer to run your own Observatory backend, set a custom baseUrl:

// With your own auth
const observatory = new Observatory({
  apiKey: 'my-internal-key',
  baseUrl: 'https://observatory.your-company.com',
});

// Without auth (internal networks, dev environments)
const observatory = new Observatory({
  baseUrl: 'http://localhost:3001',
});

In self-hosted mode, the SDK sends metrics only — model, tokens, cost, latency, status, tags, and metadata. Input and output content (your prompts and completions) are never sent to custom endpoints, keeping your data fully within your infrastructure.

What "metrics only" means concretely:

| Data | Cloud | Self-Hosted | |------|-------|-------------| | Model name | Sent | Sent | | Token counts (input, output, cache, reasoning) | Sent | Sent | | Latency (start/end time) | Sent | Sent | | Status (success/error/timeout) | Sent | Sent | | Tags & metadata | Sent | Sent | | Prompt slug & version | Sent | Sent | | Input messages (your prompts) | Sent | Never sent | | Output content (completions) | Sent | Never sent | | Tool call details | Sent | Never sent |

When you're ready for full trace visibility, prompt debugging, eval analytics, and team collaboration — upgrade to Observatory Cloud and simply remove the baseUrl parameter.

Dashboard

The SDK captures traces automatically, but the real power is in the Observatory Dashboard. Sign up for a free account to get:

Real-time trace viewer — inspect every LLM call with full request/response detail
Cost analytics — track spend across models and providers, set budget alerts
Latency monitoring — p50/p95/p99 latencies, compare providers side by side
Prompt management — version and track prompt templates with __promptSlug
Team collaboration — shared dashboards, role-based access, and audit logs
Eval analytics — run automated quality evaluations on your traces

Get started free →

Support

Found a bug or have a feature request? Report an issue on our website.

License

MIT