llm-observatory
v0.5.0
Published
LLM Observatory SDK - Monitor your LLM costs, latency and quality
Downloads
651
Maintainers
Readme
LLM Observatory SDK
Early Access — This SDK is in early access. The API may change between versions. We'd love your feedback — report an issue if you run into anything.
The official Node.js SDK for LLM Observatory — monitor your LLM costs, latency, and quality with zero code changes.
npm install llm-observatoryQuick Start
import { Observatory } from 'llm-observatory';
import Anthropic from '@anthropic-ai/sdk';
const observatory = new Observatory({
apiKey: 'lo_your_api_key',
});
const anthropic = new Anthropic();
const traced = observatory.anthropic(anthropic);
// Use exactly like the original client — traces are captured automatically
const response = await traced.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello!' }],
});That's it. Every LLM call is now traced with model, tokens, cost, latency, and full input/output.
For full documentation visit the docs.
Supported Providers
| Provider | Wrapper | Streaming | Tool Calls |
|----------|---------|-----------|------------|
| Anthropic | observatory.anthropic(client) | stream: true, .messages.stream() | Yes |
| OpenAI | observatory.openai(client) | stream: true | Yes |
| Google Gemini | observatory.gemini(client) | generateContentStream(), chat.sendMessageStream() | Yes |
Install
npm install llm-observatoryConfiguration
const observatory = new Observatory({
apiKey: 'lo_...', // Required for Cloud. Optional for self-hosted.
captureInput: true, // Capture input messages (default: true)
captureOutput: true, // Capture output content (default: true)
flushInterval: 5000, // Flush buffer every N ms (default: 5000)
maxBatchSize: 25, // Max traces before auto-flush (default: 25)
debug: false, // Log SDK activity to console (default: false)
promptSlug: 'my-agent', // Default prompt slug for all traces
tags: ['production'], // Default tags (merged with per-call tags)
metadata: { service: 'backend' }, // Default metadata (merged with per-call metadata)
maxBudgetUsd: 10.00, // Global budget limit (throws BudgetExceededError)
tagBudgets: { chat: 5.00 }, // Per-tag budget limits
onBudgetWarning: (info) => { ... }, // Warning callback at 80%, 90%, 95%
});Note:
baseUrldefaults to Observatory Cloud (https://api.myllmobservatory.com). You don't need to set it unless you're self-hosting — see below.
Cost Budget
Set spending limits to prevent runaway costs. The SDK estimates cost locally using a built-in pricing table and throws BudgetExceededError before making any call that would exceed the limit.
Global Budget
import { Observatory, BudgetExceededError } from 'llm-observatory';
const observatory = new Observatory({
apiKey: 'lo_...',
maxBudgetUsd: 10.00, // Max $10 total spend
});
const traced = observatory.openai(new OpenAI());
try {
const response = await traced.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }],
});
} catch (err) {
if (err instanceof BudgetExceededError) {
console.log(`Budget exceeded: ${err.message}`);
// err.currentUsd, err.limitUsd, err.estimatedCostUsd
}
}Per-Tag Budgets
Limit spend per use case (e.g., chat, summarization, tools):
const observatory = new Observatory({
apiKey: 'lo_...',
maxBudgetUsd: 50.00,
tagBudgets: {
'chat': 20.00,
'summarization': 10.00,
'embedding': 5.00,
},
});
// This call's cost counts against the 'chat' tag budget
await traced.chat.completions.create({
model: 'gpt-4o',
messages: [...],
__tags: ['chat'],
});Budget Warnings
Get notified before hitting the limit:
const observatory = new Observatory({
apiKey: 'lo_...',
maxBudgetUsd: 100.00,
warningThresholds: [0.8, 0.9, 0.95], // default
onBudgetWarning: (info) => {
console.warn(`${info.type} budget at ${info.percentUsed}%: $${info.currentUsd}/$${info.limitUsd}`);
// info.type: 'global' | 'tag'
// info.tag: string (only for tag warnings)
},
});Budget Snapshot
Check budget status at any time:
const budget = observatory.getBudget();
// { totalCostUsd: 4.52, tagCosts: { chat: 3.10, tools: 1.42 }, globalLimit: 10, tagLimits: { chat: 5 } }Budget works with all providers (OpenAI, Anthropic, Gemini), all modes (streaming, non-streaming), and self-hosted deployments. Cost is estimated locally using the built-in pricing table and synced with server data on each flush.
Privacy & Data
Your prompts and completions are yours. Observatory is built with privacy as a core principle:
- Full control over what's captured — disable input/output capture at any time with
captureInput: falseandcaptureOutput: false. Traces still record model, tokens, cost, and latency — just without the actual content. - We never sell, share, or train on your data — your traces are stored securely and only accessible to your team.
- Self-hosted mode — point the SDK to your own backend and only metrics are sent (no prompts or completions leave your infrastructure).
- Encryption in transit — all data is sent over HTTPS with TLS encryption.
Deployment Modes
The SDK supports three deployment modes:
| Mode | apiKey | baseUrl | What's sent |
|------|----------|-----------|-------------|
| Observatory Cloud (default) | Required (lo_...) | Not needed | Full traces: model, tokens, cost, latency, input, output |
| Self-hosted with auth | Optional (your own keys) | Your backend URL | Metrics only: model, tokens, cost, latency, status, tags |
| Self-hosted no auth | Not needed | Your backend URL | Metrics only: model, tokens, cost, latency, status, tags |
Observatory Cloud
The default mode. Sign up at myllmobservatory.com to get your lo_... API key. Full traces including input/output are captured.
Self-Hosted Mode
If you prefer to run your own Observatory backend, set a custom baseUrl:
// With your own auth
const observatory = new Observatory({
apiKey: 'my-internal-key',
baseUrl: 'https://observatory.your-company.com',
});
// Without auth (internal networks, dev environments)
const observatory = new Observatory({
baseUrl: 'http://localhost:3001',
});In self-hosted mode, the SDK sends metrics only — model, tokens, cost, latency, status, tags, and metadata. Input and output content (your prompts and completions) are never sent to custom endpoints, keeping your data fully within your infrastructure.
What "metrics only" means concretely:
| Data | Cloud | Self-Hosted | |------|-------|-------------| | Model name | Sent | Sent | | Token counts (input, output, cache, reasoning) | Sent | Sent | | Latency (start/end time) | Sent | Sent | | Status (success/error/timeout) | Sent | Sent | | Tags & metadata | Sent | Sent | | Prompt slug & version | Sent | Sent | | Input messages (your prompts) | Sent | Never sent | | Output content (completions) | Sent | Never sent | | Tool call details | Sent | Never sent |
When you're ready for full trace visibility, prompt debugging, eval analytics, and team collaboration — upgrade to Observatory Cloud and simply remove the baseUrl parameter.
Dashboard
The SDK captures traces automatically, but the real power is in the Observatory Dashboard. Sign up for a free account to get:
- Real-time trace viewer — inspect every LLM call with full request/response detail
- Cost analytics — track spend across models and providers, set budget alerts
- Latency monitoring — p50/p95/p99 latencies, compare providers side by side
- Prompt management — version and track prompt templates with
__promptSlug - Team collaboration — shared dashboards, role-based access, and audit logs
- Eval analytics — run automated quality evaluations on your traces
Support
Found a bug or have a feature request? Report an issue on our website.
License
MIT
