@reallyartificial/agent-obs
v0.1.0
Published
Zero-dependency agent observability SDK with OTel GenAI semantic conventions
Readme
@reallyartificial/agent-obs
Trace what your AI agents do. Zero dependencies. No server. No vendor lock-in.
npm install @reallyartificial/agent-obsWhat it does
Wraps your agent's LLM calls and tool invocations in spans. When the agent finishes, you get a trace tree with latency, token counts, and cost:
Trace d4f2a1 | research-agent | 1,247ms | $0.0034
Agent: research-agent [ok] 1,247ms
LLM: gpt-4o [ok] 823ms | 150 in / 50 out | $0.0009
Tool: web-search [ok] 312ms
Tokens: 150 in / 50 out | Cost: $0.0034 | Errors: 0Quick start
import { createTracer, ConsoleExporter } from "@reallyartificial/agent-obs";
const tracer = createTracer({
serviceName: "research-agent",
exporters: [new ConsoleExporter()],
});
const agent = tracer.startAgentSpan("research-agent");
const llm = agent.startLLMSpan("gpt-4o", { provider: "openai" });
llm.setTokenUsage({ input: 150, output: 50 });
llm.end();
const tool = agent.startToolSpan("web-search", { input: { query: "AI news" } });
tool.setOutput({ results: ["..."] });
tool.end();
agent.end(); // flushes the full trace treeSpan types
AgentSpan — wraps an agent run. Nest LLM, tool, and sub-agent spans inside it.
const agent = tracer.startAgentSpan("orchestrator");
const subAgent = agent.startAgentSpan("researcher");LLMSpan — a single LLM API call. Tracks model, tokens, and auto-calculates cost on end().
const llm = agent.startLLMSpan("claude-sonnet-4-5-20250929", {
provider: "anthropic",
temperature: 0.7,
});
llm.setTokenUsage({ input: 200, output: 100 });
llm.setFinishReason("stop");
llm.end(); // cost calculated automaticallyToolSpan — a tool/function call. Tracks input and output.
const tool = agent.startToolSpan("database-query", {
input: { sql: "SELECT ..." },
});
tool.setOutput({ rows: 42 });
tool.end();Any span can create any child span. Nesting is arbitrary.
Exporters
ConsoleExporter
Pretty-prints trace trees to stdout.
new ConsoleExporter()
new ConsoleExporter({ logger: myCustomLogger })JSONExporter
Writes one JSON object per span (JSONL format).
// Callback
new JSONExporter({ onLine: (json) => sendToMyBackend(json) })
// File (Node.js only)
new JSONExporter({ filePath: "./traces.jsonl" })MemoryExporter
Stores spans in memory. Built for tests.
const exporter = new MemoryExporter();
// after running your agent...
exporter.getSpans(); // all spans
exporter.getSpansByKind("llm"); // just LLM spans
exporter.getTraces(); // grouped by traceId
exporter.reset(); // clearCustom exporters
Implement SpanExporter:
import type { SpanExporter, SpanData } from "@reallyartificial/agent-obs";
const myExporter: SpanExporter = {
export(spans: SpanData[]) {
// send to your backend, database, whatever
},
shutdown() {
// optional cleanup
},
};export can return void or Promise<void>. Async exporters are properly awaited by tracer.flush() and tracer.shutdown().
Cost tracking
Built-in pricing for 24 models: GPT-4o/4/3.5, o1/o3, Claude Opus/Sonnet/Haiku, Gemini 2.0/1.5, Llama 3.1. Cost is auto-calculated when an LLM span ends.
Override or add models:
const tracer = createTracer({
serviceName: "my-agent",
pricing: {
"my-fine-tune": { inputPer1M: 5, outputPer1M: 15 },
"gpt-4o": { inputPer1M: 3, outputPer1M: 12 }, // override default
},
});Unknown models return zero cost. Traces never break because of missing pricing.
Error tracking
const tool = agent.startToolSpan("risky-operation");
try {
const result = await doSomething();
tool.setOutput(result);
} catch (err) {
tool.setStatus("error", err.message);
} finally {
tool.end();
}Errors are counted in the trace summary and shown in console output.
Flush and shutdown
Traces auto-export when the root span ends. For long-running agents, flush manually:
await tracer.flush(); // export all buffered spans
await tracer.shutdown(); // flush + call exporter.shutdown()Default attributes
Attributes applied to every span in a trace:
const tracer = createTracer({
serviceName: "my-agent",
defaultAttributes: {
environment: "production",
version: "1.2.0",
},
});service.name is always set from serviceName. All defaults propagate to child spans. Span-specific attributes override defaults.
Testing
Use TestClock for deterministic timestamps and MemoryExporter to inspect output:
import { createTracer, MemoryExporter, TestClock } from "@reallyartificial/agent-obs";
const clock = new TestClock(1000);
const exporter = new MemoryExporter();
const tracer = createTracer({
serviceName: "test",
exporters: [exporter],
clock,
});
const agent = tracer.startAgentSpan("agent");
clock.advance(500);
agent.end();
const spans = exporter.getSpans();
expect(spans[0].endTime - spans[0].startTime).toBe(500);OTel compatibility
Span attributes follow OpenTelemetry GenAI semantic conventions:
| Attribute | Set by |
|-----------|--------|
| gen_ai.operation.name | All spans |
| gen_ai.request.model | LLMSpan |
| gen_ai.system | LLMSpan (provider) |
| gen_ai.usage.input_tokens | LLMSpan |
| gen_ai.usage.output_tokens | LLMSpan |
| gen_ai.usage.cost | LLMSpan (auto) |
| gen_ai.agent.name | AgentSpan |
| gen_ai.tool.name | ToolSpan |
| gen_ai.tool.input | ToolSpan |
| gen_ai.tool.output | ToolSpan |
This means your spans are already formatted for any OTel-compatible backend (Jaeger, Grafana Tempo, Datadog) if you write a custom exporter.
API reference
createTracer(config: TracerConfig): Tracer
| Config field | Type | Default |
|-------------|------|---------|
| serviceName | string | required |
| exporters | SpanExporter[] | [] |
| pricing | Record<string, ModelPricing> | built-in table |
| defaultAttributes | Record<string, unknown> | {} |
| clock | Clock | RealClock |
Tracer
| Method | Returns |
|--------|---------|
| startAgentSpan(name, opts?) | AgentSpan |
| startLLMSpan(model, opts?) | LLMSpan |
| startToolSpan(name, opts?) | ToolSpan |
| startSpan(name) | BaseSpan |
| flush() | Promise<void> |
| shutdown() | Promise<void> |
BaseSpan (all spans)
| Method | Description |
|--------|-------------|
| setAttribute(key, value) | Set an attribute |
| addEvent(name, attrs?) | Add a timestamped event |
| setStatus(status, message?) | Set "ok", "error", or "unset" |
| end() | End the span |
| startAgentSpan(name, opts?) | Create child agent span |
| startLLMSpan(model, opts?) | Create child LLM span |
| startToolSpan(name, opts?) | Create child tool span |
| startCustomSpan(name, attrs?) | Create child custom span |
| toJSON() | Serialize to SpanData |
LLMSpan (extends BaseSpan)
| Method | Description |
|--------|-------------|
| setTokenUsage({ input, output, total? }) | Record token counts |
| setResponseModel(model) | Actual model in response |
| setFinishReason(...reasons) | e.g. "stop", "length" |
ToolSpan (extends BaseSpan)
| Method | Description |
|--------|-------------|
| setInput(value) | Record tool input |
| setOutput(value) | Record tool output |
All setters are no-ops after end() is called.
Requirements
- Node.js >= 18 (or any runtime with
crypto.randomUUID) - TypeScript >= 5.4 (for development)
License
MIT
