@reallyartificial/agent-obs

v0.1.0

Published

3 months ago

Zero-dependency agent observability SDK with OTel GenAI semantic conventions

0High
0Medium
0Low

josharsh

@reallyartificial/agent-obs

Trace what your AI agents do. Zero dependencies. No server. No vendor lock-in.

npm install @reallyartificial/agent-obs

What it does

Wraps your agent's LLM calls and tool invocations in spans. When the agent finishes, you get a trace tree with latency, token counts, and cost:

Trace d4f2a1 | research-agent | 1,247ms | $0.0034
  Agent: research-agent [ok] 1,247ms
    LLM: gpt-4o [ok] 823ms | 150 in / 50 out | $0.0009
    Tool: web-search [ok] 312ms
  Tokens: 150 in / 50 out | Cost: $0.0034 | Errors: 0

Quick start

import { createTracer, ConsoleExporter } from "@reallyartificial/agent-obs";

const tracer = createTracer({
  serviceName: "research-agent",
  exporters: [new ConsoleExporter()],
});

const agent = tracer.startAgentSpan("research-agent");

const llm = agent.startLLMSpan("gpt-4o", { provider: "openai" });
llm.setTokenUsage({ input: 150, output: 50 });
llm.end();

const tool = agent.startToolSpan("web-search", { input: { query: "AI news" } });
tool.setOutput({ results: ["..."] });
tool.end();

agent.end(); // flushes the full trace tree

Span types

AgentSpan — wraps an agent run. Nest LLM, tool, and sub-agent spans inside it.

const agent = tracer.startAgentSpan("orchestrator");
const subAgent = agent.startAgentSpan("researcher");

LLMSpan — a single LLM API call. Tracks model, tokens, and auto-calculates cost on end().

const llm = agent.startLLMSpan("claude-sonnet-4-5-20250929", {
  provider: "anthropic",
  temperature: 0.7,
});
llm.setTokenUsage({ input: 200, output: 100 });
llm.setFinishReason("stop");
llm.end(); // cost calculated automatically

ToolSpan — a tool/function call. Tracks input and output.

const tool = agent.startToolSpan("database-query", {
  input: { sql: "SELECT ..." },
});
tool.setOutput({ rows: 42 });
tool.end();

Any span can create any child span. Nesting is arbitrary.

Exporters

ConsoleExporter

Pretty-prints trace trees to stdout.

new ConsoleExporter()
new ConsoleExporter({ logger: myCustomLogger })

JSONExporter

Writes one JSON object per span (JSONL format).

// Callback
new JSONExporter({ onLine: (json) => sendToMyBackend(json) })

// File (Node.js only)
new JSONExporter({ filePath: "./traces.jsonl" })

MemoryExporter

Stores spans in memory. Built for tests.

const exporter = new MemoryExporter();

// after running your agent...
exporter.getSpans();            // all spans
exporter.getSpansByKind("llm"); // just LLM spans
exporter.getTraces();           // grouped by traceId
exporter.reset();               // clear

Custom exporters

Implement SpanExporter:

import type { SpanExporter, SpanData } from "@reallyartificial/agent-obs";

const myExporter: SpanExporter = {
  export(spans: SpanData[]) {
    // send to your backend, database, whatever
  },
  shutdown() {
    // optional cleanup
  },
};

export can return void or Promise<void>. Async exporters are properly awaited by tracer.flush() and tracer.shutdown().

Cost tracking

Built-in pricing for 24 models: GPT-4o/4/3.5, o1/o3, Claude Opus/Sonnet/Haiku, Gemini 2.0/1.5, Llama 3.1. Cost is auto-calculated when an LLM span ends.

Override or add models:

const tracer = createTracer({
  serviceName: "my-agent",
  pricing: {
    "my-fine-tune": { inputPer1M: 5, outputPer1M: 15 },
    "gpt-4o": { inputPer1M: 3, outputPer1M: 12 }, // override default
  },
});

Unknown models return zero cost. Traces never break because of missing pricing.

Error tracking

const tool = agent.startToolSpan("risky-operation");
try {
  const result = await doSomething();
  tool.setOutput(result);
} catch (err) {
  tool.setStatus("error", err.message);
} finally {
  tool.end();
}

Errors are counted in the trace summary and shown in console output.

Flush and shutdown

Traces auto-export when the root span ends. For long-running agents, flush manually:

await tracer.flush();    // export all buffered spans
await tracer.shutdown(); // flush + call exporter.shutdown()

Default attributes

Attributes applied to every span in a trace:

const tracer = createTracer({
  serviceName: "my-agent",
  defaultAttributes: {
    environment: "production",
    version: "1.2.0",
  },
});

service.name is always set from serviceName. All defaults propagate to child spans. Span-specific attributes override defaults.

Testing

Use TestClock for deterministic timestamps and MemoryExporter to inspect output:

import { createTracer, MemoryExporter, TestClock } from "@reallyartificial/agent-obs";

const clock = new TestClock(1000);
const exporter = new MemoryExporter();
const tracer = createTracer({
  serviceName: "test",
  exporters: [exporter],
  clock,
});

const agent = tracer.startAgentSpan("agent");
clock.advance(500);
agent.end();

const spans = exporter.getSpans();
expect(spans[0].endTime - spans[0].startTime).toBe(500);

OTel compatibility

Span attributes follow OpenTelemetry GenAI semantic conventions:

| Attribute | Set by | |-----------|--------| | gen_ai.operation.name | All spans | | gen_ai.request.model | LLMSpan | | gen_ai.system | LLMSpan (provider) | | gen_ai.usage.input_tokens | LLMSpan | | gen_ai.usage.output_tokens | LLMSpan | | gen_ai.usage.cost | LLMSpan (auto) | | gen_ai.agent.name | AgentSpan | | gen_ai.tool.name | ToolSpan | | gen_ai.tool.input | ToolSpan | | gen_ai.tool.output | ToolSpan |

This means your spans are already formatted for any OTel-compatible backend (Jaeger, Grafana Tempo, Datadog) if you write a custom exporter.

API reference

`createTracer(config: TracerConfig): Tracer`

| Config field | Type | Default | |-------------|------|---------| | serviceName | string | required | | exporters | SpanExporter[] | [] | | pricing | Record<string, ModelPricing> | built-in table | | defaultAttributes | Record<string, unknown> | {} | | clock | Clock | RealClock |

Tracer

| Method | Returns | |--------|---------| | startAgentSpan(name, opts?) | AgentSpan | | startLLMSpan(model, opts?) | LLMSpan | | startToolSpan(name, opts?) | ToolSpan | | startSpan(name) | BaseSpan | | flush() | Promise<void> | | shutdown() | Promise<void> |

BaseSpan (all spans)

| Method | Description | |--------|-------------| | setAttribute(key, value) | Set an attribute | | addEvent(name, attrs?) | Add a timestamped event | | setStatus(status, message?) | Set "ok", "error", or "unset" | | end() | End the span | | startAgentSpan(name, opts?) | Create child agent span | | startLLMSpan(model, opts?) | Create child LLM span | | startToolSpan(name, opts?) | Create child tool span | | startCustomSpan(name, attrs?) | Create child custom span | | toJSON() | Serialize to SpanData |

LLMSpan (extends BaseSpan)

| Method | Description | |--------|-------------| | setTokenUsage({ input, output, total? }) | Record token counts | | setResponseModel(model) | Actual model in response | | setFinishReason(...reasons) | e.g. "stop", "length" |

ToolSpan (extends BaseSpan)

| Method | Description | |--------|-------------| | setInput(value) | Record tool input | | setOutput(value) | Record tool output |

All setters are no-ops after end() is called.

Requirements

Node.js >= 18 (or any runtime with crypto.randomUUID)
TypeScript >= 5.4 (for development)

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@reallyartificial/agent-obs

What it does

Quick start

Span types

Exporters

ConsoleExporter

JSONExporter

MemoryExporter

Custom exporters

Cost tracking

Error tracking

Flush and shutdown

Default attributes

Testing

OTel compatibility

API reference

createTracer(config: TracerConfig): Tracer

Tracer

BaseSpan (all spans)

LLMSpan (extends BaseSpan)

ToolSpan (extends BaseSpan)

Requirements

License

`createTracer(config: TracerConfig): Tracer`