kelet

v0.11.0

Published

a month ago

Kelet TypeScript SDK for AI observability

Kelet analyzes production failures 24/7. Each trace takes 15-25 minutes to debug manually—finding patterns requires analyzing hundreds of traces. That's weeks of engineering time per root cause. Kelet does this automatically, surfacing issues like data imbalance, concept drift, prompt poisoning, and model laziness hidden in production noise.

What Kelet Does

Kelet runs 24/7 analyzing every production trace:

Captures every interaction, user signal, and failure context automatically
Analyzes hundreds of failures in parallel to detect repeatable patterns
Identifies root causes (data issues, prompt problems, model behavior)
Delivers targeted fixes, not just dashboards

Unlike observability tools that show you data, Kelet analyzes it and tells you what to fix.

Not magic: Kelet is in alpha. Won't catch everything yet, needs your guidance sometimes. But it's already doing analysis that would take weeks manually.

Three lines of code to start.

Installation

npm install kelet @opentelemetry/api @opentelemetry/sdk-trace-node @opentelemetry/exporter-trace-otlp-http

Or with your preferred package manager:

# pnpm
pnpm add kelet @opentelemetry/api @opentelemetry/sdk-trace-node @opentelemetry/exporter-trace-otlp-http

# yarn
yarn add kelet @opentelemetry/api @opentelemetry/sdk-trace-node @opentelemetry/exporter-trace-otlp-http

# bun
bun add kelet @opentelemetry/api @opentelemetry/sdk-trace-node @opentelemetry/exporter-trace-otlp-http

For Vercel AI SDK reasoning capture, the right peers depend on your ai version — see Reasoning Capture for Vercel AI SDK below.

Set your API key:

export KELET_API_KEY=your_api_key
export KELET_PROJECT=production  # Required — create a project at console.kelet.ai

Or configure in code:

import { configure } from 'kelet';

configure({
  apiKey: 'your_api_key',
  project: 'production',  // Groups traces by project/environment
});

Quick Start

Node.js / General Setup

import { configure } from 'kelet';

// Set up tracing (once at app startup)
// Creates exporter + span processor + provider automatically
configure({
  apiKey: process.env.KELET_API_KEY,
  project: 'production',
});

Works with any OpenTelemetry-instrumented framework or library.

If you already have a TracerProvider, pass it in:

import { configure } from 'kelet';
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';

const provider = new NodeTracerProvider();
provider.register();

configure({
  apiKey: process.env.KELET_API_KEY,
  project: 'production',
  tracerProvider: provider,
});

Next.js Setup

1. Install dependencies:

npm install kelet @vercel/otel @opentelemetry/api @opentelemetry/exporter-trace-otlp-http

2. Create instrumentation.ts in your project root:

import { registerOTel } from '@vercel/otel';
import { KeletExporter } from 'kelet';

export function register() {
  registerOTel({
    serviceName: 'my-app',
    traceExporter: new KeletExporter({
      apiKey: process.env.KELET_API_KEY,
      project: 'production',
    }),
  });
}

3. Enable instrumentation in next.config.js:

/** @type {import('next').NextConfig} */
const nextConfig = {
  experimental: {
    instrumentationHook: true,
  },
};

module.exports = nextConfig;

Vercel AI SDK

Enable telemetry in your AI SDK calls:

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';

const result = await generateText({
  model: openai('gpt-4'),
  prompt: 'Book a flight to NYC',
  experimental_telemetry: {
    isEnabled: true,
    metadata: {
      userId: 'user-123',
      sessionId: 'session-456',
    },
  },
});

Capturing User Feedback

import { signal, SignalKind, SignalSource } from 'kelet';

// Capture explicit user feedback
await signal({
  kind: SignalKind.FEEDBACK,
  source: SignalSource.HUMAN,
  sessionId: 'user-123-session',
  score: 0.0,  // User unhappy? Kelet analyzes why.
  value: 'Response was incorrect',
  triggerName: 'thumbs_down',
});

// Capture metric signals
await signal({
  kind: SignalKind.METRIC,
  source: SignalSource.SYNTHETIC,
  traceId: 'trace-abc-123',
  triggerName: 'accuracy',
  score: 0.85,
});

That's it. Kelet now runs 24/7 analyzing every trace, clustering failure patterns, and identifying root causes—work that would take weeks manually.

Session Grouping

Use agenticSession to group spans and signals under a session/user. All spans created and signal() calls made inside the callback automatically inherit the session context.

import { configure, agenticSession, signal, SignalKind, SignalSource } from 'kelet';

// configure() sets up the exporter + KeletSpanProcessor + provider
configure({
  apiKey: process.env.KELET_API_KEY,
  project: 'my-project',
});

// Group work under a session
await agenticSession({ sessionId: 'sess-123', userId: 'user-1' }, async () => {
  // All spans created here get gen_ai.conversation.id + user.id attributes
  // signal() auto-resolves sessionId from context — no need to pass it explicitly
  await signal({ kind: SignalKind.FEEDBACK, source: SignalSource.HUMAN, score: 1.0 });
});

configure() automatically wires a KeletSpanProcessor that stamps kelet.project, gen_ai.conversation.id, and user.id on every span. Inside agenticSession, signal() automatically picks up the sessionId — no need to pass it explicitly.

Using with Temporal

If your agents run on Temporal, register KeletPlugin on both the Client and Worker (the TS SDK doesn't auto-propagate plugins from client to worker — Python does):

import { KeletPlugin } from 'kelet/temporal';
import { Client } from '@temporalio/client';
import { Worker } from '@temporalio/worker';

const plugin = new KeletPlugin({ otelPluginOptions: { resource, spanProcessor } });
const client = new Client({ plugins: [plugin] });
const worker = await Worker.create({ /* ... */, plugins: [plugin] });

Session context flows through Temporal headers across workflows, child workflows, and activities. Peer deps: @temporalio/plugin, @temporalio/interceptors-opentelemetry.

📖 Full setup, options, plugin ordering: docs.kelet.ai/integrations/temporal

Easy Feedback UI for React

Building a React frontend? Use the Kelet Feedback UI component for instant implicit and explicit feedback collection. See the live demo and documentation for full integration guide.

What Gets Captured

Kelet is built on OpenTelemetry and supports multiple semantic conventions for AI/LLM observability:

| Semantic Convention | Supported Frameworks | |---------------------|----------------------| | GenAI Semantic Conventions | Pydantic AI, LiteLLM, Langfuse SDK | | Vercel AI SDK | Next.js, Vercel AI | | OpenInference | Arize Phoenix | | OpenLLMetry / Traceloop | LangChain, LangGraph, LlamaIndex, OpenAI SDK, Anthropic SDK |

Any framework that exports OpenTelemetry traces using the GenAI semantic conventions will work automatically.

Captured data includes:

LLM calls: Model, provider, tokens, latency, errors
Agent sessions: Multi-step interactions grouped by user session
Custom context: User IDs, session metadata, business-specific attributes

Works with any OpenTelemetry-compatible AI framework out of the box.

Reasoning Capture for Claude Agent SDK

Claude Code redacts reasoning (thinking) text in its native OTLP log records. Kelet's @anthropic-ai/claude-agent-sdk integration observes the SDK's message stream and emits kelet.reasoning log records with the full thinking content so Kelet's server can attach it back to the completion.

The integration is installed automatically by configure() when @anthropic-ai/claude-agent-sdk is resolvable. See docs/claude-agent-sdk.md for manual install, the contract fields (body, reasoning.text, reasoning.message_id, session.id), and Next.js / ESM caveats.

Reasoning Capture for Vercel AI SDK

The right setup depends on which version of ai you're on. configure() auto-detects and does the right thing in v6 / v7+; v4–v5 still need the legacy entry points.

| ai version | What you do | What you get | |---|---|---| | ^4.x, ^5.x | Use kelet/aisdk import OR --import kelet/reasoning/register | ai.response.reasoning span attribute | | ^6.0.74+ | Just call configure() — nothing else | ai.response.reasoning (emitted natively by AI SDK) | | ^7.0.0-beta+ | npm i @ai-sdk/otel, then configure() | Full OpenTelemetry GenAI semconv via @ai-sdk/otel |

v6.0.74+ (recommended)

import { configure } from 'kelet';
configure({ apiKey: process.env.KELET_API_KEY!, project: 'my-project' });
// generateText/streamText with experimental_telemetry: { isEnabled: true }
// now writes ai.response.reasoning natively. Nothing else needed.

v7-beta + `@ai-sdk/otel` (gen_ai semconv)

npm install ai@^7.0.0-beta @ai-sdk/otel

import { configure } from 'kelet';
configure({ apiKey, project: 'my-project' });
// configure() auto-registers `@ai-sdk/otel`'s OpenTelemetry integration.
// Every generateText/streamText emits gen_ai.input.messages,
// gen_ai.output.messages (with reasoning parts), gen_ai.system_instructions,
// gen_ai.provider.name, gen_ai.operation.name.

Pass injectAiSdkTelemetry: false to configure() to opt out of auto-registration (e.g., when you want a custom OpenTelemetry({tracer, enrichSpan})).

v4 / v5 (legacy paths)

Vercel AI SDK didn't emit reasoning text in telemetry until [email protected]. For consumers stuck on v4 or v5, two entry points are still shipped:

Loader hook (Node 18.19+):

node --import kelet/reasoning/register app.js
npx tsx --import kelet/reasoning/register app.ts

Drop-in (Bun, or when you can't pass --import):

// Replace the import:
import { generateText, streamText, wrapExporter } from 'kelet/aisdk';

Both add ai.response.reasoning to the AI SDK span. Plan to remove the legacy paths in a future major when v4/v5 reach end-of-life — until then they remain available.

Claude Agent SDK

When you use @anthropic-ai/claude-agent-sdk, Kelet automatically wires the bundled claude subprocess to send OTLP traces, logs, and metrics to your Kelet project — plus captures redacted thinking/reasoning blocks as kelet.reasoning log records.

Quick start (no code changes)

kelet.configure() populates the seven CLAUDE_CODE_* / OTEL_EXPORTER_OTLP_* env vars on process.env if they're not already set. The spawned claude subprocess inherits them automatically:

import { configure } from 'kelet';

configure({
  apiKey: process.env.KELET_API_KEY,
  project: 'my-project',
});

// Use the SDK normally — no other changes needed.
import { query } from '@anthropic-ai/claude-agent-sdk';
for await (const msg of query({ prompt: 'hello' })) { /* ... */ }

Note: Node ESM and Bun freeze module namespace bindings, so post-import patching cannot wrap destructured imports like import { query } from '@anthropic-ai/claude-agent-sdk'. Layer A (process.env injection) covers the common case anyway. If you also pass a custom options.env (which would otherwise wipe process.env from the spawned subprocess), use the loader or shim below. Both Layer A and Layer B emit the claude_code.sdk_query wrapper span (under scope kelet.claude_agent_sdk) and share an internal sentinel (WRAPPER_BRACKETED_MARKER) so the same query() call never gets bracketed twice when both install paths run in one process.

Loader (Node.js)

node --import kelet/claude-agent-sdk/register app.js
npx tsx --import kelet/claude-agent-sdk/register app.ts

The loader hooks @anthropic-ai/claude-agent-sdk exports at module-load time so destructured imports get the wrapped versions.

Drop-in shim (Bun and elsewhere)

Bun ignores --import loader hooks. Change one import:

// Before
import { query, ClaudeSDKClient } from '@anthropic-ai/claude-agent-sdk';

// After
import { query, ClaudeSDKClient } from 'kelet/claude-agent-sdk/shim';

Conflict handling

OTLP env vars are single-valued per signal — the SDK can only point at one backend at a time. If process.env already has OTEL_EXPORTER_OTLP_ENDPOINT (or any of the other six) set for a different backend (Sentry, Datadog, custom collector), Kelet does not override the value. CC telemetry will continue to route to that backend, and a one-shot WARNING is logged.

To route CC telemetry to Kelet anyway:

Unset the conflicting env vars before calling configure(), or
Pass per-call options.env to ClaudeAgentOptions — the loader/shim merge Kelet's keys in set-if-missing, so user keys still win.

Opt out

Pass injectCcTelemetry: false to disable env injection entirely:

configure({
  apiKey: process.env.KELET_API_KEY,
  project: 'my-project',
  injectCcTelemetry: false,
});

If you opt out and don't set CLAUDE_CODE_ENABLE_TELEMETRY=1 yourself, the SDK emits a one-shot informational warning so you don't lose CC telemetry by accident.

Reasoning capture (optional)

kelet.reasoning log records are emitted via @opentelemetry/sdk-logs + @opentelemetry/exporter-logs-otlp-http. They're optional peer deps; install them if you want reasoning capture:

npm install @opentelemetry/sdk-logs @opentelemetry/exporter-logs-otlp-http @opentelemetry/api-logs

Without these, env injection still works — only thinking/reasoning capture is disabled.

Configuration

Set via environment variables:

export KELET_API_KEY=your_api_key    # Required
export KELET_PROJECT=production      # Required — create a project at console.kelet.ai
export KELET_API_URL=https://...     # Optional, defaults to api.kelet.ai

Or pass directly to the exporter:

import { KeletExporter } from 'kelet';

const exporter = new KeletExporter({
  apiKey: 'your_api_key',
  project: 'production',
  apiUrl: 'https://custom.api.kelet.ai',  // Optional
});

API Reference

KeletExporter

OpenTelemetry trace exporter for sending traces to Kelet.

import { KeletExporter } from 'kelet';
import { NodeSDK } from '@opentelemetry/sdk-node';

const exporter = new KeletExporter({
  apiKey?: string,     // KELET_API_KEY env var if not provided
  project?: string,    // KELET_PROJECT env var if not set here — required
  apiUrl?: string,     // defaults to "https://api.kelet.ai"
});

const sdk = new NodeSDK({ traceExporter: exporter });
sdk.start();

signal()

Capture signals for AI responses. Inside agenticSession(), sessionId and traceId are resolved automatically from context.

import { signal, SignalKind, SignalSource } from 'kelet';

await signal({
  kind: SignalKind.FEEDBACK,       // feedback | edit | event | metric | arbitrary
  source: SignalSource.HUMAN,      // human | label | synthetic
  sessionId?: string,              // Auto-resolved from agenticSession, or pass explicitly
  traceId?: string,                // Auto-resolved from active span, or pass explicitly
  triggerName?: string,            // e.g., "thumbs_down", "user_copy"
  score?: number,                  // 0.0 to 1.0
  value?: string,                  // Text content
  confidence?: number,             // 0.0 to 1.0
  metadata?: Record<string, unknown>,  // Additional metadata
  timestamp?: Date | string,       // Event timestamp
  raiseOnFailure?: boolean,        // Re-raise transport/HTTP failures (default: false)
});

Transport and HTTP failures are retried with exponential backoff, then logged via console.warn and swallowed by default — signal() is a telemetry call and won't crash your code path. Pass raiseOnFailure: true to opt into re-raising after retries are exhausted. Validation errors (bad score/confidence ranges, missing identifier) always throw regardless.

agenticSession()

Group spans and signals under a session/user context.

import { agenticSession } from 'kelet';

await agenticSession({
  sessionId: string,                                    // Required: session identifier
  userId?: string,                                      // Optional: user identifier
  metadata?: Record<string, string | number | boolean>, // Optional: stamped as metadata.{key} on all spans
}, async () => {
  // All spans and signals inside inherit session context
});

KeletSpanProcessor

SpanProcessor that stamps kelet.project, session, and user attributes on every span. Used automatically by configure() — only needed for manual OTEL setups.

import { KeletSpanProcessor } from 'kelet';
import { SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';

const processor = new KeletSpanProcessor(
  new SimpleSpanProcessor(exporter),
  { project: 'my-project' }
);

Context Helpers

import { getSessionId, getUserId, getTraceId } from 'kelet';

getSessionId()  // Current session ID from agenticSession, or undefined
getUserId()     // Current user ID from agenticSession, or undefined
getMetadata()   // Current metadata record from agenticSession, or undefined
getTraceId()    // Current trace ID from active OpenTelemetry span, or undefined

configure()

Configure the SDK and set up the OTEL tracing pipeline. Creates an exporter, KeletSpanProcessor, and TracerProvider automatically.

import { configure } from 'kelet';

configure({
  apiKey?: string,              // KELET_API_KEY env var if not provided
  project?: string,             // KELET_PROJECT env var if not set here — required
  apiUrl?: string,              // defaults to "https://api.kelet.ai"
  tracerProvider?: BasicTracerProvider,  // Optional: use existing provider
  spanProcessor?: SpanProcessor,         // Optional: use this instead of default KeletSpanProcessor
});

shutdown()

Flush any pending spans and release SDK resources. Called automatically on beforeExit (natural event-loop drain). Call it manually from your own signal handlers or before a hard process.exit(N).

import { shutdown } from 'kelet';

// Flush on SIGINT/SIGTERM from your own handler:
process.on('SIGTERM', async () => {
  await shutdown();
  process.exit(143);
});

The SDK intentionally does not install SIGINT/SIGTERM handlers — attaching a listener suppresses Node's default exit-on-signal, and calling process.exit() from a library would override your app's graceful-shutdown logic. Errors from individual processors are logged and swallowed (best-effort).

Types

// Signal kind enum
const SignalKind = {
  FEEDBACK: 'feedback',   // User feedback (ratings, thumbs)
  EDIT: 'edit',            // User edited AI output
  EVENT: 'event',          // System/app event
  METRIC: 'metric',        // Numeric measurement
  ARBITRARY: 'arbitrary',  // Custom signal
} as const;

// Signal source enum
const SignalSource = {
  HUMAN: 'human',          // From a human user
  LABEL: 'label',          // From labeling process
  SYNTHETIC: 'synthetic',  // Synthetically generated
} as const;

Production-Ready

The SDK never disrupts your application:

Async: Telemetry exports in background, zero blocking
Fail-safe: Network errors handled with retries and exponential backoff
Safe on missing credentials: If KELET_API_KEY or KELET_PROJECT can't be resolved, configure() logs a single warning and installs a no-op — signal() becomes a silent no-op while agenticSession() still runs the callback with context but no spans are exported. Pass strict: true to configure() to fail-fast on CI / staging.
Graceful: If Kelet is down, your agent keeps running
Standard: Built on OpenTelemetry, works with any OTEL-compatible setup

Alpha Status

Kelet is in alpha. What this means:

It works: Already analyzing thousands of production traces for early users
Not perfect: Won't catch every failure pattern yet, sometimes needs guidance
Improving fast: The AI learns from more production data every day
We need feedback: Help us make it better—tell us what it catches and what it misses

Even in alpha, Kelet does analysis that would take your team weeks to do manually.

The alternative? Manually analyzing 15-25 minutes per trace, across hundreds of failures, trying to spot patterns by hand. Most teams just don't do it—and ship broken agents.

Learn More

Website: kelet.ai
Early Access: We're onboarding teams with production AI agents
Support: GitHub Issues

Built for teams shipping mission-critical AI agents.