@hydrotik/air

v0.1.1

Published

8 days ago

Real-time AI observability — monitor context windows, tool calls, token costs, and compaction for coding agents.

Downloads

213

0High
0Medium
0Low

djdonovanbk

ai observability rum telemetry mcp rag llm coding-agent context-window token-usage claude codex pi dashboard drift-detection cost-tracking

⚡ AIr

Real-time AI observability for coding agents.

AIr monitors your AI coding sessions — context windows, tool calls, token costs, latency, output quality, prompt effectiveness, and drift detection — streaming everything to a live dashboard with built-in data redaction.

Install

npm install @hydrotik/air

This gives you:

air CLI — starts the server + dashboard (single process)
SDK — instrument MCP servers, RAG pipelines, and custom tools
Event types — TypeScript definitions for all telemetry events

🔌 Integrate Your Own Services

Want to wire your RAG pipeline, MCP server, or custom tools into AIr?
📖 Integration Guide — everything you need:
Copy-paste prompt for AI coding agents (Claude Code, Codex, ChatGPT) to auto-instrument your services
HTTP API reference — just POST JSON, no SDK required
Language examples — JavaScript, Python, Go, and cURL
Step-by-step walkthrough — config file → verify → instrument → dashboard
Most integrations take under 5 minutes. The only required field is source.

Quick Start

1. Start the AIr server + dashboard

Standalone (installed via npm):

npx @hydrotik/air                    # default port 5200
npx @hydrotik/air --port 8080        # custom port

Monorepo (development):

pnpm turbo run dev --filter=@hydrotik/air

Background / detached (cross-platform — macOS, Linux, Windows):

# Start as a detached background process (survives terminal closure)
pnpm --filter @hydrotik/air start:detached

# Custom port
pnpm --filter @hydrotik/air start:detached -- --port 8080

# Stop the background server
pnpm --filter @hydrotik/air stop

⚠ Troubleshooting: Server won't start from AI agents / CI tools
AI coding agents (pi, Cursor, VS Code tasks) run commands in a process group that gets cleaned up when the shell exits. A simple node cli.js & will die when the parent tool finishes. Use start:detached instead — it spawns the server as a fully detached process with child.unref(), writes a PID file to $TMPDIR/air-server.pid, and exits cleanly. Works on all platforms.

Opens:

Dashboard → http://localhost:5200 (production) or http://localhost:5201 (dev)
API → http://localhost:5200/api/health

2. Connect your coding agent

AIr works with multiple AI coding agents. Pick your setup:

Pi

cd .pi/extensions/ai-rum-collector && npm install

Then /reload in pi. The extension auto-discovers and streams tool calls, turns, token usage, context breakdown, and compaction events via WebSocket. See Pi collector docs.

Claude Code

One command installs the hooks:

npx air-install-claude-code

This copies two hook scripts into .claude/hooks/ and wires them into .claude/settings.json:

.claude/hooks/
├── air-session-start.js     ← SessionStart hook
└── air-post-tool-use.js     ← PostToolUse hook

What happens automatically:

When Claude Code starts a session, air-session-start.js checks if the AIr server is running and starts it if needed
Every tool call triggers air-post-tool-use.js, which POSTs the event to AIr via HTTP
Session IDs are persisted to a temp file so events from the same session are correlated

Manual install (if you prefer not to use the installer):

Copy session-start.js and post-tool-use.js from src/collectors/claude-code/ into .claude/hooks/
Add to .claude/settings.json:

{
  "hooks": {
    "SessionStart": [{
      "hooks": [{
        "type": "command",
        "command": "node .claude/hooks/air-session-start.js"
      }]
    }],
    "PostToolUse": [{
      "hooks": [{
        "type": "command",
        "command": "node .claude/hooks/air-post-tool-use.js"
      }]
    }]
  }
}

Configuration:

| Variable | Default | Description | |----------|---------|-------------| | AIR_URL | http://localhost:5200 | AIr server HTTP endpoint | | AIR_PORT | 5200 | Server port (used by auto-start) | | AIR_ENABLED | true | Set to "false" to disable collection | | AIR_AUTOSTART | true | Set to "false" to skip auto-starting the server |

What it collects vs Pi:

| Feature | Pi | Claude Code | |---------|-----|-------------| | Tool call events | ✅ With precise timing | ✅ No duration (PostToolUse only) | | Token usage per turn | ✅ From API response | ❌ Not exposed in hooks | | Context breakdown | ✅ Full treemap | ⚠️ Total % only (via statusline bridge) | | Compaction events | ✅ Direct hook | ❌ Not exposed | | Connection | WebSocket (persistent) | HTTP POST (per-event) |

See Claude Code collector docs.

Codex CLI

Run the watcher alongside Codex — it tails session files in real time:

# Terminal 1: Start the AIr server
npx air

# Terminal 2: Start the Codex watcher
npx air-codex-watcher

The watcher monitors ~/.codex/sessions/ for new and updated .jsonl files, maps Codex events to AIr telemetry, and POSTs them to the server.

Options:

# Watch all sessions (live + new)
npx air-codex-watcher

# Watch a specific session ID
npx air-codex-watcher --session 019c7e7f

# Replay a past session into AIr (backfill)
npx air-codex-watcher --replay ~/.codex/sessions/2026/02/20/rollout-2026-02-20T23-39-53-019c7e7f.jsonl

What it collects:

| Codex Event | AIr Event | Data | |-------------|-----------|------| | session_meta | session_start | session_id, model, cwd | | event_msg:task_started | turn_start | turn index | | event_msg:task_complete | turn_end | tool call count | | event_msg:token_count | token_usage | input/output tokens | | response_item:function_call | tool_call_start | tool name, call_id, input preview | | response_item:function_call_output | tool_call_end | call_id, output, duration, errors | | response_item:custom_tool_call | tool_call_start/end | apply_patch, etc. | | compacted | compaction | summary length |

Configuration:

| Variable | Default | Description | |----------|---------|-------------| | AIR_URL | http://localhost:5200 | AIr server HTTP endpoint | | AIR_ENABLED | true | Set to "false" to disable | | CODEX_HOME | ~/.codex | Codex home directory |

See Codex collector docs.

Any Agent (SDK)

import { AirClient } from '@hydrotik/air/sdk';

const air = new AirClient({ url: 'ws://localhost:5200/ws/collector' });
air.trace('my_tool', { input: 'data' }, async () => doWork());

3. Open the dashboard

That's it. Every tool call, turn, and context change streams to the dashboard in real-time.

Sessions from different agents are labeled — the dashboard shows whether each session came from Pi, Claude Code, or Codex.

What You See

KPI Cards

Total tokens in context window, session cost, tool call count, turn count, compactions, and context utilization percentage — all updating live. Context % changes color dynamically: pink (<80%), yellow/amber (≥80%), red (≥90%).

Context Window Treemap

D3 treemap showing what fills your context — system prompt, user messages, assistant responses, tool results, thinking blocks. Like a webpack bundle analyzer for your LLM context. Hover any segment for a tooltip with token count and percentage.

Context Utilization Over Time

Area chart tracking context window fill percentage with 80%/95% warning thresholds. Know when you're approaching compaction. Charts fill their full panel height via flex layout.

Token Flow

Cache read / output / input tokens per turn as stacked area chart with gradient fills. See cache efficiency, cost drivers, and compaction sawtooth patterns. Auto-scales Y-axis, hidden X-axis labels for maximum data density.

Tool Call Waterfall

DevTools-style timeline of tool executions with durations. Spot slow reads, long builds, and error patterns.

Live Event Feed

Scrolling log of all telemetry events — color-coded by type, newest-first, with inline summaries.

Data Security & Redaction

AIr is designed to store metadata only — sizes, durations, counts, rates — not your prompts, code, or conversations.

Redaction Levels

Set via --redaction flag or AIR_REDACTION_LEVEL env var:

| Level | What's stored | Use case | |-------|--------------|----------| | preview (default) | Content truncated to 50 chars, sensitive patterns scrubbed | Production — safe observability | | full | ALL content fields stripped, only numeric metadata remains | Strict compliance environments | | none | Everything stored as-is | Local development only |

npx air --redaction full    # maximum privacy
npx air --redaction preview # balanced (default)
npx air --redaction none    # development only ⚠️

What Gets Scrubbed

At preview and full levels, the server automatically detects and redacts:

API keys and tokens (sk-..., Bearer ..., AKIA...)
JWT tokens
Email addresses
Private key blocks
Database connection strings with credentials
.env-style KEY=VALUE patterns

Security Principles

No raw prompts stored — prompt tracking uses one-way SHA-256 hashes
No code content stored — tool I/O stores byte sizes, not actual content
Redaction at ingestion — data is sanitized before it hits SQLite
Local-only by default — server binds to localhost, no external network calls
No telemetry about telemetry — AIr itself sends nothing to external services

Latency Monitoring

AIr tracks timing at multiple granularities:

Automatic (from collectors)

Turn latency — time from user message to final response (Pi collector)
Tool call duration — per-tool execution time with waterfall visualization
API call latency — MCP and RAG operation timing (SDK)

Manual (via SDK)

import { AirClient } from '@hydrotik/air/sdk';
const air = new AirClient({ sessionId: 'my-session' });

// Auto-measure an operation
const result = await air.measureLatency('api_call', async () => {
  return await fetch('https://api.example.com/data');
}, { model: 'gpt-4o' });

// Record with phase breakdown
air.recordLatency('turn', 1500, {
  ttftMs: 200,
  phases: [
    { name: 'thinking', durationMs: 800 },
    { name: 'tool_execution', durationMs: 500 },
    { name: 'response_generation', durationMs: 200 },
  ],
});

API

| Endpoint | Description | |----------|-------------| | GET /api/sessions/:id/latency | Latency stats by operation (avg/min/max) | | GET /api/sessions/:id/latency/timeseries?operation=turn | Time series |

Cost Monitoring

AIr tracks costs automatically using a built-in pricing table for common models (Claude, GPT-4o, Gemini, Codex, etc.) and supports manual cost recording.

Automatic Cost Calculation

When a token_usage event arrives with zero cost, AIr auto-computes it from the model pricing table. The Pi collector emits cost events automatically.

Budget Alerts

const air = new AirClient({
  sessionId: 'my-session',
  budgetLimit: 5.00, // Alert when session cost exceeds $5
});

When cumulative cost crosses the budget limit, a cost event with budgetExceeded: true is emitted and stored.

Built-in Model Pricing

Prices per 1M tokens (USD). Override via custom events if your pricing differs.

| Model | Input | Output | Cache Read | |-------|-------|--------|------------| | claude-4-sonnet | $3.00 | $15.00 | $0.30 | | claude-4-opus | $15.00 | $75.00 | $1.50 | | gpt-4o | $2.50 | $10.00 | $1.25 | | gpt-4.1 | $2.00 | $8.00 | $0.50 | | gpt-4.1-mini | $0.40 | $1.60 | $0.10 | | gemini-2.5-pro | $1.25 | $10.00 | — |

API

| Endpoint | Description | |----------|-------------| | GET /api/sessions/:id/cost | Cost breakdown by model | | GET /api/sessions/:id/cost/timeseries | Cumulative cost over time |

Output Evaluation

AIr tracks quality signals for every LLM turn — no content stored, only metrics:

Tool success rate — what % of tool calls succeeded
Response token count — verbosity tracking
Cache hit rate — context efficiency
Retry detection — did the turn need a correction?
Response latency — time from prompt to response
User rating — optional 1-5 star rating via SDK

Automatic (Pi collector)

The Pi collector emits output_eval events automatically on every turn with tool success rate, cache hit rate, response latency, and token counts.

Manual (SDK)

air.recordOutputEval(3, 'claude-4-sonnet', 'anthropic', {
  responseTokens: 450,
  toolCallCount: 5,
  toolErrorCount: 1,
  responseLatencyMs: 3200,
  cacheHitRate: 0.65,
}, { userRating: 4, tags: ['accurate', 'concise'] });

API

| Endpoint | Description | |----------|-------------| | GET /api/sessions/:id/evals | Aggregate quality metrics by model | | GET /api/sessions/:id/evals/timeseries | Quality signals over time |

Prompt Rating & Comparison

Track which prompt variants work best — without storing prompt content.

How It Works

Prompts are identified by a SHA-256 hash (first 16 chars) — the raw text never leaves your machine
Each prompt gets a variant label (baseline, v2-concise, v3-cot, etc.)
After a task completes, record effectiveness metrics
Query the API to compare variants by goal achievement, cost, latency, and error rates

SDK Usage

import { AirClient, hashPrompt } from '@hydrotik/air/sdk';
const air = new AirClient({ sessionId: 'my-session' });

// Rate after a successful interaction
air.ratePrompt('v2-concise', 'system', systemPromptText, {
  goalAchieved: true,
  turnsToComplete: 3,
  totalTokens: 5000,
  totalCost: 0.02,
  totalLatencyMs: 15000,
  toolErrorRate: 0,
  requiredCompaction: false,
}, 4); // 4/5 stars

// Compare variants via API
// GET /api/prompts → all variants ranked by goal rate
// GET /api/prompts?hash=abc123 → variants for specific prompt

API

| Endpoint | Description | |----------|-------------| | GET /api/prompts | All prompt variants ranked by effectiveness | | GET /api/prompts?hash=<hash> | Compare variants of a specific prompt | | GET /api/prompts/:variant | All ratings for a specific variant |

Drift Detection

AIr automatically detects when model behavior changes — latency spikes, cost increases, error rate jumps, or token usage shifts.

How It Works

The server maintains rolling baselines for key metrics (window of 50 samples)
When a new value deviates beyond the threshold, a drift event is emitted
Drift events are stored and queryable for post-mortem analysis

Monitored Metrics

| Metric | What it tracks | |--------|---------------| | latency | Tool call and turn duration | | cost | Per-turn cost | | token_usage | Total tokens per turn | | output_tokens | Response verbosity | | error_rate | Tool failure rate | | cache_hit_rate | Context cache efficiency |

Severity Thresholds

| Severity | Default Deviation | |----------|------------------| | info | ≥25% from baseline | | warning | ≥50% from baseline | | critical | ≥100% from baseline |

API

| Endpoint | Description | |----------|-------------| | GET /api/drift | Recent drift events (all sessions) | | GET /api/drift?session=<id> | Drift events for a session | | GET /api/drift/summary | Drift counts by metric and severity |

Bring Your Own RAG

🚀 Quick integration? See the Integration Guide for step-by-step instructions and a prompt you can paste into any AI coding agent to auto-instrument your services.

AIr supports any RAG system through three integration paths — from zero-code config to full SDK instrumentation.

1. Configuration File (`.air.json`)

Drop a .air.json in your project root to register your RAG providers. The dashboard shows them immediately — even before data flows.

{
  "providers": {
    "rag": [
      {
        "name": "product-search",
        "type": "qdrant",
        "description": "Product catalog vector search",
        "embeddingModel": "text-embedding-3-small",
        "dimensions": 1536
      },
      {
        "name": "docs-kb",
        "type": "pinecone",
        "description": "Documentation knowledge base"
      }
    ],
    "mcp": [
      { "name": "design-mcp", "description": "Design system MCP server" }
    ]
  },
  "redaction": "preview",
  "budgetLimit": 10.00
}

The server reads this config on startup and registers providers in the dashboard's Integrations panel with status indicators (active/inactive/never seen).

2. HTTP API (Language-Agnostic)

Your RAG system — Python, Go, Rust, whatever — POSTs simple JSON to dedicated endpoints. No SDK needed, no full event schema required. Only source is mandatory.

Log a retrieval:

curl -X POST http://localhost:5200/api/rag/retrieval \
  -H 'Content-Type: application/json' \
  -d '{
    "source": "product-search",
    "query": "red running shoes",
    "resultCount": 10,
    "topScore": 0.92,
    "durationMs": 45
  }'

Log an embedding:

curl -X POST http://localhost:5200/api/rag/embedding \
  -H 'Content-Type: application/json' \
  -d '{
    "source": "product-search",
    "model": "text-embedding-3-small",
    "inputTokens": 150,
    "durationMs": 12,
    "dimensions": 1536
  }'

Log an indexing operation:

curl -X POST http://localhost:5200/api/rag/index \
  -H 'Content-Type: application/json' \
  -d '{
    "source": "docs-kb",
    "documentCount": 500,
    "totalTokens": 250000,
    "durationMs": 3200
  }'

Register a provider at runtime:

curl -X POST http://localhost:5200/api/providers/rag \
  -H 'Content-Type: application/json' \
  -d '{ "name": "my-rag", "type": "custom", "description": "My RAG pipeline" }'

The server auto-fills defaults from .air.json config (e.g., embedding model, dimensions) and applies redaction before storage.

3. SDK (TypeScript)

For TypeScript/Node.js RAG systems, use the SDK for automatic tracing with async/await wrappers.

Config-driven (reads .air.json):

import { createRagTracersFromConfig } from '@hydrotik/air/sdk';

// Creates a tracer for each provider in .air.json
const rag = createRagTracersFromConfig({ sessionId: 'my-session' });

// Use by provider name
const results = await rag['product-search'].traceRetrieval('red shoes', async () => {
  return await qdrant.search({ vector, limit: 10 });
}, {
  extractResults: (r) => ({
    count: r.length,
    topScore: r[0]?.score,
    chunkSizes: r.map(doc => doc.tokenCount),
  }),
});

Manual setup:

import { createRagTracer } from '@hydrotik/air/sdk';

const rag = createRagTracer('product-search', {
  sessionId: 'my-session',
  defaultEmbeddingModel: 'text-embedding-3-small',
  defaultDimensions: 1536,
});

await rag.traceRetrieval('query', fetchFn);
await rag.traceEmbedding('text-embedding-3-small', 150, embedFn);
await rag.traceIndex(500, 250000, indexFn);

What Shows Up in the Dashboard

| Panel | What it shows | |-------|--------------| | Integrations | All registered providers with status (active/idle/never seen), type, event count, last seen | | RAG Pipeline | Stats table: source, type, call count, avg latency, avg results, relevance scores, token volumes | | Live Event Feed | Real-time stream of rag_retrieval, rag_embedding, rag_index events | | Drift Detection | Alerts when RAG latency, error rate, or result quality shifts from baseline |

Supported Vector DB Types

The type field in .air.json is for display only — AIr works with any backend:

| Type | Icon | |------|------| | pinecone | 🌲 | | qdrant | 🔷 | | weaviate | 🕸 | | chroma | 🎨 | | pgvector | 🐘 | | milvus | 🔬 | | custom | ⚙️ |

Provider API

| Endpoint | Description | |----------|-------------| | GET /api/providers | All registered RAG + MCP providers with status | | GET /api/providers/rag | RAG providers only | | POST /api/providers/rag | Register a new RAG provider at runtime | | POST /api/rag/retrieval | Log a retrieval (simplified — only source required) | | POST /api/rag/embedding | Log an embedding generation | | POST /api/rag/index | Log a document indexing operation |

Configuration

Configuration File

Create .air.json or air.config.json in your project root. AIr searches up to 5 parent directories.

{
  "providers": {
    "rag": [{ "name": "my-rag", "type": "qdrant" }],
    "mcp": [{ "name": "my-mcp" }]
  },
  "redaction": "preview",
  "budgetLimit": 10.00,
  "port": 5200
}

Environment Variables

| Variable | Default | Description | |----------|---------|-------------| | AIR_URL | ws://localhost:5200/ws/collector | AIr server WebSocket endpoint | | AIR_ENABLED | true | Set to "false" to disable collection | | AIR_REDACTION_LEVEL | preview | Data redaction: none, preview, full |

Database

Telemetry persists to SQLite at ~/.hydrotik/air/telemetry.db (WAL mode). Delete the file to reset.

Ports

| Port | Service | |------|---------| | 5200 | AIr server (Fastify + WebSocket + REST API) | | 5201 | AIr dashboard (Vite dev server, proxies to 5200) |

Ports are configured in @hydrotik/config (packages/hy-config/src/ports.ts).

SDK — Instrument MCP, RAG, and Custom Tools

AIr ships a lightweight SDK for instrumenting anything that talks to an LLM — MCP servers, RAG pipelines, custom tools, or your own agent framework.

MCP Instrumentation

Wrap an MCP client to auto-instrument all callTool, readResource, and getPrompt calls:

import { Client } from '@modelcontextprotocol/sdk/client';
import { instrumentMcp } from '@hydrotik/air/sdk';

const client = new Client({ name: 'my-app', version: '1.0' });

// Proxy wraps all MCP methods with telemetry
const instrumented = instrumentMcp(client, 'my-mcp-server', {
  sessionId: 'my-session',
});

// All calls are now auto-traced in the AIr dashboard
const result = await instrumented.callTool('search', { query: 'hello' });

Or use the manual tracer for more control:

import { createMcpTracer } from '@hydrotik/air/sdk';

const mcp = createMcpTracer('my-server');
const result = await mcp.traceToolCall('search', { query: 'hello' }, async () => {
  return await myMcpClient.callTool('search', { query: 'hello' });
});

RAG Instrumentation

Instrument vector DB queries, embedding generation, and document indexing:

import { createRagTracer } from '@hydrotik/air/sdk';

const rag = createRagTracer('pinecone', { sessionId: 'my-session' });

// Trace a retrieval with result extraction
const results = await rag.traceRetrieval('search query', async () => {
  return await pinecone.query({ vector, topK: 5 });
}, {
  extractResults: (r) => ({
    count: r.matches.length,
    topScore: r.matches[0]?.score,
    chunkSizes: r.matches.map(m => m.metadata.tokenCount),
  }),
});

// Trace embedding generation
const embedding = await rag.traceEmbedding('text-embedding-3-small', 150, async () => {
  return await openai.embeddings.create({ model: 'text-embedding-3-small', input: text });
}, { dimensions: 1536 });

// Trace document indexing
await rag.traceIndex(100, 50000, async () => {
  return await pinecone.upsert(vectors);
});

Generic Client — Any Custom Tool

For anything not covered by MCP or RAG helpers, use the base AirClient:

import { AirClient } from '@hydrotik/air/sdk';

const air = new AirClient({
  sessionId: 'my-session',
  provider: 'my-custom-tool',
});

// Trace any async operation
const result = await air.trace('database_query', { table: 'users', filter: 'active' }, async () => {
  return await db.query('SELECT * FROM users WHERE active = true');
});

// Or emit raw events
air.emit({
  type: 'custom',
  provider: 'my-tool',
  eventName: 'cache_hit',
  data: { key: 'user:123', ttl: 300 },
});

Dashboard Auto-Detection

MCP and RAG panels appear automatically in the dashboard when data from those providers flows in. No configuration needed — the dashboard detects event types and renders the appropriate panels:

MCP Servers — table of server/method/tool stats with call counts, avg/min/max latency, error rates
RAG Pipeline — table of source/type stats with retrieval counts, avg relevance scores, token volumes

REST API

All endpoints return JSON. All ingested data is subject to server-side redaction.

| Endpoint | Description | |----------|-------------| | Server | | | GET /api/health | Server uptime, connected clients, redaction level | | GET /api/config | Server configuration and enabled features | | Sessions | | | GET /api/sessions | List all sessions with summary stats | | GET /api/sessions/:id | Single session summary | | GET /api/sessions/:id/events | All events for a session | | GET /api/events/recent | Recent events across all sessions | | Tools | | | GET /api/sessions/:id/tool-calls | Tool call records with timing | | GET /api/sessions/:id/tool-stats | Per-tool aggregate stats (count, avg/min/max ms, errors) | | Context | | | GET /api/sessions/:id/context | Context utilization snapshots over time | | GET /api/sessions/:id/context/latest | Latest context breakdown with segments | | Latency | | | GET /api/sessions/:id/latency | Latency stats by operation | | GET /api/sessions/:id/latency/timeseries | Latency time series (optional ?operation=) | | Cost | | | GET /api/sessions/:id/cost | Cost breakdown by model | | GET /api/sessions/:id/cost/timeseries | Cumulative cost over time | | Quality | | | GET /api/sessions/:id/evals | Output evaluation stats by model | | GET /api/sessions/:id/evals/timeseries | Quality signals over time | | Prompts | | | GET /api/prompts | All prompt variants ranked by effectiveness | | GET /api/prompts?hash=<hash> | Compare variants of a specific prompt | | GET /api/prompts/:variant | All ratings for a variant | | Drift | | | GET /api/drift | Recent drift events (optional ?session=) | | GET /api/drift/summary | Drift counts by metric and severity | | Integrations | | | GET /api/sessions/:id/mcp-stats | MCP call stats grouped by server/method/tool | | GET /api/sessions/:id/rag-stats | RAG stats grouped by source/type | | GET /api/sessions/:id/providers | Event type summary for all providers | | GET /api/providers | Registered RAG + MCP providers with status | | GET /api/providers/rag | RAG providers only | | POST /api/providers/rag | Register a new RAG provider at runtime | | RAG Ingest (simplified) | | | POST /api/rag/retrieval | Log a retrieval — only source required | | POST /api/rag/embedding | Log an embedding — only source required | | POST /api/rag/index | Log an indexing op — only source required | | Ingestion (full events) | | | POST /api/ingest | Ingest a single event (redacted before storage) | | POST /api/ingest/batch | Ingest multiple events at once |

Architecture

 ┌──────────────┐   WebSocket    ┌──────────────┐   WebSocket    ┌──────────────┐
 │  Pi Agent    │ ─────────────→ │              │ ─────────────→ │              │
 │  + Extension │  /ws/collector │              │  /ws/dashboard │              │
 └──────────────┘                │              │                │              │
 ┌──────────────┐   HTTP POST    │  AIr Server  │                │  Dashboard   │
 │  Claude Code │ ─────────────→ │  (Fastify)   │                │  (React+D3)  │
 │  + Hooks     │  /api/ingest   │              │                │              │
 └──────────────┘                │              │                │              │
 ┌──────────────┐   HTTP POST    │              │                │              │
 │  Codex CLI   │ ─────────────→ │              │                │              │
 │  + Watcher   │  /api/ingest   └──────┬───────┘                └──────────────┘
 └──────────────┘                       │
                                   SQLite DB
                                 ~/.hydrotik/air/
                                  telemetry.db

Ingestion Paths

AIr accepts telemetry via two protocols:

WebSocket (/ws/collector) — persistent connection for long-lived processes (Pi extension, SDK clients)
HTTP POST (/api/ingest, /api/ingest/batch) — fire-and-forget for short-lived processes (Claude Code hooks, Codex watcher)

Both paths go through the same TelemetryStore.ingestEvent() — same DB, same broadcast to dashboard clients.

Pi Collector (WebSocket)

Hooks into pi's event system via ExtensionAPI:

tool_execution_start / tool_execution_end — tool call timing and I/O sizes
turn_start / turn_end — LLM roundtrip tracking + token usage from response
agent_start / agent_end — session lifecycle
session_compact — compaction events
model_select — model changes
ctx.getContextUsage() — real token count from pi
ctx.sessionManager.getBranch() — context breakdown by message category
ctx.getSystemPrompt() — system prompt size

The collector is silent and non-blocking — if the AIr server isn't running, events are dropped without disrupting pi. Reconnects automatically every 5 seconds.

Claude Code Collector (HTTP)

Two Node.js hook scripts that run as short-lived processes on each Claude Code event:

SessionStart — auto-starts AIr server, emits session_start, persists session ID to temp file
PostToolUse — emits tool_call_start + tool_call_end with correlated IDs, reads context metrics from statusline bridge

Hooks use HTTP POST because they're ephemeral processes — no time to establish a WebSocket.

Codex CLI Collector (HTTP)

A long-running watcher that tails ~/.codex/sessions/*.jsonl:

Watches for file changes via fs.watch (falls back to 2s polling)
Tracks byte offsets per file to avoid re-processing
Maps Codex JSONL entries (session_meta, function_call, task_started, etc.) to AIr events
Extracts wall-time duration from Codex output metadata
Supports --replay mode for backfilling historical sessions

Server (Fastify + SQLite)

Ingests events via WebSocket, persists to SQLite with WAL mode
Broadcasts events to connected dashboard clients in real-time
Serves REST API for historical queries
Four tables: sessions, events, tool_calls, context_snapshots

Dashboard (React + Vite)

Connects to server via WebSocket for live updates
Falls back to REST API for historical data on session switch
D3.js for treemap and waterfall visualizations
Recharts for time-series charts
Styled with @hydrotik/tokens via vanilla-extract (dark theme)

Package Exports

@hydrotik/air          → Event types, MODEL_PRICING, computeCost, DriftDetector, redaction utils
@hydrotik/air/sdk      → AirClient, hashPrompt, instrumentMcp, createRagTracer
@hydrotik/air/server   → createServer() for programmatic use

bin:

air → starts server + serves built dashboard on a single port
air-install-claude-code → installs Claude Code hooks into .claude/
air-codex-watcher → tails Codex session files and streams to AIr

What Ships in the Package

dist/
├── server/          ← Fastify server + CLI (ESM)
│   ├── cli.js       ← npx entry point
│   └── index.js     ← createServer() export
├── sdk/             ← SDK for instrumentation (ESM + CJS + DTS)
│   ├── index.js
│   ├── index.cjs
│   └── index.d.ts
├── shared/          ← Event type definitions (ESM + CJS + DTS)
│   ├── index.js
│   ├── index.cjs
│   └── index.d.ts
└── dashboard/       ← Pre-built React SPA
    ├── index.html
    └── assets/      ← JS + CSS bundles (~650KB gzip: ~190KB)

Runtime dependencies: fastify, better-sqlite3, ws, @fastify/cors, @fastify/static, @fastify/websocket

Dashboard (React, D3, Recharts, vanilla-extract) is pre-built at publish time — zero React dependency at runtime.

Tech Stack

| Layer | Technology | |-------|-----------| | Server | Fastify 5, better-sqlite3, @fastify/websocket, data redaction, drift detection | | Dashboard | React 19, Vite 6, D3.js 7, Recharts 2 (pre-built) | | Styling | vanilla-extract, @hydrotik/tokens (compiled to CSS) | | Collectors | Pi ExtensionAPI (WebSocket), Claude Code hooks (HTTP), Codex watcher (HTTP) | | SDK | WebSocket (ws), crypto (prompt hashing), zero external deps | | Storage | SQLite 3 (WAL mode), ~/.hydrotik/air/telemetry.db | | Security | 3-level content redaction, SHA-256 prompt hashing, sensitive pattern scrubbing |

Troubleshooting

Dashboard shows "Reconnecting…" The AIr server isn't running. Start it with npx @hydrotik/air or pnpm turbo run dev --filter=@hydrotik/air.

"0 sessions" after reload The collector connects to the server async. Send a message in pi — the first tool call or turn will create a session.

Context % doesn't match pi footer Delete ~/.hydrotik/air/telemetry.db to clear stale data, restart the server, then /reload in pi.

Pi extension not loading Check that npm install was run inside .pi/extensions/ai-rum-collector/ (the ws package must be in node_modules). Run /reload in pi after fixing.

Claude Code hooks not firing Verify .claude/settings.json has the hook entries under hooks.SessionStart and hooks.PostToolUse. Run npx air-install-claude-code again to re-install. Check that the hook scripts exist at .claude/hooks/air-session-start.js and .claude/hooks/air-post-tool-use.js.

Claude Code sessions not correlating The SessionStart hook persists a session ID to $TMPDIR/air-claude-code/session.json. If PostToolUse events show up as separate sessions, check that the temp directory is writable and both hooks run in the same OS user context.

Codex watcher not seeing sessions Verify ~/.codex/sessions/ exists and contains .jsonl files. Run npx air-codex-watcher --replay <file> on a specific file to test the pipeline. Check AIR_URL if the server is on a non-default port.

Codex watcher missing events The watcher processes the 3 most recent files on startup. Older sessions need --replay to backfill. If fs.watch isn't working (some network filesystems), the watcher falls back to 2s polling automatically.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

⚡ AIr

Install

🔌 Integrate Your Own Services

Quick Start

1. Start the AIr server + dashboard

2. Connect your coding agent

Pi

Claude Code

Codex CLI

Any Agent (SDK)

3. Open the dashboard

What You See

KPI Cards

Context Window Treemap

Context Utilization Over Time

Token Flow

Tool Call Waterfall

Live Event Feed

Data Security & Redaction

Redaction Levels

What Gets Scrubbed

Security Principles

Latency Monitoring

Automatic (from collectors)

Manual (via SDK)

API

Cost Monitoring

Automatic Cost Calculation

Budget Alerts

Built-in Model Pricing

API

Output Evaluation

Automatic (Pi collector)

Manual (SDK)

API

Prompt Rating & Comparison

How It Works

SDK Usage

API

Drift Detection

How It Works

Monitored Metrics

Severity Thresholds

API

Bring Your Own RAG

1. Configuration File (.air.json)

2. HTTP API (Language-Agnostic)

3. SDK (TypeScript)

What Shows Up in the Dashboard

Supported Vector DB Types

Provider API

Configuration

Configuration File

Environment Variables

Database

Ports

SDK — Instrument MCP, RAG, and Custom Tools

MCP Instrumentation

RAG Instrumentation

Generic Client — Any Custom Tool

Dashboard Auto-Detection

REST API

Architecture

Ingestion Paths

Pi Collector (WebSocket)

Claude Code Collector (HTTP)

Codex CLI Collector (HTTP)

Server (Fastify + SQLite)

Dashboard (React + Vite)

Package Exports

What Ships in the Package

Tech Stack

Troubleshooting

License

1. Configuration File (`.air.json`)