llm-metrics

v0.7.0

Published

a month ago

Metrics collection system for LLMs and AI agents. Tracks performance, latency, and usage metrics for agents, tools, and LLM requests.

llm-metrics

Metrics collection system for LLMs and AI agents

Track performance, latency, and usage metrics for agents, tools, and LLM requests. Perfect for monitoring LLM applications, AI agents, and agentic systems.

Installation • Quick Start • Documentation • Contributing

A professional, framework-agnostic metrics collection system designed specifically for LLM applications and AI agents. Built with TypeScript, featuring type-safe APIs, comprehensive validation, and flexible persistence backends.

✨ Features

🚀 Framework-agnostic - Works with any JavaScript/TypeScript project (Next.js, Express, Hono, etc.)
📊 Multiple metric types - Track agents, tools, latency, and request timing
💾 Flexible persistence - In-memory by default, pluggable persistence backends (PostgreSQL, MongoDB, Redis)
✅ Type-safe - Full TypeScript support with strict types and IntelliSense
🔍 Validation - Built-in metric validation (configurable, prevents invalid data)
📈 Aggregations - Built-in summary statistics, percentiles, and histograms
🎨 Formatting - Human-readable metric formatting utilities for logging
🔌 Extensible - Custom persistence backends, loggers, and event hooks
⚡ Zero dependencies - No runtime dependencies, lightweight and fast
🔎 Query API - Flexible filtering by context, time range, duration, metadata, etc.
📦 Batch operations - Efficient batch recording for migrations and imports
📊 Derived metrics - Rate calculations, error rates, and trend analysis
🧪 Well tested - Comprehensive test suite (154 tests, 290+ assertions)
📦 ESM-only - Modern JavaScript, no CommonJS legacy code

📦 Installation

Install llm-metrics from npm:

npm install llm-metrics

Or using your preferred package manager:

# Bun
bun add llm-metrics

# Yarn
yarn add llm-metrics

# pnpm
pnpm add llm-metrics

Requirements

Node.js >= 22.0.0 (LTS) or Bun >= 1.3.0
ESM-only - This package uses ES Modules only (no CommonJS support)
TypeScript 5.6+ (recommended for type safety)

🚀 Quick Start

Get started with llm-metrics in under 2 minutes:

import { metricsCollector, measureAgent, measureTool } from 'llm-metrics';

// Measure an agent execution (e.g., LLM agent, AI assistant)
const result = await measureAgent(
  'memory-manager',        // Agent identifier
  'conversation-123',     // Context ID (conversation, session, etc.)
  async () => {
    // Your agent code here
    const facts = await extractFacts();
    return { facts, count: facts.length };
  }
);

// Measure a tool execution (e.g., database query, API call)
const toolResult = await measureTool(
  'search-database',      // Tool name
  'conversation-123',      // Context ID
  async () => {
    // Your tool code here
    return await db.query('SELECT * FROM users');
  }
);

// Get summary statistics
const summary = metricsCollector.getSummary(3600000); // Last hour
console.log(`Agents executed: ${summary.totalAgentsExecuted}`);
console.log(`Average duration: ${summary.averageAgentDuration}ms`);
console.log(`Tools called: ${summary.totalToolsCalled}`);

📚 Core Concepts

Metrics Types

llm-metrics supports four types of metrics optimized for LLM and AI agent workflows:

🤖 Agent Metrics - Track execution of AI agents, LLM calls, or long-running processes
- Duration, success/failure, custom metadata
- Perfect for monitoring agent performance and reliability
🔧 Tool Metrics - Track individual tool/function calls (function calling, RAG queries, etc.)
- Success rate, execution time, error tracking
- Essential for debugging tool usage in agentic systems
⏱️ Latency Metrics - Track specific operations or bottlenecks
- Embedding generation, vector search, cache lookups
- Identify performance bottlenecks in your LLM pipeline
📡 Request Timing Metrics - Track client vs server timing for requests
- Client-side latency, server processing time, streaming duration
- Understand end-to-end user experience

Storage Architecture

In-memory - Fast access, limited by maxMetrics (default: 1000)
- Perfect for real-time monitoring and debugging
- Automatically rotates oldest metrics when limit reached
Persistence - Optional backend for long-term storage
- PostgreSQL, MongoDB, Redis, or any custom backend
- Implement MetricsPersistence interface for your database

💡 Usage Examples

Use Cases

Perfect for:

LLM Applications - Monitor GPT-4, Claude, Gemini API calls
AI Agents - Track agent execution, tool usage, and performance
RAG Systems - Measure vector search, embedding generation latency
Agentic Workflows - Monitor multi-step agent operations
Production Monitoring - Track metrics in production LLM applications

📖 Usage Examples

Basic Agent Tracking

import { measureAgent } from 'llm-metrics';

const result = await measureAgent(
  'data-processor',
  'session-123',
  async () => {
    // Process data
    const processed = await processData();
    return processed;
  }
);

Agent Tracking with Custom Metadata

import { measureAgentWithMetrics } from 'llm-metrics';

const result = await measureAgentWithMetrics(
  'memory-manager',
  'conversation-456',
  async () => {
    const facts = await extractFacts();
    return { facts, count: facts.length };
  },
  (result) => ({
    factsExtracted: result.count,
    summaryLength: result.summary?.length || 0,
  })
);

Tool Tracking

import { measureTool } from 'llm-metrics';

const result = await measureTool(
  'database-query',
  'request-789',
  async () => {
    return await db.query('SELECT * FROM users');
  }
);

Manual Metric Recording

import { metricsCollector } from 'llm-metrics';

// Record agent metrics manually
metricsCollector.recordAgent({
  agentId: 'custom-agent',
  contextId: 'context-123',
  startTime: Date.now() - 5000,
  endTime: Date.now(),
  duration: 5000,
  metadata: {
    customField: 'value',
    itemsProcessed: 42,
  },
});

// Record latency metrics
metricsCollector.recordLatency({
  operation: 'cache-lookup',
  startTime: Date.now() - 100,
  endTime: Date.now(),
  duration: 100,
  metadata: {
    cacheHit: true,
  },
});

Request Timing (Client vs Server)

import { metricsCollector } from 'llm-metrics';

metricsCollector.recordRequestTiming({
  contextId: 'request-123',
  serverTimeToFirstChunk: 500,
  serverStreamDuration: 2000,
  serverTotalDuration: 2500,
  clientTimeToFirstChunk: 800, // From Performance API
  clientRequestStart: performance.now(),
  networkLatencyEstimate: 300, // client - server difference
  metadata: {
    model: 'gpt-4',
    messageCount: 5,
  },
});

⚙️ Configuration

Customize llm-metrics to fit your needs:

Custom Persistence Backend

import { MetricsPersistence, metricsCollector } from 'llm-metrics';
import type { AgentMetrics, ToolMetrics, LatencyMetrics, RequestTimingMetrics } from 'llm-metrics';

class MyDatabasePersistence implements MetricsPersistence {
  async persistAgentMetrics(metrics: AgentMetrics): Promise<void> {
    // Save to your database
    await db.insert('agent_metrics', metrics);
  }

  async persistToolMetrics(metrics: ToolMetrics): Promise<void> {
    await db.insert('tool_metrics', metrics);
  }

  async persistLatencyMetrics(metrics: LatencyMetrics): Promise<void> {
    await db.insert('latency_metrics', metrics);
  }

  async persistRequestTimingMetrics(metrics: RequestTimingMetrics): Promise<void> {
    await db.insert('request_timing_metrics', metrics);
  }

  async getAgentMetrics(timeRangeMs?: number, contextId?: string): Promise<AgentMetrics[]> {
    // Retrieve from database
    return await db.query('SELECT * FROM agent_metrics WHERE ...');
  }

  // ... implement other get methods
}

// Configure persistence
metricsCollector.setPersistence(new MyDatabasePersistence());

Custom Logger

import { MetricsLogger, metricsCollector } from 'llm-metrics';

class MyLogger implements MetricsLogger {
  info(message: string, data?: Record<string, unknown>): void {
    console.log(`[INFO] ${message}`, data);
  }

  debug(message: string, data?: Record<string, unknown>): void {
    console.debug(`[DEBUG] ${message}`, data);
  }

  warn(message: string, data?: Record<string, unknown>): void {
    console.warn(`[WARN] ${message}`, data);
  }

  error(message: string, data?: Record<string, unknown>): void {
    console.error(`[ERROR] ${message}`, data);
  }
}

metricsCollector.setLogger(new MyLogger());

Collector Configuration

import { MetricsCollector, MetricsCollectorConfig } from 'llm-metrics';

const config: MetricsCollectorConfig = {
  maxMetrics: 5000, // Keep more metrics in memory
  validateMetrics: true, // Enable validation (default)
  throwOnValidationError: false, // Don't throw, just log (default)
};

const customCollector = new MetricsCollector(undefined, undefined, config);

// Or configure existing collector
metricsCollector.configure({
  maxMetrics: 2000,
});

API Reference

MetricsCollector

Methods

recordAgent(metrics: AgentMetrics): void - Record agent metrics
recordTool(metrics: ToolMetrics): void - Record tool metrics
recordLatency(metrics: LatencyMetrics): void - Record latency metrics
recordRequestTiming(metrics: RequestTimingMetrics): void - Record request timing
getSnapshot(): MetricsSnapshot - Get all current metrics
getSummary(timeRangeMs?: number): MetricsSummary - Get aggregated statistics
getContextMetrics(contextId: string): Promise<...> - Get metrics for a context
clear(): void - Clear all metrics
setPersistence(persistence: MetricsPersistence): void - Configure persistence
setLogger(logger: MetricsLogger): void - Configure logger
configure(config: Partial<MetricsCollectorConfig>): void - Update configuration

Helper Functions

measureAgent<T>(agentId, contextId?, execute): Promise<T> - Measure agent execution
measureAgentWithMetrics<T>(agentId, contextId, execute, extractMetadata): Promise<T> - Measure with metadata extraction
measureTool<T>(toolName, contextId, execute): Promise<T> - Measure tool execution
measureToolWithMetadata<T>(toolName, contextId, execute, extractMetadata): Promise<T> - Measure tool with metadata

Formatting Utilities

formatDuration(ms: number): string - Format duration (e.g., "1.5s", "2m 5s")
formatDurationDetailed(ms: number): string - Detailed duration format
formatAgentMetrics(metrics: AgentMetrics): string - Human-readable agent metrics
formatToolMetrics(metrics: ToolMetrics): string - Human-readable tool metrics
formatLatencyMetrics(metrics: LatencyMetrics): string - Human-readable latency metrics
formatMetricsSummary(summary: MetricsSummary): string - Human-readable summary

Validation

validateAgentMetrics(metrics: AgentMetrics): ValidationResult - Validate agent metrics
validateToolMetrics(metrics: ToolMetrics): ValidationResult - Validate tool metrics
validateLatencyMetrics(metrics: LatencyMetrics): ValidationResult - Validate latency metrics
validateRequestTimingMetrics(metrics: RequestTimingMetrics): ValidationResult - Validate request timing

Types

AgentMetrics

interface AgentMetrics {
  agentId: string;
  contextId?: string; // Generic context ID (conversationId, sessionId, requestId, etc.)
  startTime: number; // Timestamp in milliseconds
  endTime?: number; // Timestamp in milliseconds
  duration?: number; // Duration in milliseconds
  metadata?: Record<string, unknown>; // Custom metadata
  error?: string; // Error message if failed
}

ToolMetrics

interface ToolMetrics {
  toolName: string;
  contextId?: string;
  startTime: number;
  endTime?: number;
  duration?: number;
  success: boolean;
  error?: string;
  metadata?: Record<string, unknown>;
}

LatencyMetrics

interface LatencyMetrics {
  operation: string;
  startTime: number;
  endTime: number;
  duration: number;
  metadata?: Record<string, unknown>;
}

RequestTimingMetrics

interface RequestTimingMetrics {
  contextId?: string;
  serverTimeToFirstChunk: number; // milliseconds
  serverStreamDuration: number; // milliseconds
  serverTotalDuration: number; // milliseconds
  clientTimeToFirstChunk?: number; // milliseconds (from Performance API)
  clientRequestStart?: number; // performance.now() timestamp
  networkLatencyEstimate?: number; // milliseconds
  metadata?: Record<string, unknown>;
}

Examples

See the examples/ directory for complete, runnable examples:

Next.js API Route - Integration with Next.js API routes
Express Middleware - Express middleware for automatic request tracking
AI SDK Integration - Integration with Vercel AI SDK
Export Metrics - Export metrics to JSON and CSV
Aggregations - Advanced aggregations and histograms
Event Hooks - Event hooks for integrations and alerting

Advanced Usage

Event Hooks

Use event hooks to integrate with external systems, dashboards, or alerting:

import { metricsCollector } from 'llm-metrics';

// Set up callbacks
metricsCollector.setCallbacks({
  onAgentRecorded: (metrics) => {
    // Send to monitoring service, update dashboard, etc.
    console.log('Agent executed:', metrics.agentId, metrics.duration);
  },
  onToolRecorded: (metrics) => {
    // Track tool usage, alert on failures, etc.
    if (!metrics.success) {
      console.error('Tool failed:', metrics.toolName);
    }
  },
});

// Or configure during construction
const collector = new MetricsCollector(persistence, logger, {
  callbacks: {
    onAgentRecorded: (metrics) => { /* ... */ },
    onToolRecorded: (metrics) => { /* ... */ },
  },
});

See examples/event-hooks.ts for complete examples.

Query and Filter API

Query metrics with flexible filter criteria:

import { metricsCollector } from 'llm-metrics';

// Filter by multiple context IDs
const metrics = metricsCollector.queryMetrics({
  contextIds: ['session-123', 'session-456'],
});

// Filter by agent IDs
const agentMetrics = metricsCollector.queryMetrics({
  agentIds: ['data-processor'],
});

// Filter by time range
const recentMetrics = metricsCollector.queryMetrics({
  startTime: Date.now() - 3600000, // Last hour
  endTime: Date.now(),
});

// Filter by duration range
const slowMetrics = metricsCollector.queryMetrics({
  minDuration: 5000, // Slower than 5 seconds
});

// Filter by metadata
const dataMetrics = metricsCollector.queryMetrics({
  metadata: { category: 'data' },
});

// Combine multiple filters
const complexFilter = metricsCollector.queryMetrics({
  contextIds: ['session-123'],
  minDuration: 1000,
  maxDuration: 5000,
  metadata: { category: 'data' },
});

See examples/query-filter.ts for complete examples.

Batch Operations

Record multiple metrics efficiently in batch:

import { metricsCollector } from 'llm-metrics';

// Record multiple agents in batch
metricsCollector.recordAgents([
  { agentId: 'agent-1', startTime: Date.now(), /* ... */ },
  { agentId: 'agent-2', startTime: Date.now(), /* ... */ },
]);

// Record multiple tools in batch
metricsCollector.recordTools([
  { toolName: 'tool-1', startTime: Date.now(), success: true, /* ... */ },
  { toolName: 'tool-2', startTime: Date.now(), success: false, /* ... */ },
]);

// Record multiple latency metrics in batch
metricsCollector.recordLatencies([
  { operation: 'op-1', startTime: Date.now() - 100, endTime: Date.now(), duration: 100 },
  { operation: 'op-2', startTime: Date.now() - 50, endTime: Date.now(), duration: 50 },
]);

// Record multiple request timings in batch
metricsCollector.recordRequestTimings([
  { contextId: 'req-1', serverTimeToFirstChunk: 500, serverStreamDuration: 2000, serverTotalDuration: 2500 },
  { contextId: 'req-2', serverTimeToFirstChunk: 300, serverStreamDuration: 1000, serverTotalDuration: 1300 },
]);

Batch operations are useful for:

Migrating metrics from another system
Importing historical data
Bulk operations
More efficient than individual record*() calls

See examples/batch-operations.ts for complete examples.

Derived Metrics

Calculate simple derived metrics like rates and trends:

import { calculateAgentDerivedMetrics, calculateToolDerivedMetrics, calculateTrend } from 'llm-metrics';

const snapshot = metricsCollector.getSnapshot();

// Calculate agent derived metrics
const agentDerived = calculateAgentDerivedMetrics(snapshot.agents, 3600000); // Last hour
console.log(`Error Rate: ${agentDerived.errorRate}%`);
console.log(`Requests/Second: ${agentDerived.requestsPerSecond}`);

// Calculate tool derived metrics
const toolDerived = calculateToolDerivedMetrics(snapshot.tools, 3600000);
console.log(`Success Rate: ${toolDerived.successRate}%`);

// Calculate trends
const trend = calculateTrend(currentRate, previousRate);
console.log(`Change: ${trend.changePercent}%`);

Available derived metrics:

Rates: Requests per second, operations per second
Error Rates: Error percentage, success percentage
Trends: Change between time periods, percentage change

See examples/derived-metrics.ts for complete examples.

Custom Metadata Extraction

import { measureAgentWithMetrics } from 'llm-metrics';

const result = await measureAgentWithMetrics(
  'data-processor',
  'batch-123',
  async () => {
    const data = await processBatch();
    return {
      items: data.items,
      errors: data.errors,
      stats: data.stats,
    };
  },
  (result) => ({
    itemsProcessed: result.items.length,
    errorCount: result.errors.length,
    averageScore: result.stats.averageScore,
    customMetric: result.stats.customValue,
  })
);

Time-Range Filtering

import { metricsCollector } from 'llm-metrics';

// Last hour
const lastHour = metricsCollector.getSummary(3600000);

// Last 24 hours
const lastDay = metricsCollector.getSummary(86400000);

// All time
const allTime = metricsCollector.getSummary();

Context-Based Queries

import { metricsCollector } from 'llm-metrics';

// Get all metrics for a specific context (conversation, session, etc.)
const contextMetrics = await metricsCollector.getContextMetrics('conversation-123');

console.log(`Agents: ${contextMetrics.agents.length}`);
console.log(`Tools: ${contextMetrics.tools.length}`);
console.log(`Latency operations: ${contextMetrics.latency.length}`);

Best Practices

Use context IDs - Always provide contextId to track metrics across operations
Extract meaningful metadata - Use metadata to store domain-specific information
Configure persistence - For production, use a persistence backend
Enable validation - Keep validation enabled to catch errors early
Monitor memory usage - Adjust maxMetrics based on your needs
Use helper functions - Prefer measureAgent/measureTool over manual recording

Performance Considerations

In-memory storage - Fast but limited by maxMetrics (default: 1000)
Persistence is async - Persistence operations don't block metric recording
Validation overhead - Can be disabled for maximum performance if needed
FIFO eviction - Oldest metrics are removed when limit is reached

🔨 Building with Bun

This package is fully compatible with Bun and can be bundled directly:

# Bundle with Bun
bun build ./src/index.ts --outdir ./dist --target bun

# Or use Bun's bundler in your project
bun build node_modules/llm-metrics/dist/index.js --outdir ./bundled

🛠️ Technical Details

Modern JavaScript Only

This package uses ES Modules (ESM) only:

✅ ES2022+ syntax
✅ Native ESM imports/exports
✅ Compatible with Bun 1.3+, Node.js 22+ (LTS), Deno
❌ No CommonJS support
❌ No legacy browser support

Requirements:

Node.js >= 22.0.0 (LTS)
Bun >= 1.3.0

📊 Project Status

✅ v0.6.0 - Latest release
✅ 154 tests passing (290+ assertions)
✅ ~95% code coverage (comprehensive edge case coverage)
✅ 100% TypeScript type coverage
✅ ESM-only (modern JavaScript)
✅ Zero dependencies (runtime)

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

🔌 Creating Custom Adapters

Want to create your own persistence adapter? See src/adapters/README.md for:

Adapter interface documentation
PostgreSQL adapter example
MongoDB adapter example
Redis adapter example
Best practices and testing guidelines

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

# Clone the repository
git clone https://github.com/Arakiss/llm-metrics.git
cd llm-metrics

# Install dependencies
bun install

# Run tests
bun test

# Build
bun run build

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built for the LLM and AI agent ecosystem
Inspired by the need for better observability in agentic systems
Designed with performance and developer experience in mind

🔗 Links

npm: https://www.npmjs.com/package/llm-metrics
GitHub: https://github.com/Arakiss/llm-metrics
Issues: https://github.com/Arakiss/llm-metrics/issues
Releases: https://github.com/Arakiss/llm-metrics/releases
Changelog: https://github.com/Arakiss/llm-metrics/blob/main/CHANGELOG.md
Contributing: https://github.com/Arakiss/llm-metrics/blob/main/CONTRIBUTING.md
Security: https://github.com/Arakiss/llm-metrics/blob/main/SECURITY.md

Made with ❤️ for the LLM community

⭐ Star on GitHub • 📦 Install from npm • 🐛 Report Bug

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

llm-metrics

✨ Features

📦 Installation

Requirements

🚀 Quick Start

📚 Core Concepts

Metrics Types

Storage Architecture

💡 Usage Examples

Use Cases

📖 Usage Examples

Basic Agent Tracking

Agent Tracking with Custom Metadata

Tool Tracking

Manual Metric Recording

Request Timing (Client vs Server)

⚙️ Configuration

Custom Persistence Backend

Custom Logger

Collector Configuration

API Reference

MetricsCollector

Methods

Helper Functions

Formatting Utilities

Validation

Types

AgentMetrics

ToolMetrics

LatencyMetrics

RequestTimingMetrics

Examples

Advanced Usage

Event Hooks

Query and Filter API

Batch Operations

Derived Metrics

Custom Metadata Extraction

Time-Range Filtering

Context-Based Queries

Best Practices

Performance Considerations

🔨 Building with Bun

🛠️ Technical Details

Modern JavaScript Only

📊 Project Status

🤝 Contributing

🔌 Creating Custom Adapters

Development Setup

📝 License

🙏 Acknowledgments

🔗 Links