llm-metrics
v0.7.0
Published
Metrics collection system for LLMs and AI agents. Tracks performance, latency, and usage metrics for agents, tools, and LLM requests.
Maintainers
Readme
llm-metrics
Metrics collection system for LLMs and AI agents
Track performance, latency, and usage metrics for agents, tools, and LLM requests. Perfect for monitoring LLM applications, AI agents, and agentic systems.
Installation • Quick Start • Documentation • Contributing
A professional, framework-agnostic metrics collection system designed specifically for LLM applications and AI agents. Built with TypeScript, featuring type-safe APIs, comprehensive validation, and flexible persistence backends.
✨ Features
- 🚀 Framework-agnostic - Works with any JavaScript/TypeScript project (Next.js, Express, Hono, etc.)
- 📊 Multiple metric types - Track agents, tools, latency, and request timing
- 💾 Flexible persistence - In-memory by default, pluggable persistence backends (PostgreSQL, MongoDB, Redis)
- ✅ Type-safe - Full TypeScript support with strict types and IntelliSense
- 🔍 Validation - Built-in metric validation (configurable, prevents invalid data)
- 📈 Aggregations - Built-in summary statistics, percentiles, and histograms
- 🎨 Formatting - Human-readable metric formatting utilities for logging
- 🔌 Extensible - Custom persistence backends, loggers, and event hooks
- ⚡ Zero dependencies - No runtime dependencies, lightweight and fast
- 🔎 Query API - Flexible filtering by context, time range, duration, metadata, etc.
- 📦 Batch operations - Efficient batch recording for migrations and imports
- 📊 Derived metrics - Rate calculations, error rates, and trend analysis
- 🧪 Well tested - Comprehensive test suite (154 tests, 290+ assertions)
- 📦 ESM-only - Modern JavaScript, no CommonJS legacy code
📦 Installation
Install llm-metrics from npm:
npm install llm-metricsOr using your preferred package manager:
# Bun
bun add llm-metrics
# Yarn
yarn add llm-metrics
# pnpm
pnpm add llm-metricsRequirements
- Node.js >= 22.0.0 (LTS) or Bun >= 1.3.0
- ESM-only - This package uses ES Modules only (no CommonJS support)
- TypeScript 5.6+ (recommended for type safety)
🚀 Quick Start
Get started with llm-metrics in under 2 minutes:
import { metricsCollector, measureAgent, measureTool } from 'llm-metrics';
// Measure an agent execution (e.g., LLM agent, AI assistant)
const result = await measureAgent(
'memory-manager', // Agent identifier
'conversation-123', // Context ID (conversation, session, etc.)
async () => {
// Your agent code here
const facts = await extractFacts();
return { facts, count: facts.length };
}
);
// Measure a tool execution (e.g., database query, API call)
const toolResult = await measureTool(
'search-database', // Tool name
'conversation-123', // Context ID
async () => {
// Your tool code here
return await db.query('SELECT * FROM users');
}
);
// Get summary statistics
const summary = metricsCollector.getSummary(3600000); // Last hour
console.log(`Agents executed: ${summary.totalAgentsExecuted}`);
console.log(`Average duration: ${summary.averageAgentDuration}ms`);
console.log(`Tools called: ${summary.totalToolsCalled}`);📚 Core Concepts
Metrics Types
llm-metrics supports four types of metrics optimized for LLM and AI agent workflows:
🤖 Agent Metrics - Track execution of AI agents, LLM calls, or long-running processes
- Duration, success/failure, custom metadata
- Perfect for monitoring agent performance and reliability
🔧 Tool Metrics - Track individual tool/function calls (function calling, RAG queries, etc.)
- Success rate, execution time, error tracking
- Essential for debugging tool usage in agentic systems
⏱️ Latency Metrics - Track specific operations or bottlenecks
- Embedding generation, vector search, cache lookups
- Identify performance bottlenecks in your LLM pipeline
📡 Request Timing Metrics - Track client vs server timing for requests
- Client-side latency, server processing time, streaming duration
- Understand end-to-end user experience
Storage Architecture
In-memory - Fast access, limited by
maxMetrics(default: 1000)- Perfect for real-time monitoring and debugging
- Automatically rotates oldest metrics when limit reached
Persistence - Optional backend for long-term storage
- PostgreSQL, MongoDB, Redis, or any custom backend
- Implement
MetricsPersistenceinterface for your database
💡 Usage Examples
Use Cases
Perfect for:
- LLM Applications - Monitor GPT-4, Claude, Gemini API calls
- AI Agents - Track agent execution, tool usage, and performance
- RAG Systems - Measure vector search, embedding generation latency
- Agentic Workflows - Monitor multi-step agent operations
- Production Monitoring - Track metrics in production LLM applications
📖 Usage Examples
Basic Agent Tracking
import { measureAgent } from 'llm-metrics';
const result = await measureAgent(
'data-processor',
'session-123',
async () => {
// Process data
const processed = await processData();
return processed;
}
);Agent Tracking with Custom Metadata
import { measureAgentWithMetrics } from 'llm-metrics';
const result = await measureAgentWithMetrics(
'memory-manager',
'conversation-456',
async () => {
const facts = await extractFacts();
return { facts, count: facts.length };
},
(result) => ({
factsExtracted: result.count,
summaryLength: result.summary?.length || 0,
})
);Tool Tracking
import { measureTool } from 'llm-metrics';
const result = await measureTool(
'database-query',
'request-789',
async () => {
return await db.query('SELECT * FROM users');
}
);Manual Metric Recording
import { metricsCollector } from 'llm-metrics';
// Record agent metrics manually
metricsCollector.recordAgent({
agentId: 'custom-agent',
contextId: 'context-123',
startTime: Date.now() - 5000,
endTime: Date.now(),
duration: 5000,
metadata: {
customField: 'value',
itemsProcessed: 42,
},
});
// Record latency metrics
metricsCollector.recordLatency({
operation: 'cache-lookup',
startTime: Date.now() - 100,
endTime: Date.now(),
duration: 100,
metadata: {
cacheHit: true,
},
});Request Timing (Client vs Server)
import { metricsCollector } from 'llm-metrics';
metricsCollector.recordRequestTiming({
contextId: 'request-123',
serverTimeToFirstChunk: 500,
serverStreamDuration: 2000,
serverTotalDuration: 2500,
clientTimeToFirstChunk: 800, // From Performance API
clientRequestStart: performance.now(),
networkLatencyEstimate: 300, // client - server difference
metadata: {
model: 'gpt-4',
messageCount: 5,
},
});⚙️ Configuration
Customize llm-metrics to fit your needs:
Custom Persistence Backend
import { MetricsPersistence, metricsCollector } from 'llm-metrics';
import type { AgentMetrics, ToolMetrics, LatencyMetrics, RequestTimingMetrics } from 'llm-metrics';
class MyDatabasePersistence implements MetricsPersistence {
async persistAgentMetrics(metrics: AgentMetrics): Promise<void> {
// Save to your database
await db.insert('agent_metrics', metrics);
}
async persistToolMetrics(metrics: ToolMetrics): Promise<void> {
await db.insert('tool_metrics', metrics);
}
async persistLatencyMetrics(metrics: LatencyMetrics): Promise<void> {
await db.insert('latency_metrics', metrics);
}
async persistRequestTimingMetrics(metrics: RequestTimingMetrics): Promise<void> {
await db.insert('request_timing_metrics', metrics);
}
async getAgentMetrics(timeRangeMs?: number, contextId?: string): Promise<AgentMetrics[]> {
// Retrieve from database
return await db.query('SELECT * FROM agent_metrics WHERE ...');
}
// ... implement other get methods
}
// Configure persistence
metricsCollector.setPersistence(new MyDatabasePersistence());Custom Logger
import { MetricsLogger, metricsCollector } from 'llm-metrics';
class MyLogger implements MetricsLogger {
info(message: string, data?: Record<string, unknown>): void {
console.log(`[INFO] ${message}`, data);
}
debug(message: string, data?: Record<string, unknown>): void {
console.debug(`[DEBUG] ${message}`, data);
}
warn(message: string, data?: Record<string, unknown>): void {
console.warn(`[WARN] ${message}`, data);
}
error(message: string, data?: Record<string, unknown>): void {
console.error(`[ERROR] ${message}`, data);
}
}
metricsCollector.setLogger(new MyLogger());Collector Configuration
import { MetricsCollector, MetricsCollectorConfig } from 'llm-metrics';
const config: MetricsCollectorConfig = {
maxMetrics: 5000, // Keep more metrics in memory
validateMetrics: true, // Enable validation (default)
throwOnValidationError: false, // Don't throw, just log (default)
};
const customCollector = new MetricsCollector(undefined, undefined, config);
// Or configure existing collector
metricsCollector.configure({
maxMetrics: 2000,
});API Reference
MetricsCollector
Methods
recordAgent(metrics: AgentMetrics): void- Record agent metricsrecordTool(metrics: ToolMetrics): void- Record tool metricsrecordLatency(metrics: LatencyMetrics): void- Record latency metricsrecordRequestTiming(metrics: RequestTimingMetrics): void- Record request timinggetSnapshot(): MetricsSnapshot- Get all current metricsgetSummary(timeRangeMs?: number): MetricsSummary- Get aggregated statisticsgetContextMetrics(contextId: string): Promise<...>- Get metrics for a contextclear(): void- Clear all metricssetPersistence(persistence: MetricsPersistence): void- Configure persistencesetLogger(logger: MetricsLogger): void- Configure loggerconfigure(config: Partial<MetricsCollectorConfig>): void- Update configuration
Helper Functions
measureAgent<T>(agentId, contextId?, execute): Promise<T>- Measure agent executionmeasureAgentWithMetrics<T>(agentId, contextId, execute, extractMetadata): Promise<T>- Measure with metadata extractionmeasureTool<T>(toolName, contextId, execute): Promise<T>- Measure tool executionmeasureToolWithMetadata<T>(toolName, contextId, execute, extractMetadata): Promise<T>- Measure tool with metadata
Formatting Utilities
formatDuration(ms: number): string- Format duration (e.g., "1.5s", "2m 5s")formatDurationDetailed(ms: number): string- Detailed duration formatformatAgentMetrics(metrics: AgentMetrics): string- Human-readable agent metricsformatToolMetrics(metrics: ToolMetrics): string- Human-readable tool metricsformatLatencyMetrics(metrics: LatencyMetrics): string- Human-readable latency metricsformatMetricsSummary(summary: MetricsSummary): string- Human-readable summary
Validation
validateAgentMetrics(metrics: AgentMetrics): ValidationResult- Validate agent metricsvalidateToolMetrics(metrics: ToolMetrics): ValidationResult- Validate tool metricsvalidateLatencyMetrics(metrics: LatencyMetrics): ValidationResult- Validate latency metricsvalidateRequestTimingMetrics(metrics: RequestTimingMetrics): ValidationResult- Validate request timing
Types
AgentMetrics
interface AgentMetrics {
agentId: string;
contextId?: string; // Generic context ID (conversationId, sessionId, requestId, etc.)
startTime: number; // Timestamp in milliseconds
endTime?: number; // Timestamp in milliseconds
duration?: number; // Duration in milliseconds
metadata?: Record<string, unknown>; // Custom metadata
error?: string; // Error message if failed
}ToolMetrics
interface ToolMetrics {
toolName: string;
contextId?: string;
startTime: number;
endTime?: number;
duration?: number;
success: boolean;
error?: string;
metadata?: Record<string, unknown>;
}LatencyMetrics
interface LatencyMetrics {
operation: string;
startTime: number;
endTime: number;
duration: number;
metadata?: Record<string, unknown>;
}RequestTimingMetrics
interface RequestTimingMetrics {
contextId?: string;
serverTimeToFirstChunk: number; // milliseconds
serverStreamDuration: number; // milliseconds
serverTotalDuration: number; // milliseconds
clientTimeToFirstChunk?: number; // milliseconds (from Performance API)
clientRequestStart?: number; // performance.now() timestamp
networkLatencyEstimate?: number; // milliseconds
metadata?: Record<string, unknown>;
}Examples
See the examples/ directory for complete, runnable examples:
- Next.js API Route - Integration with Next.js API routes
- Express Middleware - Express middleware for automatic request tracking
- AI SDK Integration - Integration with Vercel AI SDK
- Export Metrics - Export metrics to JSON and CSV
- Aggregations - Advanced aggregations and histograms
- Event Hooks - Event hooks for integrations and alerting
Advanced Usage
Event Hooks
Use event hooks to integrate with external systems, dashboards, or alerting:
import { metricsCollector } from 'llm-metrics';
// Set up callbacks
metricsCollector.setCallbacks({
onAgentRecorded: (metrics) => {
// Send to monitoring service, update dashboard, etc.
console.log('Agent executed:', metrics.agentId, metrics.duration);
},
onToolRecorded: (metrics) => {
// Track tool usage, alert on failures, etc.
if (!metrics.success) {
console.error('Tool failed:', metrics.toolName);
}
},
});
// Or configure during construction
const collector = new MetricsCollector(persistence, logger, {
callbacks: {
onAgentRecorded: (metrics) => { /* ... */ },
onToolRecorded: (metrics) => { /* ... */ },
},
});See examples/event-hooks.ts for complete examples.
Query and Filter API
Query metrics with flexible filter criteria:
import { metricsCollector } from 'llm-metrics';
// Filter by multiple context IDs
const metrics = metricsCollector.queryMetrics({
contextIds: ['session-123', 'session-456'],
});
// Filter by agent IDs
const agentMetrics = metricsCollector.queryMetrics({
agentIds: ['data-processor'],
});
// Filter by time range
const recentMetrics = metricsCollector.queryMetrics({
startTime: Date.now() - 3600000, // Last hour
endTime: Date.now(),
});
// Filter by duration range
const slowMetrics = metricsCollector.queryMetrics({
minDuration: 5000, // Slower than 5 seconds
});
// Filter by metadata
const dataMetrics = metricsCollector.queryMetrics({
metadata: { category: 'data' },
});
// Combine multiple filters
const complexFilter = metricsCollector.queryMetrics({
contextIds: ['session-123'],
minDuration: 1000,
maxDuration: 5000,
metadata: { category: 'data' },
});See examples/query-filter.ts for complete examples.
Batch Operations
Record multiple metrics efficiently in batch:
import { metricsCollector } from 'llm-metrics';
// Record multiple agents in batch
metricsCollector.recordAgents([
{ agentId: 'agent-1', startTime: Date.now(), /* ... */ },
{ agentId: 'agent-2', startTime: Date.now(), /* ... */ },
]);
// Record multiple tools in batch
metricsCollector.recordTools([
{ toolName: 'tool-1', startTime: Date.now(), success: true, /* ... */ },
{ toolName: 'tool-2', startTime: Date.now(), success: false, /* ... */ },
]);
// Record multiple latency metrics in batch
metricsCollector.recordLatencies([
{ operation: 'op-1', startTime: Date.now() - 100, endTime: Date.now(), duration: 100 },
{ operation: 'op-2', startTime: Date.now() - 50, endTime: Date.now(), duration: 50 },
]);
// Record multiple request timings in batch
metricsCollector.recordRequestTimings([
{ contextId: 'req-1', serverTimeToFirstChunk: 500, serverStreamDuration: 2000, serverTotalDuration: 2500 },
{ contextId: 'req-2', serverTimeToFirstChunk: 300, serverStreamDuration: 1000, serverTotalDuration: 1300 },
]);Batch operations are useful for:
- Migrating metrics from another system
- Importing historical data
- Bulk operations
- More efficient than individual
record*()calls
See examples/batch-operations.ts for complete examples.
Derived Metrics
Calculate simple derived metrics like rates and trends:
import { calculateAgentDerivedMetrics, calculateToolDerivedMetrics, calculateTrend } from 'llm-metrics';
const snapshot = metricsCollector.getSnapshot();
// Calculate agent derived metrics
const agentDerived = calculateAgentDerivedMetrics(snapshot.agents, 3600000); // Last hour
console.log(`Error Rate: ${agentDerived.errorRate}%`);
console.log(`Requests/Second: ${agentDerived.requestsPerSecond}`);
// Calculate tool derived metrics
const toolDerived = calculateToolDerivedMetrics(snapshot.tools, 3600000);
console.log(`Success Rate: ${toolDerived.successRate}%`);
// Calculate trends
const trend = calculateTrend(currentRate, previousRate);
console.log(`Change: ${trend.changePercent}%`);Available derived metrics:
- Rates: Requests per second, operations per second
- Error Rates: Error percentage, success percentage
- Trends: Change between time periods, percentage change
See examples/derived-metrics.ts for complete examples.
Custom Metadata Extraction
import { measureAgentWithMetrics } from 'llm-metrics';
const result = await measureAgentWithMetrics(
'data-processor',
'batch-123',
async () => {
const data = await processBatch();
return {
items: data.items,
errors: data.errors,
stats: data.stats,
};
},
(result) => ({
itemsProcessed: result.items.length,
errorCount: result.errors.length,
averageScore: result.stats.averageScore,
customMetric: result.stats.customValue,
})
);Time-Range Filtering
import { metricsCollector } from 'llm-metrics';
// Last hour
const lastHour = metricsCollector.getSummary(3600000);
// Last 24 hours
const lastDay = metricsCollector.getSummary(86400000);
// All time
const allTime = metricsCollector.getSummary();Context-Based Queries
import { metricsCollector } from 'llm-metrics';
// Get all metrics for a specific context (conversation, session, etc.)
const contextMetrics = await metricsCollector.getContextMetrics('conversation-123');
console.log(`Agents: ${contextMetrics.agents.length}`);
console.log(`Tools: ${contextMetrics.tools.length}`);
console.log(`Latency operations: ${contextMetrics.latency.length}`);Best Practices
- Use context IDs - Always provide
contextIdto track metrics across operations - Extract meaningful metadata - Use metadata to store domain-specific information
- Configure persistence - For production, use a persistence backend
- Enable validation - Keep validation enabled to catch errors early
- Monitor memory usage - Adjust
maxMetricsbased on your needs - Use helper functions - Prefer
measureAgent/measureToolover manual recording
Performance Considerations
- In-memory storage - Fast but limited by
maxMetrics(default: 1000) - Persistence is async - Persistence operations don't block metric recording
- Validation overhead - Can be disabled for maximum performance if needed
- FIFO eviction - Oldest metrics are removed when limit is reached
🔨 Building with Bun
This package is fully compatible with Bun and can be bundled directly:
# Bundle with Bun
bun build ./src/index.ts --outdir ./dist --target bun
# Or use Bun's bundler in your project
bun build node_modules/llm-metrics/dist/index.js --outdir ./bundled🛠️ Technical Details
Modern JavaScript Only
This package uses ES Modules (ESM) only:
- ✅ ES2022+ syntax
- ✅ Native ESM imports/exports
- ✅ Compatible with Bun 1.3+, Node.js 22+ (LTS), Deno
- ❌ No CommonJS support
- ❌ No legacy browser support
Requirements:
- Node.js >= 22.0.0 (LTS)
- Bun >= 1.3.0
📊 Project Status
- ✅ v0.6.0 - Latest release
- ✅ 154 tests passing (290+ assertions)
- ✅ ~95% code coverage (comprehensive edge case coverage)
- ✅ 100% TypeScript type coverage
- ✅ ESM-only (modern JavaScript)
- ✅ Zero dependencies (runtime)
🤝 Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
🔌 Creating Custom Adapters
Want to create your own persistence adapter? See src/adapters/README.md for:
- Adapter interface documentation
- PostgreSQL adapter example
- MongoDB adapter example
- Redis adapter example
- Best practices and testing guidelines
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Setup
# Clone the repository
git clone https://github.com/Arakiss/llm-metrics.git
cd llm-metrics
# Install dependencies
bun install
# Run tests
bun test
# Build
bun run build📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Built for the LLM and AI agent ecosystem
- Inspired by the need for better observability in agentic systems
- Designed with performance and developer experience in mind
🔗 Links
- npm: https://www.npmjs.com/package/llm-metrics
- GitHub: https://github.com/Arakiss/llm-metrics
- Issues: https://github.com/Arakiss/llm-metrics/issues
- Releases: https://github.com/Arakiss/llm-metrics/releases
- Changelog: https://github.com/Arakiss/llm-metrics/blob/main/CHANGELOG.md
- Contributing: https://github.com/Arakiss/llm-metrics/blob/main/CONTRIBUTING.md
- Security: https://github.com/Arakiss/llm-metrics/blob/main/SECURITY.md
Made with ❤️ for the LLM community
