@reaatech/agent-eval-harness-latency
v0.1.0
Published
Latency monitoring, SLA enforcement, and optimization analysis for agent-eval-harness
Readme
@reaatech/agent-eval-harness-latency
Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.
Turn-level and trajectory-level latency monitoring with SLA enforcement and optimization analysis. Computes P50/P90/P99 percentiles, detects anomalies, and provides actionable bottleneck recommendations for AI agent latency budgets.
Installation
npm install @reaatech/agent-eval-harness-latencyFeature Overview
- Percentile computation — P50, P90, P99 latency metrics computed per turn and aggregated across the full trajectory
- Component breakdown — Separates LLM call latency from tool invocation latency and system overhead for targeted optimization
- SLA enforcement — Configurable per-turn and per-trajectory latency thresholds with severity-graded violation detection and early-warning signals
- Three latency presets —
strict(P50: 500ms, P90: 1000ms, P99: 2000ms),moderate(P50: 1000ms, P90: 2000ms, P99: 5000ms),lenient(P50: 2000ms, P90: 4000ms, P99: 10000ms) - Anomaly detection — Identifies outlier turns whose latency exceeds a configurable multiplier of the average, with a minimum 1000ms floor
- Optimization analysis — Ranked bottleneck identification (LLM call, tool invocation, overhead, total) with priority-ordered recommendations covering model selection, batching, streaming, caching, prompt shortening, and turn reduction
- Latency trend tracking —
LatencyTrackerclass records history and computes improvement trends across evaluation runs
Quick Start
import { monitorLatency, enforceBudget, createLatencyBudget } from '@reaatech/agent-eval-harness-latency';
import type { Trajectory } from '@reaatech/agent-eval-harness-types';
// Assume trajectory loaded from JSONL
const result = monitorLatency(trajectory);
console.log(`P50: ${result.p50Ms}ms, P99: ${result.p99Ms}ms, Total: ${result.totalLatencyMs}ms`);
const budget = createLatencyBudget('moderate');
const enforcement = enforceBudget(result, budget);
console.log(`Within SLA: ${enforcement.passed}, Violations: ${enforcement.violations.length}`);API Reference
Monitoring Functions
| Export | Signature | Description |
|--------|-----------|-------------|
| monitorLatency | (trajectory: Trajectory) => LatencyResult | Extracts per-turn latency from agent turns, computes P50/P90/P99 percentiles, total, average, min, and max latency |
| getComponentBreakdown | (result: LatencyResult) => ComponentBreakdown | Breaks down latency into average and total LLM call, tool invocation, and overhead components |
| compareLatency | (baseline: LatencyResult, candidate: LatencyResult) => { avgDiffMs, p99DiffMs, faster, percentageChange } | Compares two latency results and returns differences with directional indication |
| detectAnomalies | (result: LatencyResult, thresholdMultiplier?: number) => TurnLatency[] | Returns turns where latency exceeds avgLatencyMs * thresholdMultiplier (default 2x) and is above 1000ms |
Budget Enforcement Functions
| Export | Signature | Description |
|--------|-----------|-------------|
| enforceBudget | (result: LatencyResult, budget: LatencyBudget) => BudgetEnforcementResult | Validates latency result against budget thresholds, returns violations, warnings, and a composite score (0–1) |
| createLatencyBudget | (preset: 'strict' \| 'moderate' \| 'lenient') => LatencyBudget | Returns a pre-configured budget with P50/P90/P99 max turn, trajectory total, and component thresholds |
| formatLatency | (ms: number) => string | Formats milliseconds into human-readable strings: ms, s, or m |
Optimization Functions
| Export | Signature | Description |
|--------|-----------|-------------|
| analyzeOptimization | (result: LatencyResult, trajectory?: Trajectory) => OptimizationResult | Identifies bottlenecks, generates ranked recommendations with estimated improvement, and computes an optimization score |
| LatencyTracker | class | Maintains latency history, computes trends (getTrend()), average scores (getAverageScore()), and history retrieval (getHistory()) |
Types
LatencyBudget
| Field | Type | Description |
|-------|------|-------------|
| p50 | number? | Maximum allowed P50 latency in ms |
| p90 | number? | Maximum allowed P90 latency in ms |
| p99 | number? | Maximum allowed P99 latency in ms |
| maxTurn | number? | Maximum allowed per-turn latency in ms |
| total | number? | Maximum allowed total trajectory latency in ms |
| components | ComponentBudget? | Per-component budget thresholds |
LatencyResult
| Field | Type | Description |
|-------|------|-------------|
| turns | TurnLatency[] | Per-turn latency breakdown |
| totalLatencyMs | number | Sum of all agent turn latencies |
| avgLatencyMs | number | Mean latency across agent turns |
| p50Ms | number | 50th percentile |
| p90Ms | number | 90th percentile |
| p99Ms | number | 99th percentile |
| maxLatencyMs | number | Maximum single-turn latency |
| minLatencyMs | number | Minimum single-turn latency |
| turnCount | number | Number of agent turns evaluated |
LatencyViolation
| Field | Type | Description |
|-------|------|-------------|
| type | ViolationType | Category (p50_exceeded, p90_exceeded, p99_exceeded, max_turn_exceeded, total_exceeded, llm_call_exceeded, tool_invocation_exceeded, overhead_exceeded) |
| severity | 'low' \| 'medium' \| 'high' \| 'critical' | Impact level of the violation |
| description | string | Human-readable violation description |
| actual | number | Measured value in ms |
| threshold | number | Budget threshold in ms |
| turnId | number? | Affected turn (for max_turn violations) |
ComponentBreakdown
| Field | Type | Description |
|-------|------|-------------|
| avgLlmCallMs | number | Average LLM call latency across turns |
| avgToolInvocationMs | number | Average tool invocation latency across turns |
| avgOverheadMs | number | Average system overhead across turns |
| totalLlmCallMs | number | Sum of all LLM call latencies |
| totalToolInvocationMs | number | Sum of all tool invocation latencies |
| totalOverheadMs | number | Sum of all overhead latencies |
Latency Presets
| Preset | P50 | P90 | P99 | Max Turn | Trajectory Total |
|--------|-----|-----|-----|----------|-------------------|
| strict | 500ms | 1000ms | 2000ms | 3000ms | 15000ms |
| moderate | 1000ms | 2000ms | 5000ms | 8000ms | 30000ms |
| lenient | 2000ms | 4000ms | 10000ms | 15000ms | 60000ms |
Advanced: Component-Level SLA Enforcement
Each preset also includes per-component budgets. Pass a custom LatencyBudget with a components field to enforce LLM call, tool invocation, and overhead thresholds independently:
import { enforceBudget } from '@reaatech/agent-eval-harness-latency';
const budget = createLatencyBudget('strict');
// budget.components = { llmCall: 400, toolInvocation: 100, overhead: 50 }
const result = monitorLatency(trajectory);
const enforcement = enforceBudget(result, budget);
for (const v of enforcement.violations) {
console.log(`[${v.severity.toUpperCase()}] ${v.type}: ${v.description}`);
}
// Enforcement score: 1.0 = perfect, deducts 0.4 for critical, 0.25 for high, etc.
console.log(`Enforcement score: ${enforcement.score}`);Advanced: Optimization Analysis
The optimizer identifies the most impactful bottlenecks and generates actionable, priority-ranked recommendations:
import { analyzeOptimization, LatencyTracker } from '@reaatech/agent-eval-harness-latency';
const optimization = analyzeOptimization(latencyResult, trajectory);
console.log(`Bottlenecks: ${optimization.bottlenecks.length}`);
for (const b of optimization.bottlenecks) {
console.log(` ${b.type}: severity=${b.severity.toFixed(2)}, ${b.description}`);
}
console.log(`Top recommendations:`);
for (const r of optimization.recommendations.slice(0, 3)) {
console.log(` [${r.priority}] ${r.description} (effort: ${r.effort}, est. gain: ${r.expectedImprovementMs}ms)`);
}
// Track latency across multiple evaluation runs
const tracker = new LatencyTracker();
tracker.record(result);
console.log(`Trend: ${tracker.getTrend().improving ? 'improving' : 'degrading'}`);
console.log(`Average score: ${tracker.getAverageScore()}`);Related Packages
| Package | Description | |---------|-------------| | @reaatech/agent-eval-harness-types | Shared domain types and schemas | | @reaatech/agent-eval-harness-trajectory | Trajectory evaluation | | @reaatech/agent-eval-harness-tool-use | Tool-use validation | | @reaatech/agent-eval-harness-cost | Cost tracking | | @reaatech/agent-eval-harness-latency | Latency monitoring | | @reaatech/agent-eval-harness-judge | LLM-as-judge | | @reaatech/agent-eval-harness-golden | Golden trajectories | | @reaatech/agent-eval-harness-suite | Suite runner | | @reaatech/agent-eval-harness-gate | CI gates | | @reaatech/agent-eval-harness-mcp-server | MCP server | | @reaatech/agent-eval-harness-cli | CLI | | @reaatech/agent-eval-harness-observability | Observability |
License
MIT
