@bloom-perf/hikaku

v0.2.1

Published

3 months ago

Lightweight Prometheus metrics analysis and regression detection for Kyara load testing

0High
0Medium
0Low

jramette

prometheus metrics baseline regression load-testing performance llm report

Hikaku 比較

Hikaku (比較, "comparison" in Japanese) is a lightweight metrics analysis library for Kyara load testing. It reads Prometheus metrics directly from the prom-client registry, computes structured snapshots, detects performance regressions by comparing against JSON baselines, and optionally generates natural language reports via LLM.

Why Hikaku?

Kyara generates detailed Prometheus metrics during load tests — HTTP request counts, latency histograms, error rates, CPU/RAM consumption — all labelled by scenario and iteration. The challenge is: how do you detect regressions between test runs?

Why not use Prometheus directly?

Prometheus excels at monitoring continuously running services, but it has fundamental limitations for discrete load test comparison:

| Limitation | Impact | |---|---| | No concept of "test run" | Prometheus sees a continuous time series, not isolated runs. Comparing run A vs run B requires knowing exact timestamps, managed externally. | | Continuous-service assumption | Functions like avg_over_time and stddev_over_time assume a steady stream of data. Load tests produce isolated bursts separated by long silences — rolling averages become meaningless. | | Cardinality explosion | Adding a run_id label to distinguish runs creates unbounded label values — a well-known Prometheus anti-pattern that causes memory growth and potential OOM. | | Ephemeral environments | In CI pipelines, the Prometheus instance may not persist between runs, making historical comparisons impossible. | | Expensive subqueries | Computing histogram_quantile inside a subquery (for comparison over time) is costly and has alignment pitfalls flagged by Prometheus maintainers. |

Why not use Keptn / Iter8?

Existing tools like Keptn (quality gates) and Iter8 (SLO validation) require Kubernetes operators, Helm charts, and significant infrastructure. They are designed for production deployment pipelines, not lightweight CI load test checks.

Hikaku's approach

Hikaku takes a different path: read metrics in-memory, compare against a file.

Zero infrastructure — no Prometheus server, no Kubernetes, no operator
In-process — reads directly from prom-client's Registry, no HTTP scraping
File-based baselines — a JSON snapshot stored in git or as a CI artifact
Run-to-run semantics — each baseline is a discrete snapshot, no timestamp bookkeeping
Same math as Prometheus — percentiles use identical linear interpolation (histogram_quantile)

How It Works

                          Kyara load test run
                                  │
                                  ▼
                     prom-client Registry (in-memory)
                                  │
                        createSnapshot(registry)
                                  │
                                  ▼
                           ┌─────────────┐
                           │ RunSnapshot  │──── saveBaseline() ──▶ baseline.json
                           └─────────────┘                            │
                                  │                                   │
                                  │         loadBaseline() ◀──────────┘
                                  │              │
                                  ▼              ▼
                         compare(current, baseline, thresholds)
                                  │
                                  ▼
                        ┌──────────────────┐
                        │ ComparisonReport │
                        │   verdict: pass  │
                        │   or fail        │
                        └──────────────────┘
                                  │
                        generateReport(report, ...)
                                  │
                                  ▼
                        ┌──────────────────┐
                        │  LLM Report      │
                        │  (optional)      │
                        └──────────────────┘

Snapshot — At the end of a Kyara run, createSnapshot() reads all metrics and produces a RunSnapshot structured by scenario, iteration, and hostname
Baseline — The first run's snapshot is saved as a baseline JSON file via saveBaseline()
Compare — Subsequent runs compare their snapshot against the baseline via compare(), producing a ComparisonReport with per-scenario deltas and a pass/fail verdict

Installation

npm install @bloom-perf/hikaku

prom-client is a peer dependency — it must be installed separately (Kyara already includes it).

Quick Start

import { createSnapshot, saveBaseline, loadBaseline, baselineExists, compare } from '@bloom-perf/hikaku';
import { Registry } from 'prom-client';

// After your Kyara load test completes...
const registry: Registry = getYourPromRegistry();

// Take a snapshot of current metrics
const snapshot = await createSnapshot(registry);

const baselinePath = './baseline.json';

if (!baselineExists(baselinePath)) {
  // First run: save as baseline
  saveBaseline(snapshot, baselinePath);
  console.log('Baseline saved.');
} else {
  // Subsequent runs: compare against baseline
  const baseline = loadBaseline(baselinePath);
  const report = compare(snapshot, baseline);

  console.log(`Verdict: ${report.overallVerdict}`);
  console.log(`Scenarios: ${report.summary.passed} passed, ${report.summary.failed} failed`);

  if (report.overallVerdict === 'fail') {
    for (const regression of report.summary.regressions) {
      console.log(`  ⚠ ${regression.metricName}: ${regression.deltaPercent.toFixed(1)}% increase`);
    }
    process.exit(1);
  }
}

API Reference

`createSnapshot(registry): Promise<RunSnapshot>`

Reads all metrics from a prom-client Registry and produces a structured RunSnapshot.

Extracts counters grouped by (scenario, iteration, hostname)
Computes p50/p75/p90/p95/p99 percentiles from histogram buckets
Groups resource metrics (CPU, RAM) by browser

`saveBaseline(snapshot, filePath): void`

Saves a RunSnapshot as a versioned JSON baseline file.

`loadBaseline(filePath): Baseline`

Loads and validates a baseline file. Throws on missing file or unsupported version.

`baselineExists(filePath): boolean`

Returns true if a baseline file exists at the given path.

`compare(current, baseline, thresholds?): ComparisonReport`

Compares a current RunSnapshot against a Baseline and returns a detailed report.

Matching: Scenarios are matched by composite key "scenario:iteration".

Metrics compared:

p95_latency — 95th percentile of request duration (aggregated across hosts)
p50_latency — 50th percentile of request duration
error_rate — ratio of failed requests to total requests

Verdicts:

A scenario fails if p95_latency or p50_latency increases beyond the threshold, or if error_rate increases beyond its threshold
The overall verdict is fail if any scenario fails
New scenarios (present in current but absent from baseline) are skipped

`generateReport(report, current, baseline, options): Promise<string>`

Generates a natural language performance report from a ComparisonReport using an LLM provider.

Options:

| Option | Type | Default | Description | |---|---|---|---| | provider | LlmProvider | (required) | LLM backend to use | | locale | 'en' \| 'fr' | 'en' | Report language | | format | 'markdown' \| 'text' | 'markdown' | Output format | | includeRecommendations | boolean | true | Include investigation suggestions |

`createAnthropicProvider(apiKey, model?): LlmProvider`

Creates an LLM provider using the Anthropic SDK. Requires @anthropic-ai/sdk as a peer dependency.

apiKey — Anthropic API key
model — Model name (default: claude-sonnet-4-20250514)

`histogramQuantile(quantile, buckets): number`

Low-level utility: computes a quantile from histogram buckets using linear interpolation, matching the Prometheus histogram_quantile() algorithm.

Baseline Format

Baselines are human-readable JSON files, suitable for version control:

{
  "version": 1,
  "createdAt": "2026-03-08T15:00:00.000Z",
  "snapshot": {
    "timestamp": "2026-03-08T15:00:00.000Z",
    "scenarios": [
      {
        "scenario": "Login",
        "iteration": 0,
        "hosts": [
          {
            "hostname": "api.example.com",
            "requestCount": 150,
            "requestFinishedCount": 148,
            "requestFailedCount": 2,
            "responseCount": 148,
            "durationPercentiles": { "p50": 0.12, "p75": 0.25, "p90": 0.45, "p95": 0.62, "p99": 1.1 },
            "durationSum": 28.5,
            "durationCount": 150
          }
        ],
        "totalRequests": 150,
        "totalRequestsFailed": 2,
        "errorRate": 0.0133,
        "aggregatedDurationPercentiles": { "p50": 0.12, "p75": 0.25, "p90": 0.45, "p95": 0.62, "p99": 1.1 }
      }
    ],
    "resources": []
  }
}

Thresholds Configuration

Default thresholds:

| Parameter | Default | Description | |---|---|---| | defaultMaxIncreasePercent | 20 | Max allowed percentage increase for p50/p95 latency | | defaultMaxErrorRateIncreasePercent | 10 | Max allowed percentage increase for error rate |

Custom thresholds:

const report = compare(snapshot, baseline, {
  defaultMaxIncreasePercent: 15,        // Stricter: 15% max
  defaultMaxErrorRateIncreasePercent: 5, // Very strict on errors
  perScenario: {
    'Checkout:0': { maxIncreasePercent: 30 },  // More tolerant for checkout
    'Login:0': { maxErrorRateIncreasePercent: 0 }, // Zero tolerance on login errors
  },
});

Per-scenario overrides use the key format "scenarioName:iteration".

LLM Reporting

Hikaku can generate human-readable performance reports using an LLM. This is optional and requires @anthropic-ai/sdk as a peer dependency (or a custom LlmProvider).

npm install @anthropic-ai/sdk

Using the built-in Anthropic provider

import { compare, generateReport, createAnthropicProvider } from '@bloom-perf/hikaku';

const report = compare(snapshot, baseline);

if (report.overallVerdict === 'fail') {
  const provider = createAnthropicProvider(process.env.ANTHROPIC_API_KEY!);
  const analysis = await generateReport(report, snapshot, baseline, {
    provider,
    locale: 'fr',
    format: 'markdown',
  });
  console.log(analysis);
}

Using a custom LLM provider

Implement the LlmProvider interface to use any LLM backend:

import type { LlmProvider } from '@bloom-perf/hikaku';

const myProvider: LlmProvider = {
  async complete(systemPrompt, userMessage) {
    // Call your preferred LLM API here
    return await myLlmClient.chat(systemPrompt, userMessage);
  },
};

const analysis = await generateReport(report, snapshot, baseline, { provider: myProvider });

Design Decisions

| Decision | Rationale | |---|---| | prom-client as peer dependency | Hikaku reads the registry in-memory — it must share the same instance as Kyara. No runtime dependencies. | | JSON baselines | Human-readable, diffable in git, portable as CI artifacts. The version field enables future format evolution. | | Same histogram_quantile algorithm | Percentiles computed from bucket boundaries using linear interpolation, identical to Prometheus. Results are directly comparable to Grafana dashboards. | | Match by scenario:iteration | Leverages the labels added to Kyara's metrics in v2.0.0, enabling per-scenario regression detection without high-cardinality run_id labels. | | Stateless comparison | No database, no time-series storage. Each comparison is a pure function: (current, baseline) → report. | | Injectable LLM provider | LlmProvider interface allows using any LLM backend. Anthropic SDK provided as a convenience, but not a hard dependency (optional peer dep + dynamic import). | | LLM only on demand | Report generation is opt-in: no API calls unless explicitly requested. Keeps the core library fast and free. |

Bloom-Perf Ecosystem

Hikaku is part of the Bloom-Perf load testing ecosystem:

| Package | Role | |---|---| | kyara | Load testing engine — Puppeteer + Firefox + YAML scenarios + Prometheus metrics | | yaml-pptr | YAML-to-Puppeteer scenario interpreter | | hikaku | Metrics analysis and regression detection |

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

# Install dependencies
npm install

# Run tests
npm test

# Build
npm run build

# Format code
npm run format

License

Hikaku is licensed under the Apache 2.0 License.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme