@bloom-perf/hikaku
v0.2.1
Published
Lightweight Prometheus metrics analysis and regression detection for Kyara load testing
Maintainers
Readme
Hikaku 比較
Hikaku (比較, "comparison" in Japanese) is a lightweight metrics analysis library for Kyara load testing. It reads Prometheus metrics directly from the prom-client registry, computes structured snapshots, detects performance regressions by comparing against JSON baselines, and optionally generates natural language reports via LLM.
Table of Contents
- Why Hikaku?
- How It Works
- Installation
- Quick Start
- API Reference
- Baseline Format
- Thresholds Configuration
- LLM Reporting
- Design Decisions
- Contributing
- License
Why Hikaku?
Kyara generates detailed Prometheus metrics during load tests — HTTP request counts, latency histograms, error rates, CPU/RAM consumption — all labelled by scenario and iteration. The challenge is: how do you detect regressions between test runs?
Why not use Prometheus directly?
Prometheus excels at monitoring continuously running services, but it has fundamental limitations for discrete load test comparison:
| Limitation | Impact |
|---|---|
| No concept of "test run" | Prometheus sees a continuous time series, not isolated runs. Comparing run A vs run B requires knowing exact timestamps, managed externally. |
| Continuous-service assumption | Functions like avg_over_time and stddev_over_time assume a steady stream of data. Load tests produce isolated bursts separated by long silences — rolling averages become meaningless. |
| Cardinality explosion | Adding a run_id label to distinguish runs creates unbounded label values — a well-known Prometheus anti-pattern that causes memory growth and potential OOM. |
| Ephemeral environments | In CI pipelines, the Prometheus instance may not persist between runs, making historical comparisons impossible. |
| Expensive subqueries | Computing histogram_quantile inside a subquery (for comparison over time) is costly and has alignment pitfalls flagged by Prometheus maintainers. |
Why not use Keptn / Iter8?
Existing tools like Keptn (quality gates) and Iter8 (SLO validation) require Kubernetes operators, Helm charts, and significant infrastructure. They are designed for production deployment pipelines, not lightweight CI load test checks.
Hikaku's approach
Hikaku takes a different path: read metrics in-memory, compare against a file.
- Zero infrastructure — no Prometheus server, no Kubernetes, no operator
- In-process — reads directly from
prom-client'sRegistry, no HTTP scraping - File-based baselines — a JSON snapshot stored in git or as a CI artifact
- Run-to-run semantics — each baseline is a discrete snapshot, no timestamp bookkeeping
- Same math as Prometheus — percentiles use identical linear interpolation (
histogram_quantile)
How It Works
Kyara load test run
│
▼
prom-client Registry (in-memory)
│
createSnapshot(registry)
│
▼
┌─────────────┐
│ RunSnapshot │──── saveBaseline() ──▶ baseline.json
└─────────────┘ │
│ │
│ loadBaseline() ◀──────────┘
│ │
▼ ▼
compare(current, baseline, thresholds)
│
▼
┌──────────────────┐
│ ComparisonReport │
│ verdict: pass │
│ or fail │
└──────────────────┘
│
generateReport(report, ...)
│
▼
┌──────────────────┐
│ LLM Report │
│ (optional) │
└──────────────────┘- Snapshot — At the end of a Kyara run,
createSnapshot()reads all metrics and produces aRunSnapshotstructured by scenario, iteration, and hostname - Baseline — The first run's snapshot is saved as a baseline JSON file via
saveBaseline() - Compare — Subsequent runs compare their snapshot against the baseline via
compare(), producing aComparisonReportwith per-scenario deltas and a pass/fail verdict
Installation
npm install @bloom-perf/hikakuprom-client is a peer dependency — it must be installed separately (Kyara already includes it).
Quick Start
import { createSnapshot, saveBaseline, loadBaseline, baselineExists, compare } from '@bloom-perf/hikaku';
import { Registry } from 'prom-client';
// After your Kyara load test completes...
const registry: Registry = getYourPromRegistry();
// Take a snapshot of current metrics
const snapshot = await createSnapshot(registry);
const baselinePath = './baseline.json';
if (!baselineExists(baselinePath)) {
// First run: save as baseline
saveBaseline(snapshot, baselinePath);
console.log('Baseline saved.');
} else {
// Subsequent runs: compare against baseline
const baseline = loadBaseline(baselinePath);
const report = compare(snapshot, baseline);
console.log(`Verdict: ${report.overallVerdict}`);
console.log(`Scenarios: ${report.summary.passed} passed, ${report.summary.failed} failed`);
if (report.overallVerdict === 'fail') {
for (const regression of report.summary.regressions) {
console.log(` ⚠ ${regression.metricName}: ${regression.deltaPercent.toFixed(1)}% increase`);
}
process.exit(1);
}
}API Reference
createSnapshot(registry): Promise<RunSnapshot>
Reads all metrics from a prom-client Registry and produces a structured RunSnapshot.
- Extracts counters grouped by
(scenario, iteration, hostname) - Computes p50/p75/p90/p95/p99 percentiles from histogram buckets
- Groups resource metrics (CPU, RAM) by browser
saveBaseline(snapshot, filePath): void
Saves a RunSnapshot as a versioned JSON baseline file.
loadBaseline(filePath): Baseline
Loads and validates a baseline file. Throws on missing file or unsupported version.
baselineExists(filePath): boolean
Returns true if a baseline file exists at the given path.
compare(current, baseline, thresholds?): ComparisonReport
Compares a current RunSnapshot against a Baseline and returns a detailed report.
Matching: Scenarios are matched by composite key "scenario:iteration".
Metrics compared:
p95_latency— 95th percentile of request duration (aggregated across hosts)p50_latency— 50th percentile of request durationerror_rate— ratio of failed requests to total requests
Verdicts:
- A scenario fails if
p95_latencyorp50_latencyincreases beyond the threshold, or iferror_rateincreases beyond its threshold - The overall verdict is fail if any scenario fails
- New scenarios (present in current but absent from baseline) are skipped
generateReport(report, current, baseline, options): Promise<string>
Generates a natural language performance report from a ComparisonReport using an LLM provider.
Options:
| Option | Type | Default | Description |
|---|---|---|---|
| provider | LlmProvider | (required) | LLM backend to use |
| locale | 'en' \| 'fr' | 'en' | Report language |
| format | 'markdown' \| 'text' | 'markdown' | Output format |
| includeRecommendations | boolean | true | Include investigation suggestions |
createAnthropicProvider(apiKey, model?): LlmProvider
Creates an LLM provider using the Anthropic SDK. Requires @anthropic-ai/sdk as a peer dependency.
apiKey— Anthropic API keymodel— Model name (default:claude-sonnet-4-20250514)
histogramQuantile(quantile, buckets): number
Low-level utility: computes a quantile from histogram buckets using linear interpolation, matching the Prometheus histogram_quantile() algorithm.
Baseline Format
Baselines are human-readable JSON files, suitable for version control:
{
"version": 1,
"createdAt": "2026-03-08T15:00:00.000Z",
"snapshot": {
"timestamp": "2026-03-08T15:00:00.000Z",
"scenarios": [
{
"scenario": "Login",
"iteration": 0,
"hosts": [
{
"hostname": "api.example.com",
"requestCount": 150,
"requestFinishedCount": 148,
"requestFailedCount": 2,
"responseCount": 148,
"durationPercentiles": { "p50": 0.12, "p75": 0.25, "p90": 0.45, "p95": 0.62, "p99": 1.1 },
"durationSum": 28.5,
"durationCount": 150
}
],
"totalRequests": 150,
"totalRequestsFailed": 2,
"errorRate": 0.0133,
"aggregatedDurationPercentiles": { "p50": 0.12, "p75": 0.25, "p90": 0.45, "p95": 0.62, "p99": 1.1 }
}
],
"resources": []
}
}Thresholds Configuration
Default thresholds:
| Parameter | Default | Description |
|---|---|---|
| defaultMaxIncreasePercent | 20 | Max allowed percentage increase for p50/p95 latency |
| defaultMaxErrorRateIncreasePercent | 10 | Max allowed percentage increase for error rate |
Custom thresholds:
const report = compare(snapshot, baseline, {
defaultMaxIncreasePercent: 15, // Stricter: 15% max
defaultMaxErrorRateIncreasePercent: 5, // Very strict on errors
perScenario: {
'Checkout:0': { maxIncreasePercent: 30 }, // More tolerant for checkout
'Login:0': { maxErrorRateIncreasePercent: 0 }, // Zero tolerance on login errors
},
});Per-scenario overrides use the key format "scenarioName:iteration".
LLM Reporting
Hikaku can generate human-readable performance reports using an LLM. This is optional and requires @anthropic-ai/sdk as a peer dependency (or a custom LlmProvider).
npm install @anthropic-ai/sdkUsing the built-in Anthropic provider
import { compare, generateReport, createAnthropicProvider } from '@bloom-perf/hikaku';
const report = compare(snapshot, baseline);
if (report.overallVerdict === 'fail') {
const provider = createAnthropicProvider(process.env.ANTHROPIC_API_KEY!);
const analysis = await generateReport(report, snapshot, baseline, {
provider,
locale: 'fr',
format: 'markdown',
});
console.log(analysis);
}Using a custom LLM provider
Implement the LlmProvider interface to use any LLM backend:
import type { LlmProvider } from '@bloom-perf/hikaku';
const myProvider: LlmProvider = {
async complete(systemPrompt, userMessage) {
// Call your preferred LLM API here
return await myLlmClient.chat(systemPrompt, userMessage);
},
};
const analysis = await generateReport(report, snapshot, baseline, { provider: myProvider });Design Decisions
| Decision | Rationale |
|---|---|
| prom-client as peer dependency | Hikaku reads the registry in-memory — it must share the same instance as Kyara. No runtime dependencies. |
| JSON baselines | Human-readable, diffable in git, portable as CI artifacts. The version field enables future format evolution. |
| Same histogram_quantile algorithm | Percentiles computed from bucket boundaries using linear interpolation, identical to Prometheus. Results are directly comparable to Grafana dashboards. |
| Match by scenario:iteration | Leverages the labels added to Kyara's metrics in v2.0.0, enabling per-scenario regression detection without high-cardinality run_id labels. |
| Stateless comparison | No database, no time-series storage. Each comparison is a pure function: (current, baseline) → report. |
| Injectable LLM provider | LlmProvider interface allows using any LLM backend. Anthropic SDK provided as a convenience, but not a hard dependency (optional peer dep + dynamic import). |
| LLM only on demand | Report generation is opt-in: no API calls unless explicitly requested. Keeps the core library fast and free. |
Bloom-Perf Ecosystem
Hikaku is part of the Bloom-Perf load testing ecosystem:
| Package | Role | |---|---| | kyara | Load testing engine — Puppeteer + Firefox + YAML scenarios + Prometheus metrics | | yaml-pptr | YAML-to-Puppeteer scenario interpreter | | hikaku | Metrics analysis and regression detection |
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
# Install dependencies
npm install
# Run tests
npm test
# Build
npm run build
# Format code
npm run formatLicense
Hikaku is licensed under the Apache 2.0 License.
