@apeleghq/benchmark

v1.0.6

Published

2 months ago

A statistically rigorous benchmarking library with paired t-tests, baseline correction, and confidence intervals

0High
0Medium
0Low

er-ci-bot

corrideat

benchmark benchmarking performance statistics paired-t-test confidence-interval profiling timing measurement statistical-analysis

@apeleghq/benchmark

NPM Downloads

A statistically rigorous benchmarking library with paired t-tests, baseline correction, and confidence intervals.

Features

No runtime dependencies (except picocolor with some built-in reporters)
Runtime and framework agnostic (ESM and CJS dual-package).
Automatically injects a no-op baseline function to subtract loop and call overhead from every measurement on a per-trial basis.
Runs all functions within the same trial to enable paired statistical tests — shared noise sources (GC pauses, thermal throttling, etc.) cancel out in per-trial differences $d_i = a_i - b_i$.
Computes descriptive statistics per function: mean, median, standard deviation, SEM, min, max, and percentiles ($p_5$, $p_{25}$, $p_{75}$, $p_{95}$).
Reports 95% confidence intervals for all means.
Performs all $\binom{k}{2}$ pairwise paired t-tests with two-tailed p-values and significance flags.
Randomises execution order within each trial to reduce systematic bias.
Per-benchmark and suite-level setup/teardown hooks with shared typed context.
Configurable warmup iterations, iterations per trial, and number of trials.
Includes three built-in reporters: simple, advanced, and xUnit.
Fully JSON-serialisable output for custom reporting or CI integration.

Screenshot: Simple report

Screenshot: Advanced report

Installation

npm install @apeleghq/benchmark

yarn add @apeleghq/benchmark

Usage

Basic suite

import { runSuite } from '@apeleghq/benchmark';
import simpleReport from '@apeleghq/benchmark/reporters/simple';

type Ctx = {
  array: unknown[];
};

const result = await runSuite<Ctx>({
  name: 'Array copy comparison',
  setup() {
    this.array = [1, 2, 3];
  },
  functions: [
    {
      name: 'Array.from',
      fn() {
        Array.from(this.array);
      },
    },
    {
      name: 'Spread operator',
      fn() {
        [...this.array];
      },
    },
  ],
});

simpleReport(result);

Advanced reporter

import advancedReport from '@apeleghq/benchmark/reporters/advanced';

advancedReport(result);

xUnit reporter (for CI integration)

import xunitReport from '@apeleghq/benchmark/reporters/xunit';

console.log(xunitReport(result));

Configuration options

All fields in ISuiteConfig beyond name and functions are optional:

const result = await runSuite({
  name: 'My suite',
  warmupIterations: 10,    // default: 10
  iterationsPerTrial: 1000, // default: 1000
  trials: 30,               // default: 30
  setup() { /* suite-level context setup */ },
  teardown() { /* suite-level context teardown */ },
  functions: [
    {
      name: 'my-fn',
      fn() { /* ... */ },
      setup() { /* function-level setup, runs before warmup+measurement */ },
      teardown() { /* function-level teardown, runs after measurement */ },
    },
  ],
});

Using the statistics module directly

The stats subpackage exposes the underlying statistical primitives:

import { mean, median, stdDev, tDistPValue } from '@apeleghq/benchmark/stats';

How it works

Baseline correction

A no-op function (internally named @@null) is automatically added to every suite. Its per-trial measurement captures pure loop and call overhead. This baseline value is subtracted from every other function's measurement on a per-trial basis before any statistics are computed:

$$\hat{x}{i,\text{fn}} = x{i,\text{fn}} - x_{i,\text{null}}$$

All reported aggregate statistics (mean, median, standard deviation, confidence intervals) are computed from these baseline-corrected samples.

Paired t-test

Because all functions are measured within the same trial, comparisons use a paired (dependent) t-test rather than an independent-samples test. Per-trial differences are:

$$d_i = a_i - b_i$$

The t-statistic is:

$$t = \frac{\bar{d}}{s_d / \sqrt{n}}$$

where $\bar{d}$ is the mean of the differences and $s_d$ is their sample standard deviation. This is more powerful than an unpaired test because shared noise sources cancel out.

Confidence intervals

The 95% confidence interval for the true mean difference is:

$$\bar{d} \pm t_{\alpha/2,, n-1} \cdot \frac{s_d}{\sqrt{n}}$$

where $t_{\alpha/2,, n-1}$ is the critical value from the Student's t-distribution with $n-1$ degrees of freedom.

Output format

runSuite returns a fully JSON-serialisable ISuiteReport object:

| Field | Description | |---|---| | name | Suite name | | config | Effective configuration (iterations, trials, warmup) | | trials | Every trial with raw timings and execution order | | functions | Per-function aggregate statistics (corrected and raw samples) | | comparisons | All pairwise paired t-test results | | baselineName | Name of the injected null baseline (@@null) |

Each entry in comparisons includes:

| Field | Description | |---|---| | meanDifference | $\bar{d}$ — mean of paired differences (ms) | | relativeDifference | $\bar{d} / \bar{b}$ — relative difference as a ratio | | tStatistic | Computed t-statistic | | pValue | Two-tailed p-value | | significant | true if pValue < 0.05 | | confidenceInterval | 95% CI for the true mean difference (ms) |

Exports

| Export path | Description | |---|---| | @apeleghq/benchmark | runSuite — main entry point | | @apeleghq/benchmark/stats | Statistical primitives (mean, stdDev, t-test, etc.) | | @apeleghq/benchmark/reporters/simple | Simple console reporter | | @apeleghq/benchmark/reporters/advanced | Advanced console reporter with full statistics | | @apeleghq/benchmark/reporters/xunit | xUnit XML reporter for CI systems |

Contributing

Contributions welcome. Please open issues or pull requests on the repository. Consider adding unit tests for edge cases and additional reporters if extending the library.

License

This project is licensed under the Apache 2.0 License with the LLVM exception. You are free to use this package in compliance with the terms of the license. For more information, see the LICENSE file.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@apeleghq/benchmark

Features

Installation

Usage

Basic suite

Advanced reporter

xUnit reporter (for CI integration)

Configuration options

Using the statistics module directly

How it works

Baseline correction

Paired t-test

Confidence intervals

Output format

Exports

Contributing

License