npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@benchkit/format

v0.2.3

Published

Benchmark result types and format parsers (Go bench, Rust bench, Hyperfine, pytest-benchmark, benchmark-action, OTLP).

Readme

@benchkit/format

Benchmark result types and format parsers for benchkit. Parses Go bench output, Hyperfine JSON, benchmark-action JSON, pytest-benchmark JSON, and OTLP metrics JSON into a single normalized OTLP metrics document.

Installation

Note: @benchkit/format is not yet published to the npm registry. Until the first release, install from source as described below.

Clone the repository, install dependencies, and build the package:

git clone https://github.com/strawgate/octo11y.git
cd benchkit
npm ci
npm run build --workspace=packages/format

Then, from your project directory, link the local package (adjust the path to where you cloned benchkit):

npm link <path-to-benchkit>/packages/format

Or use a file: reference in your project's package.json:

{
  "dependencies": {
    "@benchkit/format": "file:<path-to-benchkit>/packages/format"
  }
}

Once published, you will be able to install directly:

npm install @benchkit/format

Quick start

import { parseBenchmarks } from "@benchkit/format";

// Auto-detect the format and parse
const result = parseBenchmarks(input);

for (const bench of result.benchmarks) {
  for (const [name, metric] of Object.entries(bench.metrics)) {
    console.log(`${bench.name} ${name}: ${metric.value} ${metric.unit ?? ""}`);
  }
}

Building OTLP results programmatically

If your benchmark does not come from a tool like go test -bench, you can build OTLP results programmatically using buildOtlpResult:

import { buildOtlpResult } from "@benchkit/format";

const doc = buildOtlpResult({
  benchmarks: [
    {
      name: "mock-http-ingest",
      metrics: {
        events_per_sec: { value: 13240.5, unit: "events/sec" },
        p95_batch_ms: { value: 143.2, unit: "ms", direction: "smaller_is_better" },
        service_rss_mb: { value: 543.1, unit: "MB", direction: "smaller_is_better" },
      },
    },
  ],
  context: { sourceFormat: "otlp" },
});

const json = JSON.stringify(doc, null, 2);
// write json to results.json, then stash with format: otlp

Shorthands:

  • numeric metrics like { parse_errors: 0 } are accepted
  • direction is inferred from unit when omitted, e.g. events/sec becomes bigger_is_better

Parser entry points

parseBenchmarks(input, format?)

Main entry point. Accepts a string and an optional format hint. When format is omitted or "auto", the parser inspects the input and picks the right strategy:

| Detected shape | Trigger | Format | |---|---|---| | JSON object with a benchmarks array whose entries have a stats object | benchmarks[0].stats is an object | pytest-benchmark | | JSON object with a resourceMetrics array | Top-level resourceMetrics key present | otlp | | JSON object with a results array | Top-level results key with objects containing a command string | hyperfine | | JSON array of objects | Array whose first element has both a string name and a numeric value | benchmark-action | | Plain text lines | Lines matching /^Benchmark\w.*\s+\d+\s+[\d.]+\s+\w+\/\w+/ | go | | Plain text lines | Lines matching /^test\s+\S+\s+\.\.\.\s+bench:/ | rust |

If auto-detection fails, parseBenchmarks throws with a message listing the supported formats.

import { parseBenchmarks } from "@benchkit/format";

// Explicit format
const result = parseBenchmarks(goOutput, "go");

// Auto-detect (default)
const result = parseBenchmarks(unknownInput);

parseOtlp(input)

Parses OTLP metrics JSON and provides helpers for:

  • reading resource and datapoint attributes
  • discriminating metric kinds (gauge, sum, histogram)
  • reading aggregation temporality
import { parseOtlp } from "@benchkit/format";

const document = parseOtlp(otlpJson);

parseGoBench(input)

Parses Go testing.B text output. Each benchmark line produces one Benchmark entry. The -P processor suffix is extracted into a procs tag. Multiple value/unit pairs on the same line produce separate named metrics.

Input (typical go test -bench=. -benchmem output):

goos: linux
goarch: amd64
BenchmarkSort/small-8     500000     2345 ns/op     128 B/op     3 allocs/op
BenchmarkSort/large-8       1000   987654 ns/op   65536 B/op   512 allocs/op
BenchmarkHash-8          1000000      890 ns/op       0 B/op     0 allocs/op
PASS
ok  	example.com/mypackage	3.456s

Call:

import { parseGoBench } from "@benchkit/format";

const input = `
BenchmarkSort/small-8     500000     2345 ns/op     128 B/op     3 allocs/op
BenchmarkSort/large-8       1000   987654 ns/op   65536 B/op   512 allocs/op
BenchmarkHash-8          1000000      890 ns/op       0 B/op     0 allocs/op
`.trim();

const result = parseGoBench(input);

Result (abbreviated):

{
  "benchmarks": [
    {
      "name": "BenchmarkSort/small",
      "tags": { "procs": "8" },
      "metrics": {
        "ns_per_op":     { "value": 2345,   "unit": "ns/op",     "direction": "smaller_is_better" },
        "bytes_per_op":  { "value": 128,    "unit": "B/op",      "direction": "smaller_is_better" },
        "allocs_per_op": { "value": 3,      "unit": "allocs/op", "direction": "smaller_is_better" }
      }
    },
    {
      "name": "BenchmarkSort/large",
      "tags": { "procs": "8" },
      "metrics": {
        "ns_per_op":     { "value": 987654, "unit": "ns/op",     "direction": "smaller_is_better" },
        "bytes_per_op":  { "value": 65536,  "unit": "B/op",      "direction": "smaller_is_better" },
        "allocs_per_op": { "value": 512,    "unit": "allocs/op", "direction": "smaller_is_better" }
      }
    },
    {
      "name": "BenchmarkHash",
      "tags": { "procs": "8" },
      "metrics": {
        "ns_per_op":     { "value": 890, "unit": "ns/op", "direction": "smaller_is_better" },
        "bytes_per_op":  { "value": 0,   "unit": "B/op",  "direction": "smaller_is_better" },
        "allocs_per_op": { "value": 0,   "unit": "allocs/op", "direction": "smaller_is_better" }
      }
    }
  ]
}

parseBenchmarkAction(input)

Parses the JSON array format used by benchmark-action/github-action-benchmark. Each array entry becomes one Benchmark with a single metric called value. The range string (e.g. "± 300") is parsed into a numeric range field.

Input (JSON produced by the benchmark tool):

[
  { "name": "encode/small",  "value": 125430, "unit": "ops/sec", "range": "± 1200" },
  { "name": "encode/medium", "value":  48200, "unit": "ops/sec", "range": "± 480" },
  { "name": "decode/small",  "value":  98700, "unit": "ops/sec" },
  { "name": "latency/p99",   "value":    4.2, "unit": "ms",      "range": "+/- 0.3" }
]

Call:

import { parseBenchmarkAction } from "@benchkit/format";

const result = parseBenchmarkAction(input);

Result (abbreviated):

{
  "benchmarks": [
    {
      "name": "encode/small",
      "metrics": {
        "value": { "value": 125430, "unit": "ops/sec", "direction": "bigger_is_better", "range": 1200 }
      }
    },
    {
      "name": "encode/medium",
      "metrics": {
        "value": { "value": 48200, "unit": "ops/sec", "direction": "bigger_is_better", "range": 480 }
      }
    },
    {
      "name": "decode/small",
      "metrics": {
        "value": { "value": 98700, "unit": "ops/sec", "direction": "bigger_is_better" }
      }
    },
    {
      "name": "latency/p99",
      "metrics": {
        "value": { "value": 4.2, "unit": "ms", "direction": "smaller_is_better", "range": 0.3 }
      }
    }
  ]
}

parseRustBench(input)

Parses Rust cargo bench (libtest) text output. Each benchmark line produces one Benchmark entry.

import { parseRustBench } from "@benchkit/format";

const result = parseRustBench(
  "test sort::bench_sort   ... bench:         320 ns/iter (+/- 42)"
);
// result.benchmarks[0].metrics => { ns_per_iter: { value: 320, unit: "ns/iter", range: 42 } }

parseHyperfine(input)

Parses the JSON export from Hyperfine (hyperfine --export-json). Each result becomes a benchmark named after the command, with mean, stddev, median, min, and max metrics.

import { parseHyperfine } from "@benchkit/format";

const result = parseHyperfine(JSON.stringify({
  results: [
    {
      command: "sleep 0.1",
      mean: 0.105,
      stddev: 0.002,
      median: 0.105,
      min: 0.103,
      max: 0.108,
      times: [0.103, 0.105, 0.108]
    }
  ]
}));

parsePytestBenchmark(input)

Parses pytest-benchmark JSON output (pytest --benchmark-json=results.json). Each benchmark entry becomes a Benchmark with metrics for mean (primary, seconds), ops, rounds, median, min, max, and stddev.

import { parsePytestBenchmark } from "@benchkit/format";

const result = parsePytestBenchmark(JSON.stringify({
  benchmarks: [
    {
      name: "test_sort",
      fullname: "tests/test_perf.py::test_sort",
      stats: {
        min: 0.000123,
        max: 0.000156,
        mean: 0.000134,
        stddev: 0.0000089,
        rounds: 1000,
        median: 0.000132,
        ops: 7462.68
      }
    }
  ]
}));
// result.benchmarks[0].metrics.mean  => { value: 0.000134, unit: "s", direction: "smaller_is_better", range: 0.0000089 }
// result.benchmarks[0].metrics.ops   => { value: 7462.68, unit: "ops/s", direction: "bigger_is_better" }
// result.benchmarks[0].metrics.rounds => { value: 1000, direction: "bigger_is_better" }

Python example — generate and consume pytest-benchmark output:

# conftest.py / test_perf.py
def test_sort(benchmark):
    benchmark(sorted, range(1000))
pytest --benchmark-json=results.json
import { readFileSync } from "fs";
import { parsePytestBenchmark } from "@benchkit/format";

const result = parsePytestBenchmark(readFileSync("results.json", "utf-8"));
for (const bench of result.benchmarks) {
  console.log(`${bench.name}: ${bench.metrics.mean.value}s (${bench.metrics.ops.value} ops/s)`);
}

inferDirection(unit)

Infers whether a unit string represents a "bigger is better" or "smaller is better" metric. Used internally by all parsers when no explicit direction is provided.

import { inferDirection } from "@benchkit/format";

inferDirection("ops/sec");   // "bigger_is_better"
inferDirection("MB/s");      // "bigger_is_better"
inferDirection("throughput"); // "bigger_is_better"
inferDirection("ns/op");     // "smaller_is_better"
inferDirection("ms");        // "smaller_is_better"
inferDirection("B/op");      // "smaller_is_better"

The heuristic scans the lowercased unit string for substrings:

| Matched substring | Direction | Example units | |---|---|---| | ops/s | bigger_is_better | ops/sec, ops/s | | op/s | bigger_is_better | op/sec, op/s | | /sec | bigger_is_better | req/sec, events/sec | | mb/s | bigger_is_better | MB/s, mb/s | | throughput | bigger_is_better | throughput | | events | bigger_is_better | events, events/sec | | (no match) | smaller_is_better | ns/op, ms, B/op, allocs/op, ns/iter, bytes |

Types

All types mirror the JSON schemas in schema/.

compareRuns(current, baseline[], config?)

Compare a current benchmark run against one or more baseline runs.

import { compareRuns } from "@benchkit/format";

const result = compareRuns(current, [baseline]);
if (result.hasRegression) {
  console.log("Regressions detected!");
}

Sample

A time-series data point within a benchmark run. t is seconds since benchmark start; all other keys are metric values at that instant.

interface Sample {
  t: number;
  [metricName: string]: number;
}

MonitorContext

Metadata about the resource monitoring context (when monitor output is merged via stash action).

interface MonitorContext {
  monitor_version: string;
  poll_interval_ms: number;
  duration_ms: number;
  runner_os?: string;
  runner_arch?: string;
  poll_count?: number;
  kernel?: string;
  cpu_model?: string;
  cpu_count?: number;
  total_memory_mb?: number;
}

Series and index types

These types describe the aggregated files on the bench-data branch (see Data files below):

| Type | Schema | Purpose | |---|---|---| | IndexFile | index.schema.json | Run listing with per-run metadata | | RunEntry | (inline in index schema) | Single entry inside IndexFile.runs | | SeriesFile | series.schema.json | Pre-aggregated time-series for one metric | | SeriesEntry | (inline in series schema) | Points array for one benchmark within a series | | DataPoint | (inline in series schema) | Single {timestamp, value} point |

Metric naming conventions

When the Go and benchmark-action parsers normalize metrics they apply these rules:

| Go unit | Metric name | Rule | |---|---|---| | ns/op | ns_per_op | Replace / with _per_, lowercase | | B/op | bytes_per_op | Known alias | | allocs/op | allocs_per_op | Replace / with _per_, lowercase | | MB/s | mb_per_s | Known alias |

General algorithm: replace every / with _per_, replace spaces with _, then lowercase. Specific aliases (B/opbytes_per_op, MB/smb_per_s, ns/iterns_per_iter) take precedence.

Direction semantics

Every metric may declare whether higher or lower values represent improvement.

| Direction | Meaning | Examples | |---|---|---| | bigger_is_better | Higher values are improvements | throughput, ops/sec, MB/s | | smaller_is_better | Lower values are improvements | latency, ns/op, allocations |

When direction is not specified in the input, all parsers call inferDirection(unit) to infer it from the unit string. See the inferDirection section for the full list of recognized unit patterns.

If no unit is provided and no direction is set, consumers should treat the metric as smaller_is_better.

Data files

The bench-stash and bench-aggregate actions maintain a set of JSON files on a dedicated Git branch (default bench-data). The branch layout is:

data/
├── index.json              # All runs (IndexFile)
├── runs/
│   ├── {runId}/
│   │   ├── benchmark.otlp.json     # OTLP benchmark metrics JSON for one run
│   │   └── telemetry.otlp.jsonl.gz # Optional OTLP telemetry sidecar
│   └── ...
└── series/
    ├── {metricName}.json   # Time-series for one metric (SeriesFile)
    └── ...

| File | Schema | Written by | |---|---|---| | data/index.json | index.schema.json | bench-aggregate | | data/runs/{id}/benchmark.otlp.json | OTLP metrics JSON | bench-stash | | data/series/{metric}.json | series.schema.json | bench-aggregate |

Validating your own output

Validate aggregated output against the JSON schemas:

npx ajv validate -s schema/index.schema.json -d data/index.json
npx ajv validate -s schema/series.schema.json -d data/series/ns_per_op.json

License

MIT