@jsleekr/reqbench

v1.0.0

Published

3 months ago

API benchmarking with A/B comparison and statistical significance testing

0High
0Medium
0Low

jsleekr

typescript benchmarking http performance api cli statistics load-testing

⚡ reqbench

API benchmarking with statistical comparison

Measure latency percentiles, compare endpoints with Welch's t-test, and automate performance gates in CI

Why This Exists

Most load testing tools give you numbers but not answers. You get a p99 and a mean -- but is version B actually faster than version A, or did noise just fall your way this run?

reqbench goes further -- when you compare two endpoints, it runs Welch's t-test and tells you whether the difference is statistically significant or just noise. A warm-up phase discards early results so connection pool effects and DNS caches don't skew your measurements. Scenarios let you chain multi-step auth flows with variable extraction. And CI-friendly exit codes mean you can block a deploy when latency regresses past a threshold.

Welch's t-test A/B comparison -- tells you whether the difference is real or noise, not just bigger or smaller
Warm-up phase -- discards early requests to eliminate connection pool and JIT effects before measurement
Multi-step YAML scenarios -- chain requests with variable extraction for auth flows and stateful endpoints
Zero heavy dependencies -- only commander and js-yaml; no Rust binaries, no native modules

Requirements

Node.js >= 18.0.0

Quick Start

# Install globally
npm install -g reqbench

# Benchmark a single endpoint
reqbench run https://api.example.com/health

# A/B compare two endpoints
reqbench compare https://api-v1.example.com/data https://api-v2.example.com/data

# With options
reqbench run https://api.example.com/users \
  -c 50 \
  -d 30 \
  -m POST \
  -H "Authorization: Bearer TOKEN" \
  -f json

Commands

`reqbench run <url>`

Benchmark a single endpoint and display latency percentiles, RPS, error rate, and a histogram.

| Option | Description | Default | |--------|-------------|---------| | -c, --concurrency <n> | Concurrent connections | 10 | | -d, --duration <seconds> | Test duration in seconds | 10 | | -m, --method <method> | HTTP method (GET/POST/PUT/PATCH/DELETE/HEAD/OPTIONS) | GET | | -H, --header <header> | HTTP header in Key: Value format (repeatable) | -- | | -b, --body <body> | Request body string | -- | | -w, --warmup <seconds> | Warm-up duration (results discarded) | 2 | | -t, --timeout <ms> | Per-request timeout in milliseconds | 5000 | | -f, --format <format> | Output format: terminal, json, markdown | terminal | | -p, --profile <name> | Load options from a saved profile | -- |

`reqbench compare <url1> <url2>`

Benchmark both endpoints under identical conditions and compare the results with Welch's t-test. Reports p-value, statistical significance, and winner.

reqbench compare https://api-v1.example.com/endpoint https://api-v2.example.com/endpoint

# With higher concurrency and longer duration for more reliable results
reqbench compare https://v1.example.com/api https://v2.example.com/api -c 20 -d 60

# JSON output for CI pipelines
reqbench compare https://v1.example.com https://v2.example.com -f json

`reqbench scenario <file>`

Run a multi-step scenario from a YAML file. Supports variable extraction between steps for auth flows, token-based workflows, and any multi-request sequence.

reqbench scenario auth-flow.yaml
reqbench scenario api-workflow.yaml -c 10 -d 30

`reqbench profile save|list|delete`

Manage named connection profiles to avoid repeating common options.

# Save a profile
reqbench profile save myapi -u https://api.example.com -m POST -H "Authorization: Bearer TOKEN"

# List saved profiles
reqbench profile list

# Use a profile in a benchmark
reqbench run https://api.example.com/endpoint -p myapi

# Delete a profile
reqbench profile delete myapi

Profiles are stored as JSON files in ~/.reqbench/profiles/. Profile names are validated to prevent path traversal.

Example Output

Terminal (default)

  URL:         https://api.example.com/health
  Duration:    10.02s
  Requests:    4523
  RPS:         451.40
  Error Rate:  0.00%

  Latency (ms):
    p50:   18.32
    p95:   45.67
    p99:   98.41
    mean:  22.14
    stdev: 15.82
    min:   3.21
    max:   152.88

  Histogram:
    0.0-30.0ms           ########################## 3200 (70.8%)
    30.0-60.0ms          ########                    900 (19.9%)
    60.0-90.0ms          ###                         320 ( 7.1%)
    90.0-150.0ms         #                           103 ( 2.3%)

A/B Comparison

  A/B Comparison
  ─────────────────────────────────────────────────────
  Metric               A               B               Diff
  ─────────────────────────────────────────────────────
  p50 (ms)             18.32           42.15           -23.83
  p95 (ms)             45.67           89.24           -43.57
  p99 (ms)             98.41          156.30           -57.89
  RPS                 451.40          220.38          +231.02
  Mean (ms)            22.14           48.33           -26.19
  Stdev (ms)           15.82           32.70           -16.88
  Error Rate            0.00            0.00             0.00
  ─────────────────────────────────────────────────────
  p-value: 0.000012
  Significant: Yes
  Winner: A

  A is faster than B by 56.5% (p=0.0000).

JSON output

reqbench run https://api.example.com/health -f json

{
  "url": "https://api.example.com/health",
  "duration": 10.02,
  "requests": 4523,
  "rps": 451.40,
  "errorRate": 0.00,
  "latency": {
    "p50": 18.32,
    "p95": 45.67,
    "p99": 98.41,
    "mean": 22.14,
    "stdev": 15.82,
    "min": 3.21,
    "max": 152.88
  }
}

Scenario Files

Create a YAML file to describe multi-step workflows. Variable extraction lets you pass values (such as auth tokens) between steps.

name: Auth Flow
concurrency: 5
duration: 30
steps:
  - name: Login
    url: https://api.example.com/auth/login
    method: POST
    body: '{"username":"test","password":"pass"}'
    extract:
      token: token

  - name: Get Profile
    url: https://api.example.com/users/me
    method: GET
    headers:
      Authorization: "Bearer {{token}}"

  - name: Update Profile
    url: https://api.example.com/users/me
    method: PUT
    headers:
      Authorization: "Bearer {{token}}"
    body: '{"displayName":"Test User"}'

The extract map reads a field from the JSON response body and stores it as a variable. Use {{variableName}} in subsequent steps.

CI Integration

Block deploys on latency regression

name: Performance Gate

on:
  pull_request:
    branches: [main]

jobs:
  bench:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Start staging server
        run: npm start &

      - name: Compare PR vs main latency
        run: |
          npx reqbench compare \
            https://main.example.com/api \
            https://staging.example.com/api \
            -c 20 -d 30 -f json > result.json

      - name: Check regression
        run: |
          P50_DIFF=$(cat result.json | jq '.comparison.p50Diff')
          if (( $(echo "$P50_DIFF > 20" | bc -l) )); then
            echo "p50 latency regressed by ${P50_DIFF}ms -- blocking merge"
            exit 1
          fi

Post benchmark results as PR comment

      - name: Run benchmark
        id: bench
        run: |
          OUTPUT=$(npx reqbench run https://api.example.com/health -f markdown)
          echo "report<<EOF" >> $GITHUB_OUTPUT
          echo "$OUTPUT" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT

      - name: Comment on PR
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## Benchmark Results\n\n${{ steps.bench.outputs.report }}`
            })

Exit Codes

| Code | Meaning | |------|---------| | 0 | Success -- benchmark completed normally | | 1 | Comparison result -- B is faster (useful for A/B regression checks) | | 2 | Validation error -- invalid URL, method, headers | | 3 | Runtime error -- connection refused, timeout, etc. |

How It Works

Request Phase       Warmup Filter      Measurement        Statistics         Output
─────────────       ─────────────      ─────────────      ─────────────      ──────────
concurrent      →   discard first   →  collect           →  p50/p95/p99    →  terminal
workers             W seconds          latencies             mean/stdev        json
                                                             RPS               markdown
                                                             error rate
                                                             Welch's t-test
                                                             (compare mode)

Warm-up -- Fires requests for the configured warm-up period and discards results. Eliminates connection pool startup, DNS resolution, and server-side JIT effects.
Measurement -- Concurrent workers fire requests for the configured duration. Each latency sample is recorded with microsecond precision.
Statistics -- Percentiles are computed from the collected sample array. Standard deviation uses Welch's online algorithm. For compare mode, Welch's t-test evaluates significance.
Histogram -- Built with a loop-based accumulator safe for 500K+ samples (no Math.min(...array) stack overflow).
Output -- Results are formatted and written to stdout. Exit codes reflect outcome for CI consumption.

Architecture

src/
  types.ts        # Core types (BenchResult, CompareResult, ScenarioStep, etc.)
  errors.ts       # ReqBenchError, ValidationError with descriptive hints
  bench.ts        # Single endpoint benchmark engine
  compare.ts      # A/B comparison with Welch's t-test
  scenario.ts     # Multi-step YAML scenario runner
  reporter.ts     # Output formatters (terminal, json, markdown)
  profile.ts      # Profile save/load/delete with path traversal protection
  validation.ts   # URL, method, header, profile name validators
  stats.ts        # Statistics utilities (percentiles, stdev, t-test)
  cli.ts          # CLI entry point
  index.ts        # Public re-exports

Security

URL validation -- rejects non-HTTP protocols (ftp://, file://, javascript:), embedded credentials, and URLs over 2048 characters
Header injection prevention -- rejects headers containing CRLF (\r\n)
Method whitelisting -- only GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS are accepted
Profile path traversal -- profile names containing ../ or special characters are rejected before any filesystem operation

FAQ

Q: How does the statistical comparison work? A: reqbench uses Welch's t-test (two-tailed) to compare the latency distributions of both endpoints. A p-value below 0.05 indicates a statistically significant difference. Welch's variant (rather than Student's) accounts for unequal sample sizes and variances. See docs/advanced-guide.md for the full formula and interpretation guide.

Q: What does "Winner: tie" mean? A: No statistically significant difference was detected. The observed latency gap could be due to random variation. Run with -d 60 or -c 50 to collect more samples for a more reliable result.

Q: Can I benchmark HTTPS endpoints? A: Yes. reqbench automatically detects the protocol from the URL and uses the appropriate Node.js http or https module.

Q: Does the warm-up phase affect results? A: No. The warm-up phase fires requests but discards all latency samples. Only measurements after the warm-up window are included in statistics.

Q: Why does RPS drop with very high concurrency? A: At high concurrency the server becomes the bottleneck, not the client. The measured RPS accurately reflects the server's throughput limit -- this is expected behavior, not a bug in reqbench.

Q: Can I run reqbench against localhost? A: Yes. reqbench run http://localhost:3000/health works normally. Use a warm-up period to allow your local server to reach steady state before measuring.

Troubleshooting

| Problem | Likely Cause | Solution | |---------|--------------|----------| | Request timeout | Endpoint too slow for default timeout | Increase with -t 10000 (10s) | | ECONNREFUSED | Server not running or wrong port | Verify the URL and that the server is accepting connections | | ENOTFOUND | DNS resolution failed | Check the hostname; verify network connectivity | | Error Rate: 100% | All requests failing | Check URL, method, headers, and server logs | | 0 requests in short tests | Duration too short for slow endpoints | Increase -d (e.g., -d 30) | | Low RPS with high concurrency | Server is the bottleneck | This is accurate -- check server resources | | DEPTH_ZERO_SELF_SIGNED_CERT | Self-signed SSL certificate | Set NODE_TLS_REJECT_UNAUTHORIZED=0 (dev only) | | Profile load error | Corrupt JSON in profile file | Delete with reqbench profile delete <name> and re-save |

Documentation

Advanced Guide -- Welch's t-test formula, interpreting p-values, sample size recommendations, percentile meanings
Integration Patterns -- GitHub Actions, Danger.js, custom CI scripts, JSON pipeline examples

License

MIT