@jsleekr/reqbench
v1.0.0
Published
API benchmarking with A/B comparison and statistical significance testing
Maintainers
Readme
⚡ reqbench
API benchmarking with statistical comparison
Measure latency percentiles, compare endpoints with Welch's t-test, and automate performance gates in CI
Why This Exists | Quick Start | Commands | Example Output | Scenarios | CI Integration
Why This Exists
Most load testing tools give you numbers but not answers. You get a p99 and a mean -- but is version B actually faster than version A, or did noise just fall your way this run?
reqbench goes further -- when you compare two endpoints, it runs Welch's t-test and tells you whether the difference is statistically significant or just noise. A warm-up phase discards early results so connection pool effects and DNS caches don't skew your measurements. Scenarios let you chain multi-step auth flows with variable extraction. And CI-friendly exit codes mean you can block a deploy when latency regresses past a threshold.
- Welch's t-test A/B comparison -- tells you whether the difference is real or noise, not just bigger or smaller
- Warm-up phase -- discards early requests to eliminate connection pool and JIT effects before measurement
- Multi-step YAML scenarios -- chain requests with variable extraction for auth flows and stateful endpoints
- Zero heavy dependencies -- only
commanderandjs-yaml; no Rust binaries, no native modules
Requirements
- Node.js >= 18.0.0
Quick Start
# Install globally
npm install -g reqbench
# Benchmark a single endpoint
reqbench run https://api.example.com/health
# A/B compare two endpoints
reqbench compare https://api-v1.example.com/data https://api-v2.example.com/data
# With options
reqbench run https://api.example.com/users \
-c 50 \
-d 30 \
-m POST \
-H "Authorization: Bearer TOKEN" \
-f jsonCommands
reqbench run <url>
Benchmark a single endpoint and display latency percentiles, RPS, error rate, and a histogram.
| Option | Description | Default |
|--------|-------------|---------|
| -c, --concurrency <n> | Concurrent connections | 10 |
| -d, --duration <seconds> | Test duration in seconds | 10 |
| -m, --method <method> | HTTP method (GET/POST/PUT/PATCH/DELETE/HEAD/OPTIONS) | GET |
| -H, --header <header> | HTTP header in Key: Value format (repeatable) | -- |
| -b, --body <body> | Request body string | -- |
| -w, --warmup <seconds> | Warm-up duration (results discarded) | 2 |
| -t, --timeout <ms> | Per-request timeout in milliseconds | 5000 |
| -f, --format <format> | Output format: terminal, json, markdown | terminal |
| -p, --profile <name> | Load options from a saved profile | -- |
reqbench compare <url1> <url2>
Benchmark both endpoints under identical conditions and compare the results with Welch's t-test. Reports p-value, statistical significance, and winner.
reqbench compare https://api-v1.example.com/endpoint https://api-v2.example.com/endpoint
# With higher concurrency and longer duration for more reliable results
reqbench compare https://v1.example.com/api https://v2.example.com/api -c 20 -d 60
# JSON output for CI pipelines
reqbench compare https://v1.example.com https://v2.example.com -f jsonreqbench scenario <file>
Run a multi-step scenario from a YAML file. Supports variable extraction between steps for auth flows, token-based workflows, and any multi-request sequence.
reqbench scenario auth-flow.yaml
reqbench scenario api-workflow.yaml -c 10 -d 30reqbench profile save|list|delete
Manage named connection profiles to avoid repeating common options.
# Save a profile
reqbench profile save myapi -u https://api.example.com -m POST -H "Authorization: Bearer TOKEN"
# List saved profiles
reqbench profile list
# Use a profile in a benchmark
reqbench run https://api.example.com/endpoint -p myapi
# Delete a profile
reqbench profile delete myapiProfiles are stored as JSON files in ~/.reqbench/profiles/. Profile names are validated to prevent path traversal.
Example Output
Terminal (default)
URL: https://api.example.com/health
Duration: 10.02s
Requests: 4523
RPS: 451.40
Error Rate: 0.00%
Latency (ms):
p50: 18.32
p95: 45.67
p99: 98.41
mean: 22.14
stdev: 15.82
min: 3.21
max: 152.88
Histogram:
0.0-30.0ms ########################## 3200 (70.8%)
30.0-60.0ms ######## 900 (19.9%)
60.0-90.0ms ### 320 ( 7.1%)
90.0-150.0ms # 103 ( 2.3%)A/B Comparison
A/B Comparison
─────────────────────────────────────────────────────
Metric A B Diff
─────────────────────────────────────────────────────
p50 (ms) 18.32 42.15 -23.83
p95 (ms) 45.67 89.24 -43.57
p99 (ms) 98.41 156.30 -57.89
RPS 451.40 220.38 +231.02
Mean (ms) 22.14 48.33 -26.19
Stdev (ms) 15.82 32.70 -16.88
Error Rate 0.00 0.00 0.00
─────────────────────────────────────────────────────
p-value: 0.000012
Significant: Yes
Winner: A
A is faster than B by 56.5% (p=0.0000).JSON output
reqbench run https://api.example.com/health -f json{
"url": "https://api.example.com/health",
"duration": 10.02,
"requests": 4523,
"rps": 451.40,
"errorRate": 0.00,
"latency": {
"p50": 18.32,
"p95": 45.67,
"p99": 98.41,
"mean": 22.14,
"stdev": 15.82,
"min": 3.21,
"max": 152.88
}
}Scenario Files
Create a YAML file to describe multi-step workflows. Variable extraction lets you pass values (such as auth tokens) between steps.
name: Auth Flow
concurrency: 5
duration: 30
steps:
- name: Login
url: https://api.example.com/auth/login
method: POST
body: '{"username":"test","password":"pass"}'
extract:
token: token
- name: Get Profile
url: https://api.example.com/users/me
method: GET
headers:
Authorization: "Bearer {{token}}"
- name: Update Profile
url: https://api.example.com/users/me
method: PUT
headers:
Authorization: "Bearer {{token}}"
body: '{"displayName":"Test User"}'The extract map reads a field from the JSON response body and stores it as a variable. Use {{variableName}} in subsequent steps.
CI Integration
Block deploys on latency regression
name: Performance Gate
on:
pull_request:
branches: [main]
jobs:
bench:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Start staging server
run: npm start &
- name: Compare PR vs main latency
run: |
npx reqbench compare \
https://main.example.com/api \
https://staging.example.com/api \
-c 20 -d 30 -f json > result.json
- name: Check regression
run: |
P50_DIFF=$(cat result.json | jq '.comparison.p50Diff')
if (( $(echo "$P50_DIFF > 20" | bc -l) )); then
echo "p50 latency regressed by ${P50_DIFF}ms -- blocking merge"
exit 1
fiPost benchmark results as PR comment
- name: Run benchmark
id: bench
run: |
OUTPUT=$(npx reqbench run https://api.example.com/health -f markdown)
echo "report<<EOF" >> $GITHUB_OUTPUT
echo "$OUTPUT" >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
- name: Comment on PR
uses: actions/github-script@v7
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## Benchmark Results\n\n${{ steps.bench.outputs.report }}`
})Exit Codes
| Code | Meaning |
|------|---------|
| 0 | Success -- benchmark completed normally |
| 1 | Comparison result -- B is faster (useful for A/B regression checks) |
| 2 | Validation error -- invalid URL, method, headers |
| 3 | Runtime error -- connection refused, timeout, etc. |
How It Works
Request Phase Warmup Filter Measurement Statistics Output
───────────── ───────────── ───────────── ───────────── ──────────
concurrent → discard first → collect → p50/p95/p99 → terminal
workers W seconds latencies mean/stdev json
RPS markdown
error rate
Welch's t-test
(compare mode)- Warm-up -- Fires requests for the configured warm-up period and discards results. Eliminates connection pool startup, DNS resolution, and server-side JIT effects.
- Measurement -- Concurrent workers fire requests for the configured duration. Each latency sample is recorded with microsecond precision.
- Statistics -- Percentiles are computed from the collected sample array. Standard deviation uses Welch's online algorithm. For compare mode, Welch's t-test evaluates significance.
- Histogram -- Built with a loop-based accumulator safe for 500K+ samples (no
Math.min(...array)stack overflow). - Output -- Results are formatted and written to stdout. Exit codes reflect outcome for CI consumption.
Architecture
src/
types.ts # Core types (BenchResult, CompareResult, ScenarioStep, etc.)
errors.ts # ReqBenchError, ValidationError with descriptive hints
bench.ts # Single endpoint benchmark engine
compare.ts # A/B comparison with Welch's t-test
scenario.ts # Multi-step YAML scenario runner
reporter.ts # Output formatters (terminal, json, markdown)
profile.ts # Profile save/load/delete with path traversal protection
validation.ts # URL, method, header, profile name validators
stats.ts # Statistics utilities (percentiles, stdev, t-test)
cli.ts # CLI entry point
index.ts # Public re-exportsSecurity
- URL validation -- rejects non-HTTP protocols (
ftp://,file://,javascript:), embedded credentials, and URLs over 2048 characters - Header injection prevention -- rejects headers containing CRLF (
\r\n) - Method whitelisting -- only
GET,POST,PUT,PATCH,DELETE,HEAD,OPTIONSare accepted - Profile path traversal -- profile names containing
../or special characters are rejected before any filesystem operation
FAQ
Q: How does the statistical comparison work? A: reqbench uses Welch's t-test (two-tailed) to compare the latency distributions of both endpoints. A p-value below 0.05 indicates a statistically significant difference. Welch's variant (rather than Student's) accounts for unequal sample sizes and variances. See docs/advanced-guide.md for the full formula and interpretation guide.
Q: What does "Winner: tie" mean?
A: No statistically significant difference was detected. The observed latency gap could be due to random variation. Run with -d 60 or -c 50 to collect more samples for a more reliable result.
Q: Can I benchmark HTTPS endpoints?
A: Yes. reqbench automatically detects the protocol from the URL and uses the appropriate Node.js http or https module.
Q: Does the warm-up phase affect results? A: No. The warm-up phase fires requests but discards all latency samples. Only measurements after the warm-up window are included in statistics.
Q: Why does RPS drop with very high concurrency? A: At high concurrency the server becomes the bottleneck, not the client. The measured RPS accurately reflects the server's throughput limit -- this is expected behavior, not a bug in reqbench.
Q: Can I run reqbench against localhost?
A: Yes. reqbench run http://localhost:3000/health works normally. Use a warm-up period to allow your local server to reach steady state before measuring.
Troubleshooting
| Problem | Likely Cause | Solution |
|---------|--------------|----------|
| Request timeout | Endpoint too slow for default timeout | Increase with -t 10000 (10s) |
| ECONNREFUSED | Server not running or wrong port | Verify the URL and that the server is accepting connections |
| ENOTFOUND | DNS resolution failed | Check the hostname; verify network connectivity |
| Error Rate: 100% | All requests failing | Check URL, method, headers, and server logs |
| 0 requests in short tests | Duration too short for slow endpoints | Increase -d (e.g., -d 30) |
| Low RPS with high concurrency | Server is the bottleneck | This is accurate -- check server resources |
| DEPTH_ZERO_SELF_SIGNED_CERT | Self-signed SSL certificate | Set NODE_TLS_REJECT_UNAUTHORIZED=0 (dev only) |
| Profile load error | Corrupt JSON in profile file | Delete with reqbench profile delete <name> and re-save |
Documentation
- Advanced Guide -- Welch's t-test formula, interpreting p-values, sample size recommendations, percentile meanings
- Integration Patterns -- GitHub Actions, Danger.js, custom CI scripts, JSON pipeline examples
License
MIT
