llm-speed-bench

v1.5.2

Published

a month ago

A CLI tool to benchmark the performance of OpenAI-compatible LLM providers.

0High
0Medium
0Low

mcowger

llm benchmark performance openai speed

LLM Speed Bench

llm-speed-bench is a command-line interface (CLI) tool for benchmarking the performance of Large Language Model (LLM) providers that offer an OpenAI-compatible API.

It is designed to provide detailed, actionable data on the output speed and latency characteristics of different models and providers. It measures key performance indicators from the moment a request is sent until the final token of the response is received, with a focus on streaming APIs.

Features

OpenAI-Compatible: Works with any API that adheres to the OpenAI specification for streaming chat completions.
Streaming First: Benchmarks performance by leveraging the provider's streaming API to get detailed timing data.
Detailed Performance Metrics: Collects and calculates a comprehensive set of metrics, including token counts, time to first token, inter-token latency, and overall throughput.
ASCII Graphs: Visualize TPS and inter-token latency over time with --graph option.
Flexible Configuration: Manage inputs via both command-line arguments and environment variables.
Multiple Output Formats: Presents results in a clean, human-readable format, with an option for machine-readable JSON.

Installation

Bun (Recommended)

bun install

Running

bun run src/index.ts [options]

Usage

Configuration can be provided through command-line arguments or environment variables.

Configuration

Examples

Using Command-Line Arguments

bun run src/index.ts \
  --api-base-url "https://api.openai.com/v1" \
  --api-key "sk-..." \
  --model "gpt-4o" \
  --prompt "Tell me a short story about a robot who discovers music."

Using Environment Variables

export LLM_API_URL="https://api.openai.com/v1"
export LLM_API_KEY="sk-..."
export LLM_MODEL_NAME="gpt-4o"
export LLM_PROMPT="Tell me a short story about a robot who discovers music."

bun run src/index.ts

Getting JSON Output

bun run src/index.ts --json > results.json

Showing ASCII Graphs

bun run src/index.ts --graph

This displays two ASCII graphs:

TPS Over Time: Tokens per second throughout the response
Inter-Token Latency Over Time: Latency between each token

The graphs automatically adjust to your terminal width.

Output Format

Standard Output

The default output is a human-readable summary:

LLM Benchmark Results
=======================

Configuration
-----------------------
Provider API Base:   https://api.groq.com/openai
Model:               llama3-70b-8192

Metrics
-----------------------
Time to First Token:   152 ms
Total Wall Clock Time: 2,130 ms
Overall Output Rate:   234.7 tokens/sec

Token Counts
-----------------------
Prompt Tokens:         35 (estimated)
Output Tokens:         450

Inter-Token Latency (ms)
-----------------------
Min:                 2 ms
Mean:                4.1 ms
Median:              4 ms
Max:                 15 ms
p90:                 6 ms
p95:                 8 ms
p99:                 12 ms

JSON Output (`--json`)

The JSON output includes all the calculated metrics and configuration details.

{
  "configuration": {
    "apiBaseUrl": "https://api.groq.com/openai",
    "model": "llama3-70b-8192"
  },
  "metrics": {
    "timeToFirstTokenMs": 152,
    "totalWallClockTimeMs": 2130,
    "overallOutputRateTps": 234.7
  },
  "tokenCounts": {
    "promptTokens": 35,
    "outputTokens": 450
  },
  "interTokenLatencyMs": {
    "min": 2,
    "mean": 4.1,
    "median": 4,
    "max": 15,
    "p90": 6,
    "p95": 8,
    "p99": 12
  }
}

Development

Running with ts-node

To run the tool in development mode without building, you can use ts-node:

npx ts-node src/index.ts --api-base-url ...

Local Installation and Testing

To test the CLI locally as if it were globally installed, you can use npm link. This is the best way to test the final command-line experience before publishing.

Build the project: Make sure your latest changes are compiled.
```
npm run build
```
Link the package: This creates a global symbolic link to your local project.
```
npm link
```
Run the command globally: You can now run the command from any directory.
```
llm-speed-bench --api-base-url "..." --api-key "..."
```
Rebuild after changes: Whenever you change the source code, just re-run the build command. The symbolic link will ensure your global command always uses the latest compiled code.
```
npm run build
```
Unlink the package: When you're done with local testing, you can remove the global link.
```
npm unlink llm-speed-bench
```