llm-speed-bench
v1.5.2
Published
A CLI tool to benchmark the performance of OpenAI-compatible LLM providers.
Maintainers
Readme
LLM Speed Bench
llm-speed-bench is a command-line interface (CLI) tool for benchmarking the performance of Large Language Model (LLM) providers that offer an OpenAI-compatible API.
It is designed to provide detailed, actionable data on the output speed and latency characteristics of different models and providers. It measures key performance indicators from the moment a request is sent until the final token of the response is received, with a focus on streaming APIs.
Features
- OpenAI-Compatible: Works with any API that adheres to the OpenAI specification for streaming chat completions.
- Streaming First: Benchmarks performance by leveraging the provider's streaming API to get detailed timing data.
- Detailed Performance Metrics: Collects and calculates a comprehensive set of metrics, including token counts, time to first token, inter-token latency, and overall throughput.
- ASCII Graphs: Visualize TPS and inter-token latency over time with
--graphoption. - Flexible Configuration: Manage inputs via both command-line arguments and environment variables.
- Multiple Output Formats: Presents results in a clean, human-readable format, with an option for machine-readable JSON.
Installation
Bun (Recommended)
bun installRunning
bun run src/index.ts [options]Usage
Configuration can be provided through command-line arguments or environment variables.
Configuration
| Parameter | CLI Argument | Environment Variable | Required | Description |
| :--- | :--- | :--- | :--- | :--- |
| API Base URL | --api-base-url <url> | LLM_API_URL | Yes | The base URL for the OpenAI-compatible API. |
| API Key | --api-key <key> | LLM_API_KEY | Yes | The authentication key for the API. |
| Model Name | --model <name> | LLM_MODEL_NAME | Yes | The specific model to be benchmarked (e.g., gpt-4o). |
| Prompt | --prompt <text> | LLM_PROMPT | Yes | The input text to send to the model. |
Examples
Using Command-Line Arguments
bun run src/index.ts \
--api-base-url "https://api.openai.com/v1" \
--api-key "sk-..." \
--model "gpt-4o" \
--prompt "Tell me a short story about a robot who discovers music."Using Environment Variables
export LLM_API_URL="https://api.openai.com/v1"
export LLM_API_KEY="sk-..."
export LLM_MODEL_NAME="gpt-4o"
export LLM_PROMPT="Tell me a short story about a robot who discovers music."
bun run src/index.tsGetting JSON Output
bun run src/index.ts --json > results.jsonShowing ASCII Graphs
bun run src/index.ts --graphThis displays two ASCII graphs:
- TPS Over Time: Tokens per second throughout the response
- Inter-Token Latency Over Time: Latency between each token
The graphs automatically adjust to your terminal width.
Output Format
Standard Output
The default output is a human-readable summary:
LLM Benchmark Results
=======================
Configuration
-----------------------
Provider API Base: https://api.groq.com/openai
Model: llama3-70b-8192
Metrics
-----------------------
Time to First Token: 152 ms
Total Wall Clock Time: 2,130 ms
Overall Output Rate: 234.7 tokens/sec
Token Counts
-----------------------
Prompt Tokens: 35 (estimated)
Output Tokens: 450
Inter-Token Latency (ms)
-----------------------
Min: 2 ms
Mean: 4.1 ms
Median: 4 ms
Max: 15 ms
p90: 6 ms
p95: 8 ms
p99: 12 msJSON Output (--json)
The JSON output includes all the calculated metrics and configuration details.
{
"configuration": {
"apiBaseUrl": "https://api.groq.com/openai",
"model": "llama3-70b-8192"
},
"metrics": {
"timeToFirstTokenMs": 152,
"totalWallClockTimeMs": 2130,
"overallOutputRateTps": 234.7
},
"tokenCounts": {
"promptTokens": 35,
"outputTokens": 450
},
"interTokenLatencyMs": {
"min": 2,
"mean": 4.1,
"median": 4,
"max": 15,
"p90": 6,
"p95": 8,
"p99": 12
}
}Development
Running with ts-node
To run the tool in development mode without building, you can use ts-node:
npx ts-node src/index.ts --api-base-url ...Local Installation and Testing
To test the CLI locally as if it were globally installed, you can use npm link. This is the best way to test the final command-line experience before publishing.
Build the project: Make sure your latest changes are compiled.
npm run buildLink the package: This creates a global symbolic link to your local project.
npm linkRun the command globally: You can now run the command from any directory.
llm-speed-bench --api-base-url "..." --api-key "..."Rebuild after changes: Whenever you change the source code, just re-run the build command. The symbolic link will ensure your global command always uses the latest compiled code.
npm run buildUnlink the package: When you're done with local testing, you can remove the global link.
npm unlink llm-speed-bench
