llmbench

v1.0.2

Published

a year ago

benchmark the llm api user want to test...

0High
0Medium
0Low

LLM API Benchmark

A command-line tool for benchmarking various Large Language Model (LLM) APIs. This tool allows you to test different LLM models with standardized prompts to evaluate their performance and capabilities.

Features

Support for different LLM models and APIs
Multiple testing modes:
- Chat completion testing
- Reasoning capability testing
Configurable API endpoints and parameters
Streaming response support
Easy-to-use command-line interface

Usage

Install and run directly using npx:

npx llmbench run [options]

Options

-m, --model <model>: Specify the LLM model name to test
-u, --url <url>: Specify the LLM API URL to test
-k, --api-key <api-key>: Provide the LLM API key
-t, --type <type>: Choose the type of test (default: "chat")
- Available types: "chat", "reason"

Examples

Run a basic chat completion test:

npx llmbench run -m gpt-4 -u https://api.openai.com/v1 -k your-api-key

Run a reasoning capability test:

npx llmbench run -m gpt-4 -u https://api.openai.com/v1 -k your-api-key -t reason

Test Types

Chat Completion Test

The chat completion test uses a simple prompt asking for a Shakespeare-style poem about a cat. This test evaluates the basic generation capabilities of the model.

Reasoning Test

The reasoning test uses a more complex system prompt that evaluates the model's ability to follow guidelines and provide structured responses while maintaining specific operational parameters.

Development

Local Installation

Clone the repository:

git clone <repository-url>
cd llm-api-benchmark

Install dependencies:

npm install

Build the project:

npm run build

This project is built with TypeScript and uses the following key dependencies:

Commander.js for CLI interface
OpenAI SDK for API interactions
Chalk for colored console output

To contribute to the project:

Fork the repository
Create your feature branch
Make your changes
Submit a pull request

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.