evalops-cli
v1.0.0
Published
CLI tool for evaluating code against Large Language Models using the EvalOps platform
Maintainers
Readme
EvalOps CLI
The EvalOps CLI is a powerful tool for evaluating code against Large Language Models (LLMs) using the EvalOps platform. It allows you to define, validate, and run evaluations directly from your command line.
Features
- Initialize Projects: Quickly set up a new EvalOps project with
evalops init - Validate Configurations: Ensure your
evalops.yamlfile is correctly formatted and your test cases are discoverable withevalops validate - Upload Test Suites: Upload your evaluation configurations to the EvalOps platform with
evalops upload - Local Evaluations (Coming Soon): Run evaluations locally against different providers with
evalops run - Automatic Test Discovery: Automatically discover test cases in your codebase using Tree-sitter parsing
- TypeScript & JavaScript Support: Full support for both TypeScript and JavaScript test files
- Multiple Test Patterns: Support for decorators, function calls, and various file patterns
Installation
Install globally via npm:
npm install -g evalops-cliOr install locally in your project:
npm install --save-dev evalops-cliGetting Started
Initialize a new project:
evalops initThis will create a
evalops.yamlfile in your current directory. You can use the interactive prompt to configure your project or start with a template:evalops init --template basicDefine your evaluation in
evalops.yaml:The
evalops.yamlfile is the heart of your evaluation. Here you can define:- A description and version for your evaluation.
- The prompts to be used.
- The LLM providers to test against.
- Default and specific test cases with assertions.
Add test cases to your code:
The CLI can automatically discover test cases in your code. You can define test cases in special
.eval.tsor.eval.jsfiles using decorators or function calls.Using Decorator (TypeScript):
// mycode.eval.ts @evalops_test({ prompt: 'Analyze this function: {{code}}', asserts: [ { type: 'contains', value: 'function', weight: 0.5 }, { type: 'llm-judge', value: 'Is the analysis accurate?', weight: 0.8 } ], tags: ['analysis', 'functions'] }) function testMyFunction() { /** * This function calculates the factorial of a number */ function factorial(n: number): number { if (n <= 1) return 1; return n * factorial(n - 1); } return factorial; }Using Function Call (JavaScript):
// mycode.eval.js evalops_test({ prompt: 'Review this code for potential issues: {{code}}', asserts: [ { type: 'contains', value: 'error handling', weight: 0.6 }, { type: 'llm-judge', value: 'Does the review identify key issues?', weight: 0.9 } ], description: 'Test async function review' }, function() { async function fetchData(url) { const response = await fetch(url); return response.json(); } return fetchData; });File Patterns: The CLI automatically discovers files matching these patterns:
**/*.eval.{js,ts}- Dedicated evaluation files**/*.test.{js,ts}- Test files with evaluation decorators
Validate your configuration:
Before uploading, it's a good practice to validate your configuration and discover your test cases:
evalops validateUpload your test suite:
Once you're ready, upload your test suite to the EvalOps platform:
evalops uploadYou will need to provide your EvalOps API key. You can do this by setting the
EVALOPS_API_KEYenvironment variable or by using the--api-keyflag.
CLI Commands
init
Initialize a new EvalOps project.
Options:
-f, --force: Overwrite existingevalops.yamlfile.--template <template>: Use a specific template (basic,advanced).
validate
Validate the evalops.yaml file and discovered test cases.
Options:
-v, --verbose: Show detailed validation output.-f, --file <file>: Path toevalops.yamlfile (default:./evalops.yaml).
upload
Upload test suite to the EvalOps platform.
Options:
-f, --file <file>: Path toevalops.yamlfile (default:./evalops.yaml).--api-key <key>: EvalOps API key.--api-url <url>: EvalOps API URL (default:https://api.evalops.dev).--name <name>: Name for the test suite.--dry-run: Preview what would be uploaded without actually uploading.
run
Run evaluation locally (not yet implemented).
Options:
-f, --file <file>: Path toevalops.yamlfile (default:./evalops.yaml).--provider <provider>: Specify provider to use.--output <output>: Output file path.
Configuration
The evalops.yaml file supports the following main sections:
Basic Configuration
description: "My Code Evaluation Project"
version: "1.0"
# Prompts can be strings, objects, or arrays
prompts:
- role: "system"
content: "You are a helpful code reviewer."
- role: "user"
content: "Analyze this code: {{code}}"
# Providers can be simple strings or detailed configurations
providers:
- "openai/gpt-4"
- provider: "anthropic"
model: "claude-2"
temperature: 0.7
# Default assertions applied to all test cases
defaultTest:
assert:
- type: "contains"
value: "analysis"
weight: 0.5
- type: "llm-judge"
value: "Is the analysis helpful?"
weight: 0.8
# Test cases (auto-discovered from code or defined manually)
tests: []
# Execution settings
config:
iterations: 1
parallel: true
timeout: 60
# Output configuration
outputPath: "results.json"
outputFormat: "json"
# Sharing settings
sharing:
public: false
allowForks: trueFile References
You can reference external files using the @ prefix:
prompts: "@prompts/system-prompt.txt"
# Or in nested structures
prompts:
- role: "system"
content: "@prompts/system.txt"
- role: "user"
content: "@prompts/user.txt"Assertion Types
The CLI supports various assertion types:
contains/not-contains: Check if output contains specific textequals/not-equals: Exact match comparisonsllm-judge: Use another LLM to judge the output qualityregex: Regular expression matchingjson-path: Extract and validate JSON path valuessimilarity: Semantic similarity scoring
Environment Variables
EVALOPS_API_KEY: Your EvalOps API keyEVALOPS_API_URL: Custom API URL (defaults tohttps://api.evalops.dev)
Examples
Check the examples/ directory for complete examples:
examples/basic.eval.ts- TypeScript decorator examplesexamples/functional-approach.eval.js- JavaScript function call examples
Development
To build and test the CLI locally:
# Install dependencies
npm install
# Build the project
npm run build
# Run tests
npm test
# Test CLI locally
npm run dev -- init --template basicContributing
Contributions are welcome! Please read the contributing guidelines and submit pull requests to the main repository.
License
MIT License - see LICENSE file for details.
