prompt-versioning-cli
v1.0.0
Published
A CLI tool for managing, versioning, and evaluating LLM prompts
Downloads
10
Maintainers
Readme
Promptforge CLI
A powerful CLI tool for managing, versioning, and evaluating LLM prompts. Promptforge helps you organize prompts, track changes across versions, and run systematic evaluations against test fixtures.
Features
- 📝 Prompt Management: Organize prompts in a structured workspace with versioning
- 🔄 Version Control: Track changes between prompt versions with visual diffs
- ✅ Evaluation Engine: Test prompts against fixtures with multiple LLM providers
- 📊 Reporting: Generate detailed reports with pass rates, failures, and metrics
- 🔌 Provider Support: Works with OpenAI, Anthropic, and mock providers
- 🚀 CI/CD Ready: Exit codes and structured output for continuous integration
Installation
Global Installation
npm install -g prompt-cliLocal Installation
npm install prompt-cli
npx prompt-cli --helpQuick Start
- Initialize a workspace:
prompt-cli initThis creates a promptforge.yaml config file and sets up the directory structure:
prompts/- Store your prompt templatesevals/- Store evaluation fixturesruns/- Store evaluation results
- Create a prompt:
Create a prompt directory structure:
prompts/
greeting/
v1/
greeting.prompt.md
greeting.meta.jsongreeting.prompt.md:
You are a helpful assistant.
User: {{userName}}
Please greet them in a friendly way.greeting.meta.json:
{
"name": "greeting",
"version": "v1",
"description": "A friendly greeting prompt",
"variables": ["userName"],
"tags": ["greeting", "example"]
}- Create an evaluation fixture:
Create evals/greeting.jsonl:
{"prompt":"greeting@v1","variables":{"userName":"Alice"},"expectations":[{"type":"contains","value":"Alice"}]}- Run evaluations:
prompt-cli evalCommands
init - Initialize Workspace
Initialize a new Promptforge workspace in the current directory.
prompt-cli init
prompt-cli init --force # Overwrite existing workspace
prompt-cli init --name my-project # Set custom workspace nameWhat it creates:
promptforge.yaml- Configuration fileprompts/- Directory for prompt templatesevals/- Directory for evaluation fixturesruns/- Directory for evaluation results
diff - Compare Prompt Versions
Compare two versions of a prompt to see what changed.
# Compare two specific versions
prompt-cli diff greeting@v1 greeting@v2
# Compare v1 to latest version
prompt-cli diff greeting@v1 greeting
# Compare latest to v2
prompt-cli diff greeting greeting@v2
# Show only template changes
prompt-cli diff greeting@v1 greeting@v2 --template
# Show only metadata changes
prompt-cli diff greeting@v1 greeting@v2 --metadataOutput includes:
- Template changes (additions, deletions, modifications)
- Metadata changes (variables, tags, descriptions)
- Color-coded diff visualization
- Summary statistics
eval - Run Evaluations
Run evaluations on prompt fixtures and generate reports.
# Evaluate all JSONL files in evals/
prompt-cli eval
# Evaluate a specific file
prompt-cli eval greeting.jsonl
# Show latest evaluation results
prompt-cli eval --latest
# Show detailed report
prompt-cli eval --latest --detailed
# Show compact summary
prompt-cli eval --latest --compact
# Aggregate results from all runs
prompt-cli eval --aggregate
# Aggregate results for a specific prompt
prompt-cli eval --aggregate-prompt greeting
# Aggregate last 5 runs
prompt-cli eval --aggregate-recent 5
# Use a specific provider
prompt-cli eval --provider mock
prompt-cli eval --provider openai
# Don't save results
prompt-cli eval --no-saveEvaluation Output:
- Pass/fail status for each fixture
- Failed expectations with details
- Pass rate and statistics
- Token usage and latency metrics
- Exit code 0 for success, 1 for failures (CI-friendly)
Configuration
promptforge.yaml
The workspace configuration file:
version: '1.0'
paths:
prompts: prompts # Directory for prompts
evals: evals # Directory for evaluation fixtures
runs: runs # Directory for evaluation results
provider:
type: mock # Default provider: mock, openai, anthropic
# Provider-specific config (e.g., model, temperature)
evaluation:
defaultProvider: mockProvider Configuration
Mock Provider (Default)
provider:
type: mockOpenAI Provider
provider:
type: openai
model: gpt-4
temperature: 0.7Set your API key:
export OPENAI_API_KEY=your_api_key_hereAnthropic Provider
provider:
type: anthropic
model: claude-3-opus-20240229Set your API key:
export ANTHROPIC_API_KEY=your_api_key_herePrompt Structure
Prompts are organized in a versioned directory structure:
prompts/
{prompt-name}/
v{version-number}/
{prompt-name}.prompt.md # Markdown template
{prompt-name}.meta.json # MetadataTemplate File (.prompt.md)
Markdown file with variable interpolation using {{variableName}}:
You are a helpful assistant.
User: {{userName}}
Context: {{context}}
Please respond in a {{tone}} tone.Metadata File (.meta.json)
JSON file with prompt metadata:
{
"name": "greeting",
"version": "v1",
"description": "A friendly greeting prompt",
"variables": ["userName", "tone"],
"tags": ["greeting", "social"],
"createdAt": "2024-01-15T10:30:00Z"
}For detailed information, see Prompt Structure Documentation.
Evaluation Fixtures
Evaluation fixtures are stored as JSONL (JSON Lines) files in the evals/ directory.
Fixture Format
Each line in the JSONL file is a JSON object:
{
"prompt": "greeting@v1",
"variables": {
"userName": "Alice"
},
"expectations": [
{"type": "contains", "value": "Alice"},
{"type": "contains", "value": "Hello"},
{"type": "maxWords", "value": 20},
{"type": "regex", "value": "^Hello", "flags": "i"}
],
"metadata": {
"description": "Test greeting for Alice"
}
}Expectation Types
contains: Output must contain the specified stringregex: Output must match the regex patternmaxWords: Output must have ≤ N words
Example Fixture File
evals/greeting.jsonl:
{"prompt":"greeting@v1","variables":{"userName":"Alice"},"expectations":[{"type":"contains","value":"Alice"}]}
{"prompt":"greeting@v1","variables":{"userName":"Bob"},"expectations":[{"type":"contains","value":"Bob"},{"type":"maxWords","value":15}]}
{"prompt":"greeting@v2","variables":{"userName":"Charlie"},"expectations":[{"type":"regex","value":"^Hello", "flags":"i"}]}CI/CD Integration
Promptforge is designed for CI/CD pipelines with proper exit codes. The project includes a GitHub Actions workflow (.github/workflows/ci.yml) that runs:
- Linting and code formatting checks
- TypeScript compilation
- Unit tests across multiple Node.js versions
- Integration tests for CLI commands
- Build verification
Using in Your CI/CD Pipeline
GitHub Actions example:
name: Evaluate Prompts
on: [push, pull_request]
jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm install -g prompt-cli
- run: prompt-cli eval
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}Exit Codes:
0- All evaluations passed1- One or more evaluations failed
The CLI returns appropriate exit codes for CI/CD integration, making it easy to fail builds when evaluations don't pass.
Examples
Example 1: Basic Prompt with Evaluation
- Create a prompt:
mkdir -p prompts/customer-support/v1prompts/customer-support/v1/customer-support.prompt.md:
You are a customer support agent.
Customer: {{customerName}}
Question: {{question}}
Please provide a helpful response.prompts/customer-support/v1/customer-support.meta.json:
{
"name": "customer-support",
"version": "v1",
"description": "Customer support prompt",
"variables": ["customerName", "question"],
"tags": ["support"]
}- Create evaluation fixture:
evals/customer-support.jsonl:
{"prompt":"customer-support@v1","variables":{"customerName":"Alice","question":"How do I reset my password?"},"expectations":[{"type":"contains","value":"password"},{"type":"maxWords","value":100}]}- Run evaluation:
prompt-cli eval customer-support.jsonlExample 2: Comparing Versions
After creating v2 of your prompt:
prompt-cli diff customer-support@v1 customer-support@v2This shows:
- Template changes (what text was added/removed/modified)
- Metadata changes (new variables, updated tags, etc.)
- Summary of changes
Example 3: Aggregated Reporting
View aggregated statistics across multiple evaluation runs:
# Aggregate all runs
prompt-cli eval --aggregate
# Aggregate last 10 runs
prompt-cli eval --aggregate-recent 10
# Aggregate for specific prompt
prompt-cli eval --aggregate-prompt customer-supportProject Structure
your-project/
├── promptforge.yaml # Workspace configuration
├── prompts/ # Prompt templates
│ └── greeting/
│ ├── v1/
│ │ ├── greeting.prompt.md
│ │ └── greeting.meta.json
│ └── v2/
│ ├── greeting.prompt.md
│ └── greeting.meta.json
├── evals/ # Evaluation fixtures
│ └── greeting.jsonl
└── runs/ # Evaluation results
└── eval-2024-01-15T10-30-00.jsonRequirements
- Node.js >= 20
- npm or yarn
License
ISC
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
