prompt-version-manager

v0.1.4

Published

5 months ago

Centralized prompt management system for Human Behavior AI agents

0High
0Medium
0Low

humanbehaviorco

ai prompts workflow human-behavior version-control

PVM - Prompt Version Management

Git-like version control for AI prompts with automatic execution tracking

Features • Quick Start • Simplified API • Documentation • Examples

Overview

PVM (Prompt Version Management) brings Git-like version control to AI prompt development with automatic execution tracking. The new simplified API makes it incredibly easy to integrate prompt management into any AI workflow.

Quick Start with Simplified API

Installation

pip install prompt-version-management
# or
npm install prompt-version-management

Basic Usage in 3 Lines

from pvm import prompts, model

# Load and render a prompt template
prompt = prompts("assistant.md", ["expert", "analyze this data"])

# Execute with automatic tracking
response = await model.complete("gpt-4", prompt, "analysis-task")
print(response.content)

That's it! Every execution is automatically tracked with timing, tokens, costs, and model details.

Simplified API Reference

`prompts()` - Load and Render Templates

Load prompt templates with variable substitution:

# Python
prompt = prompts("template_name.md", ["var1", "var2", "var3"])

# TypeScript
const prompt = prompts("template_name.md", ["var1", "var2", "var3"])

How it works:

Templates are stored in .pvm/prompts/ directory
Variables are replaced in order of appearance
Supports YAML frontmatter for metadata

Example template (.pvm/prompts/assistant.md):

---
version: 1.0.0
description: General assistant prompt
---

You are a {{role}} assistant specialized in {{domain}}.

Task: {{task}}

Please be {{tone}} in your response.

Usage:

prompt = prompts("assistant.md", [
    "helpful AI",      # {{role}}
    "data analysis",   # {{domain}}
    "analyze sales",   # {{task}}
    "concise"         # {{tone}}
])

`model.complete()` - Execute with Auto-Tracking

Execute any LLM with automatic tracking:

# Python
response = await model.complete(
    model_name="gpt-4",           # Model to use
    prompt="Analyze this text",    # Prompt or previous response
    tag="analysis",                # Tag for tracking
    json_output=MyModel,           # Optional: structured output
    temperature=0.7                # Optional: model parameters
)

# TypeScript
const response = await model.complete(
    "gpt-4",                       // Model to use
    "Analyze this text",           // Prompt or previous response
    "analysis",                    // Tag for tracking
    { schema: mySchema },          // Optional: structured output
    { temperature: 0.7 }           // Optional: model parameters
)

Automatic features:

Provider detection (OpenAI, Anthropic, Google)
Execution tracking (tokens, latency, cost)
SHA-256 prompt hashing
Git-like auto-commits
Chain detection

Automatic Chaining

Pass a response directly to create execution chains:

# First agent
analysis = await model.complete("gpt-4", prompt1, "analyze")

# Second agent - automatically chains!
summary = await model.complete("claude-3", analysis, "summarize")

# Third agent - chain continues
translation = await model.complete("gemini-pro", summary, "translate")

PVM automatically:

Links executions with a chain ID
Tracks parent-child relationships
Preserves context across models

Native Structured Output

Get typed responses using each provider's native capabilities:

from pydantic import BaseModel

class Analysis(BaseModel):
    sentiment: str
    confidence: float
    keywords: list[str]

# Automatic structured output
result = await model.complete(
    "gpt-4",
    "Analyze: PVM is amazing!",
    "sentiment",
    json_output=Analysis  # Pass Pydantic model
)

print(result.content.sentiment)    # "positive"
print(result.content.confidence)   # 0.95
print(result.content.keywords)     # ["PVM", "amazing"]

Provider-specific implementations:

OpenAI: Uses response_format with Pydantic models
Anthropic: Uses tool/function calling
Google: Uses response_schema with Type constants

ModelResponse Object

Every execution returns a ModelResponse with:

response.content          # The actual response (string or object)
response.model           # Model used
response.execution_id    # Unique execution ID
response.prompt_hash     # SHA-256 of prompt
response.metadata        # Execution details:
  - tag                  # Your tracking tag
  - provider            # Detected provider
  - chain_id            # Chain ID (if chained)
  - tokens              # Token usage breakdown
  - latency_ms          # Execution time
  - structured          # Whether structured output was used

# Methods
str(response)            # Convert to string
response.to_prompt()     # Format for chaining

Full Version Control Features

Initialize Repository

pvm init

Add and Track Prompts

# Add a prompt file
pvm add prompts/assistant.md -m "Add assistant prompt"

# Track executions automatically
python my_agent.py  # Uses simplified API

# View execution history
pvm log

# See execution dashboard
pvm dashboard

Branching and Experimentation

# Create experiment branch
pvm branch experiment/new-approach
pvm checkout experiment/new-approach

# Make changes and test
# ... edit prompts ...

# Merge back when ready
pvm checkout main
pvm merge experiment/new-approach

Analytics and Insights

# View execution analytics
pvm analytics summary

# See token usage
pvm analytics tokens --days 7

# Track costs
pvm analytics cost --group-by model

Repository Structure

.pvm/
├── prompts/              # Prompt templates
│   ├── assistant.md
│   ├── analyzer.md
│   └── summarizer.md
├── config.json          # Repository configuration
├── HEAD                 # Current branch reference
└── objects/             # Content-addressed storage
    ├── commits/         # Commit objects
    ├── prompts/         # Prompt versions
    └── executions/      # Execution records

Advanced Features

Templates with Complex Variables

# Create a template with sections
template = """
You are a {{role}}.

## Instructions
{{instructions}}

## Context
{{context}}

## Output Format
{{format}}
"""

# Use with prompts()
prompt = prompts("complex_template.md", [
    "senior data analyst",
    "Analyze trends and patterns",
    "Q4 sales data from 2023", 
    "Bullet points with insights"
])

Multi-Provider Execution

providers = ["gpt-4", "claude-3-opus", "gemini-ultra"]

# Run same prompt across providers
for model_name in providers:
    response = await model.complete(
        model_name,
        prompt,
        f"comparison-{model_name}"
    )
    print(f"{model_name}: {response.content}")

Custom Tracking Metadata

response = await model.complete(
    "gpt-4",
    prompt,
    "analysis",
    temperature=0.3,
    max_tokens=1000,
    # Custom parameters stored in metadata
    user_id="12345",
    session_id="abc-def",
    experiment="v2"
)

Working with Branches

from pvm import Repository

repo = Repository(".")

# Create and switch branch
await repo.branch("feature/new-prompts")
await repo.checkout("feature/new-prompts")

# Make changes
prompt = prompts("new_assistant.md", ["expert", "task"])
response = await model.complete("gpt-4", prompt, "test")

# Commit and merge
await repo.commit("Test new prompts")
await repo.checkout("main")
await repo.merge("feature/new-prompts")

Environment Setup

Set your API keys:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."

Or use .env file:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...

Execution Analytics

View detailed analytics:

# Summary statistics
pvm analytics summary

# Token usage over time
pvm analytics tokens --days 30 --group-by model

# Cost analysis
pvm analytics cost --group-by tag

# Chain visualization
pvm analytics chains --limit 10

Testing

Run tests for both implementations:

# Python tests
pytest tests/

# TypeScript tests
npm test

Examples

Two-Agent Analysis Pipeline

from pvm import prompts, model
from pydantic import BaseModel

# Define output structures
class SentimentResult(BaseModel):
    sentiment: str
    confidence: float

class Summary(BaseModel):
    title: str
    key_points: list[str]

# Agent 1: Sentiment Analysis
sentiment_prompt = prompts("sentiment.md", ["expert analyst", text])
sentiment = await model.complete(
    "gpt-4", 
    sentiment_prompt, 
    "sentiment-analysis",
    json_output=SentimentResult
)

# Agent 2: Summarization (chained)
summary = await model.complete(
    "claude-3",
    sentiment,  # Chains automatically!
    "summarization",
    json_output=Summary
)

print(f"Sentiment: {sentiment.content.sentiment}")
print(f"Summary: {summary.content.title}")

A/B Testing Prompts

# Version A
prompt_a = prompts("assistant_v1.md", variables)
response_a = await model.complete("gpt-4", prompt_a, "test-v1")

# Version B  
prompt_b = prompts("assistant_v2.md", variables)
response_b = await model.complete("gpt-4", prompt_b, "test-v2")

# Compare results
print(f"Version A tokens: {response_a.metadata['tokens']['total']}")
print(f"Version B tokens: {response_b.metadata['tokens']['total']}")

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

MIT License - see LICENSE for details.