gepa-ts

v1.0.0

Published

10 months ago

TypeScript implementation of GEPA (Gradient-free Evolution of Prompts and Agents) - Complete port with 100% feature parity

Downloads

313

0High
0Medium
0Low

drewstone

optimization llm prompt-engineering evolutionary-algorithm pareto gepa genetic-algorithm dspy prompt-optimization text-evolution multi-objective aime terminal-bench

GEPA-TS

TypeScript implementation of GEPA (Gradient-free Evolution of Prompts and Agents) with complete 1:1 feature parity to the Python version.

GEPA optimizes text components (prompts, instructions, entire programs) using evolutionary algorithms and LLM-based reflection to automatically improve performance on your tasks.

Now with real DSPy execution and native AxLLM evolution!

Installation

npm install gepa-ts

Quick Start

import { optimize } from 'gepa-ts';

// Define your training data
const trainData = [
  { input: "I love this product!", answer: "positive" },
  { input: "This is terrible", answer: "negative" },
  { input: "It's okay I guess", answer: "neutral" }
];

// Run optimization
const result = await optimize({
  seedCandidate: { 
    system_prompt: "Classify the sentiment of this text" 
  },
  trainset: trainData,
  maxMetricCalls: 50,
  reflectionLM: async (prompt) => {
    // Your LLM call for generating improvements
    return await openai.chat.completions.create({
      model: "gpt-4",
      messages: [{ role: "user", content: prompt }]
    }).choices[0].message.content;
  }
});

console.log('Best prompt:', result.bestCandidate.system_prompt);
console.log('Score improvement:', result.bestScore);

Key Features

Complete Python Parity: Identical API, behavior, and results to Python GEPA
Real DSPy Execution: Execute actual DSPy programs via Python subprocess (no simulation!)
Native AxLLM Evolution: Evolve AxLLM programs natively in TypeScript
Per-Instance Pareto Optimization: Tracks best candidates for each validation example
LLM-Based Reflection: Uses language models to generate improved prompts
Merge Operations: Combines successful prompts through crossover
Built-in Adapters: Ready-to-use adapters for common tasks and frameworks
Deterministic Testing: Record/replay system for reproducible results

Core Concepts

Optimization Process

GEPA evolves prompts through these steps:

Evaluate current prompts on your training data
Reflect on failures using an LLM to generate improvements
Merge successful prompts to create new variants
Select the best performers for the next iteration

Adapters

Adapters connect GEPA to your specific task:

import { BaseAdapter } from 'gepa-ts';

class SentimentAdapter extends BaseAdapter {
  async evaluate(batch, candidate, captureTraces = false) {
    const outputs = [];
    const scores = [];
    
    for (const example of batch) {
      const response = await this.callLLM(
        candidate.system_prompt + "\n" + example.input
      );
      const score = response.includes(example.answer) ? 1.0 : 0.0;
      
      outputs.push({ response });
      scores.push(score);
    }
    
    return { outputs, scores, trajectories: null };
  }
  
  async makeReflectiveDataset(candidate, evalBatch, componentsToUpdate) {
    // Generate examples for LLM reflection
    return {
      system_prompt: evalBatch.trajectories?.map(traj => ({
        'Input': traj.input,
        'Output': traj.output, 
        'Score': traj.score
      })) || []
    };
  }
}

Built-in Adapters

DefaultAdapter

Basic single-turn prompt evaluation:

import { DefaultAdapter } from 'gepa-ts';

const adapter = new DefaultAdapter(async (prompt) => {
  return await openai.chat.completions.create({
    model: "gpt-3.5-turbo",
    messages: [{ role: "user", content: prompt }]
  });
});

DSPy Program Evolution

Evolve entire DSPy programs with real Python execution:

import { DSPyFullProgramAdapter } from 'gepa-ts';

const adapter = new DSPyFullProgramAdapter({
  taskLm: { model: 'openai/gpt-4o-mini' },
  metricFn: (example, pred) => pred.answer === example.answer ? 1 : 0,
  reflectionLm: async (prompt) => llm.generate(prompt)
});

const result = await optimizer.optimize({
  initialCandidate: { 
    program: 'import dspy\nprogram = dspy.ChainOfThought("question -> answer")'
  },
  trainData: mathExamples,
  // GEPA evolves the entire program structure!
});

AxLLM Native Evolution

Evolve AxLLM programs natively in TypeScript:

import { ax, ai } from '@ax-llm/ax';
import { createAxLLMNativeAdapter } from 'gepa-ts';

const adapter = createAxLLMNativeAdapter({
  axInstance: ax,
  llm: ai({ name: 'openai' }),
  metricFn: ({ prediction, example }) => 
    prediction.sentiment === example.sentiment ? 1 : 0
});

// GEPA evolves signature, demos, and model config
const result = await optimizer.optimize({
  initialCandidate: { 
    program: JSON.stringify({
      signature: 'review:string -> sentiment:string',
      demos: [],
      modelConfig: { temperature: 0.7 }
    })
  }
});

API Reference

optimize(config)

Main optimization function with identical parameters to Python GEPA:

interface OptimizeConfig {
  // Required
  seedCandidate: ComponentMap;           // Starting prompt(s)
  trainset: DataInst[];                  // Training examples
  maxMetricCalls: number;                // Evaluation budget
  
  // LLM Configuration  
  reflectionLM?: LanguageModel;          // LLM for generating improvements
  taskLM?: LanguageModel;                // LLM for task evaluation (if no adapter)
  
  // Optional
  valset?: DataInst[];                   // Validation set (defaults to trainset)
  adapter?: GEPAAdapter;                 // Custom evaluation adapter
  
  // Optimization Parameters
  reflectionMinibatchSize?: number;      // Examples per reflection (default: 3)
  useMerge?: boolean;                    // Enable crossover (default: false)
  maxMergeInvocations?: number;          // Max merges (default: 5)
  perfectScore?: number;                 // Early stopping score (default: 1.0)
  skipPerfectScore?: boolean;            // Skip if perfect (default: true)
  
  // Reproducibility
  seed?: number;                         // Random seed (default: 0)
  
  // Logging
  displayProgressBar?: boolean;          // Show progress (default: false)
  useWandB?: boolean;                   // Weights & Biases logging
}

GEPAResult

Optimization results with complete Python compatibility:

interface GEPAResult {
  // Core Results
  bestCandidate: ComponentMap;           // Best performing prompts
  bestScore: number;                     // Best validation score
  candidates: ComponentMap[];            // All generated candidates
  
  // Detailed Tracking
  valAggregateScores: number[];         // Per-candidate scores
  valSubscores: number[][];             // Per-instance scores
  perValInstanceBestCandidates: Set<number>[]; // Pareto front per example
  
  // Metadata
  totalEvaluations: number;             // Total LLM calls made
  generationsCompleted: number;         // Optimization iterations
  
  // Utility Methods
  nonDominatedIndices(): number[];      // Get Pareto optimal candidates
  lineage(idx: number): number[];       // Get candidate ancestry
  bestK(k: number): CandidateRanking[]; // Top k candidates
  toDict(): Record<string, any>;        // Serialize results
}

Examples

Real AI Evolution with AxLLM

Run actual evolution with OpenAI:

# Set API key
export OPENAI_API_KEY=sk-...

# Run real AI example
npm run example:axllm

# Interactive demo
npm run demo:axllm

Results from actual runs:

Initial: 60% accuracy (generic prompt)
After GEPA: 93% accuracy (evolved program)
Cost: ~$0.005 per optimization

DSPy Program Evolution (MATH Dataset)

Evolve from simple to complex DSPy programs:

# Initial: Simple ChainOfThought
program = dspy.ChainOfThought("question -> answer")
# Score: 67.1%

# Evolved by GEPA: Complex multi-module program
class MathQAReasoningSignature(dspy.Signature):
    """Solve math problems step by step"""
    question = dspy.InputField()
    reasoning = dspy.OutputField(desc="Step-by-step solution")
    answer = dspy.OutputField(desc="Final answer")

class MathQAExtractSignature(dspy.Signature):
    """Extract final answer from reasoning"""
    question = dspy.InputField()
    reasoning = dspy.InputField()
    answer = dspy.OutputField(desc="Final answer only")

class MathQAModule(dspy.Module):
    def __init__(self):
        super().__init__()
        self.reasoner = dspy.ChainOfThought(MathQAReasoningSignature)
        self.extractor = dspy.Predict(MathQAExtractSignature)
    
    def forward(self, question):
        reasoning_result = self.reasoner(question=question)
        final_answer = self.extractor(
            question=question, 
            reasoning=reasoning_result.reasoning
        )
        return dspy.Prediction(
            reasoning=reasoning_result.reasoning,
            answer=final_answer.answer
        )

program = MathQAModule()
# Score: 93.2% (+26.1 points!)

Testing

Complete test coverage with Python parity validation:

npm test                    # Run all tests
npm test -- --run          # Skip watch mode  
npm run test:parity         # Validate Python compatibility

Performance

Fast: Native TypeScript performance with async/await
Memory efficient: Lazy evaluation and proper garbage collection
Deterministic: Reproducible results with seeded randomness
Feature complete: 100% API compatibility with Python GEPA

Development

git clone https://github.com/yourusername/gepa-ts
cd gepa-ts
npm install
npm run build
npm test

License

MIT

Based on the original Python implementation: GEPA