npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

eva-ts

v1.0.2

Published

A TypeScript evaluation framework for running concurrent evaluations with progress tracking and result persistence

Readme

eva

npm version CI TypeScript License: MIT

A powerful TypeScript evaluation framework for running concurrent evaluations with progress tracking and result persistence. Perfect for testing AI models, APIs, data processing pipelines, and any system that needs systematic evaluation against expected results.

✨ Features

  • 🚀 Concurrent Processing - Configurable concurrency limits for optimal performance
  • 📊 Progress Tracking - Visual progress bars with real-time updates
  • 📁 Result Persistence - JSONL output for detailed analysis
  • 🔍 Flexible Scoring - Support for string matching, JSON comparison, numerical analysis
  • 🎯 Type Safety - Full TypeScript support with generic types
  • Async Support - Handle async data providers, tasks, and scorers
  • 🛡️ Error Handling - Robust error handling in concurrent execution

📦 Installation

npm install eva-ts

🚀 Quick Start

import { Eval } from "eva-ts";
import type { DataItem, BaseScore } from "eva-ts";

// Define your evaluation types
interface MyInput {
  question: string;
}

interface MyExpected {
  answer: string;
}

interface MyOutput {
  response: string;
}

interface MyScore extends BaseScore {
  name: string;
  value: number;
}

// Create and run evaluation
const evaluation = new Eval<MyInput, MyExpected, MyOutput, MyScore>({
  // Provide test data
  dataProvider: () => [
    {
      input: { question: "What is 2+2?" },
      expected: { answer: "4" },
    },
    {
      input: { question: "What is the capital of France?" },
      expected: { answer: "Paris" },
    },
  ],

  // Define the task to evaluate
  taskFn: async ({ data }) => {
    // Your system under test (e.g., API call, model inference)
    const response = await myAIModel.generate(data.input.question);
    return { response };
  },

  // Define scoring functions
  scorers: [
    ({ output, data }) => ({
      name: "exact-match",
      value:
        output.response.toLowerCase() === data.expected?.answer.toLowerCase()
          ? 1
          : 0,
    }),
    ({ output, data }) => ({
      name: "contains-answer",
      value: output.response
        .toLowerCase()
        .includes(data.expected?.answer.toLowerCase() || "")
        ? 1
        : 0,
    }),
  ],

  // Configuration
  config: {
    name: "ai-model-evaluation",
    maxConcurrency: 3,
    outputDir: "./results",
  },
});

// Run the evaluation
const results = await evaluation.evaluate();
console.log(`Completed ${results.scores.length} evaluations`);

📚 Core Concepts

DataItem

Represents a single evaluation case:

interface DataItem<Input, Expected> {
  input: Input; // The input to your system
  expected?: Expected; // Expected output (optional)
  metadata?: Record<string, unknown>; // Additional context
}

Scorers

Functions that evaluate output quality:

type Scorer<Output, Score> = ({
  output,
  data,
}: {
  output: Output;
  data: DataItem<Input, Expected>;
}) => Score | Promise<Score>;

Configuration

interface EvalConfig {
  name: string; // Evaluation name
  maxConcurrency: number; // Concurrent task limit
  outputDir?: string; // Optional JSONL output directory
  projectName?: string; // Optional database project name
  evalDescription?: string; // Optional evaluation description
}

🗄️ Database Setup (Optional)

Eva supports optional database persistence for evaluation runs and results. We recommend using Supabase for easy setup and management.

Using Supabase (Recommended)

  1. Create a Supabase project:

    • Go to supabase.com and create a new project
    • Note your project URL and password from the project settings
  2. Set environment variable:

    export DATABASE_URL="postgresql://postgres:[PASSWORD]@[HOST]:6543/postgres?pgbouncer=true&connection_limit=1"

    Replace [PASSWORD] and [HOST] with your Supabase credentials. Use port 6543 for connection pooling (recommended) or 5432 for direct connection.

  3. Configure your evaluation:

    const evaluation = new Eval({
      config: {
        name: "my-evaluation",
        maxConcurrency: 3,
        projectName: "my-project", // Required for database storage
        evalDescription: "Testing my AI model", // Optional description
      },
      // ... rest of configuration
    });

Database Schema

When database config is provided, Eva automatically creates a hierarchical structure:

  • ProjectsEvaluation NamesEvaluation RunsResults
  • Stores input, expected output, actual output, scores, and metadata in JSONB format
  • Supports querying, filtering, and statistical analysis of evaluation data

Alternative Database Setup

Eva uses Drizzle ORM and supports any PostgreSQL database. For other providers, simply set the DATABASE_URL environment variable to your PostgreSQL connection string.

🎯 Scoring Examples

String Matching

// Exact string match
({ output, data }) => ({
  name: 'exact-match',
  value: output.text === data.expected?.text ? 1 : 0
})

// Fuzzy string matching
({ output, data }) => ({
  name: 'similarity',
  value: calculateStringSimilarity(output.text, data.expected?.text || '')
})

JSON Comparison

// Deep JSON equality
({ output, data }) => ({
  name: 'json-match',
  value: JSON.stringify(output.data) === JSON.stringify(data.expected?.data) ? 1 : 0
})

// Field-specific validation
({ output, data }) => ({
  name: 'has-required-fields',
  value: output.data.id && output.data.name ? 1 : 0
})

Numerical Analysis

// Absolute error
({ output, data }) => ({
  name: 'absolute-error',
  value: Math.abs(output.value - (data.expected?.value || 0))
})

// Relative error
({ output, data }) => ({
  name: 'relative-error',
  value: data.expected?.value
    ? Math.abs(output.value - data.expected.value) / Math.abs(data.expected.value)
    : 0
})

🔧 Advanced Usage

Async Data Provider

const evaluation = new Eval({
  dataProvider: async () => {
    const response = await fetch("/api/test-cases");
    return await response.json();
  },
  // ... rest of configuration
});

Custom Metadata Scoring

const scorers = [
  // Score based on input context
  ({ output, data }) => ({
    name: "difficulty-adjusted",
    value:
      data.metadata?.difficulty === "hard"
        ? output.score * 2 // Double points for hard questions
        : output.score,
  }),
];

Error Handling

const taskFn = async ({ data }) => {
  try {
    return await riskyApiCall(data.input);
  } catch (error) {
    return { error: error.message, success: false };
  }
};

const scorers = [
  ({ output }) => ({
    name: "success-rate",
    value: output.success ? 1 : 0,
  }),
];

📊 Output Format

When outputDir is specified, Eva generates JSONL files with detailed results:

{"scores":[{"name":"exact-match","value":1}],"index":0,"input":{"question":"What is 2+2?"},"expected":{"answer":"4"},"metadata":{},"output":{"response":"4"}}
{"scores":[{"name":"exact-match","value":0}],"index":1,"input":{"question":"Capital of France?"},"expected":{"answer":"Paris"},"metadata":{},"output":{"response":"The capital is Paris"}}

🏗️ Architecture

Eva is built with performance and flexibility in mind:

  • Concurrent Execution: Uses p-limit for controlled concurrency
  • Progress Tracking: Real-time progress bars via cli-progress
  • Type Safety: Full TypeScript generics support
  • Memory Efficient: Streams results to disk for large evaluations
  • Error Resilient: Continues evaluation even if individual tasks fail

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Run tests (npm test)
  4. Commit your changes (git commit -m 'Add amazing feature')
  5. Push to the branch (git push origin feature/amazing-feature)
  6. Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Built with TypeScript for type safety
  • Uses cli-progress for beautiful progress bars
  • Powered by p-limit for concurrency control

Made with ❤️ by Lilac Labs