npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@frontsail_ai/frontevals

v0.1.0

Published

A minimal, vitest-native evals library for LLM applications

Readme

frontevals

A minimal, vitest-native evals library for LLM applications.

Features

  • Vitest-native: Works seamlessly inside your existing vitest test suite
  • Simple API: Just evalSuite() or evalTest() - no custom CLI needed
  • Built-in metrics: exact(), contains(), startsWith(), endsWith(), jsonValid(), hasKeys()
  • LLM-as-judge: G-Eval style custom criteria evaluation via OpenAI
  • Autoevals integration: Re-exports popular scorers like Factuality, Levenshtein, etc.
  • Multi-run support: Run cases multiple times with pass rate thresholds for non-deterministic tasks
  • Pretty output: Console tables showing scores and pass/fail status

Installation

npm install @frontsail_ai/frontevals

Quick Start

Basic Usage

import { describe, it, expect } from 'vitest';
import { evalSuite } from '@frontsail_ai/frontevals';
import { exact, contains } from '@frontsail_ai/frontevals/metrics';

describe('Greeting Evals', () => {
  it('passes all cases', async () => {
    const result = await evalSuite({
      name: 'greet function',
      data: [
        { input: 'Alice', expected: 'Hello, Alice!' },
        { input: 'Bob', expected: 'Hello, Bob!' },
      ],
      task: (name) => `Hello, ${name}!`,
      scorers: [exact(), contains('Hello')],
    });

    expect(result.summary.passRate).toBe(1);
  });
});

Even Simpler: evalTest Helper

import { evalTest } from '@frontsail_ai/frontevals/vitest';
import { exact } from '@frontsail_ai/frontevals/metrics';

describe('String Utils', () => {
  evalTest('uppercase works', {
    data: [
      { input: 'hello', expected: 'HELLO' },
      { input: 'world', expected: 'WORLD' },
    ],
    task: (s) => s.toUpperCase(),
    scorers: [exact()],
  });
});

API Reference

evalSuite(config)

Run an evaluation suite and return structured results.

interface EvalSuiteConfig<TInput, TOutput> {
  name: string;                    // Suite name for display
  data: EvalData<TInput>[];        // Test cases (or async function returning them)
  task: (input: TInput) => TOutput | Promise<TOutput>;  // Function to evaluate
  scorers: Scorer<TOutput>[];      // Scoring functions
  runs?: number;                   // Times to run each case (default: 1)
  threshold?: number;              // Pass threshold 0-1 (default: 1.0)
}

interface EvalData<TInput> {
  input: TInput;
  expected?: unknown;
  name?: string;                   // Custom display name
}

evalTest(name, config, threshold?)

Vitest helper that creates a test case automatically.

evalTest('test name', {
  data: [...],
  task: (input) => output,
  scorers: [...],
}, 0.9);  // Optional threshold (default: 1.0)

Built-in Metrics

Import from frontevals/metrics:

| Metric | Description | |--------|-------------| | exact() | Exact string match | | contains(str) | Output contains substring | | startsWith(str) | Output starts with prefix | | endsWith(str) | Output ends with suffix | | jsonValid() | Output is valid JSON | | hasKeys(keys) | Output object has required keys |

Example

import { exact, contains, jsonValid, hasKeys } from '@frontsail_ai/frontevals/metrics';

const result = await evalSuite({
  name: 'API Response',
  data: [{ input: 'query', expected: '{"status":"ok"}' }],
  task: async (q) => await api.call(q),
  scorers: [
    jsonValid(),
    hasKeys(['status']),
    contains('ok'),
  ],
});

G-Eval (LLM-as-Judge)

Use custom criteria evaluated by an LLM:

import { gEval } from '@frontsail_ai/frontevals/metrics';

const result = await evalSuite({
  name: 'Support Bot',
  data: [
    { input: 'My order is late', expected: 'Apologize and offer help' },
  ],
  task: async (q) => await bot.respond(q),
  scorers: [
    gEval({ criteria: 'Is the response empathetic and helpful?' }),
    gEval({
      name: 'actionable',
      criteria: 'Does the response provide a clear next step?',
      threshold: 0.8,
    }),
  ],
});

gEval Options

gEval({
  name?: string;           // Scorer name (default: 'gEval')
  criteria: string;        // Plain language criteria
  steps?: string[];        // Optional evaluation steps
  threshold?: number;      // Pass threshold (default: 0.7)
  model?: string;          // OpenAI model (default: 'gpt-4o-mini')
})

Note: Requires OPENAI_API_KEY environment variable.

Autoevals Integration

Re-exported scorers from the autoevals library:

import { Factuality, Levenshtein } from '@frontsail_ai/frontevals/autoevals';

const result = await evalSuite({
  name: 'Chatbot Quality',
  data: [
    { input: 'What is TypeScript?', expected: 'A typed superset of JavaScript' },
  ],
  task: async (q) => await chatbot.answer(q),
  scorers: [Factuality, Levenshtein],
});

Available: Factuality, Levenshtein, ClosedQA, Battle, Humor, Security, Summary, Translation

Multiple Runs (Non-Deterministic Tasks)

For LLM tasks with variable outputs, run multiple times and set a pass threshold:

const result = await evalSuite({
  name: 'Chatbot Consistency',
  data: [
    { input: 'Explain recursion', expected: 'A clear explanation' },
  ],
  task: async (q) => await chatbot.answer(q),
  scorers: [gEval({ criteria: 'Is the explanation clear?' })],
  runs: 5,          // Run each case 5 times
  threshold: 0.8,   // 80% of runs must pass
});

console.log(result.results[0].passRate); // e.g., 0.8

Console Output

Single Run Mode

┌─────────────────────────────────────────────────────────────────┐
│                     greet function                              │
├──────────┬────────────────┬───────┬──────────┬─────────┬────────┤
│ Case     │ Output         │ exact │ contains │ Score   │ Status │
├──────────┼────────────────┼───────┼──────────┼─────────┼────────┤
│ Alice    │ Hello, Alice!  │ 1.00  │ 1.00     │ 1.00    │ ✓ PASS │
│ Bob      │ Hello, Bob!    │ 1.00  │ 1.00     │ 1.00    │ ✓ PASS │
├──────────┴────────────────┴───────┴──────────┴─────────┴────────┤
│ Summary: 2/2 passed (100%)                                      │
└─────────────────────────────────────────────────────────────────┘

Multiple Runs Mode

┌───────────────────────────────────────────────────────────────────────┐
│                     Chatbot Consistency (5 runs, 80% threshold)       │
├────────────────────┬─────────┬───────────┬──────────┬─────────────────┤
│ Case               │ gEval   │ Runs      │ PassRate │ Status          │
├────────────────────┼─────────┼───────────┼──────────┼─────────────────┤
│ Explain recursion  │ 0.85    │ 4/5       │ 80%      │ ✓ PASS          │
│ What is a closure  │ 0.60    │ 3/5       │ 60%      │ ✗ FAIL (< 80%)  │
├────────────────────┴─────────┴───────────┴──────────┴─────────────────┤
│ Summary: 1/2 cases passed | 7/10 total trials passed (70%)            │
└───────────────────────────────────────────────────────────────────────┘

Result Types

interface SuiteResult {
  name: string;
  results: EvalResult[];
  summary: {
    total: number;
    passed: number;
    failed: number;
    passRate: number;
    totalTrials: number;
    byScorer: Record<string, { avgScore: number; passRate: number }>;
  };
  runs: number;
  threshold: number;
}

interface EvalResult {
  name: string;
  input: unknown;
  expected?: unknown;
  trials: TrialResult[];
  passRate: number;
  pass: boolean;
}

interface TrialResult {
  trialIndex: number;
  output: unknown;
  scores: Array<{ name: string; score: number; pass: boolean }>;
  pass: boolean;
}

Custom Scorers

Create custom scorers matching the autoevals signature:

import type { Scorer } from '@frontsail_ai/frontevals';

const lengthScorer: Scorer<string> = ({ output, expected }) => {
  const diff = Math.abs(output.length - String(expected).length);
  const score = Math.max(0, 1 - diff / 100);
  return { score, name: 'length' };
};

// Async scorers are supported
const apiScorer: Scorer<string> = async ({ output }) => {
  const result = await externalApi.evaluate(output);
  return { score: result.score, name: 'api', metadata: result };
};

Development

Setup

git clone <repo-url>
cd frontevals
npm install

Scripts

npm test          # Run tests
npm run test:watch # Run tests in watch mode
npm run build     # Build TypeScript

Running Tests with Coverage

npx vitest run --coverage

Project Structure

frontevals/
├── src/
│   ├── index.ts          # Main exports
│   ├── types.ts          # TypeScript interfaces
│   ├── eval-suite.ts     # Core evalSuite function
│   ├── reporter.ts       # Console table output
│   ├── vitest.ts         # evalTest helper
│   ├── autoevals.ts      # Re-exports from autoevals
│   └── metrics/
│       ├── index.ts      # Built-in metrics
│       └── geval.ts      # G-Eval LLM-as-judge
├── tests/
│   ├── eval-suite.test.ts
│   ├── metrics.test.ts
│   ├── geval.test.ts
│   ├── reporter.test.ts
│   ├── vitest-helper.test.ts
│   ├── exports.test.ts
│   └── output.test.ts
├── package.json
├── tsconfig.json
└── vitest.config.ts

Adding New Metrics

  1. Create scorer function in src/metrics/index.ts or a new file
  2. Export from src/metrics/index.ts
  3. Add tests in tests/metrics.test.ts

Example:

// src/metrics/index.ts
export function regex(pattern: RegExp): Scorer<unknown> {
  return ({ output }): ScoreResult => {
    const score = pattern.test(String(output)) ? 1 : 0;
    return { score, name: 'regex' };
  };
}

Dependencies

  • autoevals: LLM evaluation metrics
  • console-table-printer: Pretty console tables
  • openai: Required for gEval

Peer Dependencies

  • vitest: >=1.0.0

License

MIT