@loveholidays/eval-kit

v1.0.2

Published

14 days ago

TypeScript SDK for content evaluation

eval-kit

A TypeScript SDK for evaluating content quality using traditional metrics and AI-powered evaluation.

Features

Traditional Metrics: BLEU, TER, BERTScore, Coherence, Perplexity
AI-Powered Evaluation: LLM-based evaluator with prompt templating (via Vercel AI SDK)
Batch Processing: Concurrent execution, progress tracking, retry logic, CSV/JSON export

Installation

npm install @loveholidays/eval-kit
# or
pnpm add @loveholidays/eval-kit

For AI evaluation, you'll also need an AI SDK provider:

npm install @ai-sdk/openai
# or @ai-sdk/anthropic, @ai-sdk/google, etc.

Quick Start

Traditional Metrics

import { calculateBleu, calculateCoherence } from '@loveholidays/eval-kit';

// BLEU score for translation quality
const bleuResult = calculateBleu(
  'The cat sits on the mat',
  ['The cat is on the mat']
);
console.log(bleuResult.score); // 75.98

// Coherence for text flow
const coherenceResult = calculateCoherence(
  'The cat sat on the mat. It was comfortable.'
);
console.log(coherenceResult.score); // 65.43

AI-Powered Evaluation

import { openai } from '@ai-sdk/openai';
import { Evaluator } from '@loveholidays/eval-kit';

const evaluator = Evaluator.create('fluency', openai('gpt-4'));

const result = await evaluator.evaluate({
  candidateText: 'The quick brown fox jumps over the lazy dog.'
});

console.log(result.score);    // 95
console.log(result.feedback); // "Excellent fluency..."

Batch Evaluation

import { anthropic } from '@ai-sdk/anthropic';
import { BatchEvaluator, Evaluator } from '@loveholidays/eval-kit';

const evaluator = new Evaluator({
  name: 'quality',
  model: anthropic('claude-3-5-haiku-20241022'),
  evaluationPrompt: 'Rate the quality of this text from 1-10.',
  scoreConfig: { type: 'numeric', min: 1, max: 10 },
});

const batchEvaluator = new BatchEvaluator({
  evaluators: [evaluator],
  concurrency: 5,
  onResult: (result) => console.log(`Row ${result.rowId}: ${result.results[0].score}`),
});

const result = await batchEvaluator.evaluate({ filePath: './data.csv' });

await batchEvaluator.export({
  format: 'csv',
  destination: './results.csv',
});

Documentation

| Guide | Description | |-------|-------------| | Metrics | BLEU, TER, BERTScore, Coherence, Perplexity | | Evaluator | AI-powered evaluation and scoring | | Batch Evaluation | Concurrent processing, progress tracking | | Export | CSV and JSON export options |

Supported LLM Providers

Via Vercel AI SDK: OpenAI, Anthropic, Google, Mistral, Groq, Cohere, and any OpenAI-compatible endpoint.

Development

pnpm install    # Install dependencies
pnpm build      # Build the project
pnpm test       # Run tests
pnpm lint       # Lint code

Publishing

This package uses Changesets for version management and is published to the npm registry.

Creating a Release

Add a changeset when you make changes that should be released:
```
pnpm changeset
```
- Select the version bump type (patch/minor/major)
- Write a summary of your changes
- This creates a markdown file in .changeset/
Merge to main — The CI will automatically:
- Detect changesets
- Bump the version in package.json
- Update CHANGELOG.md
- Publish to npm registry
- Push git tags

Manual Publishing

For local testing or manual releases:

pnpm build              # Build the package
pnpm changeset version  # Apply version bumps
pnpm changeset publish  # Publish to registry

Version Types

| Type | When to use | |------|-------------| | patch | Bug fixes, small updates | | minor | New features (backwards compatible) | | major | Breaking changes |

License

MIT