@loveholidays/eval-kit
v1.0.2
Published
TypeScript SDK for content evaluation
Maintainers
Readme
eval-kit
A TypeScript SDK for evaluating content quality using traditional metrics and AI-powered evaluation.
Features
- Traditional Metrics: BLEU, TER, BERTScore, Coherence, Perplexity
- AI-Powered Evaluation: LLM-based evaluator with prompt templating (via Vercel AI SDK)
- Batch Processing: Concurrent execution, progress tracking, retry logic, CSV/JSON export
Installation
npm install @loveholidays/eval-kit
# or
pnpm add @loveholidays/eval-kitFor AI evaluation, you'll also need an AI SDK provider:
npm install @ai-sdk/openai
# or @ai-sdk/anthropic, @ai-sdk/google, etc.Quick Start
Traditional Metrics
import { calculateBleu, calculateCoherence } from '@loveholidays/eval-kit';
// BLEU score for translation quality
const bleuResult = calculateBleu(
'The cat sits on the mat',
['The cat is on the mat']
);
console.log(bleuResult.score); // 75.98
// Coherence for text flow
const coherenceResult = calculateCoherence(
'The cat sat on the mat. It was comfortable.'
);
console.log(coherenceResult.score); // 65.43AI-Powered Evaluation
import { openai } from '@ai-sdk/openai';
import { Evaluator } from '@loveholidays/eval-kit';
const evaluator = Evaluator.create('fluency', openai('gpt-4'));
const result = await evaluator.evaluate({
candidateText: 'The quick brown fox jumps over the lazy dog.'
});
console.log(result.score); // 95
console.log(result.feedback); // "Excellent fluency..."Batch Evaluation
import { anthropic } from '@ai-sdk/anthropic';
import { BatchEvaluator, Evaluator } from '@loveholidays/eval-kit';
const evaluator = new Evaluator({
name: 'quality',
model: anthropic('claude-3-5-haiku-20241022'),
evaluationPrompt: 'Rate the quality of this text from 1-10.',
scoreConfig: { type: 'numeric', min: 1, max: 10 },
});
const batchEvaluator = new BatchEvaluator({
evaluators: [evaluator],
concurrency: 5,
onResult: (result) => console.log(`Row ${result.rowId}: ${result.results[0].score}`),
});
const result = await batchEvaluator.evaluate({ filePath: './data.csv' });
await batchEvaluator.export({
format: 'csv',
destination: './results.csv',
});Documentation
| Guide | Description | |-------|-------------| | Metrics | BLEU, TER, BERTScore, Coherence, Perplexity | | Evaluator | AI-powered evaluation and scoring | | Batch Evaluation | Concurrent processing, progress tracking | | Export | CSV and JSON export options |
Supported LLM Providers
Via Vercel AI SDK: OpenAI, Anthropic, Google, Mistral, Groq, Cohere, and any OpenAI-compatible endpoint.
Development
pnpm install # Install dependencies
pnpm build # Build the project
pnpm test # Run tests
pnpm lint # Lint codePublishing
This package uses Changesets for version management and is published to the npm registry.
Creating a Release
Add a changeset when you make changes that should be released:
pnpm changeset- Select the version bump type (patch/minor/major)
- Write a summary of your changes
- This creates a markdown file in
.changeset/
Merge to main — The CI will automatically:
- Detect changesets
- Bump the version in
package.json - Update
CHANGELOG.md - Publish to npm registry
- Push git tags
Manual Publishing
For local testing or manual releases:
pnpm build # Build the package
pnpm changeset version # Apply version bumps
pnpm changeset publish # Publish to registryVersion Types
| Type | When to use |
|------|-------------|
| patch | Bug fixes, small updates |
| minor | New features (backwards compatible) |
| major | Breaking changes |
License
MIT
