npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@caleblawson/evals

v0.10.4

Published

A comprehensive evaluation framework for assessing AI model outputs across multiple dimensions.

Downloads

2

Readme

@mastra/evals

A comprehensive evaluation framework for assessing AI model outputs across multiple dimensions.

Installation

npm install @mastra/evals

Overview

@mastra/evals provides a suite of evaluation metrics for assessing AI model outputs. The package includes both LLM-based and NLP-based metrics, enabling both automated and model-assisted evaluation of AI responses.

Features

LLM-Based Metrics

  1. Answer Relevancy

    • Evaluates how well an answer addresses the input question
    • Considers uncertainty weighting for more nuanced scoring
    • Returns detailed reasoning for scores
  2. Bias Detection

    • Identifies potential biases in model outputs
    • Analyzes opinions and statements for bias indicators
    • Provides explanations for detected biases
    • Configurable scoring scale
  3. Context Precision & Relevancy

    • Assesses how well responses use provided context
    • Evaluates accuracy of context usage
    • Measures relevance of context to the response
    • Analyzes context positioning in responses
  4. Faithfulness

    • Verifies that responses are faithful to provided context
    • Detects hallucinations or fabricated information
    • Evaluates claims against provided context
    • Provides detailed analysis of faithfulness breaches
  5. Prompt Alignment

    • Measures how well responses follow given instructions
    • Evaluates adherence to multiple instruction criteria
    • Provides per-instruction scoring
    • Supports custom instruction sets
  6. Toxicity

    • Detects toxic or harmful content in responses
    • Provides detailed reasoning for toxicity verdicts
    • Configurable scoring thresholds
    • Considers both input and output context

NLP-Based Metrics

  1. Completeness

    • Analyzes structural completeness of responses
    • Identifies missing elements from input requirements
    • Provides detailed element coverage analysis
    • Tracks input-output element ratios
  2. Content Similarity

    • Measures text similarity between inputs and outputs
    • Configurable for case and whitespace sensitivity
    • Returns normalized similarity scores
    • Uses string comparison algorithms for accuracy
  3. Keyword Coverage

    • Tracks presence of key terms from input in output
    • Provides detailed keyword matching statistics
    • Calculates coverage ratios
    • Useful for ensuring comprehensive responses

Usage

Basic Example

import { ContentSimilarityMetric, ToxicityMetric } from '@mastra/evals';

// Initialize metrics
const similarityMetric = new ContentSimilarityMetric({
  ignoreCase: true,
  ignoreWhitespace: true,
});

const toxicityMetric = new ToxicityMetric({
  model: openai('gpt-4'),
  scale: 1, // Optional: adjust scoring scale
});

// Evaluate outputs
const input = 'What is the capital of France?';
const output = 'Paris is the capital of France.';

const similarityResult = await similarityMetric.measure(input, output);
const toxicityResult = await toxicityMetric.measure(input, output);

console.log('Similarity Score:', similarityResult.score);
console.log('Toxicity Score:', toxicityResult.score);

Context-Aware Evaluation

import { FaithfulnessMetric } from '@mastra/evals';

// Initialize with context
const faithfulnessMetric = new FaithfulnessMetric({
  model: openai('gpt-4'),
  context: ['Paris is the capital of France', 'Paris has a population of 2.2 million'],
  scale: 1,
});

// Evaluate response against context
const result = await faithfulnessMetric.measure(
  'Tell me about Paris',
  'Paris is the capital of France with 2.2 million residents',
);

console.log('Faithfulness Score:', result.score);
console.log('Reasoning:', result.reason);

Metric Results

Each metric returns a standardized result object containing:

  • score: Normalized score (typically 0-1)
  • info: Detailed information about the evaluation
  • Additional metric-specific data (e.g., matched keywords, missing elements)

Some metrics also provide:

  • reason: Detailed explanation of the score
  • verdicts: Individual judgments that contributed to the final score

Telemetry and Logging

The package includes built-in telemetry and logging capabilities:

  • Automatic evaluation tracking through Mastra Storage
  • Integration with OpenTelemetry for performance monitoring
  • Detailed evaluation traces for debugging
import { attachListeners } from '@mastra/evals';

// Enable basic evaluation tracking
await attachListeners();

// Store evals in Mastra Storage (if storage is enabled)
await attachListeners(mastra);
// Note: When using in-memory storage, evaluations are isolated to the test process.
// When using file storage, evaluations are persisted and can be queried later.

Environment Variables

Required for LLM-based metrics:

  • OPENAI_API_KEY: For OpenAI model access
  • Additional provider keys as needed (Cohere, Anthropic, etc.)

Package Exports

// Main package exports
import { evaluate } from '@mastra/evals';
// NLP-specific metrics
import { ContentSimilarityMetric } from '@mastra/evals/nlp';

Related Packages

  • @mastra/core: Core framework functionality
  • @mastra/engine: LLM execution engine
  • @mastra/mcp: Model Context Protocol integration