npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@future-agi/ai-evaluation

v0.2.0

Published

We help GenAI teams maintain high-accuracy for their Models in production.

Readme

@future-agi/ai-evaluation

TypeScript SDK for Future AGI's AI evaluation platform. Evaluate LLM outputs with 50+ metrics including factual accuracy, groundedness, relevance, and more.

Installation

npm install @future-agi/ai-evaluation
# or
pnpm add @future-agi/ai-evaluation
# or
yarn add @future-agi/ai-evaluation

Quick Start

import { evaluate } from '@future-agi/ai-evaluation';

// Set your API key
process.env.FI_API_KEY = 'your-api-key';
process.env.FI_SECRET_KEY = 'your-secret-key';

// Run an evaluation
const result = await evaluate(
  'Factual Accuracy',
  {
    response: ['The capital of France is Paris.'],
    context: ['Paris is the capital and largest city of France.']
  }
);

console.log(result.eval_results[0].output); // Score: 0-1
console.log(result.eval_results[0].reason); // Explanation

Features

  • Cloud Evaluations: 50+ evaluation metrics via Future AGI API
  • Local Evaluations: Run heuristic metrics offline without API calls
  • Hybrid Mode: Automatically route between local and cloud execution
  • Local LLM Support: Use Ollama for LLM-as-judge evaluations locally
  • Platform Integration: Langfuse integration for observability
  • Pipeline Evaluation: Evaluate entire ML pipelines

Usage

Cloud Evaluation (Default)

import { Evaluator } from '@future-agi/ai-evaluation';

const evaluator = new Evaluator({
  fiApiKey: 'your-api-key',
  fiSecretKey: 'your-secret-key'
});

// Single evaluation
const result = await evaluator.evaluate(
  'Groundedness',
  {
    query: ['What is machine learning?'],
    response: ['Machine learning is a subset of AI...'],
    context: ['Machine learning (ML) is a field of AI...']
  },
  { modelName: 'gpt-4o' }
);

// Async evaluation (returns immediately, poll for results)
const asyncResult = await evaluator.evaluate(
  'Factual Accuracy',
  { response: ['...'], context: ['...'] },
  { isAsync: true }
);

// Get async result later
const finalResult = await evaluator.getEvalResult(asyncResult.eval_id);

Local Evaluation (Offline)

Run evaluations locally without API calls using heuristic metrics:

import { LocalEvaluator } from '@future-agi/ai-evaluation/local';

const evaluator = new LocalEvaluator();

// String metrics
const containsResult = evaluator.evaluate(
  'contains',
  [{ response: 'Hello world' }],
  { keyword: 'world' }
);
// Score: 1.0 (contains the keyword)

// JSON validation
const jsonResult = evaluator.evaluate(
  'json_schema',
  [{ response: '{"name": "John", "age": 30}' }],
  {
    schema: {
      type: 'object',
      properties: { name: { type: 'string' }, age: { type: 'number' } },
      required: ['name']
    }
  }
);

// Similarity metrics
const bleuResult = evaluator.evaluate(
  'bleu_score',
  [{ response: 'The cat sat on the mat' }],
  { reference: 'The cat is on the mat' }
);

Available Local Metrics

| Category | Metrics | |----------|---------| | String | regex, contains, contains_all, contains_any, contains_none, one_line, equals, starts_with, ends_with, length_less_than, length_greater_than, length_between | | JSON | contains_json, is_json, json_schema | | Similarity | bleu_score, rouge_score, recall_score, levenshtein_similarity, numeric_similarity, semantic_list_contains |

Hybrid Evaluation

Automatically route between local and cloud execution:

import { HybridEvaluator, OllamaLLM } from '@future-agi/ai-evaluation/local';
import { Evaluator } from '@future-agi/ai-evaluation';

// Setup hybrid evaluator with local LLM
const localLLM = new OllamaLLM({ model: 'llama3.2' });
const cloudEvaluator = new Evaluator();

const hybrid = new HybridEvaluator({
  localLLM,
  cloudEvaluator,
  preferLocal: true,      // Prefer local when possible
  fallbackToCloud: true,  // Fall back to cloud if local fails
  offlineMode: false      // Set true to disable cloud entirely
});

// Heuristic metrics run locally
const localResult = await hybrid.evaluate(
  'contains',
  [{ response: 'Hello world' }],
  { keyword: 'world' }
);

// LLM-based metrics use local Ollama if available
const llmResult = await hybrid.evaluate(
  'groundedness',
  [{
    query: 'What is AI?',
    response: 'AI is artificial intelligence.',
    context: 'Artificial intelligence (AI) is...'
  }]
);

Local LLM with Ollama

Use Ollama for local LLM-as-judge evaluations:

import { OllamaLLM } from '@future-agi/ai-evaluation/local';

// Ensure Ollama is running: ollama serve
const llm = new OllamaLLM({
  model: 'llama3.2',           // Model name
  baseUrl: 'http://localhost:11434',  // Ollama URL
  temperature: 0.0,            // Deterministic output
  maxTokens: 1024,
  timeout: 120                 // Seconds
});

// Check availability
const isAvailable = await llm.isAvailable();

// Direct generation
const response = await llm.generate('Explain quantum computing');

// Chat completion
const chatResponse = await llm.chat([
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'What is machine learning?' }
]);

// LLM-as-judge
const judgment = await llm.judge(
  'What is the capital of France?',      // Query
  'The capital of France is Paris.',     // Response to evaluate
  'Evaluate factual accuracy. Score 0-1.', // Criteria
  'Paris is the capital of France.'      // Optional context
);
// Returns: { score: 1.0, passed: true, reason: '...' }

Pipeline Evaluation

Evaluate entire ML pipelines:

const evaluator = new Evaluator();

// Submit pipeline evaluation
await evaluator.evaluatePipeline(
  'my-project',
  'v1.0.0',
  [
    { input: 'query1', output: 'response1', context: 'ctx1' },
    { input: 'query2', output: 'response2', context: 'ctx2' }
  ]
);

// Get results for multiple versions
const results = await evaluator.getPipelineResults(
  'my-project',
  ['v1.0.0', 'v1.1.0', 'v2.0.0']
);

Langfuse Integration

Enable observability with Langfuse:

const evaluator = new Evaluator({
  fiApiKey: process.env.FI_API_KEY,
  fiSecretKey: process.env.FI_SECRET_KEY,
  langfuseSecretKey: process.env.LANGFUSE_SECRET_KEY,
  langfusePublicKey: process.env.LANGFUSE_PUBLIC_KEY,
  langfuseHost: process.env.LANGFUSE_HOST
});

// Evaluations will be logged to Langfuse
const result = await evaluator.evaluate(
  'Groundedness',
  { response: ['...'], context: ['...'] },
  { platform: 'langfuse', customEvalName: 'my-eval' }
);

API Reference

Main Exports (@future-agi/ai-evaluation)

| Export | Description | |--------|-------------| | Evaluator | Main class for cloud evaluations | | evaluate() | Convenience function for single evaluation | | list_evaluations() | List available evaluation templates | | get_eval_result() | Get async evaluation result | | evaluate_pipeline() | Evaluate a pipeline | | get_pipeline_results() | Get pipeline results |

Local Exports (@future-agi/ai-evaluation/local)

| Export | Description | |--------|-------------| | LocalEvaluator | Run heuristic metrics locally | | HybridEvaluator | Route between local and cloud | | OllamaLLM | Local LLM client via Ollama | | LocalLLMFactory | Factory for creating LLM instances | | canRunLocally() | Check if metric runs locally | | requiresLLM() | Check if metric needs LLM | | Individual metrics | contains, regex, bleuScore, etc. |

Environment Variables

| Variable | Description | |----------|-------------| | FI_API_KEY | Future AGI API key | | FI_SECRET_KEY | Future AGI secret key | | FI_BASE_URL | API base URL (optional) | | LANGFUSE_SECRET_KEY | Langfuse secret key (optional) | | LANGFUSE_PUBLIC_KEY | Langfuse public key (optional) | | LANGFUSE_HOST | Langfuse host URL (optional) |

Requirements

  • Node.js >= 18.0.0
  • For local LLM: Ollama installed and running

License

MIT

Links