npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@reaatech/classifier-evals-exporters

v0.1.0

Published

Exporters for classifier evaluation results (JSON, HTML, Phoenix, Langfuse)

Readme

@reaatech/classifier-evals-exporters

npm version License: MIT CI

Status: Pre-1.0 — APIs may change in minor versions. Pin to a specific version in production.

Exporters for classifier evaluation results. Supports JSON (machine-readable), HTML (interactive report), Arize Phoenix (traces and embeddings), and Langfuse (observability traces).

Installation

npm install @reaatech/classifier-evals-exporters
# or
pnpm add @reaatech/classifier-evals-exporters

Feature Overview

  • JSON export — machine-readable output with optional sample inclusion, PII-aware redaction
  • HTML report — interactive SVG-based confusion matrix heatmap, per-class bar charts, metrics dashboard
  • Phoenix export — Arize Phoenix trace export with full metrics as span attributes and HTTP transport
  • Langfuse export — Langfuse trace ingestion with authentication, session grouping, and structured metadata
  • Dual ESM/CJS output — works with import and require

Quick Start

import {
  exportToJson,
  exportToHtml,
  exportToPhoenix,
  exportToLangfuse,
} from "@reaatech/classifier-evals-exporters";

// JSON export (PII-redacted by default)
const jsonResult = exportToJson({ evalRun });
console.log(jsonResult.json);

// HTML report with confusion matrix and per-class metrics
const htmlResult = exportToHtml(evalRun, {
  includeConfusionMatrix: true,
  includePerClassMetrics: true,
  title: "Classifier v2 Evaluation",
});
console.log(htmlResult.html);

// Phoenix trace export
await exportToPhoenix({
  evalRun,
  options: { endpoint: "http://localhost:6006", datasetName: "intent-classifier-v2" },
});

// Langfuse trace export
await exportToLangfuse({
  evalRun,
  options: {
    publicKey: process.env.LANGFUSE_PUBLIC_KEY,
    secretKey: process.env.LANGFUSE_SECRET_KEY,
    traceName: "classifier-evaluation",
  },
});

API Reference

JSON Exporter

exportToJson(input: JsonExportInput): ExportResult

Exports an EvalRun as a structured JSON payload with configurable options.

const result = exportToJson({
  evalRun,
  options: { includeSamples: false },
});
// result.json → string, result.success → boolean

| Option | Type | Default | Description | |--------|------|---------|-------------| | includeSamples | boolean | false | Include raw sample data (PII consideration) | | includePerClass | boolean | true | Include per-class breakdown | | includeVisualizationData | boolean | false | Include visualization-ready data |

The JSON payload includes: run_id, dataset_name, dataset_path, total_samples, duration_ms, started_at, completed_at, metrics, confusion_matrix, gate_results (if present), judge summary (if judged), and redacted metadata.

HTML Exporter

exportToHtml(evalRun: EvalRun, options?: HtmlExportOptions): HtmlExportResult

Generates a self-contained interactive HTML report with inline SVG charts.

const report = exportToHtml(evalRun, {
  includeConfusionMatrix: true,   // SVG heatmap (default: true)
  includePerClassMetrics: true,   // Per-class metrics table (default: true)
  includeBaselineComparison: true, // Baseline comparison section (default: false)
  includeJudgeAnalysis: true,     // LLM judge results section (default: true)
  title: "My Evaluation Report",
});

The report includes:

  • Header — dataset name, run ID, total samples, duration, timestamps
  • Metrics grid — accuracy, macro/micro F1, precision, recall, MCC, Cohen's Kappa
  • Confusion matrix — SVG heatmap with color-coded cells and labels
  • Per-class metrics — table with precision, recall, F1, and support per label
  • Gate results — pass/fail status for each gate (if gate results present)
  • Judge results — agreement rate, cost breakdown (if judged)

SVG charts are generated inline — no external dependencies or CDN requests.

Phoenix Exporter

exportToPhoenix(input: PhoenixExportInput): Promise<ExportResult>

Publishes evaluation results to an Arize Phoenix server as OpenTelemetry traces.

await exportToPhoenix({
  evalRun,
  options: {
    endpoint: "http://localhost:6006",  // Phoenix endpoint (default)
    datasetName: "intent-classifier",   // Dataset name in Phoenix (default: "classifier-evals")
    apiKey: process.env.PHOENIX_API_KEY, // Optional API key
    metadata: { model: "v2", env: "staging" },
  },
});

The export creates a single trace per eval run with one span containing:

  • Dataset info — name, path, total samples
  • Full metrics — all 14 ClassificationMetrics as span attributes
  • Confusion matrix metadata — labels array and class count
  • Custom metadata — user-provided metadata merged with PII-redacted content

Uses fetch() with a 30-second timeout and configurable authentication via Authorization: Bearer header.

Langfuse Exporter

exportToLangfuse(input: LangfuseExportInput): Promise<ExportResult>

Publishes evaluation results to Langfuse as trace events.

await exportToLangfuse({
  evalRun,
  options: {
    publicKey: process.env.LANGFUSE_PUBLIC_KEY,
    secretKey: process.env.LANGFUSE_SECRET_KEY,
    baseUrl: "https://cloud.langfuse.com",  // Default
    traceName: "classifier-evaluation",       // Default
    sessionId: `eval-${Date.now()}`,
  },
});

The export creates a trace event with:

  • Input — dataset path, total samples
  • Output — full metrics, gate pass status
  • Metadata — duration, run ID, confusion matrix class count

Uses HTTP Basic Authentication with publicKey:secretKey over fetch() with a 30-second timeout.

Usage Patterns

Export Pipeline

import { createEvalRunFromSamples } from "@reaatech/classifier-evals-metrics";
import { exportToJson, exportToHtml } from "@reaatech/classifier-evals-exporters";

// Build the eval run
const evalRun = createEvalRunFromSamples({ datasetPath: "./test.csv", samples });

// Export to JSON for CI artifacts
const jsonResult = exportToJson({ evalRun });
await fs.writeFile("results.json", jsonResult.json);

// Export to HTML for human review
const htmlResult = exportToHtml(evalRun, { title: "Production Eval — v2.1.0" });
await fs.writeFile("report.html", htmlResult.html);

PII-Safe Export

// JSON export redacts PII by default (includesSamples: false)
const safe = exportToJson({ evalRun });

// Metadata is automatically redacted for PII before inclusion
// Phoenix export also redacts metadata via redactObjectPII()

Related Packages

License

MIT