npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@tars-inc/eval-lib

v0.1.1

Published

TypeScript building blocks for evaluating RAG retrieval pipelines and CX agents: chunkers, embedders, retrievers, span-based metrics, synthetic data generation, transcript analysis, and LangSmith integration.

Readme

@tars-inc/eval-lib

Composable TypeScript building blocks for evaluating RAG retrieval pipelines and CX (customer experience) agents end-to-end.

Capabilities:

  • Span-based RAG evaluation — character-level recall, precision, IoU, F1 against ground-truth spans (not just chunk IDs)
  • Configurable retrieval pipelines — mix and match index strategies (Plain / Contextual / Summary / ParentChild), query rewriting (HyDE, MultiQuery, StepBack), search backends (Dense / BM25 / Hybrid), and refinement steps (Rerank / Threshold / Dedup / MMR / ExpandContext)
  • Synthetic dataset generation — three strategies: SimpleStrategy, DimensionDrivenStrategy, RealWorldGroundedStrategy, plus token-level ground-truth assignment
  • Conversation analysis — transcript parsing, microtopic extraction, message-type classification, agent-level statistics
  • Source ingestion — HTML scraping (ContentScraper) and file processing (PDF, Markdown, HTML → Markdown)
  • LangSmith integration — dataset upload, experiment runner, evaluator factory

Install

pnpm add @tars-inc/eval-lib@beta

Optional peer dependencies — install whichever providers you use:

pnpm add openai           # OpenAIEmbedder, pipeline LLM client
pnpm add cohere-ai        # CohereEmbedder, CohereReranker
pnpm add @anthropic-ai/sdk  # Claude-based conversation classification
pnpm add langsmith        # LangSmith dataset / experiment runner

Quick start: span-based evaluation with a custom retriever

import {
  createDocument,
  createCorpus,
  CallbackRetriever,
  computeMetrics,
  recall,
  precision,
  f1,
  PositionAwareChunkId,
  DocumentId,
} from "@tars-inc/eval-lib";

const corpus = createCorpus([
  createDocument({
    id: "faq.md",
    content: "How do I reset my password? Click 'Forgot Password' on the login page.",
  }),
]);

const retriever = new CallbackRetriever({
  name: "keyword-matcher",
  retrieveFn: async (query, k) => [
    {
      id: PositionAwareChunkId("chunk-1"),
      content: "Click 'Forgot Password' on the login page.",
      docId: DocumentId("faq.md"),
      start: 28,
      end: 70,
      metadata: {},
    },
  ],
});

await retriever.init(corpus);

const result = await computeMetrics({
  retriever,
  corpus,
  metrics: [recall, precision, f1],
  examples: [
    {
      inputs: { query: "how do I reset my password" },
      outputs: {
        relevantSpans: [{
          docId: "faq.md",
          start: 28,
          end: 70,
          text: "Click 'Forgot Password' on the login page.",
        }],
      },
      metadata: {},
    },
  ],
});

console.log(result);
await retriever.cleanup();

Using a built-in retriever preset

import { createHybridRerankedRetriever } from "@tars-inc/eval-lib";
import { OpenAIEmbedder } from "@tars-inc/eval-lib/embedders/openai";
import { CohereReranker } from "@tars-inc/eval-lib/rerankers/cohere";

const embedder = await OpenAIEmbedder.create({ model: "text-embedding-3-small" });
const reranker = new CohereReranker({ model: "rerank-english-v3.0" });

const retriever = createHybridRerankedRetriever({ embedder, reranker });
await retriever.init(corpus);
const hits = await retriever.retrieve("how do I reset my password", 10);

Generating a synthetic evaluation dataset

import {
  SimpleStrategy,
  GroundTruthAssigner,
  RecursiveCharacterChunker,
  openAIClientAdapter,
} from "@tars-inc/eval-lib";
import OpenAI from "openai";

const llm = openAIClientAdapter(new OpenAI());
const strategy = new SimpleStrategy({ queriesPerDoc: 5 });
const chunker = new RecursiveCharacterChunker({ chunkSize: 500, chunkOverlap: 50 });

const queries = await strategy.generate({ corpus, llm, model: "gpt-4o-mini" });
const groundTruth = await new GroundTruthAssigner({ chunker }).assign(queries, corpus);

Other strategies:

  • DimensionDrivenStrategy — generates orthogonal coverage across dimensions you define (task type, difficulty, persona, etc.)
  • RealWorldGroundedStrategy — matches a list of real user questions to documents via embedding similarity, then synthesizes variants

Sub-path entry points

Provider-specific code lives in sub-paths so you only pay for what you import:

| Path | Contents | |---|---| | @tars-inc/eval-lib | Core types, chunkers, retrievers, metrics, synthetic generation, presets | | @tars-inc/eval-lib/embedders/openai | OpenAIEmbedder | | @tars-inc/eval-lib/embedders/cohere | CohereEmbedder | | @tars-inc/eval-lib/embedders/voyage | VoyageEmbedder | | @tars-inc/eval-lib/embedders/jina | JinaEmbedder | | @tars-inc/eval-lib/rerankers/cohere | CohereReranker | | @tars-inc/eval-lib/rerankers/jina | JinaReranker | | @tars-inc/eval-lib/rerankers/voyage | VoyageReranker | | @tars-inc/eval-lib/pipeline/internals | BM25SearchIndex, fusion (weightedScoreFusion, reciprocalRankFusion), dimension discovery, refinement defaults | | @tars-inc/eval-lib/pipeline/llm-openai | OpenAIPipelineLLM for query expansion / rewrite | | @tars-inc/eval-lib/llm | createLLMClient, createEmbedder, getModel, DEFAULT_MODEL (Node-only) | | @tars-inc/eval-lib/langsmith | getLangSmithClient, uploadDataset, runLangSmithExperiment, createLangSmithEvaluator (Node-only) | | @tars-inc/eval-lib/utils | Hashing, span helpers, retry, concurrency, cosine similarity | | @tars-inc/eval-lib/shared | Constants and shared types (JobStatus, SerializedSpan, ExperimentResult) | | @tars-inc/eval-lib/file-processing | processFile, htmlToMarkdown, pdfToMarkdown | | @tars-inc/eval-lib/scraper | ContentScraper, filterLinks, normalizeUrl, seed-entity helpers | | @tars-inc/eval-lib/registry | Component registries for embedders, rerankers, chunkers, strategies, presets | | @tars-inc/eval-lib/data-analysis | parseTranscript, parseBotFlowInput, computeBasicStats, classifyMessageTypes, extractMicrotopics |

Key concepts

  • PositionAwareChunker — chunkers that preserve start/end character offsets in the source document. Required for span-based metrics.
  • Retriever interfaceinit(corpus)retrieve(query, k)cleanup(). Returns PositionAwareChunk[] with character offsets, enabling span-overlap metrics rather than chunk-ID matching.
  • Span-based metricsrecall, precision, iou, f1 operate on CharacterSpan[] and compute character-level overlap. mergeOverlappingSpans coalesces before comparison.
  • PipelineRetriever — config-driven retriever composed of IndexConfig × QueryConfig × SearchConfig × RefinementStepConfig[]. Use the preset factories (createBaselineVectorRagRetriever, createBM25Retriever, createHybridRetriever, createHybridRerankedRetriever) for common combinations.

What this library is not

  • Not a vector database — InMemoryVectorStore is included for dev/test only
  • Not an LLM provider — wraps OpenAI / Cohere / Anthropic SDKs
  • Not a UI library or deployment platform
  • Not a multi-turn chat engine — focused on single-turn retrieval and conversation analysis

Source

Source, tests, and end-to-end examples: Tars-Technologies/cx-agent-evals under packages/eval-lib/.

License

MIT