@tars-inc/eval-lib
v0.1.1
Published
TypeScript building blocks for evaluating RAG retrieval pipelines and CX agents: chunkers, embedders, retrievers, span-based metrics, synthetic data generation, transcript analysis, and LangSmith integration.
Readme
@tars-inc/eval-lib
Composable TypeScript building blocks for evaluating RAG retrieval pipelines and CX (customer experience) agents end-to-end.
Capabilities:
- Span-based RAG evaluation — character-level recall, precision, IoU, F1 against ground-truth spans (not just chunk IDs)
- Configurable retrieval pipelines — mix and match index strategies (Plain / Contextual / Summary / ParentChild), query rewriting (HyDE, MultiQuery, StepBack), search backends (Dense / BM25 / Hybrid), and refinement steps (Rerank / Threshold / Dedup / MMR / ExpandContext)
- Synthetic dataset generation — three strategies:
SimpleStrategy,DimensionDrivenStrategy,RealWorldGroundedStrategy, plus token-level ground-truth assignment - Conversation analysis — transcript parsing, microtopic extraction, message-type classification, agent-level statistics
- Source ingestion — HTML scraping (
ContentScraper) and file processing (PDF, Markdown, HTML → Markdown) - LangSmith integration — dataset upload, experiment runner, evaluator factory
Install
pnpm add @tars-inc/eval-lib@betaOptional peer dependencies — install whichever providers you use:
pnpm add openai # OpenAIEmbedder, pipeline LLM client
pnpm add cohere-ai # CohereEmbedder, CohereReranker
pnpm add @anthropic-ai/sdk # Claude-based conversation classification
pnpm add langsmith # LangSmith dataset / experiment runnerQuick start: span-based evaluation with a custom retriever
import {
createDocument,
createCorpus,
CallbackRetriever,
computeMetrics,
recall,
precision,
f1,
PositionAwareChunkId,
DocumentId,
} from "@tars-inc/eval-lib";
const corpus = createCorpus([
createDocument({
id: "faq.md",
content: "How do I reset my password? Click 'Forgot Password' on the login page.",
}),
]);
const retriever = new CallbackRetriever({
name: "keyword-matcher",
retrieveFn: async (query, k) => [
{
id: PositionAwareChunkId("chunk-1"),
content: "Click 'Forgot Password' on the login page.",
docId: DocumentId("faq.md"),
start: 28,
end: 70,
metadata: {},
},
],
});
await retriever.init(corpus);
const result = await computeMetrics({
retriever,
corpus,
metrics: [recall, precision, f1],
examples: [
{
inputs: { query: "how do I reset my password" },
outputs: {
relevantSpans: [{
docId: "faq.md",
start: 28,
end: 70,
text: "Click 'Forgot Password' on the login page.",
}],
},
metadata: {},
},
],
});
console.log(result);
await retriever.cleanup();Using a built-in retriever preset
import { createHybridRerankedRetriever } from "@tars-inc/eval-lib";
import { OpenAIEmbedder } from "@tars-inc/eval-lib/embedders/openai";
import { CohereReranker } from "@tars-inc/eval-lib/rerankers/cohere";
const embedder = await OpenAIEmbedder.create({ model: "text-embedding-3-small" });
const reranker = new CohereReranker({ model: "rerank-english-v3.0" });
const retriever = createHybridRerankedRetriever({ embedder, reranker });
await retriever.init(corpus);
const hits = await retriever.retrieve("how do I reset my password", 10);Generating a synthetic evaluation dataset
import {
SimpleStrategy,
GroundTruthAssigner,
RecursiveCharacterChunker,
openAIClientAdapter,
} from "@tars-inc/eval-lib";
import OpenAI from "openai";
const llm = openAIClientAdapter(new OpenAI());
const strategy = new SimpleStrategy({ queriesPerDoc: 5 });
const chunker = new RecursiveCharacterChunker({ chunkSize: 500, chunkOverlap: 50 });
const queries = await strategy.generate({ corpus, llm, model: "gpt-4o-mini" });
const groundTruth = await new GroundTruthAssigner({ chunker }).assign(queries, corpus);Other strategies:
DimensionDrivenStrategy— generates orthogonal coverage across dimensions you define (task type, difficulty, persona, etc.)RealWorldGroundedStrategy— matches a list of real user questions to documents via embedding similarity, then synthesizes variants
Sub-path entry points
Provider-specific code lives in sub-paths so you only pay for what you import:
| Path | Contents |
|---|---|
| @tars-inc/eval-lib | Core types, chunkers, retrievers, metrics, synthetic generation, presets |
| @tars-inc/eval-lib/embedders/openai | OpenAIEmbedder |
| @tars-inc/eval-lib/embedders/cohere | CohereEmbedder |
| @tars-inc/eval-lib/embedders/voyage | VoyageEmbedder |
| @tars-inc/eval-lib/embedders/jina | JinaEmbedder |
| @tars-inc/eval-lib/rerankers/cohere | CohereReranker |
| @tars-inc/eval-lib/rerankers/jina | JinaReranker |
| @tars-inc/eval-lib/rerankers/voyage | VoyageReranker |
| @tars-inc/eval-lib/pipeline/internals | BM25SearchIndex, fusion (weightedScoreFusion, reciprocalRankFusion), dimension discovery, refinement defaults |
| @tars-inc/eval-lib/pipeline/llm-openai | OpenAIPipelineLLM for query expansion / rewrite |
| @tars-inc/eval-lib/llm | createLLMClient, createEmbedder, getModel, DEFAULT_MODEL (Node-only) |
| @tars-inc/eval-lib/langsmith | getLangSmithClient, uploadDataset, runLangSmithExperiment, createLangSmithEvaluator (Node-only) |
| @tars-inc/eval-lib/utils | Hashing, span helpers, retry, concurrency, cosine similarity |
| @tars-inc/eval-lib/shared | Constants and shared types (JobStatus, SerializedSpan, ExperimentResult) |
| @tars-inc/eval-lib/file-processing | processFile, htmlToMarkdown, pdfToMarkdown |
| @tars-inc/eval-lib/scraper | ContentScraper, filterLinks, normalizeUrl, seed-entity helpers |
| @tars-inc/eval-lib/registry | Component registries for embedders, rerankers, chunkers, strategies, presets |
| @tars-inc/eval-lib/data-analysis | parseTranscript, parseBotFlowInput, computeBasicStats, classifyMessageTypes, extractMicrotopics |
Key concepts
PositionAwareChunker— chunkers that preservestart/endcharacter offsets in the source document. Required for span-based metrics.Retrieverinterface —init(corpus)→retrieve(query, k)→cleanup(). ReturnsPositionAwareChunk[]with character offsets, enabling span-overlap metrics rather than chunk-ID matching.- Span-based metrics —
recall,precision,iou,f1operate onCharacterSpan[]and compute character-level overlap.mergeOverlappingSpanscoalesces before comparison. PipelineRetriever— config-driven retriever composed ofIndexConfig×QueryConfig×SearchConfig×RefinementStepConfig[]. Use the preset factories (createBaselineVectorRagRetriever,createBM25Retriever,createHybridRetriever,createHybridRerankedRetriever) for common combinations.
What this library is not
- Not a vector database —
InMemoryVectorStoreis included for dev/test only - Not an LLM provider — wraps OpenAI / Cohere / Anthropic SDKs
- Not a UI library or deployment platform
- Not a multi-turn chat engine — focused on single-turn retrieval and conversation analysis
Source
Source, tests, and end-to-end examples: Tars-Technologies/cx-agent-evals under packages/eval-lib/.
License
MIT
