mongodb-assistant-eval
v0.0.8
Published
Evaluation library for the MongoDB Assistant API.
Readme
mongodb-assistant-eval
Evaluation library for the MongoDB Assistant API. This package bundles dataset schemas and loaders, evaluation scorers, metrics, and a programmatic CLI helper.
Install
npm install mongodb-assistant-evalRequired Peer Dependencies
These are expected to already be in your project:
npm install braintrust openai zodOptional Peer Dependencies
Install only the ones you need:
# LLM-as-judge scorers (Factuality, Faithfulness, PromptAdherence, ContextRelevancy)
npm install ai @ai-sdk/openai @ai-sdk/azure autoevals
# MongoDB dataset loader (`mongodb@^6.3.0` or `mongodb@^7.1.0`)
npm install mongodb
# Slack reporting
npm install @slack/web-api slack-block-builder slackify-markdown| Dependency | Required for |
|---|---|
| braintrust | Core eval runner, datasets, results |
| openai | Tool call types used by scorers |
| zod | Schema validation, makeConversationEvalCaseSchema |
| ai + @ai-sdk/openai + @ai-sdk/azure | createEvalJudgeModel for LLM-as-judge scorers |
| autoevals | Built-in LLM scorers (Factuality, Faithfulness, etc.) |
| mongodb (^6.3.0 or ^7.1.0) | getConversationEvalCasesFromMongoDb loader |
| @slack/web-api + slack-block-builder + slackify-markdown | Slack evaluation reporting |
Dataset Schema (Base)
import {
ConversationEvalCaseSchema,
type ConversationEvalCaseBase,
} from "mongodb-assistant-eval/schema";
const dataset: ConversationEvalCaseBase[] = [
{
name: "hello-world",
input: {
messages: [{ role: "user", content: "Hello!" }],
},
expected: {
referenceAnswer: "Hi there!",
},
},
];
ConversationEvalCaseSchema.array().parse(dataset);Loaders
import { getConversationEvalCasesFromYaml } from "mongodb-assistant-eval/datasets";
import fs from "fs";
const yaml = fs.readFileSync("./cases.yaml", "utf8");
const cases = getConversationEvalCasesFromYaml({ yaml });CLI Helper
This package does not ship a binary. Use the helper to build your own CLI.
import { createMongoDbAssistantEvalCli } from "mongodb-assistant-eval/cli";
import { createSlackReporter } from "mongodb-assistant-eval/reporters";
const cli = createMongoDbAssistantEvalCli({
projectName: "my-braintrust-project",
datasets: { /* ... */ },
models: { /* ... */ },
tasks: { /* ... */ },
scorers: { /* ... */ },
validators: { /* ... */ },
reporters: {
slack: createSlackReporter({
slackToken: process.env.SLACK_BOT_TOKEN ?? "",
slackConversationId: process.env.SLACK_EVAL_CHANNEL_ID ?? "",
}),
},
});
cli.parse();Scorers and Metrics
import type { ConversationEvalScorer } from "mongodb-assistant-eval/eval";
import { binaryNdcgAtK, normalizeUrl } from "mongodb-assistant-eval/metrics";
export const SearchQuality: ConversationEvalScorer = async ({ expected, output }) => {
if (!expected.links || !output.urls) {
return { name: "SearchQuality", score: null };
}
const score = binaryNdcgAtK(
expected.links,
output.urls,
(a, b) => normalizeUrl({ url: a }) === normalizeUrl({ url: b })
);
return { name: "SearchQuality", score };
};