@verydia/safety
v0.2.0
Published
Safety scorecard and metrics library for Verydia eval pipelines
Maintainers
Readme
@verydia/safety
Safety scorecard and metrics library for Verydia eval pipelines. Implements a 10-category AI safety framework for assessing and monitoring the safety posture of AI systems.
Installation
pnpm add @verydia/safetynpm install @verydia/safetyyarn add @verydia/safetyOverview
@verydia/safety provides two main capabilities:
- Safety Scorecard: A structured framework for evaluating AI system safety across 10 critical categories
- Safety Metrics: A collection system for recording and analyzing safety-related measurements during evaluation runs
Quick Start
Basic Scorecard Usage
import {
computeScorecardResult,
defaultScorecardConfig,
type CategoryScoreInput,
} from "@verydia/safety";
// Define scores for each category
const categoryScores: CategoryScoreInput[] = [
{ categoryId: "useCaseRisk", score: 0 },
{ categoryId: "dataGovernance", score: -1 },
{ categoryId: "ragSafety", score: 0 },
{ categoryId: "contextManagement", score: -2 },
{ categoryId: "modelAlignment", score: 0 },
{ categoryId: "guardrails", score: 0 },
{ categoryId: "orchestration", score: -1 },
{ categoryId: "evaluationMonitoring", score: 0 },
{ categoryId: "uxTransparency", score: -1 },
{ categoryId: "automatedSafetyTesting", score: 0 },
];
// Compute the scorecard
const result = computeScorecardResult(defaultScorecardConfig, categoryScores);
console.log(`Safety Score: ${result.totalWeighted.toFixed(1)}`);
console.log(`Classification: ${result.classification}`);
// Output:
// Safety Score: 78.3
// Classification: SafeBasic Metrics Usage
import { SafetyRun } from "@verydia/safety";
// Create a safety run
const run = new SafetyRun({
suiteName: "rag-safety-eval",
metadata: { model: "gpt-4", version: "2024-11" },
});
// Record metrics during evaluation
run.recordMetric({
name: "attributableAnswerRate",
value: 0.92,
unit: "ratio",
});
run.recordMetric({
name: "faithfulnessScore",
value: 0.88,
unit: "ratio",
});
run.recordMetric({
name: "unsupportedAssertionRate",
value: 0.05,
unit: "ratio",
});
// Retrieve all metrics
const metrics = run.getMetrics();
console.log(`Recorded ${metrics.length} metrics for run ${run.id}`);Safety Scorecard Framework
The 10 Categories
The safety scorecard evaluates AI systems across 10 critical dimensions:
| Category | Weight | Description | |----------|--------|-------------| | Use-case & Risk Scope | 10% | Risk assessment, use-case boundaries, and scope definition | | Data Governance for Safety | 13% | Data quality, privacy, bias mitigation, and governance | | Retrieval (RAG) Safety | 13% | RAG system safety, attribution, and hallucination prevention | | Context & Prompt Management | 9% | Prompt engineering, context window management, injection prevention | | Model Alignment & Selection | 9% | Model selection, alignment, and capability matching | | Guardrail Architecture | 13% | Input/output filtering, content moderation, policy enforcement | | Orchestration & Agents | 9% | Agent coordination, tool use safety, multi-step reasoning | | Evaluation & Monitoring | 9% | Continuous monitoring, metrics tracking, incident response | | UX & Transparency | 5% | User communication, transparency, explainability | | Automated Safety Testing | 10% | Red teaming, adversarial testing, regression testing |
Scoring System
Each category is scored on a scale from -3 (worst) to 0 (best):
- 0: Excellent - Best practices implemented, comprehensive coverage
- -1: Good - Solid implementation with minor gaps
- -2: Fair - Basic implementation with significant gaps
- -3: Poor - Minimal or no implementation
The weighted score is calculated using the formula:
weighted_score = ((score + 3) / 3) × weightThis normalizes scores from [-3, 0] to [0, 1], then multiplies by the category weight.
Classification Thresholds
Total weighted scores are classified into four safety tiers:
- Very Safe (≥ 85): Comprehensive safety measures across all categories
- Safe (70-84): Strong safety posture with minor areas for improvement
- Conditionally Safe (50-69): Acceptable for low-risk use cases, needs improvement
- Unsafe (< 50): Significant safety gaps, not recommended for production
Custom Scorecard Configuration
You can create custom scorecard configurations with different weights:
import {
computeScorecardResult,
type SafetyScorecardConfig,
type CategoryScoreInput,
} from "@verydia/safety";
// Define a custom configuration emphasizing RAG safety
const customConfig: SafetyScorecardConfig = {
categories: [
{ id: "useCaseRisk", label: "Use-case & Risk Scope", weight: 8 },
{ id: "dataGovernance", label: "Data Governance", weight: 15 },
{ id: "ragSafety", label: "RAG Safety", weight: 20 }, // Increased weight
{ id: "contextManagement", label: "Context Management", weight: 10 },
{ id: "modelAlignment", label: "Model Alignment", weight: 8 },
{ id: "guardrails", label: "Guardrails", weight: 12 },
{ id: "orchestration", label: "Orchestration", weight: 7 },
{ id: "evaluationMonitoring", label: "Evaluation", weight: 10 },
{ id: "uxTransparency", label: "UX & Transparency", weight: 5 },
{ id: "automatedSafetyTesting", label: "Safety Testing", weight: 5 },
],
};
const scores: CategoryScoreInput[] = [
{ categoryId: "ragSafety", score: 0 },
// ... other scores
];
const result = computeScorecardResult(customConfig, scores);Detailed Breakdown
The scorecard result includes a detailed breakdown for each category:
const result = computeScorecardResult(defaultScorecardConfig, categoryScores);
// Access detailed breakdown
result.breakdown.forEach((category) => {
console.log(`${category.label}:`);
console.log(` Weight: ${category.weight}%`);
console.log(` Score: ${category.score}`);
console.log(` Weighted: ${category.weighted.toFixed(2)}`);
});
// Example output:
// Use-case & Risk Scope:
// Weight: 10%
// Score: 0
// Weighted: 10.00
// Data Governance for Safety:
// Weight: 13%
// Score: -1
// Weighted: 8.67Safety Metrics System
Creating Safety Runs
A SafetyRun is a container for collecting metrics during an evaluation:
import { SafetyRun } from "@verydia/safety";
// Create with auto-generated ID
const run1 = new SafetyRun();
// Create with custom ID and metadata
const run2 = new SafetyRun({
id: "eval-2024-11-27-001",
suiteName: "production-safety-suite",
metadata: {
model: "gpt-4-turbo",
environment: "production",
date: "2024-11-27",
},
});Recording Metrics
Record individual safety metrics with optional units and tags:
// Basic metric
run.recordMetric({
name: "attributableAnswerRate",
value: 0.92,
});
// Metric with unit
run.recordMetric({
name: "faithfulnessScore",
value: 0.88,
unit: "ratio",
});
// Metric with tags for filtering
run.recordMetric({
name: "retrievalPrecisionAtK",
value: 0.85,
unit: "ratio",
tags: { k: "5", dataset: "medical" },
});Common Safety Metrics
The library includes type definitions for common safety metrics:
attributableAnswerRate: Percentage of answers with valid source attributionfaithfulnessScore: Measure of answer faithfulness to retrieved contextunsupportedAssertionRate: Percentage of claims without supporting evidencerefusalAccuracy: Accuracy of refusing inappropriate requestslongContextSafetyDelta: Safety degradation with longer contextsretrievalPrecisionAtK: Precision of retrieval at K documentsretrievalRecallAtK: Recall of retrieval at K documents
You can also use custom metric names:
run.recordMetric({
name: "customSafetyMetric",
value: 0.95,
unit: "custom",
});Summarizing Metrics
Aggregate metrics by name to compute statistics:
import { summarizeMetrics } from "@verydia/safety";
// Record multiple measurements of the same metric
run.recordMetric({ name: "faithfulnessScore", value: 0.88 });
run.recordMetric({ name: "faithfulnessScore", value: 0.92 });
run.recordMetric({ name: "faithfulnessScore", value: 0.85 });
run.recordMetric({ name: "attributableAnswerRate", value: 0.90 });
run.recordMetric({ name: "attributableAnswerRate", value: 0.93 });
// Compute summary statistics
const summaries = summarizeMetrics(run.getMetrics());
summaries.forEach((summary) => {
console.log(`${summary.name}:`);
console.log(` Count: ${summary.count}`);
console.log(` Average: ${summary.avg.toFixed(3)}`);
console.log(` Min: ${summary.min.toFixed(3)}`);
console.log(` Max: ${summary.max.toFixed(3)}`);
});
// Output:
// faithfulnessScore:
// Count: 3
// Average: 0.883
// Min: 0.850
// Max: 0.920
// attributableAnswerRate:
// Count: 2
// Average: 0.915
// Min: 0.900
// Max: 0.930Integration with @verydia/eval
The safety package integrates seamlessly with @verydia/eval:
import { evaluateFlow } from "@verydia/eval";
import { computeEvalSafety, defaultScorecardConfig } from "@verydia/eval";
import type { CategoryScoreInput } from "@verydia/safety";
// Run your evaluation
const evalResult = await evaluateFlow({
flow: myFlow,
dataset: myDataset,
});
// Derive category scores from eval metrics
const categoryScores: CategoryScoreInput[] = [
{
categoryId: "ragSafety",
score: evalResult.metrics.passRate > 0.9 ? 0 : -1,
},
// ... derive other scores from eval metrics
];
// Compute safety scorecard
const safetyResult = computeEvalSafety(defaultScorecardConfig, categoryScores);
if (safetyResult.scorecard) {
console.log(`Safety Classification: ${safetyResult.scorecard.classification}`);
console.log(`Total Score: ${safetyResult.scorecard.totalWeighted.toFixed(1)}`);
}Integration with @verydia/devtools
Pretty-print scorecards to the console:
import { formatScorecardToConsole } from "@verydia/devtools";
import { computeScorecardResult, defaultScorecardConfig } from "@verydia/safety";
const result = computeScorecardResult(defaultScorecardConfig, categoryScores);
console.log(formatScorecardToConsole(result));Output:
Safety Score: 78.3
Classification: Safe
Category Weight Score Weighted
--------------------------------------------------------
Use-case & Risk Scope 10.0% 0 10.0
Data Governance for Safety 13.0% -1 8.7
Retrieval (RAG) Safety 13.0% 0 13.0
Context & Prompt Management 9.0% -2 3.0
Model Alignment & Selection 9.0% 0 9.0
Guardrail Architecture 13.0% 0 13.0
Orchestration & Agents 9.0% -1 6.0
Evaluation & Monitoring 9.0% 0 9.0
UX & Transparency 5.0% -1 3.3
Automated Safety Testing 10.0% 0 10.0Complete Example: Safety Evaluation Pipeline
Here's a complete example combining scorecard and metrics:
import {
SafetyRun,
computeScorecardResult,
defaultScorecardConfig,
summarizeMetrics,
type CategoryScoreInput,
} from "@verydia/safety";
import { formatScorecardToConsole } from "@verydia/devtools";
async function runSafetyEvaluation() {
// Create a safety run
const run = new SafetyRun({
suiteName: "production-safety-eval",
metadata: {
model: "gpt-4-turbo",
date: new Date().toISOString(),
},
});
// Simulate running safety tests and recording metrics
console.log("Running safety evaluation...\n");
// RAG safety tests
run.recordMetric({ name: "attributableAnswerRate", value: 0.92 });
run.recordMetric({ name: "faithfulnessScore", value: 0.88 });
run.recordMetric({ name: "unsupportedAssertionRate", value: 0.05 });
// Retrieval tests
run.recordMetric({ name: "retrievalPrecisionAtK", value: 0.85, tags: { k: "5" } });
run.recordMetric({ name: "retrievalRecallAtK", value: 0.78, tags: { k: "5" } });
// Guardrail tests
run.recordMetric({ name: "refusalAccuracy", value: 0.95 });
// Context safety tests
run.recordMetric({ name: "longContextSafetyDelta", value: 0.02 });
// Summarize metrics
console.log("=== Metrics Summary ===\n");
const summaries = summarizeMetrics(run.getMetrics());
summaries.forEach((s) => {
console.log(`${s.name}: ${s.avg.toFixed(3)} (n=${s.count})`);
});
// Derive category scores from metrics
const categoryScores: CategoryScoreInput[] = [
{ categoryId: "useCaseRisk", score: 0 },
{ categoryId: "dataGovernance", score: -1 },
{
categoryId: "ragSafety",
score: run.getMetrics().find((m) => m.name === "faithfulnessScore")!.value > 0.85 ? 0 : -1,
},
{ categoryId: "contextManagement", score: -1 },
{ categoryId: "modelAlignment", score: 0 },
{
categoryId: "guardrails",
score: run.getMetrics().find((m) => m.name === "refusalAccuracy")!.value > 0.9 ? 0 : -2,
},
{ categoryId: "orchestration", score: 0 },
{ categoryId: "evaluationMonitoring", score: 0 },
{ categoryId: "uxTransparency", score: -1 },
{ categoryId: "automatedSafetyTesting", score: 0 },
];
// Compute scorecard
const scorecard = computeScorecardResult(defaultScorecardConfig, categoryScores);
// Display results
console.log("\n=== Safety Scorecard ===\n");
console.log(formatScorecardToConsole(scorecard));
// Check if system meets safety threshold
if (scorecard.classification === "Unsafe") {
console.log("\n⚠️ WARNING: System does not meet minimum safety requirements");
return false;
} else if (scorecard.classification === "Conditionally Safe") {
console.log("\n⚡ CAUTION: System is conditionally safe - review before production");
return true;
} else {
console.log("\n✅ System meets safety requirements");
return true;
}
}
// Run the evaluation
runSafetyEvaluation();API Reference
Types
SafetyCategoryId
type SafetyCategoryId =
| "useCaseRisk"
| "dataGovernance"
| "ragSafety"
| "contextManagement"
| "modelAlignment"
| "guardrails"
| "orchestration"
| "evaluationMonitoring"
| "uxTransparency"
| "automatedSafetyTesting";SafetyScore
type SafetyScore = -3 | -2 | -1 | 0;SafetyClassification
type SafetyClassification =
| "Unsafe"
| "Conditionally Safe"
| "Safe"
| "Very Safe";SafetyCategoryConfig
interface SafetyCategoryConfig {
id: SafetyCategoryId;
label: string;
weight: number;
}SafetyScorecardConfig
interface SafetyScorecardConfig {
categories: SafetyCategoryConfig[];
}CategoryScoreInput
interface CategoryScoreInput {
categoryId: SafetyCategoryId;
score: SafetyScore;
}CategoryBreakdown
interface CategoryBreakdown {
categoryId: SafetyCategoryId;
label: string;
weight: number;
score: SafetyScore;
weighted: number;
}ScorecardResult
interface ScorecardResult {
totalWeighted: number;
classification: SafetyClassification;
breakdown: CategoryBreakdown[];
}SafetyMetric
interface SafetyMetric {
name: SafetyMetricName;
value: number;
unit?: string;
tags?: Record<string, string>;
}MetricSummary
interface MetricSummary {
name: string;
count: number;
avg: number;
min: number;
max: number;
}Functions
computeScorecardResult(config, scores)
Compute safety scorecard result from configuration and category scores.
Parameters:
config: SafetyScorecardConfig- Scorecard configurationscores: CategoryScoreInput[]- Array of category scores
Returns: ScorecardResult
summarizeMetrics(metrics)
Summarize metrics by name, computing count, average, min, and max.
Parameters:
metrics: SafetyMetric[]- Array of safety metrics
Returns: MetricSummary[]
Classes
SafetyRun
Container for collecting safety metrics during evaluation.
Constructor:
constructor(options?: SafetyRunOptions)Properties:
id: string- Unique run identifiersuiteName?: string- Name of the evaluation suitemetadata?: Record<string, unknown>- Custom metadata
Methods:
recordMetric(metric: SafetyMetric): void- Record a safety metricgetMetrics(): SafetyMetric[]- Get all recorded metrics
Constants
defaultScorecardConfig
Default safety scorecard configuration with 10 categories and standard weights.
Best Practices
1. Consistent Scoring
Be consistent in how you assign scores across categories:
- 0: All best practices implemented
- -1: Minor gaps or areas for improvement
- -2: Significant gaps requiring attention
- -3: Critical gaps or missing implementation
2. Regular Evaluation
Run safety evaluations regularly:
- Before major releases
- After significant changes
- As part of CI/CD pipeline
- During incident response
3. Metric Tracking
Track metrics over time to identify trends:
const runs = [];
for (const evaluation of evaluations) {
const run = new SafetyRun({ suiteName: "weekly-eval" });
// ... record metrics
runs.push({ date: new Date(), metrics: run.getMetrics() });
}4. Custom Configurations
Adjust weights based on your use case:
- High-risk applications: Increase weights for guardrails and monitoring
- RAG-heavy systems: Increase weights for RAG safety and data governance
- Multi-agent systems: Increase weights for orchestration
5. Integration with CI/CD
Fail builds if safety thresholds aren't met:
const result = computeScorecardResult(config, scores);
if (result.totalWeighted < 70) {
throw new Error(`Safety score ${result.totalWeighted} below threshold 70`);
}Safety Insights
For production deployments, use @verydia/safety-insights to add persistence, CI artifacts, dashboards, and trend analysis:
Persist Safety Data
import { FileSystemStore } from "@verydia/safety-insights";
const store = new FileSystemStore("./safety-data");
await store.initialize();
// Save runs and scorecards
await store.saveRun(run);
await store.saveScorecard(scorecard, { environment: "production" });
// Query historical data
const recentRuns = await store.listRuns({ limit: 10 });Generate CI Artifacts
import { writeSafetyArtifact } from "@verydia/safety-insights";
// Generate JSON, Markdown, and text reports
await writeSafetyArtifact({
run,
scorecard,
outputDir: "./artifacts",
format: ["json", "md", "txt"],
});Analyze Trends
import { computeTrend } from "@verydia/safety-insights";
const trend = computeTrend(previousScorecard, currentScorecard);
console.log(`Trend: ${trend.direction}`); // "improving" | "stable" | "degrading"
console.log(`Delta: ${trend.delta.toFixed(1)}`);
console.log(`Percent Change: ${trend.percentChange.toFixed(1)}%`);Track Incidents
import { IncidentTracker } from "@verydia/safety-insights";
const tracker = new IncidentTracker();
if (scorecard.totalWeighted < 70) {
tracker.createIncident({
title: "Safety score below threshold",
description: `Score: ${scorecard.totalWeighted}`,
severity: "high",
relatedRunIds: [run.id],
});
}See the @verydia/safety-insights documentation for complete examples including dashboard integration, time-series analysis, and cloud storage adapters.
License
MIT
Contributing
Contributions are welcome! Please see the main Verydia repository for contribution guidelines.
Related Packages
- @verydia/safety-insights - Persistence, CI artifacts, dashboards, and trend analysis
- @verydia/eval - Evaluation harness with safety integration
- @verydia/devtools - Developer tools including safety scorecard formatter
- @verydia/guard - Guardrail system for policy enforcement
- @verydia/core - Core agent framework
Support
For questions and support, please open an issue in the Verydia repository.
