@verydia/safety

v0.2.0

Published

3 months ago

Safety scorecard and metrics library for Verydia eval pipelines

0High
0Medium
0Low

verydia

verydia safety scorecard eval metrics

@verydia/safety

Safety scorecard and metrics library for Verydia eval pipelines. Implements a 10-category AI safety framework for assessing and monitoring the safety posture of AI systems.

Installation

pnpm add @verydia/safety

npm install @verydia/safety

yarn add @verydia/safety

Overview

@verydia/safety provides two main capabilities:

Safety Scorecard: A structured framework for evaluating AI system safety across 10 critical categories
Safety Metrics: A collection system for recording and analyzing safety-related measurements during evaluation runs

Quick Start

Basic Scorecard Usage

import {
  computeScorecardResult,
  defaultScorecardConfig,
  type CategoryScoreInput,
} from "@verydia/safety";

// Define scores for each category
const categoryScores: CategoryScoreInput[] = [
  { categoryId: "useCaseRisk", score: 0 },
  { categoryId: "dataGovernance", score: -1 },
  { categoryId: "ragSafety", score: 0 },
  { categoryId: "contextManagement", score: -2 },
  { categoryId: "modelAlignment", score: 0 },
  { categoryId: "guardrails", score: 0 },
  { categoryId: "orchestration", score: -1 },
  { categoryId: "evaluationMonitoring", score: 0 },
  { categoryId: "uxTransparency", score: -1 },
  { categoryId: "automatedSafetyTesting", score: 0 },
];

// Compute the scorecard
const result = computeScorecardResult(defaultScorecardConfig, categoryScores);

console.log(`Safety Score: ${result.totalWeighted.toFixed(1)}`);
console.log(`Classification: ${result.classification}`);
// Output:
// Safety Score: 78.3
// Classification: Safe

Basic Metrics Usage

import { SafetyRun } from "@verydia/safety";

// Create a safety run
const run = new SafetyRun({
  suiteName: "rag-safety-eval",
  metadata: { model: "gpt-4", version: "2024-11" },
});

// Record metrics during evaluation
run.recordMetric({
  name: "attributableAnswerRate",
  value: 0.92,
  unit: "ratio",
});

run.recordMetric({
  name: "faithfulnessScore",
  value: 0.88,
  unit: "ratio",
});

run.recordMetric({
  name: "unsupportedAssertionRate",
  value: 0.05,
  unit: "ratio",
});

// Retrieve all metrics
const metrics = run.getMetrics();
console.log(`Recorded ${metrics.length} metrics for run ${run.id}`);

Safety Scorecard Framework

The 10 Categories

The safety scorecard evaluates AI systems across 10 critical dimensions:

| Category | Weight | Description | |----------|--------|-------------| | Use-case & Risk Scope | 10% | Risk assessment, use-case boundaries, and scope definition | | Data Governance for Safety | 13% | Data quality, privacy, bias mitigation, and governance | | Retrieval (RAG) Safety | 13% | RAG system safety, attribution, and hallucination prevention | | Context & Prompt Management | 9% | Prompt engineering, context window management, injection prevention | | Model Alignment & Selection | 9% | Model selection, alignment, and capability matching | | Guardrail Architecture | 13% | Input/output filtering, content moderation, policy enforcement | | Orchestration & Agents | 9% | Agent coordination, tool use safety, multi-step reasoning | | Evaluation & Monitoring | 9% | Continuous monitoring, metrics tracking, incident response | | UX & Transparency | 5% | User communication, transparency, explainability | | Automated Safety Testing | 10% | Red teaming, adversarial testing, regression testing |

Scoring System

Each category is scored on a scale from -3 (worst) to 0 (best):

0: Excellent - Best practices implemented, comprehensive coverage
-1: Good - Solid implementation with minor gaps
-2: Fair - Basic implementation with significant gaps
-3: Poor - Minimal or no implementation

The weighted score is calculated using the formula:

weighted_score = ((score + 3) / 3) × weight

This normalizes scores from [-3, 0] to [0, 1], then multiplies by the category weight.

Classification Thresholds

Total weighted scores are classified into four safety tiers:

Very Safe (≥ 85): Comprehensive safety measures across all categories
Safe (70-84): Strong safety posture with minor areas for improvement
Conditionally Safe (50-69): Acceptable for low-risk use cases, needs improvement
Unsafe (< 50): Significant safety gaps, not recommended for production

Custom Scorecard Configuration

You can create custom scorecard configurations with different weights:

import {
  computeScorecardResult,
  type SafetyScorecardConfig,
  type CategoryScoreInput,
} from "@verydia/safety";

// Define a custom configuration emphasizing RAG safety
const customConfig: SafetyScorecardConfig = {
  categories: [
    { id: "useCaseRisk", label: "Use-case & Risk Scope", weight: 8 },
    { id: "dataGovernance", label: "Data Governance", weight: 15 },
    { id: "ragSafety", label: "RAG Safety", weight: 20 }, // Increased weight
    { id: "contextManagement", label: "Context Management", weight: 10 },
    { id: "modelAlignment", label: "Model Alignment", weight: 8 },
    { id: "guardrails", label: "Guardrails", weight: 12 },
    { id: "orchestration", label: "Orchestration", weight: 7 },
    { id: "evaluationMonitoring", label: "Evaluation", weight: 10 },
    { id: "uxTransparency", label: "UX & Transparency", weight: 5 },
    { id: "automatedSafetyTesting", label: "Safety Testing", weight: 5 },
  ],
};

const scores: CategoryScoreInput[] = [
  { categoryId: "ragSafety", score: 0 },
  // ... other scores
];

const result = computeScorecardResult(customConfig, scores);

Detailed Breakdown

The scorecard result includes a detailed breakdown for each category:

const result = computeScorecardResult(defaultScorecardConfig, categoryScores);

// Access detailed breakdown
result.breakdown.forEach((category) => {
  console.log(`${category.label}:`);
  console.log(`  Weight: ${category.weight}%`);
  console.log(`  Score: ${category.score}`);
  console.log(`  Weighted: ${category.weighted.toFixed(2)}`);
});

// Example output:
// Use-case & Risk Scope:
//   Weight: 10%
//   Score: 0
//   Weighted: 10.00
// Data Governance for Safety:
//   Weight: 13%
//   Score: -1
//   Weighted: 8.67

Safety Metrics System

Creating Safety Runs

A SafetyRun is a container for collecting metrics during an evaluation:

import { SafetyRun } from "@verydia/safety";

// Create with auto-generated ID
const run1 = new SafetyRun();

// Create with custom ID and metadata
const run2 = new SafetyRun({
  id: "eval-2024-11-27-001",
  suiteName: "production-safety-suite",
  metadata: {
    model: "gpt-4-turbo",
    environment: "production",
    date: "2024-11-27",
  },
});

Recording Metrics

Record individual safety metrics with optional units and tags:

// Basic metric
run.recordMetric({
  name: "attributableAnswerRate",
  value: 0.92,
});

// Metric with unit
run.recordMetric({
  name: "faithfulnessScore",
  value: 0.88,
  unit: "ratio",
});

// Metric with tags for filtering
run.recordMetric({
  name: "retrievalPrecisionAtK",
  value: 0.85,
  unit: "ratio",
  tags: { k: "5", dataset: "medical" },
});

Common Safety Metrics

The library includes type definitions for common safety metrics:

attributableAnswerRate: Percentage of answers with valid source attribution
faithfulnessScore: Measure of answer faithfulness to retrieved context
unsupportedAssertionRate: Percentage of claims without supporting evidence
refusalAccuracy: Accuracy of refusing inappropriate requests
longContextSafetyDelta: Safety degradation with longer contexts
retrievalPrecisionAtK: Precision of retrieval at K documents
retrievalRecallAtK: Recall of retrieval at K documents

You can also use custom metric names:

run.recordMetric({
  name: "customSafetyMetric",
  value: 0.95,
  unit: "custom",
});

Summarizing Metrics

Aggregate metrics by name to compute statistics:

import { summarizeMetrics } from "@verydia/safety";

// Record multiple measurements of the same metric
run.recordMetric({ name: "faithfulnessScore", value: 0.88 });
run.recordMetric({ name: "faithfulnessScore", value: 0.92 });
run.recordMetric({ name: "faithfulnessScore", value: 0.85 });
run.recordMetric({ name: "attributableAnswerRate", value: 0.90 });
run.recordMetric({ name: "attributableAnswerRate", value: 0.93 });

// Compute summary statistics
const summaries = summarizeMetrics(run.getMetrics());

summaries.forEach((summary) => {
  console.log(`${summary.name}:`);
  console.log(`  Count: ${summary.count}`);
  console.log(`  Average: ${summary.avg.toFixed(3)}`);
  console.log(`  Min: ${summary.min.toFixed(3)}`);
  console.log(`  Max: ${summary.max.toFixed(3)}`);
});

// Output:
// faithfulnessScore:
//   Count: 3
//   Average: 0.883
//   Min: 0.850
//   Max: 0.920
// attributableAnswerRate:
//   Count: 2
//   Average: 0.915
//   Min: 0.900
//   Max: 0.930

Integration with @verydia/eval

The safety package integrates seamlessly with @verydia/eval:

import { evaluateFlow } from "@verydia/eval";
import { computeEvalSafety, defaultScorecardConfig } from "@verydia/eval";
import type { CategoryScoreInput } from "@verydia/safety";

// Run your evaluation
const evalResult = await evaluateFlow({
  flow: myFlow,
  dataset: myDataset,
});

// Derive category scores from eval metrics
const categoryScores: CategoryScoreInput[] = [
  {
    categoryId: "ragSafety",
    score: evalResult.metrics.passRate > 0.9 ? 0 : -1,
  },
  // ... derive other scores from eval metrics
];

// Compute safety scorecard
const safetyResult = computeEvalSafety(defaultScorecardConfig, categoryScores);

if (safetyResult.scorecard) {
  console.log(`Safety Classification: ${safetyResult.scorecard.classification}`);
  console.log(`Total Score: ${safetyResult.scorecard.totalWeighted.toFixed(1)}`);
}

Integration with @verydia/devtools

Pretty-print scorecards to the console:

import { formatScorecardToConsole } from "@verydia/devtools";
import { computeScorecardResult, defaultScorecardConfig } from "@verydia/safety";

const result = computeScorecardResult(defaultScorecardConfig, categoryScores);
console.log(formatScorecardToConsole(result));

Output:

Safety Score: 78.3
Classification: Safe

Category                         Weight  Score  Weighted
--------------------------------------------------------
Use-case & Risk Scope               10.0%     0      10.0
Data Governance for Safety          13.0%    -1       8.7
Retrieval (RAG) Safety              13.0%     0      13.0
Context & Prompt Management          9.0%    -2       3.0
Model Alignment & Selection          9.0%     0       9.0
Guardrail Architecture              13.0%     0      13.0
Orchestration & Agents               9.0%    -1       6.0
Evaluation & Monitoring              9.0%     0       9.0
UX & Transparency                    5.0%    -1       3.3
Automated Safety Testing            10.0%     0      10.0

Complete Example: Safety Evaluation Pipeline

Here's a complete example combining scorecard and metrics:

import {
  SafetyRun,
  computeScorecardResult,
  defaultScorecardConfig,
  summarizeMetrics,
  type CategoryScoreInput,
} from "@verydia/safety";
import { formatScorecardToConsole } from "@verydia/devtools";

async function runSafetyEvaluation() {
  // Create a safety run
  const run = new SafetyRun({
    suiteName: "production-safety-eval",
    metadata: {
      model: "gpt-4-turbo",
      date: new Date().toISOString(),
    },
  });

  // Simulate running safety tests and recording metrics
  console.log("Running safety evaluation...\n");

  // RAG safety tests
  run.recordMetric({ name: "attributableAnswerRate", value: 0.92 });
  run.recordMetric({ name: "faithfulnessScore", value: 0.88 });
  run.recordMetric({ name: "unsupportedAssertionRate", value: 0.05 });

  // Retrieval tests
  run.recordMetric({ name: "retrievalPrecisionAtK", value: 0.85, tags: { k: "5" } });
  run.recordMetric({ name: "retrievalRecallAtK", value: 0.78, tags: { k: "5" } });

  // Guardrail tests
  run.recordMetric({ name: "refusalAccuracy", value: 0.95 });

  // Context safety tests
  run.recordMetric({ name: "longContextSafetyDelta", value: 0.02 });

  // Summarize metrics
  console.log("=== Metrics Summary ===\n");
  const summaries = summarizeMetrics(run.getMetrics());
  summaries.forEach((s) => {
    console.log(`${s.name}: ${s.avg.toFixed(3)} (n=${s.count})`);
  });

  // Derive category scores from metrics
  const categoryScores: CategoryScoreInput[] = [
    { categoryId: "useCaseRisk", score: 0 },
    { categoryId: "dataGovernance", score: -1 },
    {
      categoryId: "ragSafety",
      score: run.getMetrics().find((m) => m.name === "faithfulnessScore")!.value > 0.85 ? 0 : -1,
    },
    { categoryId: "contextManagement", score: -1 },
    { categoryId: "modelAlignment", score: 0 },
    {
      categoryId: "guardrails",
      score: run.getMetrics().find((m) => m.name === "refusalAccuracy")!.value > 0.9 ? 0 : -2,
    },
    { categoryId: "orchestration", score: 0 },
    { categoryId: "evaluationMonitoring", score: 0 },
    { categoryId: "uxTransparency", score: -1 },
    { categoryId: "automatedSafetyTesting", score: 0 },
  ];

  // Compute scorecard
  const scorecard = computeScorecardResult(defaultScorecardConfig, categoryScores);

  // Display results
  console.log("\n=== Safety Scorecard ===\n");
  console.log(formatScorecardToConsole(scorecard));

  // Check if system meets safety threshold
  if (scorecard.classification === "Unsafe") {
    console.log("\n⚠️  WARNING: System does not meet minimum safety requirements");
    return false;
  } else if (scorecard.classification === "Conditionally Safe") {
    console.log("\n⚡ CAUTION: System is conditionally safe - review before production");
    return true;
  } else {
    console.log("\n✅ System meets safety requirements");
    return true;
  }
}

// Run the evaluation
runSafetyEvaluation();

API Reference

Types

`SafetyCategoryId`

type SafetyCategoryId =
  | "useCaseRisk"
  | "dataGovernance"
  | "ragSafety"
  | "contextManagement"
  | "modelAlignment"
  | "guardrails"
  | "orchestration"
  | "evaluationMonitoring"
  | "uxTransparency"
  | "automatedSafetyTesting";

`SafetyScore`

type SafetyScore = -3 | -2 | -1 | 0;

`SafetyClassification`

type SafetyClassification =
  | "Unsafe"
  | "Conditionally Safe"
  | "Safe"
  | "Very Safe";

`SafetyCategoryConfig`

interface SafetyCategoryConfig {
  id: SafetyCategoryId;
  label: string;
  weight: number;
}

`SafetyScorecardConfig`

interface SafetyScorecardConfig {
  categories: SafetyCategoryConfig[];
}

`CategoryScoreInput`

interface CategoryScoreInput {
  categoryId: SafetyCategoryId;
  score: SafetyScore;
}

`CategoryBreakdown`

interface CategoryBreakdown {
  categoryId: SafetyCategoryId;
  label: string;
  weight: number;
  score: SafetyScore;
  weighted: number;
}

`ScorecardResult`

interface ScorecardResult {
  totalWeighted: number;
  classification: SafetyClassification;
  breakdown: CategoryBreakdown[];
}

`SafetyMetric`

interface SafetyMetric {
  name: SafetyMetricName;
  value: number;
  unit?: string;
  tags?: Record<string, string>;
}

`MetricSummary`

interface MetricSummary {
  name: string;
  count: number;
  avg: number;
  min: number;
  max: number;
}

Functions

`computeScorecardResult(config, scores)`

Compute safety scorecard result from configuration and category scores.

Parameters:

config: SafetyScorecardConfig - Scorecard configuration
scores: CategoryScoreInput[] - Array of category scores

Returns: ScorecardResult

`summarizeMetrics(metrics)`

Summarize metrics by name, computing count, average, min, and max.

Parameters:

metrics: SafetyMetric[] - Array of safety metrics

Returns: MetricSummary[]

Classes

`SafetyRun`

Container for collecting safety metrics during evaluation.

Constructor:

constructor(options?: SafetyRunOptions)

Properties:

id: string - Unique run identifier
suiteName?: string - Name of the evaluation suite
metadata?: Record<string, unknown> - Custom metadata

Methods:

recordMetric(metric: SafetyMetric): void - Record a safety metric
getMetrics(): SafetyMetric[] - Get all recorded metrics

Constants

`defaultScorecardConfig`

Default safety scorecard configuration with 10 categories and standard weights.

Best Practices

1. Consistent Scoring

Be consistent in how you assign scores across categories:

0: All best practices implemented
-1: Minor gaps or areas for improvement
-2: Significant gaps requiring attention
-3: Critical gaps or missing implementation

2. Regular Evaluation

Run safety evaluations regularly:

Before major releases
After significant changes
As part of CI/CD pipeline
During incident response

3. Metric Tracking

Track metrics over time to identify trends:

const runs = [];
for (const evaluation of evaluations) {
  const run = new SafetyRun({ suiteName: "weekly-eval" });
  // ... record metrics
  runs.push({ date: new Date(), metrics: run.getMetrics() });
}

4. Custom Configurations

Adjust weights based on your use case:

High-risk applications: Increase weights for guardrails and monitoring
RAG-heavy systems: Increase weights for RAG safety and data governance
Multi-agent systems: Increase weights for orchestration

5. Integration with CI/CD

Fail builds if safety thresholds aren't met:

const result = computeScorecardResult(config, scores);
if (result.totalWeighted < 70) {
  throw new Error(`Safety score ${result.totalWeighted} below threshold 70`);
}

Safety Insights

For production deployments, use @verydia/safety-insights to add persistence, CI artifacts, dashboards, and trend analysis:

Persist Safety Data

import { FileSystemStore } from "@verydia/safety-insights";

const store = new FileSystemStore("./safety-data");
await store.initialize();

// Save runs and scorecards
await store.saveRun(run);
await store.saveScorecard(scorecard, { environment: "production" });

// Query historical data
const recentRuns = await store.listRuns({ limit: 10 });

Generate CI Artifacts

import { writeSafetyArtifact } from "@verydia/safety-insights";

// Generate JSON, Markdown, and text reports
await writeSafetyArtifact({
  run,
  scorecard,
  outputDir: "./artifacts",
  format: ["json", "md", "txt"],
});

Analyze Trends

import { computeTrend } from "@verydia/safety-insights";

const trend = computeTrend(previousScorecard, currentScorecard);

console.log(`Trend: ${trend.direction}`); // "improving" | "stable" | "degrading"
console.log(`Delta: ${trend.delta.toFixed(1)}`);
console.log(`Percent Change: ${trend.percentChange.toFixed(1)}%`);

Track Incidents

import { IncidentTracker } from "@verydia/safety-insights";

const tracker = new IncidentTracker();

if (scorecard.totalWeighted < 70) {
  tracker.createIncident({
    title: "Safety score below threshold",
    description: `Score: ${scorecard.totalWeighted}`,
    severity: "high",
    relatedRunIds: [run.id],
  });
}

See the @verydia/safety-insights documentation for complete examples including dashboard integration, time-series analysis, and cloud storage adapters.

License

MIT

Contributing

Contributions are welcome! Please see the main Verydia repository for contribution guidelines.

Related Packages

@verydia/safety-insights - Persistence, CI artifacts, dashboards, and trend analysis
@verydia/eval - Evaluation harness with safety integration
@verydia/devtools - Developer tools including safety scorecard formatter
@verydia/guard - Guardrail system for policy enforcement
@verydia/core - Core agent framework

Support

For questions and support, please open an issue in the Verydia repository.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@verydia/safety

Installation

Overview

Quick Start

Basic Scorecard Usage

Basic Metrics Usage

Safety Scorecard Framework

The 10 Categories

Scoring System

Classification Thresholds

Custom Scorecard Configuration

Detailed Breakdown

Safety Metrics System

Creating Safety Runs

Recording Metrics

Common Safety Metrics

Summarizing Metrics

Integration with @verydia/eval

Integration with @verydia/devtools

Complete Example: Safety Evaluation Pipeline

API Reference

Types

SafetyCategoryId

SafetyScore

SafetyClassification

SafetyCategoryConfig

SafetyScorecardConfig

CategoryScoreInput

CategoryBreakdown

ScorecardResult

SafetyMetric

MetricSummary

Functions

computeScorecardResult(config, scores)

summarizeMetrics(metrics)

Classes

SafetyRun

Constants

defaultScorecardConfig

Best Practices

1. Consistent Scoring

2. Regular Evaluation

3. Metric Tracking

4. Custom Configurations

5. Integration with CI/CD

Safety Insights

Persist Safety Data

Generate CI Artifacts

Analyze Trends

Track Incidents

License

Contributing

Related Packages

Support

`SafetyCategoryId`

`SafetyScore`

`SafetyClassification`

`SafetyCategoryConfig`

`SafetyScorecardConfig`

`CategoryScoreInput`

`CategoryBreakdown`

`ScorecardResult`

`SafetyMetric`

`MetricSummary`

`computeScorecardResult(config, scores)`

`summarizeMetrics(metrics)`

`SafetyRun`

`defaultScorecardConfig`