@verydia/safety-insights
v0.2.0
Published
Persistence, CI artifacts, dashboards, and trend analysis for Verydia safety evaluations
Maintainers
Readme
@verydia/safety-insights
Persistence, CI artifacts, dashboards, and trend analysis for Verydia safety evaluations. Extends @verydia/safety with production-ready tooling for tracking, storing, and visualizing safety metrics over time.
Installation
pnpm add @verydia/safety-insightsnpm install @verydia/safety-insightsyarn add @verydia/safety-insightsFeatures
- 📦 Persistence Layer - Store safety runs and scorecards with multiple adapters (FileSystem, Memory, Cloud stubs)
- 📄 CI Artifacts - Generate JSON, Markdown, and text reports for CI/CD pipelines
- 📊 Dashboard JSON - Create dashboard-ready data for visualization tools
- 📈 Trend Analysis - Compare scorecards over time and compute moving averages
- 🚨 Incident Tracking - Track and manage safety-related incidents
Quick Start
Persist Safety Data
import { FileSystemStore } from "@verydia/safety-insights";
import { SafetyRun, computeScorecardResult, defaultScorecardConfig } from "@verydia/safety";
// Create a file system store
const store = new FileSystemStore("./safety-data");
await store.initialize();
// Create and save a safety run
const run = new SafetyRun({ suiteName: "production-eval" });
run.recordMetric({ name: "faithfulnessScore", value: 0.92 });
await store.saveRun(run);
// Compute and save a scorecard
const scorecard = computeScorecardResult(defaultScorecardConfig, categoryScores);
await store.saveScorecard(scorecard, { environment: "production" });
// List all runs
const runs = await store.listRuns({ limit: 10 });Generate CI Artifacts
import { writeSafetyArtifact } from "@verydia/safety-insights";
// Write artifacts in multiple formats
const files = await writeSafetyArtifact({
run,
scorecard,
outputDir: "./artifacts",
format: ["json", "md", "txt"],
filename: "safety-report",
});
console.log(`Generated artifacts: ${files.join(", ")}`);Create Dashboard Data
import { createDashboardJson, createTimeSeriesDashboard } from "@verydia/safety-insights";
// Single data point
const dashboardData = createDashboardJson({
environment: "production",
run,
scorecard,
});
// Time series for trending
const scorecards = await store.listScorecards({ limit: 30 });
const timeSeries = createTimeSeriesDashboard("production", scorecards);Analyze Trends
import { computeTrend } from "@verydia/safety-insights";
const scorecards = await store.listScorecards({ limit: 2 });
const [current, previous] = scorecards;
const trend = computeTrend(previous.result, current.result);
console.log(`Trend: ${trend.direction}`);
console.log(`Delta: ${trend.delta.toFixed(1)}`);
console.log(`Percent Change: ${trend.percentChange.toFixed(1)}%`);
// Category-level trends
trend.categoryTrends.forEach((cat) => {
console.log(`${cat.label}: ${cat.direction} (${cat.delta.toFixed(1)})`);
});Store Adapters
Configuring the Store via Environment Variables
Use createStoreFromEnv() to automatically configure a store based on environment variables. This is ideal for CLI tools and applications that need flexible storage configuration.
import { createStoreFromEnv } from "@verydia/safety-insights";
// Uses environment variables to determine store type and configuration
const store = createStoreFromEnv();
await store.saveRun(run);
await store.saveScorecard(scorecard);Environment Variables:
| Variable | Description | Default |
|----------|-------------|---------|
| VERYDIA_SAFETY_STORE | Store type: fs, s3, or memory | fs |
| VERYDIA_SAFETY_FS_DIR | FileSystem base directory | .verydia/safety-insights |
| VERYDIA_SAFETY_S3_BUCKET | S3 bucket name (required for S3) | - |
| VERYDIA_SAFETY_S3_PREFIX | S3 key prefix | safety-insights |
| VERYDIA_SAFETY_S3_REGION | AWS region | us-east-1 |
| VERYDIA_SAFETY_S3_ENDPOINT | Custom S3 endpoint | - |
| VERYDIA_SAFETY_S3_FORCE_PATH_STYLE | Force path-style URLs (true/false) | false |
| AWS_ACCESS_KEY_ID | AWS access key | - |
| AWS_SECRET_ACCESS_KEY | AWS secret key | - |
| AWS_SESSION_TOKEN | AWS session token (optional) | - |
Example: Default FileSystem Store
# No configuration needed - uses FileSystem by default
node my-app.jsExample: AWS S3
export VERYDIA_SAFETY_STORE=s3
export VERYDIA_SAFETY_S3_BUCKET=my-safety-data
export VERYDIA_SAFETY_S3_REGION=us-east-1
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
node my-app.jsExample: NetApp StorageGrid
export VERYDIA_SAFETY_STORE=s3
export VERYDIA_SAFETY_S3_BUCKET=safety-data
export VERYDIA_SAFETY_S3_ENDPOINT=https://storagegrid.example.com
export VERYDIA_SAFETY_S3_FORCE_PATH_STYLE=true
export AWS_ACCESS_KEY_ID=your-storagegrid-access-key
export AWS_SECRET_ACCESS_KEY=your-storagegrid-secret-key
node my-app.jsExample: MinIO (Local Development)
export VERYDIA_SAFETY_STORE=s3
export VERYDIA_SAFETY_S3_BUCKET=safety-data
export VERYDIA_SAFETY_S3_ENDPOINT=http://localhost:9000
export VERYDIA_SAFETY_S3_FORCE_PATH_STYLE=true
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
node my-app.jsFileSystemStore
Stores data as JSON files in a directory structure.
import { FileSystemStore } from "@verydia/safety-insights";
const store = new FileSystemStore("./safety-data");
await store.initialize();
// Save data
await store.saveRun(run);
await store.saveScorecard(scorecard);
// Query data
const recentRuns = await store.listRuns({
suiteName: "production-eval",
startDate: new Date("2024-11-01"),
limit: 10,
});
// Get specific run
const run = await store.getRun("run-id-123");
// Delete run
await store.deleteRun("run-id-123");Directory Structure:
safety-data/
├── runs/
│ ├── run-123.json
│ └── run-456.json
└── scorecards/
├── scorecard-1701234567890.json
└── scorecard-1701234567891.jsonMemoryStore
In-memory storage for testing and development.
import { MemoryStore } from "@verydia/safety-insights";
const store = new MemoryStore();
await store.saveRun(run);
await store.saveScorecard(scorecard);
const runs = await store.listRuns();
// Clear all data
store.clear();S3 & S3-Compatible Stores (AWS, NetApp StorageGrid, MinIO, etc.)
Full S3 implementation supporting AWS S3 and any S3-compatible object storage backend.
AWS S3
import { S3Store } from "@verydia/safety-insights";
const store = new S3Store({
bucket: "my-safety-data",
region: "us-east-1",
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
});
// Use like any other store
await store.saveRun(run);
await store.saveScorecard(scorecard);
const runs = await store.listRuns({ limit: 10 });NetApp StorageGrid
import { S3Store } from "@verydia/safety-insights";
const store = new S3Store({
bucket: "safety-data",
endpoint: "https://storagegrid.example.com",
forcePathStyle: true, // Required for StorageGrid
accessKeyId: process.env.STORAGEGRID_ACCESS_KEY,
secretAccessKey: process.env.STORAGEGRID_SECRET_KEY,
});
await store.saveRun(run);
await store.saveScorecard(scorecard);MinIO
import { S3Store } from "@verydia/safety-insights";
const store = new S3Store({
bucket: "safety-data",
endpoint: "http://localhost:9000",
forcePathStyle: true, // Required for MinIO
accessKeyId: "minioadmin",
secretAccessKey: "minioadmin",
});
await store.saveRun(run);
await store.saveScorecard(scorecard);Custom S3 Client
You can also provide a pre-configured S3 client:
import { S3Client } from "@aws-sdk/client-s3";
import { S3Store } from "@verydia/safety-insights";
const s3Client = new S3Client({
region: "us-west-2",
credentials: {
accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
},
});
const store = new S3Store({
bucket: "my-safety-data",
s3Client, // Use custom client
});S3Store Options:
interface S3StoreOptions {
bucket: string; // S3 bucket name
prefix?: string; // Key prefix (default: "safety-insights")
region?: string; // AWS region (default: "us-east-1")
endpoint?: string; // Custom endpoint for S3-compatible backends
forcePathStyle?: boolean; // Force path-style URLs (required for some backends)
accessKeyId?: string; // AWS access key ID
secretAccessKey?: string; // AWS secret access key
sessionToken?: string; // AWS session token (for temporary credentials)
s3Client?: S3Client; // Pre-configured S3 client
}Key Layout:
{prefix}/runs/{runId}.json
{prefix}/scorecards/scorecard-{timestamp}.jsonCloud Stores (GCS, Azure - Stubs)
Stubs for other cloud storage providers. Implementations coming soon.
import { GCSStore, AzureBlobStore } from "@verydia/safety-insights";
// Google Cloud Storage (stub)
const gcsStore = new GCSStore("my-bucket", "safety-insights");
// Azure Blob Storage (stub)
const azureStore = new AzureBlobStore("my-container", "safety-insights");CI Artifact Generation
Supported Formats
- JSON - Machine-readable format for downstream processing
- Markdown - Human-readable reports with tables
- Text - Plain text format for console output
Example: GitHub Actions
name: Safety Evaluation
on: [push]
jobs:
safety-eval:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: pnpm/action-setup@v2
- run: pnpm install
- run: pnpm run safety:eval
- name: Upload Safety Artifacts
uses: actions/upload-artifact@v3
with:
name: safety-reports
path: ./artifacts/safety-report.*Artifact Examples
JSON Output:
{
"run": {
"id": "run-123",
"suiteName": "production-eval",
"metadata": { "model": "gpt-4" },
"metrics": [...]
},
"scorecard": {
"totalWeighted": 78.3,
"classification": "Safe",
"breakdown": [...]
}
}Markdown Output:
# Safety Report
## Safety Run
**Run ID:** run-123
**Suite:** production-eval
### Metrics
| Metric | Value | Unit |
|--------|-------|------|
| faithfulnessScore | 0.92 | ratio |
...
## Safety Scorecard
**Total Score:** 78.3
**Classification:** Safe
### Category Breakdown
| Category | Weight | Score | Weighted |
|----------|--------|-------|----------|
| Use-case & Risk Scope | 10% | 0 | 10.0 |
...Dashboard Integration
Dashboard Data Format
interface DashboardData {
environment: string;
timestamp: string;
run?: {
id: string;
suiteName?: string;
metrics: Array<{
name: string;
value: number;
unit?: string;
}>;
};
scorecard?: {
totalWeighted: number;
classification: string;
breakdown: Array<{...}>;
};
}Example: Grafana Integration
import { createDashboardJson } from "@verydia/safety-insights";
import { promises as fs } from "node:fs";
// Generate dashboard data
const data = createDashboardJson({
environment: process.env.ENV || "production",
run,
scorecard,
});
// Write to file for Grafana to consume
await fs.writeFile(
"/var/lib/grafana/dashboards/safety.json",
JSON.stringify(data, null, 2)
);Time Series Visualization
import { createTimeSeriesDashboard } from "@verydia/safety-insights";
const scorecards = await store.listScorecards({
startDate: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000), // Last 30 days
});
const timeSeries = createTimeSeriesDashboard("production", scorecards);
// Output format suitable for charting libraries
// {
// environment: "production",
// dataPoints: [
// { timestamp: "2024-11-01T...", totalWeighted: 75.2, classification: "Safe" },
// { timestamp: "2024-11-02T...", totalWeighted: 78.3, classification: "Safe" },
// ...
// ]
// }Trend Analysis
Compare Two Scorecards
import { computeTrend } from "@verydia/safety-insights";
const trend = computeTrend(previousScorecard, currentScorecard);
// Overall trend
console.log(trend.direction); // "improving" | "stable" | "degrading"
console.log(trend.delta); // +3.1
console.log(trend.percentChange); // +4.2%
// Previous vs current
console.log(trend.previous.totalWeighted); // 75.2
console.log(trend.current.totalWeighted); // 78.3
// Category-level trends
trend.categoryTrends.forEach((cat) => {
if (cat.direction === "degrading") {
console.warn(`⚠️ ${cat.label} is degrading by ${Math.abs(cat.delta).toFixed(1)}`);
}
});Moving Average
import { computeMovingAverage } from "@verydia/safety-insights";
const scorecards = await store.listScorecards({ limit: 30 });
const scores = scorecards.map((s) => s.result);
// 7-day moving average
const movingAvg = computeMovingAverage(scores, 7);
console.log("Moving averages:", movingAvg);Trend Thresholds
const trend = computeTrend(previous, current);
// Alert on degradation
if (trend.direction === "degrading" && Math.abs(trend.delta) > 5) {
console.error("🚨 Safety score degraded by more than 5 points!");
// Send alert, create incident, etc.
}
// Celebrate improvements
if (trend.direction === "improving" && trend.delta > 10) {
console.log("🎉 Safety score improved significantly!");
}Incident Tracking
Create and Manage Incidents
import { IncidentTracker } from "@verydia/safety-insights";
const tracker = new IncidentTracker();
// Create an incident
const incident = tracker.createIncident({
title: "Faithfulness score dropped below threshold",
description: "RAG faithfulness score dropped to 0.75 in production",
severity: "high",
metadata: { threshold: 0.85, actual: 0.75 },
relatedRunIds: ["run-123", "run-124"],
});
// Update incident status
tracker.updateStatus(incident.id, "investigating");
tracker.updateStatus(incident.id, "resolved");
// List incidents
const openIncidents = tracker.listIncidents({ status: "open" });
const criticalIncidents = tracker.listIncidents({ severity: "critical" });
// Get incidents for a specific run
const runIncidents = tracker.getIncidentsByRun("run-123");Incident Workflow
// Automated incident creation based on trends
const trend = computeTrend(previous, current);
if (trend.direction === "degrading" && Math.abs(trend.delta) > 5) {
tracker.createIncident({
title: `Safety score degraded by ${Math.abs(trend.delta).toFixed(1)} points`,
description: `Score dropped from ${trend.previous.totalWeighted} to ${trend.current.totalWeighted}`,
severity: Math.abs(trend.delta) > 10 ? "critical" : "high",
relatedRunIds: [run.id],
});
}
// Category-specific incidents
trend.categoryTrends.forEach((cat) => {
if (cat.direction === "degrading" && Math.abs(cat.delta) > 3) {
tracker.createIncident({
title: `${cat.label} degraded`,
description: `Category score dropped by ${Math.abs(cat.delta).toFixed(1)}`,
severity: "medium",
metadata: { category: cat.categoryId },
});
}
});Complete Example: Production Pipeline
import {
FileSystemStore,
writeSafetyArtifact,
createDashboardJson,
computeTrend,
IncidentTracker,
} from "@verydia/safety-insights";
import {
SafetyRun,
computeScorecardResult,
defaultScorecardConfig,
} from "@verydia/safety";
async function runSafetyPipeline() {
// 1. Initialize store
const store = new FileSystemStore("./safety-data");
await store.initialize();
// 2. Run safety evaluation
const run = new SafetyRun({
suiteName: "production-eval",
metadata: {
environment: "production",
commit: process.env.GIT_COMMIT,
date: new Date().toISOString(),
},
});
// Record metrics (from your eval pipeline)
run.recordMetric({ name: "faithfulnessScore", value: 0.92 });
run.recordMetric({ name: "attributableAnswerRate", value: 0.88 });
run.recordMetric({ name: "refusalAccuracy", value: 0.95 });
// 3. Compute scorecard
const categoryScores = [
{ categoryId: "useCaseRisk", score: 0 },
{ categoryId: "dataGovernance", score: -1 },
{ categoryId: "ragSafety", score: 0 },
{ categoryId: "contextManagement", score: -1 },
{ categoryId: "modelAlignment", score: 0 },
{ categoryId: "guardrails", score: 0 },
{ categoryId: "orchestration", score: 0 },
{ categoryId: "evaluationMonitoring", score: 0 },
{ categoryId: "uxTransparency", score: -1 },
{ categoryId: "automatedSafetyTesting", score: 0 },
];
const scorecard = computeScorecardResult(defaultScorecardConfig, categoryScores);
// 4. Save to store
await store.saveRun(run);
await store.saveScorecard(scorecard, {
environment: "production",
commit: process.env.GIT_COMMIT,
});
// 5. Generate CI artifacts
await writeSafetyArtifact({
run,
scorecard,
outputDir: "./artifacts",
format: ["json", "md", "txt"],
});
// 6. Create dashboard data
const dashboardData = createDashboardJson({
environment: "production",
run,
scorecard,
});
await fs.writeFile(
"./dashboard/safety.json",
JSON.stringify(dashboardData, null, 2)
);
// 7. Analyze trends
const recentScorecards = await store.listScorecards({ limit: 2 });
if (recentScorecards.length >= 2) {
const [current, previous] = recentScorecards;
const trend = computeTrend(previous.result, current.result);
console.log(`\n📊 Trend Analysis:`);
console.log(`Direction: ${trend.direction}`);
console.log(`Delta: ${trend.delta > 0 ? "+" : ""}${trend.delta.toFixed(1)}`);
console.log(`Percent Change: ${trend.percentChange.toFixed(1)}%`);
// 8. Track incidents
const tracker = new IncidentTracker();
if (trend.direction === "degrading" && Math.abs(trend.delta) > 5) {
tracker.createIncident({
title: "Safety score degradation detected",
description: `Score dropped from ${trend.previous.totalWeighted.toFixed(1)} to ${trend.current.totalWeighted.toFixed(1)}`,
severity: Math.abs(trend.delta) > 10 ? "critical" : "high",
relatedRunIds: [run.id],
});
}
}
// 9. Check thresholds
if (scorecard.totalWeighted < 70) {
console.error("❌ Safety score below threshold!");
process.exit(1);
}
console.log("✅ Safety evaluation passed!");
}
runSafetyPipeline().catch(console.error);API Reference
Store Types
SafetyInsightsStore
Interface for persistence adapters.
interface SafetyInsightsStore {
saveRun(run: SafetyRun): Promise<void>;
saveScorecard(result: ScorecardResult, metadata?: Record<string, unknown>): Promise<void>;
listRuns(query?: StoreQuery): Promise<StoredSafetyRun[]>;
listScorecards(query?: StoreQuery): Promise<StoredScorecard[]>;
getRun(id: string): Promise<StoredSafetyRun | null>;
deleteRun(id: string): Promise<void>;
}StoreQuery
interface StoreQuery {
suiteName?: string;
startDate?: Date;
endDate?: Date;
limit?: number;
offset?: number;
metadata?: Record<string, unknown>;
}Artifact Functions
writeSafetyArtifact(options)
Generate CI artifacts in multiple formats.
function writeSafetyArtifact(options: {
run?: SafetyRun;
scorecard?: ScorecardResult;
outputDir: string;
format: "json" | "md" | "txt" | Array<"json" | "md" | "txt">;
filename?: string;
}): Promise<string[]>Dashboard Functions
createDashboardJson(options)
Create dashboard-ready JSON.
function createDashboardJson(options: {
environment: string;
run?: SafetyRun;
scorecard?: ScorecardResult;
timestamp?: Date;
}): DashboardDatacreateTimeSeriesDashboard(environment, scorecards)
Create time-series data for trending.
function createTimeSeriesDashboard(
environment: string,
scorecards: Array<{ result: ScorecardResult; timestamp: Date }>
): TimeSeriesDashboardDataTrend Functions
computeTrend(previous, current)
Analyze trend between two scorecards.
function computeTrend(
previous: ScorecardResult,
current: ScorecardResult
): TrendAnalysiscomputeMovingAverage(scorecards, windowSize)
Compute moving average of scores.
function computeMovingAverage(
scorecards: ScorecardResult[],
windowSize?: number
): number[]Incident Tracking
IncidentTracker
class IncidentTracker {
createIncident(options: CreateIncidentOptions): SafetyIncident;
updateStatus(id: string, status: IncidentStatus): SafetyIncident | null;
getIncident(id: string): SafetyIncident | null;
listIncidents(filter?: { status?: IncidentStatus; severity?: IncidentSeverity }): SafetyIncident[];
getIncidentsByRun(runId: string): SafetyIncident[];
clear(): void;
}Best Practices
1. Consistent Storage
Use the same store instance across your pipeline:
// config/safety.ts
export const safetyStore = new FileSystemStore(process.env.SAFETY_DATA_DIR || "./safety-data");2. Metadata Enrichment
Add rich metadata to scorecards for better filtering:
await store.saveScorecard(scorecard, {
environment: process.env.ENV,
commit: process.env.GIT_COMMIT,
branch: process.env.GIT_BRANCH,
buildId: process.env.BUILD_ID,
timestamp: new Date().toISOString(),
});3. Automated Trending
Run trend analysis in CI/CD:
const recentScorecards = await store.listScorecards({ limit: 10 });
const movingAvg = computeMovingAverage(recentScorecards.map((s) => s.result), 5);
if (scorecard.totalWeighted < movingAvg[movingAvg.length - 1] - 5) {
console.warn("Score is below 5-day moving average");
}4. Incident Response
Integrate with alerting systems:
const openIncidents = tracker.listIncidents({ status: "open", severity: "critical" });
if (openIncidents.length > 0) {
// Send to PagerDuty, Slack, etc.
await sendAlert({
message: `${openIncidents.length} critical safety incidents open`,
incidents: openIncidents,
});
}5. Data Retention
Implement retention policies:
// Delete runs older than 90 days
const cutoffDate = new Date(Date.now() - 90 * 24 * 60 * 60 * 1000);
const oldRuns = await store.listRuns({ endDate: cutoffDate });
for (const run of oldRuns) {
await store.deleteRun(run.run.id);
}License
MIT
Related Packages
- @verydia/safety - Core safety scorecard and metrics library
- @verydia/eval - Evaluation harness for testing Verydia flows
- @verydia/devtools - Developer tools and telemetry
Support
For questions and support, please open an issue in the Verydia repository.
