@ikrigel/ragas-lib-typescript

v0.1.1

Published

12 days ago

Lightweight, extensible Node.js library for evaluating Retrieval-Augmented Generation (RAG) systems

0High
0Medium
0Low

ikrigel

rag evaluation assessment retrieval-augmented-generation metrics llm ai typescript

RAG Assessment Library

A lightweight, extensible Node.js library for evaluating Retrieval-Augmented Generation (RAG) systems

Documentation · Quick Start · API Reference · Examples · Contributing

Overview

RAG Assessment is a production-ready evaluation framework for measuring and improving the quality of RAG (Retrieval-Augmented Generation) systems. It provides:

✅ Multiple Evaluation Metrics - Faithfulness, Relevance, Coherence, Context Precision, Context Recall
✅ Flexible Dataset Management - Import/export Q&A pairs from JSON, CSV, or APIs
✅ Batch Evaluation - Run evaluations on hundreds of test cases with progress tracking
✅ LLM Provider Agnostic - Works with Gemini, Perplexity, OpenAI, Anthropic, and more
✅ Rich Reporting - Generate JSON, CSV, and HTML reports with statistical analysis
✅ CLI Tools - Command-line interface for evaluation without coding
✅ Type Safe - Full TypeScript support with comprehensive interfaces

Unlike Python-based RAGAS, this library is built for JavaScript/Node.js ecosystems and integrates seamlessly with Express, Next.js, LangChain, and LlamaIndex.

Why RAG Assessment?

RAG systems combine retrieval and generation to answer questions based on domain knowledge. But how do you know if your RAG is good?

Without measurement, you can't:

🚫 Detect quality degradation after changes
🚫 Compare different retrieval strategies
🚫 Justify performance to stakeholders
🚫 Identify failing edge cases
🚫 Track improvements over time

RAG Assessment solves this by providing automated quality metrics you can run in CI/CD pipelines, dashboards, and development workflows.

Quick Comparison

| Feature | RAG Assessment | RAGAS (Python) | |---------|---|---| | Language | JavaScript/TypeScript | Python | | Setup Time | <5 min | ~15 min | | CLI Support | ✅ Yes | ✅ Yes | | Custom Metrics | ✅ Easy | ✅ Complex | | LLM Providers | 3+ built-in | 1 (OpenAI-focused) | | Node.js Integration | ✅ Native | ⚠️ Via subprocess | | License | MIT | Apache 2.0 |

Installation

Prerequisites

Node.js 18+ (LTS recommended)
npm 8+ or yarn
API key for at least one LLM provider (Gemini, Perplexity, OpenAI, etc.)

Quick Install

npm install @ragas-lib/core

With Specific LLM Provider

# For Gemini
npm install @ragas-lib/core @ragas-lib/gemini

# For Perplexity
npm install @ragas-lib/core @ragas-lib/perplexity

# For OpenAI
npm install @ragas-lib/core @ragas-lib/openai

From Source (Development)

git clone https://github.com/ikrigel/ragas-lib-typescript.git
cd ragas-lib
npm install
npm run build

TypeScript Support

This library is built with TypeScript and provides full type definitions out of the box. No additional @types/ packages needed.

TypeScript Configuration

For best experience, configure your tsconfig.json:

{
  "compilerOptions": {
    "target": "ES2020",
    "module": "ESNext",
    "lib": ["ES2020"],
    "moduleResolution": "bundler",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "forceConsistentCasingInFileNames": true,
    "resolveJsonModule": true,
    "declaration": true,
    "declarationMap": true,
    "sourceMap": true
  }
}

Type Definitions

All exports include complete type definitions:

// TypeScript automatically provides types for all imports
import { 
  RAGAssessment,           // ✅ Typed
  DatasetManager,          // ✅ Typed
  BaseMetric,              // ✅ Typed
  EvaluationResults,       // ✅ Typed interface
  EvaluationConfig,        // ✅ Typed interface
  GroundTruthPair,         // ✅ Typed interface
  LLMProvider,             // ✅ Typed interface
  EvaluationResult,        // ✅ Typed interface
} from '@ragas-lib/core';

IDE IntelliSense

Full JSDoc documentation on all types for IDE support:

const config: EvaluationConfig = {
  provider: new GeminiProvider(),
  // ✅ IDE autocomplete shows all available options
  // ✅ Hover shows documentation and type hints
  // ✅ Type validation catches errors before runtime
  metrics: ['faithfulness', 'relevance'],
  timeout: 30000,
  retries: 3,
};

Strict Mode

Library is compatible with TypeScript's strict mode:

// Even in strict mode, full type safety
const evaluator = new RAGAssessment(config);
const results = await evaluator.evaluate({...}); // ✅ No type errors

Quick Start

1. Set Up API Credentials

Create a .env file in your project root:

# For Gemini (free tier available)
GEMINI_API_KEY=AIzaSy...

# Or Perplexity
PERPLEXITY_API_KEY=pplx-...

# Or OpenAI
OPENAI_API_KEY=sk-...

2. Create Your First Evaluation (TypeScript)

import { RAGAssessment } from '@ragas-lib/core';
import { GeminiProvider } from '@ragas-lib/gemini';

// Initialize the evaluator
const evaluator = new RAGAssessment({
  provider: new GeminiProvider({
    apiKey: process.env.GEMINI_API_KEY,
  }),
  metrics: ['faithfulness', 'relevance', 'coherence'],
});

// Define ground truth Q&A pairs
const dataset = [
  {
    question: 'What are lev-boots?',
    expectedAnswer: 'Lev-boots are gravity-reversing footwear that allow users to levitate and hover.',
  },
  {
    question: 'How do lev-boots work?',
    expectedAnswer: 'They use localized gravity reversal technology to counteract gravitational force on the wearer.',
  },
];

// Run evaluation
const results = await evaluator.evaluate({
  dataset,
  ragAnswers: [
    'Lev-boots enable levitation through advanced physics.',
    'Localized gravity reversal creates upward force.',
  ],
});

console.log(results);
// Output:
// {
//   overall_score: 8.2,
//   metrics: {
//     faithfulness: 8.5,
//     relevance: 8.1,
//     coherence: 8.0
//   },
//   per_question: [
//     { question: '...', scores: { ... }, explanation: '...' },
//     ...
//   ]
// }

3. Using the CLI

# Initialize configuration (interactive setup)
npx ragas config

# Evaluate a dataset
npx ragas evaluate --dataset questions.json --output results.json

# Generate reports
npx ragas report --input results.json --format html --output report.html

# Import dataset from CSV
npx ragas import --from questions.csv --format csv

4. Evaluate Your RAG System

import { RAGAssessment, DatasetManager } from '@ragas-lib/core';
import { GeminiProvider } from '@ragas-lib/gemini';

// Step 1: Load or create a dataset
const datasetManager = new DatasetManager();
await datasetManager.loadFromJSON('ground_truth.json');

// Step 2: Initialize evaluator
const evaluator = new RAGAssessment({
  provider: new GeminiProvider(),
  metrics: ['faithfulness', 'relevance', 'coherence'],
});

// Step 3: Get answers from your RAG system
const ragAnswers = [];
for (const pair of datasetManager.getAll()) {
  const ragAnswer = await yourRagSystem.ask(pair.question);
  ragAnswers.push(ragAnswer);
}

// Step 4: Run evaluation
const results = await evaluator.evaluate({
  dataset: datasetManager.getAll(),
  ragAnswers,
  contexts: retrievedContextChunks, // Optional: for context-based metrics
});

// Step 5: Generate reports
const report = await evaluator.generateReport(results, {
  format: 'html',
  includeCharts: true,
  outputPath: './evaluation_report.html',
});

console.log(`Report saved to ${report.path}`);
console.log(`Overall Score: ${results.overall_score}/10`);

API Reference

Core Classes

`RAGAssessment`

Main class for running evaluations.

const evaluator = new RAGAssessment(config);

// Methods
evaluator.evaluate(options)      // Run evaluation
evaluator.generateReport(results) // Create report
evaluator.registerMetric(metric)  // Add custom metric

Configuration:

interface RAGAssessmentConfig {
  provider: LLMProvider;           // LLM provider instance
  metrics?: string[];              // Metric names to use
  timeout?: number;                // Timeout per question (ms)
  retries?: number;                // Max retries on failure
  parallelConcurrency?: number;    // Parallel evaluation count
  verbose?: boolean;               // Enable logging
}

`DatasetManager`

Manage ground truth Q&A datasets.

const manager = new DatasetManager();

manager.add(pair)                    // Add Q&A pair
manager.remove(id)                   // Delete pair
manager.update(id, pair)             // Update pair
manager.getAll()                     // Get all pairs
manager.loadFromJSON(filePath)       // Import from JSON
manager.saveToJSON(filePath)         // Export to JSON
manager.loadFromCSV(filePath)        // Import from CSV
manager.validate()                   // Validate dataset

Metrics

Built-in Metrics

// Faithfulness (0-10)
// Measures: How well does the answer align with retrieved context?
// Higher = More faithful to sources

// Relevance (0-10)
// Measures: How well does the answer address the question?
// Higher = More relevant and on-topic

// Coherence (0-10)
// Measures: Is the answer clear, well-structured, and grammatically correct?
// Higher = More coherent

// ContextPrecision (0-1)
// Measures: What % of context chunks are relevant to the answer?
// Higher = Fewer irrelevant chunks retrieved

// ContextRecall (0-1)
// Measures: Did retrieval find enough context to fully answer the question?
// Higher = Complete context retrieved

Custom Metrics

Create your own evaluation metrics:

import { BaseMetric } from '@ragas-lib/core';

class CustomMetric extends BaseMetric {
  name = 'my_metric';
  description = 'My custom RAG metric';
  
  async compute(input: {
    question: string;
    answer: string;
    context: string;
    expectedAnswer?: string;
  }): Promise<{ score: number; explanation: string }> {
    // Your evaluation logic here
    const score = /* calculate score 0-10 */;
    return {
      score,
      explanation: 'Why this score?',
    };
  }
}

// Register and use
evaluator.registerMetric(new CustomMetric());

LLM Providers

Gemini Provider

import { GeminiProvider } from '@ragas-lib/gemini';

const provider = new GeminiProvider({
  apiKey: process.env.GEMINI_API_KEY,
  model: 'gemini-2.0-flash', // Optional
  temperature: 0.7,           // Optional
  maxTokens: 1024,            // Optional
});

Perplexity Provider

import { PerplexityProvider } from '@ragas-lib/perplexity';

const provider = new PerplexityProvider({
  apiKey: process.env.PERPLEXITY_API_KEY,
  model: 'sonar',
  temperature: 0.5,
});

OpenAI Provider

import { OpenAIProvider } from '@ragas-lib/openai';

const provider = new OpenAIProvider({
  apiKey: process.env.OPENAI_API_KEY,
  model: 'gpt-4-turbo',
  temperature: 0.3,
});

Mock Provider (for Testing)

import { MockProvider } from '@ragas-lib/core';

const provider = new MockProvider();
// Returns deterministic scores for testing

Results Format

interface EvaluationResults {
  overall_score: number;              // Weighted average (0-10)
  metadata: {
    timestamp: string;
    provider: string;
    model: string;
    metrics_used: string[];
    total_questions: number;
    evaluation_duration_ms: number;
  };
  metrics: {
    [metricName: string]: number;    // Average score per metric
  };
  per_question: Array<{
    question: string;
    answer: string;
    expected_answer?: string;
    scores: { [metricName: string]: number };
    explanations: { [metricName: string]: string };
    overall_score: number;
  }>;
  statistics: {
    mean: number;
    median: number;
    std_dev: number;
    min: number;
    max: number;
    passed_threshold_percentage: number; // % scoring >7
  };
}

Custom Metric Types

When creating custom metrics, use provided interfaces for type safety:

import { BaseMetric, MetricInput, MetricOutput } from '@ragas-lib/core';

class MyCustomMetric extends BaseMetric {
  name = 'my_metric';
  description = 'My custom metric';
  
  async compute(input: MetricInput): Promise<MetricOutput> {
    const { question, answer, context, expectedAnswer } = input;
    // ✅ All properties are typed and required/optional as needed
    
    const score = /* calculate 0-10 */;
    return {
      score,
      explanation: 'Why this score?',
      metadata: { /* optional metadata */ },
    };
  }
}

Examples

Example 1: Evaluate a Simple RAG System

// rag-evaluation.ts
import { RAGAssessment } from '@ragas-lib/core';
import { GeminiProvider } from '@ragas-lib/gemini';

async function evaluateRAG() {
  const evaluator = new RAGAssessment({
    provider: new GeminiProvider({
      apiKey: process.env.GEMINI_API_KEY,
    }),
    metrics: ['faithfulness', 'relevance', 'coherence'],
  });

  const testCases = [
    {
      question: 'What is machine learning?',
      expectedAnswer: 'Machine learning is a subset of AI where systems learn from data patterns.',
      ragAnswer: 'ML enables computers to learn from data without explicit programming.',
    },
    {
      question: 'Name three ML algorithms',
      expectedAnswer: 'Decision Trees, Random Forest, Neural Networks',
      ragAnswer: 'Common ML algorithms include Decision Trees, SVM, and K-means clustering.',
    },
  ];

  const results = await evaluator.evaluate({
    dataset: testCases.map(t => ({
      question: t.question,
      expectedAnswer: t.expectedAnswer,
    })),
    ragAnswers: testCases.map(t => t.ragAnswer),
  });

  console.log(`Overall Score: ${results.overall_score}/10`);
  console.log(`\nDetailed Results:`, JSON.stringify(results, null, 2));
}

evaluateRAG().catch(console.error);

Example 2: CI/CD Integration (GitHub Actions)

# .github/workflows/rag-evaluation.yml
name: RAG Quality Check

on: [push, pull_request]

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - uses: actions/setup-node@v3
        with:
          node-version: '18'
      
      - run: npm install
      
      - name: Run RAG Evaluation
        env:
          GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
        run: npx ragas evaluate --dataset ground_truth.json --threshold 7.5
      
      - name: Generate Report
        if: always()
        run: npx ragas report --format html --output evaluation_report.html
      
      - name: Upload Report
        uses: actions/upload-artifact@v3
        if: always()
        with:
          name: evaluation-report
          path: evaluation_report.html
      
      - name: Fail if Score Too Low
        run: |
          SCORE=$(jq '.overall_score' results.json)
          if (( $(echo "$SCORE < 7.5" | bc -l) )); then
            echo "RAG score ($SCORE) below threshold (7.5)"
            exit 1
          fi

Example 3: Express.js Integration

// server.ts
import express from 'express';
import { RAGAssessment } from '@ragas-lib/core';
import { GeminiProvider } from '@ragas-lib/gemini';

const app = express();
const evaluator = new RAGAssessment({
  provider: new GeminiProvider({ apiKey: process.env.GEMINI_API_KEY }),
  metrics: ['faithfulness', 'relevance'],
});

app.post('/api/evaluate', async (req, res) => {
  const { dataset, ragAnswers } = req.body;
  
  try {
    const results = await evaluator.evaluate({
      dataset,
      ragAnswers,
    });
    res.json(results);
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

app.listen(3000, () => console.log('Server running on port 3000'));

Example 4: Batch Evaluation with Progress Tracking

import { RAGAssessment, DatasetManager } from '@ragas-lib/core';
import { OpenAIProvider } from '@ragas-lib/openai';
import fs from 'fs';

async function batchEvaluation() {
  const datasetManager = new DatasetManager();
  await datasetManager.loadFromJSON('1000_qa_pairs.json');

  const evaluator = new RAGAssessment({
    provider: new OpenAIProvider({ apiKey: process.env.OPENAI_API_KEY }),
    parallelConcurrency: 5, // Evaluate 5 at a time
  });

  let processedCount = 0;
  const total = datasetManager.getAll().length;

  const results = await evaluator.evaluate({
    dataset: datasetManager.getAll(),
    ragAnswers: ragAnswersBatch,
    onProgress: (current, total) => {
      console.log(`Progress: ${current}/${total} (${Math.round(current/total*100)}%)`);
    },
  });

  // Save results
  fs.writeFileSync('results.json', JSON.stringify(results, null, 2));
  console.log(`\nEvaluation complete! Results saved to results.json`);
}

batchEvaluation().catch(console.error);

Dataset Format

JSON Format

[
  {
    "id": "q1",
    "question": "What are lev-boots?",
    "expected_answer": "Gravity-reversing footwear...",
    "metadata": {
      "source": "research_paper",
      "difficulty": "easy",
      "tags": ["product", "physics"]
    }
  },
  {
    "id": "q2",
    "question": "How do they work?",
    "expected_answer": "Using localized gravity reversal...",
    "metadata": {
      "source": "white_paper",
      "difficulty": "hard"
    }
  }
]

CSV Format

question,expected_answer,source,difficulty
"What are lev-boots?","Gravity-reversing footwear...",research_paper,easy
"How do they work?","Using localized gravity reversal...",white_paper,hard

Importing from Different Sources

import { DatasetManager } from '@ragas-lib/core';

const manager = new DatasetManager();

// From JSON
await manager.loadFromJSON('questions.json');

// From CSV
await manager.loadFromCSV('questions.csv');

// From API
await manager.loadFromAPI('https://api.example.com/qa-pairs', {
  auth: 'Bearer token',
});

// From Database
await manager.loadFromDatabase(connection, {
  query: 'SELECT question, expected_answer FROM qa_pairs',
});

Performance & Optimization

Parallel Evaluation

const evaluator = new RAGAssessment({
  provider: new GeminiProvider(),
  parallelConcurrency: 10, // Process 10 questions simultaneously
  timeout: 30000,          // 30 second timeout per question
});

Rate Limiting & Throttling

const evaluator = new RAGAssessment({
  provider: new GeminiProvider(),
  rateLimit: {
    requestsPerMinute: 100,
    burstSize: 20,
  },
});

Cost Estimation

import { CostEstimator } from '@ragas-lib/core';

const estimator = new CostEstimator();
const cost = estimator.estimate({
  provider: 'gemini',
  numQuestions: 1000,
  metricsPerQuestion: 3,
});

console.log(`Estimated cost: $${cost.totalCost}`);
console.log(`Input tokens: ${cost.inputTokens}`);
console.log(`Output tokens: ${cost.outputTokens}`);

Troubleshooting

Issue: API Key Not Found

Error: GEMINI_API_KEY not found in environment variables

Solution:
1. Create .env file with your API key
2. Or set environment variable: export GEMINI_API_KEY=your_key_here
3. Verify with: echo $GEMINI_API_KEY

Issue: Rate Limit Exceeded

Error: Rate limit exceeded. Max 60 requests per minute.

Solution:
1. Reduce parallelConcurrency (default: 5)
2. Enable rate limiting with requestsPerMinute setting
3. Use retry logic with exponential backoff

Issue: Timeout on Large Datasets

// Increase timeout for complex evaluations
const evaluator = new RAGAssessment({
  provider: new GeminiProvider(),
  timeout: 60000, // 60 seconds
  retries: 3,
});

Debug Mode

const evaluator = new RAGAssessment({
  provider: new GeminiProvider(),
  verbose: true, // Enable detailed logging
  logger: console, // Use custom logger
});

Contributing

We welcome contributions! Here's how to get started:

Fork the repository
Create a branch (git checkout -b feature/your-feature)
Make changes and write tests
Run tests (npm test)
Submit a PR with a clear description

Development Setup

git clone https://github.com/ikrigel/ragas-lib-typescript.git
cd ragas-lib
npm install
npm run build
npm test

Areas Welcoming Contributions

🆕 New LLM provider adapters (Claude, Cohere, etc.)
📊 Additional evaluation metrics
📁 Dataset format loaders (Excel, Notion, etc.)
🐛 Bug fixes and performance improvements
📝 Documentation and examples
🧪 Test coverage improvements

See CONTRIBUTING.md for detailed guidelines.

Documentation

Full API Documentation - Detailed reference for all classes and methods
Architecture Guide - System design and extensibility
Provider Integration Guide - How to implement custom providers
Metric Development Guide - Creating custom evaluation metrics
CLI Reference - Command-line tool documentation
Troubleshooting - Common issues and solutions

Examples Repository

Full working examples available at:

ragas-lib-examples
Express.js integration
Next.js dashboard
LangChain integration
Docker deployment

Roadmap

v0.1.0 (Current)

✅ Core metrics (Faithfulness, Relevance, Coherence)
✅ Dataset management
✅ Batch evaluation
✅ JSON/CSV reports
✅ Gemini provider

v0.2.0 (Q1 2026)

🔄 Context Precision & Recall metrics
🔄 Perplexity & OpenAI providers
🔄 Full CLI interface
🔄 HTML report generation

v0.3.0 (Q2 2026)

🔄 Custom metric composition
🔄 Database adapters (PostgreSQL, SQLite)
🔄 Web dashboard
🔄 Webhook integrations

v1.0.0 (Q3 2026)

🔄 Stable API
🔄 Production-grade performance
🔄 Large-scale benchmark datasets
🔄 Enterprise support

License

This project is licensed under the MIT License - see LICENSE file for details.

Support & Community

💬 Discussions: GitHub Discussions
🐛 Issues: Report a bug
💡 Feature Requests: Suggest a feature
📧 Email: [email protected]

Acknowledgments

This library was built based on:

RAGAS (Python) - Pioneering RAG evaluation framework
LevBoots Project - Real-world RAG implementation patterns
LangChain & LlamaIndex - RAG ecosystem leadership
Community feedback - Invaluable insights and use cases

Polo and ikrigel Made it with ❤️ for the RAG community Thank you Jona for the challenge ❤️🙏

⬆ Back to top