@ikrigel/ragas-lib-typescript
v0.1.1
Published
Lightweight, extensible Node.js library for evaluating Retrieval-Augmented Generation (RAG) systems
Maintainers
Readme
RAG Assessment Library
A lightweight, extensible Node.js library for evaluating Retrieval-Augmented Generation (RAG) systems
Documentation · Quick Start · API Reference · Examples · Contributing
Overview
RAG Assessment is a production-ready evaluation framework for measuring and improving the quality of RAG (Retrieval-Augmented Generation) systems. It provides:
✅ Multiple Evaluation Metrics - Faithfulness, Relevance, Coherence, Context Precision, Context Recall
✅ Flexible Dataset Management - Import/export Q&A pairs from JSON, CSV, or APIs
✅ Batch Evaluation - Run evaluations on hundreds of test cases with progress tracking
✅ LLM Provider Agnostic - Works with Gemini, Perplexity, OpenAI, Anthropic, and more
✅ Rich Reporting - Generate JSON, CSV, and HTML reports with statistical analysis
✅ CLI Tools - Command-line interface for evaluation without coding
✅ Type Safe - Full TypeScript support with comprehensive interfaces
Unlike Python-based RAGAS, this library is built for JavaScript/Node.js ecosystems and integrates seamlessly with Express, Next.js, LangChain, and LlamaIndex.
Why RAG Assessment?
RAG systems combine retrieval and generation to answer questions based on domain knowledge. But how do you know if your RAG is good?
Without measurement, you can't:
- 🚫 Detect quality degradation after changes
- 🚫 Compare different retrieval strategies
- 🚫 Justify performance to stakeholders
- 🚫 Identify failing edge cases
- 🚫 Track improvements over time
RAG Assessment solves this by providing automated quality metrics you can run in CI/CD pipelines, dashboards, and development workflows.
Quick Comparison
| Feature | RAG Assessment | RAGAS (Python) | |---------|---|---| | Language | JavaScript/TypeScript | Python | | Setup Time | <5 min | ~15 min | | CLI Support | ✅ Yes | ✅ Yes | | Custom Metrics | ✅ Easy | ✅ Complex | | LLM Providers | 3+ built-in | 1 (OpenAI-focused) | | Node.js Integration | ✅ Native | ⚠️ Via subprocess | | License | MIT | Apache 2.0 |
Installation
Prerequisites
- Node.js 18+ (LTS recommended)
- npm 8+ or yarn
- API key for at least one LLM provider (Gemini, Perplexity, OpenAI, etc.)
Quick Install
npm install @ragas-lib/coreWith Specific LLM Provider
# For Gemini
npm install @ragas-lib/core @ragas-lib/gemini
# For Perplexity
npm install @ragas-lib/core @ragas-lib/perplexity
# For OpenAI
npm install @ragas-lib/core @ragas-lib/openaiFrom Source (Development)
git clone https://github.com/ikrigel/ragas-lib-typescript.git
cd ragas-lib
npm install
npm run buildTypeScript Support
This library is built with TypeScript and provides full type definitions out of the box. No additional @types/ packages needed.
TypeScript Configuration
For best experience, configure your tsconfig.json:
{
"compilerOptions": {
"target": "ES2020",
"module": "ESNext",
"lib": ["ES2020"],
"moduleResolution": "bundler",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"resolveJsonModule": true,
"declaration": true,
"declarationMap": true,
"sourceMap": true
}
}Type Definitions
All exports include complete type definitions:
// TypeScript automatically provides types for all imports
import {
RAGAssessment, // ✅ Typed
DatasetManager, // ✅ Typed
BaseMetric, // ✅ Typed
EvaluationResults, // ✅ Typed interface
EvaluationConfig, // ✅ Typed interface
GroundTruthPair, // ✅ Typed interface
LLMProvider, // ✅ Typed interface
EvaluationResult, // ✅ Typed interface
} from '@ragas-lib/core';IDE IntelliSense
Full JSDoc documentation on all types for IDE support:
const config: EvaluationConfig = {
provider: new GeminiProvider(),
// ✅ IDE autocomplete shows all available options
// ✅ Hover shows documentation and type hints
// ✅ Type validation catches errors before runtime
metrics: ['faithfulness', 'relevance'],
timeout: 30000,
retries: 3,
};Strict Mode
Library is compatible with TypeScript's strict mode:
// Even in strict mode, full type safety
const evaluator = new RAGAssessment(config);
const results = await evaluator.evaluate({...}); // ✅ No type errorsQuick Start
1. Set Up API Credentials
Create a .env file in your project root:
# For Gemini (free tier available)
GEMINI_API_KEY=AIzaSy...
# Or Perplexity
PERPLEXITY_API_KEY=pplx-...
# Or OpenAI
OPENAI_API_KEY=sk-...2. Create Your First Evaluation (TypeScript)
import { RAGAssessment } from '@ragas-lib/core';
import { GeminiProvider } from '@ragas-lib/gemini';
// Initialize the evaluator
const evaluator = new RAGAssessment({
provider: new GeminiProvider({
apiKey: process.env.GEMINI_API_KEY,
}),
metrics: ['faithfulness', 'relevance', 'coherence'],
});
// Define ground truth Q&A pairs
const dataset = [
{
question: 'What are lev-boots?',
expectedAnswer: 'Lev-boots are gravity-reversing footwear that allow users to levitate and hover.',
},
{
question: 'How do lev-boots work?',
expectedAnswer: 'They use localized gravity reversal technology to counteract gravitational force on the wearer.',
},
];
// Run evaluation
const results = await evaluator.evaluate({
dataset,
ragAnswers: [
'Lev-boots enable levitation through advanced physics.',
'Localized gravity reversal creates upward force.',
],
});
console.log(results);
// Output:
// {
// overall_score: 8.2,
// metrics: {
// faithfulness: 8.5,
// relevance: 8.1,
// coherence: 8.0
// },
// per_question: [
// { question: '...', scores: { ... }, explanation: '...' },
// ...
// ]
// }3. Using the CLI
# Initialize configuration (interactive setup)
npx ragas config
# Evaluate a dataset
npx ragas evaluate --dataset questions.json --output results.json
# Generate reports
npx ragas report --input results.json --format html --output report.html
# Import dataset from CSV
npx ragas import --from questions.csv --format csv4. Evaluate Your RAG System
import { RAGAssessment, DatasetManager } from '@ragas-lib/core';
import { GeminiProvider } from '@ragas-lib/gemini';
// Step 1: Load or create a dataset
const datasetManager = new DatasetManager();
await datasetManager.loadFromJSON('ground_truth.json');
// Step 2: Initialize evaluator
const evaluator = new RAGAssessment({
provider: new GeminiProvider(),
metrics: ['faithfulness', 'relevance', 'coherence'],
});
// Step 3: Get answers from your RAG system
const ragAnswers = [];
for (const pair of datasetManager.getAll()) {
const ragAnswer = await yourRagSystem.ask(pair.question);
ragAnswers.push(ragAnswer);
}
// Step 4: Run evaluation
const results = await evaluator.evaluate({
dataset: datasetManager.getAll(),
ragAnswers,
contexts: retrievedContextChunks, // Optional: for context-based metrics
});
// Step 5: Generate reports
const report = await evaluator.generateReport(results, {
format: 'html',
includeCharts: true,
outputPath: './evaluation_report.html',
});
console.log(`Report saved to ${report.path}`);
console.log(`Overall Score: ${results.overall_score}/10`);API Reference
Core Classes
RAGAssessment
Main class for running evaluations.
const evaluator = new RAGAssessment(config);
// Methods
evaluator.evaluate(options) // Run evaluation
evaluator.generateReport(results) // Create report
evaluator.registerMetric(metric) // Add custom metricConfiguration:
interface RAGAssessmentConfig {
provider: LLMProvider; // LLM provider instance
metrics?: string[]; // Metric names to use
timeout?: number; // Timeout per question (ms)
retries?: number; // Max retries on failure
parallelConcurrency?: number; // Parallel evaluation count
verbose?: boolean; // Enable logging
}DatasetManager
Manage ground truth Q&A datasets.
const manager = new DatasetManager();
manager.add(pair) // Add Q&A pair
manager.remove(id) // Delete pair
manager.update(id, pair) // Update pair
manager.getAll() // Get all pairs
manager.loadFromJSON(filePath) // Import from JSON
manager.saveToJSON(filePath) // Export to JSON
manager.loadFromCSV(filePath) // Import from CSV
manager.validate() // Validate datasetMetrics
Built-in Metrics
// Faithfulness (0-10)
// Measures: How well does the answer align with retrieved context?
// Higher = More faithful to sources
// Relevance (0-10)
// Measures: How well does the answer address the question?
// Higher = More relevant and on-topic
// Coherence (0-10)
// Measures: Is the answer clear, well-structured, and grammatically correct?
// Higher = More coherent
// ContextPrecision (0-1)
// Measures: What % of context chunks are relevant to the answer?
// Higher = Fewer irrelevant chunks retrieved
// ContextRecall (0-1)
// Measures: Did retrieval find enough context to fully answer the question?
// Higher = Complete context retrievedCustom Metrics
Create your own evaluation metrics:
import { BaseMetric } from '@ragas-lib/core';
class CustomMetric extends BaseMetric {
name = 'my_metric';
description = 'My custom RAG metric';
async compute(input: {
question: string;
answer: string;
context: string;
expectedAnswer?: string;
}): Promise<{ score: number; explanation: string }> {
// Your evaluation logic here
const score = /* calculate score 0-10 */;
return {
score,
explanation: 'Why this score?',
};
}
}
// Register and use
evaluator.registerMetric(new CustomMetric());LLM Providers
Gemini Provider
import { GeminiProvider } from '@ragas-lib/gemini';
const provider = new GeminiProvider({
apiKey: process.env.GEMINI_API_KEY,
model: 'gemini-2.0-flash', // Optional
temperature: 0.7, // Optional
maxTokens: 1024, // Optional
});Perplexity Provider
import { PerplexityProvider } from '@ragas-lib/perplexity';
const provider = new PerplexityProvider({
apiKey: process.env.PERPLEXITY_API_KEY,
model: 'sonar',
temperature: 0.5,
});OpenAI Provider
import { OpenAIProvider } from '@ragas-lib/openai';
const provider = new OpenAIProvider({
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-4-turbo',
temperature: 0.3,
});Mock Provider (for Testing)
import { MockProvider } from '@ragas-lib/core';
const provider = new MockProvider();
// Returns deterministic scores for testingResults Format
interface EvaluationResults {
overall_score: number; // Weighted average (0-10)
metadata: {
timestamp: string;
provider: string;
model: string;
metrics_used: string[];
total_questions: number;
evaluation_duration_ms: number;
};
metrics: {
[metricName: string]: number; // Average score per metric
};
per_question: Array<{
question: string;
answer: string;
expected_answer?: string;
scores: { [metricName: string]: number };
explanations: { [metricName: string]: string };
overall_score: number;
}>;
statistics: {
mean: number;
median: number;
std_dev: number;
min: number;
max: number;
passed_threshold_percentage: number; // % scoring >7
};
}Custom Metric Types
When creating custom metrics, use provided interfaces for type safety:
import { BaseMetric, MetricInput, MetricOutput } from '@ragas-lib/core';
class MyCustomMetric extends BaseMetric {
name = 'my_metric';
description = 'My custom metric';
async compute(input: MetricInput): Promise<MetricOutput> {
const { question, answer, context, expectedAnswer } = input;
// ✅ All properties are typed and required/optional as needed
const score = /* calculate 0-10 */;
return {
score,
explanation: 'Why this score?',
metadata: { /* optional metadata */ },
};
}
}Examples
Example 1: Evaluate a Simple RAG System
// rag-evaluation.ts
import { RAGAssessment } from '@ragas-lib/core';
import { GeminiProvider } from '@ragas-lib/gemini';
async function evaluateRAG() {
const evaluator = new RAGAssessment({
provider: new GeminiProvider({
apiKey: process.env.GEMINI_API_KEY,
}),
metrics: ['faithfulness', 'relevance', 'coherence'],
});
const testCases = [
{
question: 'What is machine learning?',
expectedAnswer: 'Machine learning is a subset of AI where systems learn from data patterns.',
ragAnswer: 'ML enables computers to learn from data without explicit programming.',
},
{
question: 'Name three ML algorithms',
expectedAnswer: 'Decision Trees, Random Forest, Neural Networks',
ragAnswer: 'Common ML algorithms include Decision Trees, SVM, and K-means clustering.',
},
];
const results = await evaluator.evaluate({
dataset: testCases.map(t => ({
question: t.question,
expectedAnswer: t.expectedAnswer,
})),
ragAnswers: testCases.map(t => t.ragAnswer),
});
console.log(`Overall Score: ${results.overall_score}/10`);
console.log(`\nDetailed Results:`, JSON.stringify(results, null, 2));
}
evaluateRAG().catch(console.error);Example 2: CI/CD Integration (GitHub Actions)
# .github/workflows/rag-evaluation.yml
name: RAG Quality Check
on: [push, pull_request]
jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: '18'
- run: npm install
- name: Run RAG Evaluation
env:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
run: npx ragas evaluate --dataset ground_truth.json --threshold 7.5
- name: Generate Report
if: always()
run: npx ragas report --format html --output evaluation_report.html
- name: Upload Report
uses: actions/upload-artifact@v3
if: always()
with:
name: evaluation-report
path: evaluation_report.html
- name: Fail if Score Too Low
run: |
SCORE=$(jq '.overall_score' results.json)
if (( $(echo "$SCORE < 7.5" | bc -l) )); then
echo "RAG score ($SCORE) below threshold (7.5)"
exit 1
fiExample 3: Express.js Integration
// server.ts
import express from 'express';
import { RAGAssessment } from '@ragas-lib/core';
import { GeminiProvider } from '@ragas-lib/gemini';
const app = express();
const evaluator = new RAGAssessment({
provider: new GeminiProvider({ apiKey: process.env.GEMINI_API_KEY }),
metrics: ['faithfulness', 'relevance'],
});
app.post('/api/evaluate', async (req, res) => {
const { dataset, ragAnswers } = req.body;
try {
const results = await evaluator.evaluate({
dataset,
ragAnswers,
});
res.json(results);
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.listen(3000, () => console.log('Server running on port 3000'));Example 4: Batch Evaluation with Progress Tracking
import { RAGAssessment, DatasetManager } from '@ragas-lib/core';
import { OpenAIProvider } from '@ragas-lib/openai';
import fs from 'fs';
async function batchEvaluation() {
const datasetManager = new DatasetManager();
await datasetManager.loadFromJSON('1000_qa_pairs.json');
const evaluator = new RAGAssessment({
provider: new OpenAIProvider({ apiKey: process.env.OPENAI_API_KEY }),
parallelConcurrency: 5, // Evaluate 5 at a time
});
let processedCount = 0;
const total = datasetManager.getAll().length;
const results = await evaluator.evaluate({
dataset: datasetManager.getAll(),
ragAnswers: ragAnswersBatch,
onProgress: (current, total) => {
console.log(`Progress: ${current}/${total} (${Math.round(current/total*100)}%)`);
},
});
// Save results
fs.writeFileSync('results.json', JSON.stringify(results, null, 2));
console.log(`\nEvaluation complete! Results saved to results.json`);
}
batchEvaluation().catch(console.error);Dataset Format
JSON Format
[
{
"id": "q1",
"question": "What are lev-boots?",
"expected_answer": "Gravity-reversing footwear...",
"metadata": {
"source": "research_paper",
"difficulty": "easy",
"tags": ["product", "physics"]
}
},
{
"id": "q2",
"question": "How do they work?",
"expected_answer": "Using localized gravity reversal...",
"metadata": {
"source": "white_paper",
"difficulty": "hard"
}
}
]CSV Format
question,expected_answer,source,difficulty
"What are lev-boots?","Gravity-reversing footwear...",research_paper,easy
"How do they work?","Using localized gravity reversal...",white_paper,hardImporting from Different Sources
import { DatasetManager } from '@ragas-lib/core';
const manager = new DatasetManager();
// From JSON
await manager.loadFromJSON('questions.json');
// From CSV
await manager.loadFromCSV('questions.csv');
// From API
await manager.loadFromAPI('https://api.example.com/qa-pairs', {
auth: 'Bearer token',
});
// From Database
await manager.loadFromDatabase(connection, {
query: 'SELECT question, expected_answer FROM qa_pairs',
});Performance & Optimization
Parallel Evaluation
const evaluator = new RAGAssessment({
provider: new GeminiProvider(),
parallelConcurrency: 10, // Process 10 questions simultaneously
timeout: 30000, // 30 second timeout per question
});Rate Limiting & Throttling
const evaluator = new RAGAssessment({
provider: new GeminiProvider(),
rateLimit: {
requestsPerMinute: 100,
burstSize: 20,
},
});Cost Estimation
import { CostEstimator } from '@ragas-lib/core';
const estimator = new CostEstimator();
const cost = estimator.estimate({
provider: 'gemini',
numQuestions: 1000,
metricsPerQuestion: 3,
});
console.log(`Estimated cost: $${cost.totalCost}`);
console.log(`Input tokens: ${cost.inputTokens}`);
console.log(`Output tokens: ${cost.outputTokens}`);Troubleshooting
Issue: API Key Not Found
Error: GEMINI_API_KEY not found in environment variables
Solution:
1. Create .env file with your API key
2. Or set environment variable: export GEMINI_API_KEY=your_key_here
3. Verify with: echo $GEMINI_API_KEYIssue: Rate Limit Exceeded
Error: Rate limit exceeded. Max 60 requests per minute.
Solution:
1. Reduce parallelConcurrency (default: 5)
2. Enable rate limiting with requestsPerMinute setting
3. Use retry logic with exponential backoffIssue: Timeout on Large Datasets
// Increase timeout for complex evaluations
const evaluator = new RAGAssessment({
provider: new GeminiProvider(),
timeout: 60000, // 60 seconds
retries: 3,
});Debug Mode
const evaluator = new RAGAssessment({
provider: new GeminiProvider(),
verbose: true, // Enable detailed logging
logger: console, // Use custom logger
});Contributing
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a branch (
git checkout -b feature/your-feature) - Make changes and write tests
- Run tests (
npm test) - Submit a PR with a clear description
Development Setup
git clone https://github.com/ikrigel/ragas-lib-typescript.git
cd ragas-lib
npm install
npm run build
npm testAreas Welcoming Contributions
- 🆕 New LLM provider adapters (Claude, Cohere, etc.)
- 📊 Additional evaluation metrics
- 📁 Dataset format loaders (Excel, Notion, etc.)
- 🐛 Bug fixes and performance improvements
- 📝 Documentation and examples
- 🧪 Test coverage improvements
See CONTRIBUTING.md for detailed guidelines.
Documentation
- Full API Documentation - Detailed reference for all classes and methods
- Architecture Guide - System design and extensibility
- Provider Integration Guide - How to implement custom providers
- Metric Development Guide - Creating custom evaluation metrics
- CLI Reference - Command-line tool documentation
- Troubleshooting - Common issues and solutions
Examples Repository
Full working examples available at:
- ragas-lib-examples
- Express.js integration
- Next.js dashboard
- LangChain integration
- Docker deployment
Roadmap
v0.1.0 (Current)
- ✅ Core metrics (Faithfulness, Relevance, Coherence)
- ✅ Dataset management
- ✅ Batch evaluation
- ✅ JSON/CSV reports
- ✅ Gemini provider
v0.2.0 (Q1 2026)
- 🔄 Context Precision & Recall metrics
- 🔄 Perplexity & OpenAI providers
- 🔄 Full CLI interface
- 🔄 HTML report generation
v0.3.0 (Q2 2026)
- 🔄 Custom metric composition
- 🔄 Database adapters (PostgreSQL, SQLite)
- 🔄 Web dashboard
- 🔄 Webhook integrations
v1.0.0 (Q3 2026)
- 🔄 Stable API
- 🔄 Production-grade performance
- 🔄 Large-scale benchmark datasets
- 🔄 Enterprise support
License
This project is licensed under the MIT License - see LICENSE file for details.
Support & Community
- 💬 Discussions: GitHub Discussions
- 🐛 Issues: Report a bug
- 💡 Feature Requests: Suggest a feature
- 📧 Email: [email protected]
Acknowledgments
This library was built based on:
- RAGAS (Python) - Pioneering RAG evaluation framework
- LevBoots Project - Real-world RAG implementation patterns
- LangChain & LlamaIndex - RAG ecosystem leadership
- Community feedback - Invaluable insights and use cases
Polo and ikrigel Made it with ❤️ for the RAG community Thank you Jona for the challenge ❤️🙏
