@vezlo/ai-validator

v1.2.0

Published

4 months ago

AI Response Validator - Automated accuracy checking, hallucination prevention, and confidence scoring for AI responses

AI Validator

AI Response Validator - Automated accuracy checking, hallucination prevention, and confidence scoring for AI responses.

🎯 Purpose

AI Validator helps you ensure the quality and reliability of AI-generated responses by:

✅ LLM-as-Judge Context Validation - Semantic accuracy checking using OpenAI/Claude
✅ Developer Mode - Strict code grounding validation for technical queries
✅ Automated Accuracy Checking - Verify AI responses against source documents
✅ Hallucination Prevention - Detect when AI invents information not in sources
✅ Confidence Scoring - Get reliability scores for every response
✅ Query Classification - Skip validation for greetings, typos, and small talk
✅ Multi-LLM Support - Works with OpenAI and Claude

Perfect for RAG systems, knowledge bases, codebase Q&A, and any application where AI response quality matters.

🚀 Quick Start

Installation

npm install @vezlo/ai-validator

Or install globally for CLI access:

npm install -g @vezlo/ai-validator

For Local Development/Testing

# Clone the repository
git clone https://github.com/vezlo/ai-validator.git
cd ai-validator

# Install dependencies
npm install

# Build the project
npm run build

# Run the test CLI
npm test

💻 Usage

1. CLI Testing (Interactive)

Test the validator interactively without writing code:

# Using npx (no installation required)
npx vezlo-validator-test

# Or if installed globally
vezlo-validator-test

The CLI will guide you through:

Selecting LLM provider (OpenAI or Claude)
Entering API keys
Choosing models (any OpenAI or Claude model)
Configuring validation settings
Testing with your own queries and responses
Easy text input for sources (no JSON required)

2. Code Usage (Programmatic)

Basic Example

import { AIValidator } from '@vezlo/ai-validator';

// Initialize with your API key and provider
const validator = new AIValidator({
  openaiApiKey: 'sk-your-openai-key',  // Your OpenAI API key
  llmProvider: 'openai'                 // 'openai' or 'claude'
});

// Validate a response
const validation = await validator.validate({
  query: "What is machine learning?",
  response: "Machine learning is a subset of AI that focuses on algorithms.",
  sources: [
    {
      content: "Machine learning is a subset of artificial intelligence that focuses on algorithms and statistical models.",
      title: "ML Guide",
      url: "https://example.com/ml-guide"
    }
  ]
});

// Check results
console.log(`Confidence: ${(validation.confidence * 100).toFixed(1)}%`);
console.log(`Valid: ${validation.valid}`);
console.log(`Accuracy: ${validation.accuracy.verified ? 'Verified' : 'Not verified'}`);
console.log(`Hallucination Risk: ${(validation.hallucination.risk * 100).toFixed(1)}%`);
console.log(`Warnings: ${validation.warnings.join(', ')}`);

Advanced Configuration

import { AIValidator } from '@vezlo/ai-validator';

const validator = new AIValidator({
  // API Keys (at least one required)
  openaiApiKey: 'sk-your-openai-key',
  claudeApiKey: 'sk-ant-your-claude-key',
  
  // LLM Provider (required)
  llmProvider: 'openai', // 'openai' or 'claude'
  
  // Model Selection (optional)
  openaiModel: 'gpt-4o-mini',  // Default for LLM Judge
  claudeModel: 'claude-3-haiku-20240307',  // Default for LLM Judge
  
  // Validation Settings (optional)
  confidenceThreshold: 0.7,              // 0.0 - 1.0 (default: 0.7)
  enableQueryClassification: true,       // Skip validation for greetings/typos
  enableContextValidation: true,         // Context relevance validation (default: true)
  useLLMJudge: true,                    // Use LLM-as-Judge for context (default: false)
  developerMode: false,                  // Strict code grounding mode (default: false)
  enableAccuracyCheck: false,           // LLM-based accuracy checking (default: false)
  enableHallucinationDetection: false   // LLM-based hallucination detection (default: false)
});

Integration with RAG Systems

// Example with a RAG system
const ragResponse = await yourRAGSystem.query(userQuestion);
const sources = await yourRAGSystem.getSources(userQuestion);

const validation = await validator.validate({
  query: userQuestion,
  response: ragResponse.content,
  sources: sources.map(s => ({
    content: s.text,
    title: s.title,
    url: s.url
  }))
});

if (validation.valid) {
  // Show response to user
  return ragResponse.content;
} else {
  // Handle low confidence response
  console.warn('Low confidence response:', validation.warnings);
  return "I'm not confident about this answer. Please consult additional sources.";
}

📊 Validation Results

interface ValidationResult {
  confidence: number;        // 0.0 - 1.0
  valid: boolean;            // true if confidence >= threshold
  accuracy: {
    verified: boolean;
    verification_rate: number;
    reason?: string;
  };
  context: {
    source_relevance: number;
    source_usage_rate: number;
    valid: boolean;
  };
  hallucination: {
    detected: boolean;
    risk: number;
    hallucinated_parts?: string[];
  };
  warnings: string[];
  query_type?: string;       // 'greeting', 'question', etc.
  skip_validation?: boolean; // true for greetings/typos
}

🔧 Configuration

Configuration Options

All configuration is done in code when initializing the validator:

interface AIValidatorConfig {
  // API Keys (at least one required)
  openaiApiKey?: string;      // Your OpenAI API key
  claudeApiKey?: string;       // Your Claude API key
  
  // Provider (required)
  llmProvider: 'openai' | 'claude';
  
  // Models (optional - specify any valid model from the chosen provider)
  openaiModel?: string;        // Default: 'gpt-4o'
  claudeModel?: string;        // Default: 'claude-sonnet-4-5-20250929'
  
  // Validation Settings (optional)
  confidenceThreshold?: number;           // Default: 0.7
  enableQueryClassification?: boolean;    // Default: true
  enableAccuracyCheck?: boolean;         // Default: true
  enableHallucinationDetection?: boolean; // Default: true
}

Model Support

OpenAI Models: You can use any OpenAI chat model by specifying it in openaiModel. Common choices include:

gpt-4o (default, recommended)
gpt-4o-mini (faster, cheaper)
gpt-4 (previous flagship)
gpt-4-turbo
Or any other OpenAI chat completion model

Claude Models: You can use any Claude model by specifying it in claudeModel. Common choices include:

claude-sonnet-4-5-20250929 (default, Claude 4.5 Sonnet)
claude-opus-4-1-20250805 (Claude 4.1 Opus)
claude-3-7-sonnet-20250219 (Claude 3.7 Sonnet)
Or any other Claude model identifier

The validator will work with any model supported by the respective provider's API.

CLI Commands

# Interactive testing CLI
npx vezlo-validator-test

# Development commands
npm run build   # Build the project
npm run clean   # Clean build files
npm test        # Run the test CLI

🎯 Use Cases

1. RAG Systems

Validate responses against retrieved documents to ensure accuracy.

2. Customer Support Bots

Prevent incorrect information from reaching customers.

3. Knowledge Base Applications

Ensure AI answers are grounded in your documentation.

4. Content Generation

Validate AI-generated content against source materials.

5. Educational Applications

Ensure AI tutoring responses are accurate and helpful.

⚡ Performance

Validation Time: 2-5 seconds per response (depending on LLM provider)
Cost: Additional LLM API calls for validation
Accuracy: High accuracy for responses with good sources
Reliability: Graceful handling of edge cases

🔍 How It Works

Query Classification - Identifies greetings, typos, and small talk (skips validation)
Accuracy Checking - Uses LLM to verify facts against source documents
Hallucination Detection - Identifies information not present in sources
Context Validation - Ensures response relevance to the query
Confidence Scoring - Combines all metrics into a single score

📝 Examples

High Confidence Response

{
  confidence: 0.92,
  valid: true,
  accuracy: { verified: true, verification_rate: 0.95 },
  hallucination: { detected: false, risk: 0.05 },
  warnings: []
}

Low Confidence Response

{
  confidence: 0.35,
  valid: false,
  accuracy: { verified: false, verification_rate: 0.2 },
  hallucination: { detected: true, risk: 0.8 },
  warnings: ["No sources provided - high hallucination risk"]
}

Skipped Validation (Greeting)

{
  confidence: 1.0,
  valid: true,
  query_type: "greeting",
  skip_validation: true,
  warnings: []
}

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is dual-licensed:

Non-Commercial Use: Free under AGPL-3.0 license
Commercial Use: Requires a commercial license - contact us for details

See the LICENSE file for complete AGPL-3.0 license terms.

🆘 Support

Issues: GitHub Issues
Documentation: GitHub Wiki
Discussions: GitHub Discussions

🔗 Related Projects

@vezlo/assistant-server - AI Assistant Server with RAG capabilities
@vezlo/src-to-kb - Convert source code to knowledge base

Status: ✅ Production Ready | Version: 1.2.0 | License: AGPL-3.0 | Node.js: 20+

Made with ❤️ by Vezlo