@vezlo/ai-validator
v1.2.0
Published
AI Response Validator - Automated accuracy checking, hallucination prevention, and confidence scoring for AI responses
Readme
AI Validator
AI Response Validator - Automated accuracy checking, hallucination prevention, and confidence scoring for AI responses.
🎯 Purpose
AI Validator helps you ensure the quality and reliability of AI-generated responses by:
- ✅ LLM-as-Judge Context Validation - Semantic accuracy checking using OpenAI/Claude
- ✅ Developer Mode - Strict code grounding validation for technical queries
- ✅ Automated Accuracy Checking - Verify AI responses against source documents
- ✅ Hallucination Prevention - Detect when AI invents information not in sources
- ✅ Confidence Scoring - Get reliability scores for every response
- ✅ Query Classification - Skip validation for greetings, typos, and small talk
- ✅ Multi-LLM Support - Works with OpenAI and Claude
Perfect for RAG systems, knowledge bases, codebase Q&A, and any application where AI response quality matters.
🚀 Quick Start
Installation
npm install @vezlo/ai-validatorOr install globally for CLI access:
npm install -g @vezlo/ai-validatorFor Local Development/Testing
# Clone the repository
git clone https://github.com/vezlo/ai-validator.git
cd ai-validator
# Install dependencies
npm install
# Build the project
npm run build
# Run the test CLI
npm test💻 Usage
1. CLI Testing (Interactive)
Test the validator interactively without writing code:
# Using npx (no installation required)
npx vezlo-validator-test
# Or if installed globally
vezlo-validator-testThe CLI will guide you through:
- Selecting LLM provider (OpenAI or Claude)
- Entering API keys
- Choosing models (any OpenAI or Claude model)
- Configuring validation settings
- Testing with your own queries and responses
- Easy text input for sources (no JSON required)
2. Code Usage (Programmatic)
Basic Example
import { AIValidator } from '@vezlo/ai-validator';
// Initialize with your API key and provider
const validator = new AIValidator({
openaiApiKey: 'sk-your-openai-key', // Your OpenAI API key
llmProvider: 'openai' // 'openai' or 'claude'
});
// Validate a response
const validation = await validator.validate({
query: "What is machine learning?",
response: "Machine learning is a subset of AI that focuses on algorithms.",
sources: [
{
content: "Machine learning is a subset of artificial intelligence that focuses on algorithms and statistical models.",
title: "ML Guide",
url: "https://example.com/ml-guide"
}
]
});
// Check results
console.log(`Confidence: ${(validation.confidence * 100).toFixed(1)}%`);
console.log(`Valid: ${validation.valid}`);
console.log(`Accuracy: ${validation.accuracy.verified ? 'Verified' : 'Not verified'}`);
console.log(`Hallucination Risk: ${(validation.hallucination.risk * 100).toFixed(1)}%`);
console.log(`Warnings: ${validation.warnings.join(', ')}`);Advanced Configuration
import { AIValidator } from '@vezlo/ai-validator';
const validator = new AIValidator({
// API Keys (at least one required)
openaiApiKey: 'sk-your-openai-key',
claudeApiKey: 'sk-ant-your-claude-key',
// LLM Provider (required)
llmProvider: 'openai', // 'openai' or 'claude'
// Model Selection (optional)
openaiModel: 'gpt-4o-mini', // Default for LLM Judge
claudeModel: 'claude-3-haiku-20240307', // Default for LLM Judge
// Validation Settings (optional)
confidenceThreshold: 0.7, // 0.0 - 1.0 (default: 0.7)
enableQueryClassification: true, // Skip validation for greetings/typos
enableContextValidation: true, // Context relevance validation (default: true)
useLLMJudge: true, // Use LLM-as-Judge for context (default: false)
developerMode: false, // Strict code grounding mode (default: false)
enableAccuracyCheck: false, // LLM-based accuracy checking (default: false)
enableHallucinationDetection: false // LLM-based hallucination detection (default: false)
});Integration with RAG Systems
// Example with a RAG system
const ragResponse = await yourRAGSystem.query(userQuestion);
const sources = await yourRAGSystem.getSources(userQuestion);
const validation = await validator.validate({
query: userQuestion,
response: ragResponse.content,
sources: sources.map(s => ({
content: s.text,
title: s.title,
url: s.url
}))
});
if (validation.valid) {
// Show response to user
return ragResponse.content;
} else {
// Handle low confidence response
console.warn('Low confidence response:', validation.warnings);
return "I'm not confident about this answer. Please consult additional sources.";
}📊 Validation Results
interface ValidationResult {
confidence: number; // 0.0 - 1.0
valid: boolean; // true if confidence >= threshold
accuracy: {
verified: boolean;
verification_rate: number;
reason?: string;
};
context: {
source_relevance: number;
source_usage_rate: number;
valid: boolean;
};
hallucination: {
detected: boolean;
risk: number;
hallucinated_parts?: string[];
};
warnings: string[];
query_type?: string; // 'greeting', 'question', etc.
skip_validation?: boolean; // true for greetings/typos
}🔧 Configuration
Configuration Options
All configuration is done in code when initializing the validator:
interface AIValidatorConfig {
// API Keys (at least one required)
openaiApiKey?: string; // Your OpenAI API key
claudeApiKey?: string; // Your Claude API key
// Provider (required)
llmProvider: 'openai' | 'claude';
// Models (optional - specify any valid model from the chosen provider)
openaiModel?: string; // Default: 'gpt-4o'
claudeModel?: string; // Default: 'claude-sonnet-4-5-20250929'
// Validation Settings (optional)
confidenceThreshold?: number; // Default: 0.7
enableQueryClassification?: boolean; // Default: true
enableAccuracyCheck?: boolean; // Default: true
enableHallucinationDetection?: boolean; // Default: true
}Model Support
OpenAI Models:
You can use any OpenAI chat model by specifying it in openaiModel. Common choices include:
gpt-4o(default, recommended)gpt-4o-mini(faster, cheaper)gpt-4(previous flagship)gpt-4-turbo- Or any other OpenAI chat completion model
Claude Models:
You can use any Claude model by specifying it in claudeModel. Common choices include:
claude-sonnet-4-5-20250929(default, Claude 4.5 Sonnet)claude-opus-4-1-20250805(Claude 4.1 Opus)claude-3-7-sonnet-20250219(Claude 3.7 Sonnet)- Or any other Claude model identifier
The validator will work with any model supported by the respective provider's API.
CLI Commands
# Interactive testing CLI
npx vezlo-validator-test
# Development commands
npm run build # Build the project
npm run clean # Clean build files
npm test # Run the test CLI🎯 Use Cases
1. RAG Systems
Validate responses against retrieved documents to ensure accuracy.
2. Customer Support Bots
Prevent incorrect information from reaching customers.
3. Knowledge Base Applications
Ensure AI answers are grounded in your documentation.
4. Content Generation
Validate AI-generated content against source materials.
5. Educational Applications
Ensure AI tutoring responses are accurate and helpful.
⚡ Performance
- Validation Time: 2-5 seconds per response (depending on LLM provider)
- Cost: Additional LLM API calls for validation
- Accuracy: High accuracy for responses with good sources
- Reliability: Graceful handling of edge cases
🔍 How It Works
- Query Classification - Identifies greetings, typos, and small talk (skips validation)
- Accuracy Checking - Uses LLM to verify facts against source documents
- Hallucination Detection - Identifies information not present in sources
- Context Validation - Ensures response relevance to the query
- Confidence Scoring - Combines all metrics into a single score
📝 Examples
High Confidence Response
{
confidence: 0.92,
valid: true,
accuracy: { verified: true, verification_rate: 0.95 },
hallucination: { detected: false, risk: 0.05 },
warnings: []
}Low Confidence Response
{
confidence: 0.35,
valid: false,
accuracy: { verified: false, verification_rate: 0.2 },
hallucination: { detected: true, risk: 0.8 },
warnings: ["No sources provided - high hallucination risk"]
}Skipped Validation (Greeting)
{
confidence: 1.0,
valid: true,
query_type: "greeting",
skip_validation: true,
warnings: []
}🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📄 License
This project is dual-licensed:
- Non-Commercial Use: Free under AGPL-3.0 license
- Commercial Use: Requires a commercial license - contact us for details
See the LICENSE file for complete AGPL-3.0 license terms.
🆘 Support
- Issues: GitHub Issues
- Documentation: GitHub Wiki
- Discussions: GitHub Discussions
🔗 Related Projects
- @vezlo/assistant-server - AI Assistant Server with RAG capabilities
- @vezlo/src-to-kb - Convert source code to knowledge base
Status: ✅ Production Ready | Version: 1.2.0 | License: AGPL-3.0 | Node.js: 20+
Made with ❤️ by Vezlo
