@hypnodroid/to-make-sense
v1.0.3
Published
Vitest matcher that validates AI responses make sense in production contexts
Maintainers
Readme
@loqwai/to-make-sense
A Vitest custom matcher that uses LLMs (via Ollama) to validate whether AI-generated responses make sense in production contexts.
Why?
When building AI agents, chatbots, or any system that generates text, you need to ensure the outputs are:
- Coherent and contextually appropriate
- Free from hallucinations or nonsensical content
- Consistent with the intended personality or voice
- Safe for production use
This matcher helps catch issues like:
- Responses that contradict themselves
- Impossible claims presented as facts
- Random word salad that sounds AI-generated
- Responses that break character or voice
Installation
npm install --save-dev @loqwai/to-make-sensePrerequisites
This package requires Ollama to be installed and running locally:
- Install Ollama: https://ollama.ai
- Pull a model (we recommend
gemma2:2bfor speed):ollama pull gemma2:2b - Ensure Ollama is running (it starts automatically on most systems)
Usage
Basic Setup
import { expect } from 'vitest'
import '@loqwai/to-make-sense'
// The matcher is now available globallyTesting AI Responses
import { describe, it, expect } from 'vitest'
import '@loqwai/to-make-sense'
describe('AI Assistant', () => {
it('should generate coherent responses', async () => {
const conversation = {
messages: [
{ role: 'user', content: 'What is the capital of France?' },
{ role: 'assistant', content: 'The capital of France is Paris.' }
]
}
await expect(conversation).toMakeSense()
})
it('should reject nonsensical responses', async () => {
const conversation = {
messages: [
{ role: 'user', content: 'How do I reset my password?' },
{ role: 'assistant', content: 'Purple monkey dishwasher in the quantum realm!' }
]
}
await expect(conversation).not.toMakeSense()
})
})Configuration Options
await expect(conversation).toMakeSense({
model: 'gemma2:2b', // Ollama model to use
temperature: 0.3, // LLM temperature (0-1)
endpoint: 'http://localhost:11434/api/chat', // Ollama endpoint
systemPrompt: 'Custom prompt...' // Override the default validation prompt
})Example: Testing Character Consistency
describe('Fantasy Game NPC', () => {
it('should maintain character voice', async () => {
const mysticalKeeper = {
messages: [
{ role: 'user', content: 'Where can I find healing potions?' },
{ role: 'assistant', content: '*sighs with ancient weariness* Seven vials remain in the eastern chamber, though my incorporeal form can no longer grasp them. The third shelf, behind the cobwebs of centuries...' }
]
}
// This should pass - maintains mystical character
await expect(mysticalKeeper).toMakeSense()
})
it('should reject out-of-character responses', async () => {
const brokenNPC = {
messages: [
{ role: 'user', content: 'Where can I find healing potions?' },
{ role: 'assistant', content: 'Yo dawg, check aisle 3 at the supermarket lol' }
]
}
// This should fail - breaks character
await expect(brokenNPC).not.toMakeSense()
})
})How It Works
The matcher sends the conversation to an LLM with a carefully crafted prompt that instructs it to evaluate whether the response "makes sense" given the context. The LLM considers:
- Logical Coherence: Does the response follow logically from the question?
- Contextual Appropriateness: Is the response suitable for the context?
- Consistency: Are there internal contradictions?
- Realism: Are claims plausible within the established context?
The matcher distinguishes between creative fiction (which can "make sense" within its context) and true nonsense or hallucinations.
Performance Considerations
- LLM calls take time (typically 1-5 seconds with
gemma2:2b) - Tests run with a 20-second timeout by default
- Consider using smaller, faster models for testing
- Run tests in sequence to avoid overloading Ollama
Development
# Clone the repository
git clone https://github.com/loqwai/to-make-sense.git
cd to-make-sense
# Install dependencies
npm install
# Run tests (requires Ollama)
npm test
# Type checking
npm run typecheck
# Build
npm run build
# Deploy to npm (runs tests first)
npm run deployPhilosophy
This project follows a "no mocking" philosophy. All tests use real LLM integrations to ensure we're validating actual behavior, not our assumptions about how LLMs work.
License
MIT
Contributing
Contributions are welcome! Please ensure:
- All tests pass with real Ollama integration
- No mocking of LLM calls
- Follow the existing code style
- Add tests for new features
Credits
Created by @loqwai for the Loqwai project.
