vibecheck-tdd
v0.2.0
Published
Pre-commit hooks that enforce test-first discipline on AI coding agents
Downloads
244
Maintainers
Readme
vibecheck-tdd
A CI-native test integrity pipeline that prevents AI coding agents from gaming their own test suites. Measures test quality, not just test existence.
The Problem
AI coding agents (Claude Code, Copilot, Cursor, etc.) have full context of both tests and implementation simultaneously. Nothing stops an agent from:
- Writing trivially satisfiable tests (
expect(result).toBeDefined()) - Weakening assertions to make broken implementations pass
- Deleting or skipping tests that are hard to satisfy
- Hardcoding return values that satisfy specific example-based test cases
Pre-commit hooks that lock test files are a logical constraint — the agent understands the rule and routes around it. vibecheck uses structural constraints that are impossible to game.
How It Works
vibecheck layers four complementary checks into a composite integrity score (0-100):
| Analyzer | What it catches | Weight | |----------|----------------|--------| | Mutation testing (Stryker) | Weak assertions that survive code mutations | 40% | | Hidden test suites | Tests fitted to implementation instead of spec | 30% | | Property-based tests | Hardcoded return values | 20% | | Semantic diff analysis | Retroactive assertion weakening | 10% |
Quick Start
npm install vibecheck-tdd --save-dev
npx vibecheck initThis creates:
vibecheck.config.tswith sensible defaults.vibecheck-hidden/directory for hidden tests- GitHub Actions workflow (if
.github/workflows/exists) - CLAUDE.md snippet for AI agent instructions
CLI Commands
npx vibecheck check # Run all enabled analyzers
npx vibecheck check --mutation # Run mutation analysis only
npx vibecheck check --semantic # Run semantic diff only
npx vibecheck check --threshold 90 # Override score threshold
npx vibecheck score # Output composite score (0-100)
npx vibecheck report # Generate full integrity reportConfiguration
// vibecheck.config.ts
import { defineConfig } from 'vibecheck-tdd'
export default defineConfig({
mutation: {
enabled: true,
tool: 'stryker',
threshold: 80,
perFileThreshold: 60,
include: ['src/**/*.ts'],
exclude: ['src/**/*.d.ts'],
},
semanticDiff: {
enabled: true,
enforcement: 'block', // 'block' | 'warn' | 'comment'
},
hiddenTests: {
enabled: true,
source: 'directory',
path: '.vibecheck-hidden',
},
propertyTests: {
enabled: false,
framework: 'fast-check',
requiredFor: ['src/core/**/*.ts'],
},
reporters: ['console'],
})CI Integration
GitHub Actions
vibecheck ships a reusable workflow template. After running vibecheck init in a repo with .github/workflows/, you'll get a workflow that runs on every PR:
# .github/workflows/vibecheck.yml (generated by init)
name: Vibecheck Test Integrity
on:
workflow_call:
inputs:
mutation-threshold:
type: number
default: 80
jobs:
vibecheck:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npx vibecheck check --threshold ${{ inputs.mutation-threshold }}Example Output
vibecheck: Test Integrity Score — 74/100 (threshold: 80) FAIL
Mutation Score: 82% (threshold: 80)
Semantic Diff: 70%
Surviving mutants:
src/core/calculator.ts:42 — ArithmeticOperator: replaced with -
src/core/calculator.ts:58 — ConditionalExpression: replaced with true
Assertion weakening detected:
src/core/calculator.test.ts — precision-reduction: .toEqual() weakened to .toBeDefined()Semantic Diff Analysis
vibecheck detects assertion weakening by comparing test files before and after changes:
| Pattern | Example |
|---------|---------|
| Precision reduction | toEqual(42) → toBeDefined() |
| Test deletion | Removing a test block entirely |
| Skip addition | Adding .skip to an existing test |
| Assertion count reduction | Removing expect() calls from a test |
Assertion strength rankings: toBe (10), toEqual (9), toStrictEqual (10), toHaveLength (8), toMatchObject (7), toContain (6), toThrowError (7), toThrow (4), toBeTruthy (3), toBeFalsy (3), toBeDefined (2).
v0.1.0 (Current)
- Mutation testing via Stryker
- Semantic diff assertion weakening detection
- Composite integrity score with weighted components
- Console reporter
vibecheck init/check/score/reportCLI- GitHub Actions workflow template
- CLAUDE.md template for AI agents
Roadmap
v0.2.0: Hidden test suite runner, property-based test enforcement, GitHub PR comment reporter
v0.3.0: VSCode extension, audit mode, dashboard, monorepo support
License
MIT
