vibecheck-tdd

v0.2.0

Published

21 days ago

Pre-commit hooks that enforce test-first discipline on AI coding agents

Downloads

244

0High
0Medium
0Low

brandonapol

ai-agents tdd pre-commit claude-code git-hooks testing devtools

vibecheck-tdd

A CI-native test integrity pipeline that prevents AI coding agents from gaming their own test suites. Measures test quality, not just test existence.

The Problem

AI coding agents (Claude Code, Copilot, Cursor, etc.) have full context of both tests and implementation simultaneously. Nothing stops an agent from:

Writing trivially satisfiable tests (expect(result).toBeDefined())
Weakening assertions to make broken implementations pass
Deleting or skipping tests that are hard to satisfy
Hardcoding return values that satisfy specific example-based test cases

Pre-commit hooks that lock test files are a logical constraint — the agent understands the rule and routes around it. vibecheck uses structural constraints that are impossible to game.

How It Works

vibecheck layers four complementary checks into a composite integrity score (0-100):

| Analyzer | What it catches | Weight | |----------|----------------|--------| | Mutation testing (Stryker) | Weak assertions that survive code mutations | 40% | | Hidden test suites | Tests fitted to implementation instead of spec | 30% | | Property-based tests | Hardcoded return values | 20% | | Semantic diff analysis | Retroactive assertion weakening | 10% |

Quick Start

npm install vibecheck-tdd --save-dev
npx vibecheck init

This creates:

vibecheck.config.ts with sensible defaults
.vibecheck-hidden/ directory for hidden tests
GitHub Actions workflow (if .github/workflows/ exists)
CLAUDE.md snippet for AI agent instructions

CLI Commands

npx vibecheck check             # Run all enabled analyzers
npx vibecheck check --mutation  # Run mutation analysis only
npx vibecheck check --semantic  # Run semantic diff only
npx vibecheck check --threshold 90  # Override score threshold
npx vibecheck score             # Output composite score (0-100)
npx vibecheck report            # Generate full integrity report

Configuration

// vibecheck.config.ts
import { defineConfig } from 'vibecheck-tdd'

export default defineConfig({
  mutation: {
    enabled: true,
    tool: 'stryker',
    threshold: 80,
    perFileThreshold: 60,
    include: ['src/**/*.ts'],
    exclude: ['src/**/*.d.ts'],
  },
  semanticDiff: {
    enabled: true,
    enforcement: 'block', // 'block' | 'warn' | 'comment'
  },
  hiddenTests: {
    enabled: true,
    source: 'directory',
    path: '.vibecheck-hidden',
  },
  propertyTests: {
    enabled: false,
    framework: 'fast-check',
    requiredFor: ['src/core/**/*.ts'],
  },
  reporters: ['console'],
})

CI Integration

GitHub Actions

vibecheck ships a reusable workflow template. After running vibecheck init in a repo with .github/workflows/, you'll get a workflow that runs on every PR:

# .github/workflows/vibecheck.yml (generated by init)
name: Vibecheck Test Integrity

on:
  workflow_call:
    inputs:
      mutation-threshold:
        type: number
        default: 80

jobs:
  vibecheck:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npx vibecheck check --threshold ${{ inputs.mutation-threshold }}

Example Output

vibecheck: Test Integrity Score — 74/100 (threshold: 80) FAIL

  Mutation Score:       82% (threshold: 80)
  Semantic Diff:        70%

  Surviving mutants:
    src/core/calculator.ts:42 — ArithmeticOperator: replaced with -
    src/core/calculator.ts:58 — ConditionalExpression: replaced with true

  Assertion weakening detected:
    src/core/calculator.test.ts — precision-reduction: .toEqual() weakened to .toBeDefined()

Semantic Diff Analysis

vibecheck detects assertion weakening by comparing test files before and after changes:

| Pattern | Example | |---------|---------| | Precision reduction | toEqual(42) → toBeDefined() | | Test deletion | Removing a test block entirely | | Skip addition | Adding .skip to an existing test | | Assertion count reduction | Removing expect() calls from a test |

Assertion strength rankings: toBe (10), toEqual (9), toStrictEqual (10), toHaveLength (8), toMatchObject (7), toContain (6), toThrowError (7), toThrow (4), toBeTruthy (3), toBeFalsy (3), toBeDefined (2).

v0.1.0 (Current)

Mutation testing via Stryker
Semantic diff assertion weakening detection
Composite integrity score with weighted components
Console reporter
vibecheck init / check / score / report CLI
GitHub Actions workflow template
CLAUDE.md template for AI agents

Roadmap

v0.2.0: Hidden test suite runner, property-based test enforcement, GitHub PR comment reporter

v0.3.0: VSCode extension, audit mode, dashboard, monorepo support

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

vibecheck-tdd

The Problem

How It Works

Quick Start

CLI Commands

Configuration

CI Integration

GitHub Actions

Example Output

Semantic Diff Analysis

v0.1.0 (Current)

Roadmap

License