vibeval

v0.9.0

Published

a month ago

vibeval (Vibe Coding Eval) — AI application testing framework

Downloads

0High
0Medium
0Low

sandykid

vibeval — Vibe Coding Eval

A fast evaluation framework for AI applications. Install Claude Code and run vibeval via npx to get an end-to-end workflow from code analysis to test generation to evaluation.

What Problem Does It Solve

Traditional software testing frameworks cannot assess the quality of AI outputs; traditional AI evaluation platforms rely on dataset construction and cannot keep up with the pace of feature iteration. vibeval strikes a balance between the two:

Analyze your code via VibeCoding to quickly generate synthetic data and test cases
Deterministic rules + LLM semantic judgment for dual-layer evaluation
Cross-version comparison to track quality changes over time
Language-agnostic: generated test code adapts to your project's framework without depending on the vibeval package
Per-tool validation for Agent projects (custom tools, MCP tools, sub-agents) with a 5-dimension coverage matrix enforced by the Evaluator

Prerequisites

Claude Code
Node.js 20+ (for npx / npm)

Installation

Install the vibeval skill into Claude Code with one command:

npx --yes vibeval install            # global: ~/.claude/skills/vibeval/
# or scope it to the current project:
npx --yes vibeval install --local    # ./.claude/skills/vibeval/

Then open Claude Code and run /vibeval (or just ask it to test an AI feature). Later, npx --yes vibeval update refreshes the skill and npx --yes vibeval uninstall removes it.

The CLI itself needs no install — every invocation runs via npx --yes vibeval .... If you call it frequently and want to skip the npx lookup latency, do a one-time global install:

npm install -g vibeval
# then you can use `vibeval ...` directly in place of `npx --yes vibeval ...`

Usage

Before first use, verify that the LLM provider is set up correctly:

npx --yes vibeval check

Then run the unified workflow inside Claude Code:

/vibeval meeting_summary

The /vibeval command detects your project state and guides you through the appropriate phase:

New project — Scans for AI code, suggests features to test, runs the full pipeline
In progress — Verifies existing artifacts, continues from where you left off
Complete — Detects code changes for incremental updates, or lets you re-run, add tests, or modify designs

Each phase (analyze → design → code → synthesize → run) pauses for your review before continuing. Every step produces editable intermediate files.

Cross-Version Comparison

# Statistical comparison
npx --yes vibeval diff meeting_summary run_a run_b

# LLM deep comparison
npx --yes vibeval compare meeting_summary run_a run_b

Interactive Dashboard

npx --yes vibeval serve

Launches a web dashboard to browse all features, view test results and traces, visualize trends across runs, and manage datasets and judge specs. The server binds to 127.0.0.1:8080 by default; pass --open to also open the dashboard in your default browser, or --host / --port to change where it listens.

Data Validation

# Validate datasets, results, and analysis/design artifacts against the protocol
npx --yes vibeval validate meeting_summary

Checks manifest structure, judge specs, data item fields, _mock_context, trace format, and the agent-tools 5-dimension coverage matrix (Rule 7) when analysis.yaml + design.yaml are present.

Other Commands

# Show evaluation summary
npx --yes vibeval summary meeting_summary latest

# List features and runs
npx --yes vibeval features
npx --yes vibeval runs meeting_summary

# See all commands
npx --yes vibeval --help

Development

The project is a single TypeScript package (esbuild + vitest) at the repo root:

npm install
npm test        # vitest
npm run build   # → dist/cli.js
node dist/cli.js --help

See CLAUDE.md for the project guide.

License

MIT