vibeval
v0.9.0
Published
vibeval (Vibe Coding Eval) — AI application testing framework
Downloads
52
Readme
vibeval — Vibe Coding Eval
A fast evaluation framework for AI applications. Install Claude Code and run vibeval via npx to get an end-to-end workflow from code analysis to test generation to evaluation.
What Problem Does It Solve
Traditional software testing frameworks cannot assess the quality of AI outputs; traditional AI evaluation platforms rely on dataset construction and cannot keep up with the pace of feature iteration. vibeval strikes a balance between the two:
- Analyze your code via VibeCoding to quickly generate synthetic data and test cases
- Deterministic rules + LLM semantic judgment for dual-layer evaluation
- Cross-version comparison to track quality changes over time
- Language-agnostic: generated test code adapts to your project's framework without depending on the vibeval package
- Per-tool validation for Agent projects (custom tools, MCP tools, sub-agents) with a 5-dimension coverage matrix enforced by the Evaluator
Prerequisites
- Claude Code
- Node.js 20+ (for
npx/npm)
Installation
Install the vibeval skill into Claude Code with one command:
npx --yes vibeval install # global: ~/.claude/skills/vibeval/
# or scope it to the current project:
npx --yes vibeval install --local # ./.claude/skills/vibeval/Then open Claude Code and run /vibeval (or just ask it to test an AI feature). Later, npx --yes vibeval update refreshes the skill and npx --yes vibeval uninstall removes it.
The CLI itself needs no install — every invocation runs via npx --yes vibeval .... If you call it frequently and want to skip the npx lookup latency, do a one-time global install:
npm install -g vibeval
# then you can use `vibeval ...` directly in place of `npx --yes vibeval ...`Usage
Before first use, verify that the LLM provider is set up correctly:
npx --yes vibeval checkThen run the unified workflow inside Claude Code:
/vibeval meeting_summaryThe /vibeval command detects your project state and guides you through the appropriate phase:
- New project — Scans for AI code, suggests features to test, runs the full pipeline
- In progress — Verifies existing artifacts, continues from where you left off
- Complete — Detects code changes for incremental updates, or lets you re-run, add tests, or modify designs
Each phase (analyze → design → code → synthesize → run) pauses for your review before continuing. Every step produces editable intermediate files.
Cross-Version Comparison
# Statistical comparison
npx --yes vibeval diff meeting_summary run_a run_b
# LLM deep comparison
npx --yes vibeval compare meeting_summary run_a run_bInteractive Dashboard
npx --yes vibeval serveLaunches a web dashboard to browse all features, view test results and traces, visualize trends across runs, and manage datasets and judge specs. The server binds to 127.0.0.1:8080 by default; pass --open to also open the dashboard in your default browser, or --host / --port to change where it listens.
Data Validation
# Validate datasets, results, and analysis/design artifacts against the protocol
npx --yes vibeval validate meeting_summaryChecks manifest structure, judge specs, data item fields, _mock_context, trace format, and the agent-tools 5-dimension coverage matrix (Rule 7) when analysis.yaml + design.yaml are present.
Other Commands
# Show evaluation summary
npx --yes vibeval summary meeting_summary latest
# List features and runs
npx --yes vibeval features
npx --yes vibeval runs meeting_summary
# See all commands
npx --yes vibeval --helpDevelopment
The project is a single TypeScript package (esbuild + vitest) at the repo root:
npm install
npm test # vitest
npm run build # → dist/cli.js
node dist/cli.js --helpSee CLAUDE.md for the project guide.
License
MIT
