hustle-vercel-ai
v0.1.0
Published
Registry-driven eval dashboard + AGENTS.md + docs index for Next.js AI projects. Covers Vercel AI Gateway, AI SDK v6, AI Elements, and the eval system. Achieves 100% agent eval pass rate (+47pp over baseline) via the AGENTS.md + docs index methodology.
Maintainers
Readme
Eval System — hustle-vercel-ai
A portable, registry-driven eval and testing dashboard for Next.js + Vercel AI SDK projects.
Drop this folder into any project that follows the naming contract and it works automatically — zero configuration.
Install
Via npm (recommended)
# Install into the current directory
npx hustle-vercel-ai
# Install into a specific directory
npx hustle-vercel-ai ./my-new-projectThe CLI copies all files, adds package.json scripts, creates .github/workflows/ci.yml, and prints a setup checklist.
Note on
docs-Vercel/: The Vercel AI SDK reference docs (~200 files) are not bundled in the npm package due to size. Download them from the GitHub repo and copydocs-Vercel/into your project manually, or skip them if you don't need offline docs.
Via shell script (from source)
chmod +x eval-system/install.sh
./eval-system/install.sh /path/to/your-new-project
# Or install into the current directory:
./eval-system/install.sh .The shell script also copies docs-Vercel/ automatically if it's present alongside eval-system/.
What gets installed
AGENTS.md ← fill in [TODO] sections for your project
NAMING-CONTRACT.md ← registry interface contract
.github/workflows/ci.yml ← GitHub Actions: typecheck + unit tests on every push
docs/ ← QUICK-START, EVAL-TESTING-STRATEGY, AI-SDK-PRIMITIVES
docs-Vercel/ ← AI SDK, Gateway, Elements docs (shell script only)
lib/eval/ ← eval types, cost calculator, api payloads, theme injector
lib/hooks/use-eval-registry.ts ← central hook (auto-discovers all tools + routes)
app/eval/page.tsx ← /eval live dashboard (4 tabs)
app/api/eval/run/route.ts ← AI eval runner (MockLanguageModelV3 or real)
app/api/eval/save/route.ts ← writes public/eval/report.json
tests/unit/lib/tool-registry-sync.ts ← registry sync guard (catches missing tool pieces)
tests/unit/lib/api-routes-sync.ts ← route sync guard (catches unregistered routes)
tests/e2e/eval-dashboard.spec.ts ← Playwright CI spec
public/eval/report.json ← seed report
package.json scripts ← eval:components, eval:e2e, eval:reportSee docs/QUICK-START.md for the full step-by-step setup guide.
What It Does
- Auto-discovers every tool, API route, and UI component from your project's registries — no manual test maintenance
- Renders every tool's UI (custom + interactive) in all 4 states × light + dark mode simultaneously
- Health-checks every API route with real HTTP requests + token cost tracking
- Evaluates AI output quality using
MockLanguageModelV3(free) or real models - Guards your registries on every push via GitHub Actions — CI fails if a new route or tool is missing required pieces
- Generates a browsable JSON report at
/eval/report.json
Built on Vercel AI SDK ai/test primitives: MockLanguageModelV3, mockValues, simulateReadableStream, generateObject.
Quick Start
1. Update project-specific files
lib/eval/cost-calculator.ts — update COST_PER_M with your model IDs and pricing:
export const COST_PER_M = {
'your-provider/your-model': { input: 3.00, output: 15.00, label: 'Your Model' },
};lib/eval/api-payloads.ts — add one minimal payload per API route key:
export const MINIMAL_PAYLOADS = {
'your-route': { param: 'minimal valid value' },
'your-get-route': null, // GET routes: set null, isGetRoute() auto-detects from registry
'your-skip-route': null, // multipart/form-data routes: set null to skip
};GET and skip logic is automatic —
isGetRoute()readsmethods: ['GET']from yourAPI_ROUTE_REGISTRY.shouldSkipRoute()derives fromMINIMAL_PAYLOADS[key] === null. NoGET_ROUTESorSKIP_ROUTESsets to maintain.
2. Add package.json scripts
{
"typecheck": "tsc --noEmit",
"test": "vitest run",
"eval:components": "vitest run tests/unit/components/",
"eval:e2e": "playwright test tests/e2e/eval-dashboard.spec.ts",
"eval:report": "open http://localhost:3000/eval/report.json",
"eval:real": "node_modules/.bin/esbuild scripts/run-real-eval.ts --bundle --platform=node --format=cjs --alias:@=. --outfile=.eval-bundle.cjs && node .eval-bundle.cjs; rm -f .eval-bundle.cjs"
}eval:real hits your deployed Vercel URL with real API keys. Requires EVAL_BASE_URL env var.
3. Run
pnpm dev
open http://localhost:3000/evalThe Dashboard (/eval)
Tab 1 — Components
Renders every tool (custom UI + interactive) in a 4-state × 2-theme matrix:
| State | ☀ Light | 🌙 Dark | |-------|---------|---------| | Streaming | render | render | | Loading | render | render | | Result | render | render | | Error | render | render |
- Custom UI tools (
ui: 'custom') — rendered viaTOOL_RENDERERSwith mock data - Interactive tools (
ui: 'interactive') — rendered viarenderInteractiveToolPreviewfor theinput-availablestate; other states markedskip - Pass = non-null render, no thrown error
- Fail = thrown error or null/undefined output
- Skip = state not applicable for this tool type
- Click any row to expand and see the live rendered cells
- Filter by category or pass/fail/skip status
Tab 2 — API Routes
Real HTTP health checks against the running dev server:
| Field | What it shows |
|-------|--------------|
| Status | HTTP response code (200 = green, 4xx = amber, 5xx = red) |
| Latency | Round-trip ms |
| Shape | Required top-level keys present in response |
| Tokens | Input + output token counts |
| Cost | Estimated USD from COST_PER_M pricing table |
Session cost total shown in the header — tracks every real API call made during the session.
Routes that require special auth or multipart data are automatically marked skip.
Tab 3 — AI Evals
Quality evaluation for all non-interview tools (generation, editing, analysis, brand-profile) using the Evaluator-Optimizer pattern:
Generate output → Evaluate (score 1–10) → Flag issues- Mock mode (
▶ Run All) —MockLanguageModelV3, zero cost, instant - Real mode (
⚡ Run All Real) — live model call, real tokens, cost tracked
Metrics per tool:
- Schema valid ✓/✗
- Output non-empty ✓/✗
- Quality score 1–10
- Latency ms + token counts + estimated cost
Tab 4 — Costs
Sorted breakdown of every real API call made during the session — by cost descending. Shows model, tokens, latency, and USD per call.
How Auto-Discovery Works
Everything derives from two registries. Add an entry → it appears in the dashboard automatically.
// Components tab — all tools with renderers OR interactive UI
const componentEvals = TOOL_REGISTRY.filter(
t => t.name in TOOL_RENDERERS || t.ui === 'interactive'
);
// AI Evals tab — all non-interview tools
const aiEvals = TOOL_REGISTRY.filter(
t => !NO_AI_EVAL_CATEGORIES.includes(t.category)
);
// Routes tab — all registered API routes
const routeEvals = API_ROUTE_REGISTRY.map(r => ({
...r,
payload: MINIMAL_PAYLOADS[r.key] ?? {},
}));Adding a New AI Tool
4 files, then pnpm test catches anything missing:
1. lib/ai/tool-meta.ts ← add metadata (name, label, type, ui, category)
→ tool immediately appears in Components + AI Evals tabs
2. app/api/chat/route.ts ← add tool() definition with inputSchema + execute
→ omit execute for client-side interactive tools
3. components/.../toolRenderers.tsx ← add { streaming, loading, result, error } renderers
→ required only for ui: 'custom' tools
4. lib/registry/mock-tool-data.ts ← add MOCK_INPUTS[name] + MOCK_OUTPUTS[name]pnpm test will fail with the exact tool name if you skip any step:
| What you forgot | Test that fails | Error message |
|---|---|---|
| route.ts entry | tool-registry-sync | Tools in TOOL_META but missing from route.ts: myTool |
| Renderer | tool-registry-sync | Custom tools missing renderers: myTool |
| Mock data | tool-registry-sync | Server tools with no real mock output: myTool |
| Renderer crashes | tool-renderers | Stack trace with state + theme |
Adding a New API Route
3 files, then pnpm test catches anything missing:
1. app/api/my-route/route.ts ← create the route file
2. lib/registry/api-routes.ts ← register it (key, label, methods: ['POST'], category)
→ route immediately appears in Routes tab
→ methods: ['GET'] → isGetRoute() auto-detects it
3. lib/eval/api-payloads.ts ← add minimal JSON payload (or null to skip)GET and skip logic is fully automatic:
methods: ['GET']in the registry →isGetRoute()returns true, no body sentMINIMAL_PAYLOADS[key] = null→shouldSkipRoute()returns true, route skipped- No
GET_ROUTESorSKIP_ROUTESsets to maintain anywhere
pnpm test will fail if you skip step 2:
| What you forgot | Test that fails | Error message |
|---|---|---|
| Registry entry | api-routes-sync | Route files on disk but missing from API_ROUTES: my-route |
| Route file deleted | api-routes-sync | Registry entries with no route file: my-route |
| Method mismatch | api-routes-sync | my-route: exports POST but registry only declares [GET] |
CI — Automatic on Every Push
.github/workflows/ci.yml runs on every push and PR to main/develop:
git push
└── GitHub Actions
├── pnpm typecheck ← TypeScript errors block merge
└── pnpm test ← Registry sync tests block merge
├── api-routes-sync ← new route file not registered → FAIL
├── tool-registry-sync ← new tool missing renderer/mock → FAIL
└── tool-renderers ← renderer crashes with mock data → FAILA PR that adds a new route or tool but skips required steps will be blocked from merging.
JSON Report
After running evals, click 💾 Save Report to write public/eval/report.json.
Browse it at: http://localhost:3000/eval/report.json
{
"timestamp": "2026-02-18T17:34:00Z",
"projectName": "My Project",
"summary": {
"total": 48,
"pass": 45,
"fail": 2,
"skip": 1,
"totalCostUsd": 0.0042
},
"components": [
{
"tool": "generateComponent",
"states": {
"input-streaming": { "light": { "status": "pass" }, "dark": { "status": "pass" } },
"output-available": { "light": { "status": "pass" }, "dark": { "status": "pass" } }
}
}
],
"routes": [
{
"key": "generate-component",
"status": 200,
"latencyMs": 1240,
"shapeValid": true,
"tokens": { "input": 312, "output": 890 },
"costUsd": 0.00089
}
],
"aiEvals": [
{
"tool": "generateComponent",
"schemaValid": true,
"outputNonEmpty": true,
"qualityScore": 9,
"tokens": { "input": 50, "output": 120 },
"costUsd": 0
}
]
}Vercel AI SDK Primitives Used
| Primitive | Where used |
|-----------|-----------|
| MockLanguageModelV3 | /api/eval/run — mock AI evals, zero cost |
| mockValues(...arr) | Cycling responses for evaluator-optimizer loops |
| generateObject + Zod | Schema validation + quality scoring |
| generateText | Text output generation evals |
| simulateReadableStream | Streaming correctness tests |
All from ai/test — no real API calls in mock mode.
Files Reference
eval-system/
├── README.md ← This file
├── NAMING-CONTRACT.md ← Exact interface each registry must satisfy
├── AGENTS.md ← AI agent config template (fill in [TODO] sections)
│
├── .github/
│ └── workflows/
│ └── ci.yml ← GitHub Actions: typecheck + unit tests on push
│
├── lib/
│ ├── eval/
│ │ ├── eval-types.ts ← Shared types (EvalStatus, EvalReport, EvalProgress)
│ │ ├── cost-calculator.ts ← Token → $ calculator (update COST_PER_M)
│ │ ├── api-payloads.ts ← Minimal payloads per route (update per project)
│ │ └── theme-injector.ts ← data-theme attr helpers for light/dark columns
│ └── hooks/
│ └── use-eval-registry.ts ← Central hook — auto-discovers, runs, tracks cost
│
├── app/
│ ├── eval/
│ │ └── page.tsx ← /eval live dashboard (4 tabs)
│ └── api/
│ └── eval/
│ ├── run/route.ts ← Mock/real AI eval runner
│ └── save/route.ts ← Writes public/eval/report.json
│
├── tests/
│ ├── unit/
│ │ └── lib/
│ │ ├── tool-registry-sync.test.ts ← TOOL_META ↔ route.ts ↔ renderers ↔ mock data
│ │ └── api-routes-sync.test.ts ← Disk ↔ API_ROUTES ↔ HTTP methods
│ └── e2e/
│ └── eval-dashboard.spec.ts ← Playwright: matrix + screenshots
│
└── public/
└── eval/
└── report.json ← Seed file (auto-overwritten on save)Agent Context — AGENTS.md + Docs Index
This package ships AGENTS.md and docs-Vercel/ implementing the AGENTS.md + docs index methodology from Vercel's agent evals research.
AGENTS.md outperforms skills in our agent evals — Vercel, 2025
Vercel's findings (tested against Next.js 16 APIs not in model training data):
| Configuration | Pass Rate | vs Baseline | |---|---|---| | Baseline (no docs) | 53% | — | | Skill (default behavior) | 53% | +0pp | | Skill with explicit instructions | 79% | +26pp | | AGENTS.md + docs index | 100% | +47pp |
The approach: instead of hoping agents invoke a skill, embed a compressed docs index directly in AGENTS.md. The agent knows where every doc file lives and reads exactly what it needs — no decision point, no async loading, no ordering fragility.
What's included:
AGENTS.md— compressed index of all doc paths (8KB, 80% smaller than full docs)docs-Vercel/— full Vercel AI SDK v6, Gateway, Elements, and Capabilities reference docsAGENTS.template.md— template for new projects (fill in[TODO]sections)
Note:
docs-Vercel/(~200 files) is not bundled in the npm package due to size. Install via shell script or download from the GitHub repo to get offline docs.
Naming Contract Summary
See NAMING-CONTRACT.md for full details.
| Required export | From path | Notes |
|----------------|-----------|-------|
| TOOL_REGISTRY | lib/registry | Array of all tools with name, label, ui, category |
| API_ROUTE_REGISTRY | lib/registry | Array of all routes with key, path, methods |
| TOOL_RENDERERS | components/.../chat | Map of tool name → { streaming, loading, result, error } |
| renderInteractiveToolPreview(name) | components/.../HustleChat | Renders interactive tool form for eval preview |
| getMockToolPart(name, state) | lib/registry/mock-tool-data | Returns mock ToolPart for any tool + state |
| COST_PER_M | lib/eval/cost-calculator | Update this — model ID → { input, output } $/M tokens |
| MINIMAL_PAYLOADS | lib/eval/api-payloads | Update this — route key → minimal valid request body (null = skip) |
GET/skip is automatic —
isGetRoute()andshouldSkipRoute()derive fromAPI_ROUTE_REGISTRY.methodsandMINIMAL_PAYLOADSrespectively. No extra sets to maintain.
