hustle-vercel-ai

v0.1.0

Published

4 months ago

Registry-driven eval dashboard + AGENTS.md + docs index for Next.js AI projects. Covers Vercel AI Gateway, AI SDK v6, AI Elements, and the eval system. Achieves 100% agent eval pass rate (+47pp over baseline) via the AGENTS.md + docs index methodology.

0High
0Medium
0Low

hustletogether

vercel-ai-sdk vercel-ai-gateway vercel-ai-elements eval testing nextjs ai dashboard agents-md registry playwright vitest

Eval System — `hustle-vercel-ai`

A portable, registry-driven eval and testing dashboard for Next.js + Vercel AI SDK projects.

Drop this folder into any project that follows the naming contract and it works automatically — zero configuration.

Install

Via npm (recommended)

# Install into the current directory
npx hustle-vercel-ai

# Install into a specific directory
npx hustle-vercel-ai ./my-new-project

The CLI copies all files, adds package.json scripts, creates .github/workflows/ci.yml, and prints a setup checklist.

Note on docs-Vercel/: The Vercel AI SDK reference docs (~200 files) are not bundled in the npm package due to size. Download them from the GitHub repo and copy docs-Vercel/ into your project manually, or skip them if you don't need offline docs.

Via shell script (from source)

chmod +x eval-system/install.sh
./eval-system/install.sh /path/to/your-new-project

# Or install into the current directory:
./eval-system/install.sh .

The shell script also copies docs-Vercel/ automatically if it's present alongside eval-system/.

What gets installed

AGENTS.md                              ← fill in [TODO] sections for your project
NAMING-CONTRACT.md                     ← registry interface contract
.github/workflows/ci.yml               ← GitHub Actions: typecheck + unit tests on every push
docs/                                  ← QUICK-START, EVAL-TESTING-STRATEGY, AI-SDK-PRIMITIVES
docs-Vercel/                           ← AI SDK, Gateway, Elements docs (shell script only)
lib/eval/                              ← eval types, cost calculator, api payloads, theme injector
lib/hooks/use-eval-registry.ts         ← central hook (auto-discovers all tools + routes)
app/eval/page.tsx                      ← /eval live dashboard (4 tabs)
app/api/eval/run/route.ts              ← AI eval runner (MockLanguageModelV3 or real)
app/api/eval/save/route.ts             ← writes public/eval/report.json
tests/unit/lib/tool-registry-sync.ts   ← registry sync guard (catches missing tool pieces)
tests/unit/lib/api-routes-sync.ts      ← route sync guard (catches unregistered routes)
tests/e2e/eval-dashboard.spec.ts       ← Playwright CI spec
public/eval/report.json                ← seed report
package.json scripts                   ← eval:components, eval:e2e, eval:report

See docs/QUICK-START.md for the full step-by-step setup guide.

What It Does

Auto-discovers every tool, API route, and UI component from your project's registries — no manual test maintenance
Renders every tool's UI (custom + interactive) in all 4 states × light + dark mode simultaneously
Health-checks every API route with real HTTP requests + token cost tracking
Evaluates AI output quality using MockLanguageModelV3 (free) or real models
Guards your registries on every push via GitHub Actions — CI fails if a new route or tool is missing required pieces
Generates a browsable JSON report at /eval/report.json

Built on Vercel AI SDK ai/test primitives: MockLanguageModelV3, mockValues, simulateReadableStream, generateObject.

Quick Start

1. Update project-specific files

lib/eval/cost-calculator.ts — update COST_PER_M with your model IDs and pricing:

export const COST_PER_M = {
  'your-provider/your-model': { input: 3.00, output: 15.00, label: 'Your Model' },
};

lib/eval/api-payloads.ts — add one minimal payload per API route key:

export const MINIMAL_PAYLOADS = {
  'your-route': { param: 'minimal valid value' },
  'your-get-route': null,   // GET routes: set null, isGetRoute() auto-detects from registry
  'your-skip-route': null,  // multipart/form-data routes: set null to skip
};

GET and skip logic is automatic — isGetRoute() reads methods: ['GET'] from your API_ROUTE_REGISTRY. shouldSkipRoute() derives from MINIMAL_PAYLOADS[key] === null. No GET_ROUTES or SKIP_ROUTES sets to maintain.

2. Add package.json scripts

{
  "typecheck": "tsc --noEmit",
  "test": "vitest run",
  "eval:components": "vitest run tests/unit/components/",
  "eval:e2e": "playwright test tests/e2e/eval-dashboard.spec.ts",
  "eval:report": "open http://localhost:3000/eval/report.json",
  "eval:real": "node_modules/.bin/esbuild scripts/run-real-eval.ts --bundle --platform=node --format=cjs --alias:@=. --outfile=.eval-bundle.cjs && node .eval-bundle.cjs; rm -f .eval-bundle.cjs"
}

eval:real hits your deployed Vercel URL with real API keys. Requires EVAL_BASE_URL env var.

3. Run

pnpm dev
open http://localhost:3000/eval

The Dashboard (`/eval`)

Tab 1 — Components

Renders every tool (custom UI + interactive) in a 4-state × 2-theme matrix:

| State | ☀ Light | 🌙 Dark | |-------|---------|---------| | Streaming | render | render | | Loading | render | render | | Result | render | render | | Error | render | render |

Custom UI tools (ui: 'custom') — rendered via TOOL_RENDERERS with mock data
Interactive tools (ui: 'interactive') — rendered via renderInteractiveToolPreview for the input-available state; other states marked skip
Pass = non-null render, no thrown error
Fail = thrown error or null/undefined output
Skip = state not applicable for this tool type
Click any row to expand and see the live rendered cells
Filter by category or pass/fail/skip status

Tab 2 — API Routes

Real HTTP health checks against the running dev server:

| Field | What it shows | |-------|--------------| | Status | HTTP response code (200 = green, 4xx = amber, 5xx = red) | | Latency | Round-trip ms | | Shape | Required top-level keys present in response | | Tokens | Input + output token counts | | Cost | Estimated USD from COST_PER_M pricing table |

Session cost total shown in the header — tracks every real API call made during the session.

Routes that require special auth or multipart data are automatically marked skip.

Tab 3 — AI Evals

Quality evaluation for all non-interview tools (generation, editing, analysis, brand-profile) using the Evaluator-Optimizer pattern:

Generate output → Evaluate (score 1–10) → Flag issues

Mock mode (▶ Run All) — MockLanguageModelV3, zero cost, instant
Real mode (⚡ Run All Real) — live model call, real tokens, cost tracked

Metrics per tool:

Schema valid ✓/✗
Output non-empty ✓/✗
Quality score 1–10
Latency ms + token counts + estimated cost

Tab 4 — Costs

Sorted breakdown of every real API call made during the session — by cost descending. Shows model, tokens, latency, and USD per call.

How Auto-Discovery Works

Everything derives from two registries. Add an entry → it appears in the dashboard automatically.

// Components tab — all tools with renderers OR interactive UI
const componentEvals = TOOL_REGISTRY.filter(
  t => t.name in TOOL_RENDERERS || t.ui === 'interactive'
);

// AI Evals tab — all non-interview tools
const aiEvals = TOOL_REGISTRY.filter(
  t => !NO_AI_EVAL_CATEGORIES.includes(t.category)
);

// Routes tab — all registered API routes
const routeEvals = API_ROUTE_REGISTRY.map(r => ({
  ...r,
  payload: MINIMAL_PAYLOADS[r.key] ?? {},
}));

Adding a New AI Tool

4 files, then pnpm test catches anything missing:

1. lib/ai/tool-meta.ts          ← add metadata (name, label, type, ui, category)
                                   → tool immediately appears in Components + AI Evals tabs

2. app/api/chat/route.ts        ← add tool() definition with inputSchema + execute
                                   → omit execute for client-side interactive tools

3. components/.../toolRenderers.tsx  ← add { streaming, loading, result, error } renderers
                                        → required only for ui: 'custom' tools

4. lib/registry/mock-tool-data.ts   ← add MOCK_INPUTS[name] + MOCK_OUTPUTS[name]

pnpm test will fail with the exact tool name if you skip any step:

| What you forgot | Test that fails | Error message | |---|---|---| | route.ts entry | tool-registry-sync | Tools in TOOL_META but missing from route.ts: myTool | | Renderer | tool-registry-sync | Custom tools missing renderers: myTool | | Mock data | tool-registry-sync | Server tools with no real mock output: myTool | | Renderer crashes | tool-renderers | Stack trace with state + theme |

Adding a New API Route

3 files, then pnpm test catches anything missing:

1. app/api/my-route/route.ts    ← create the route file

2. lib/registry/api-routes.ts   ← register it (key, label, methods: ['POST'], category)
                                   → route immediately appears in Routes tab
                                   → methods: ['GET'] → isGetRoute() auto-detects it

3. lib/eval/api-payloads.ts     ← add minimal JSON payload (or null to skip)

GET and skip logic is fully automatic:

methods: ['GET'] in the registry → isGetRoute() returns true, no body sent
MINIMAL_PAYLOADS[key] = null → shouldSkipRoute() returns true, route skipped
No GET_ROUTES or SKIP_ROUTES sets to maintain anywhere

pnpm test will fail if you skip step 2:

| What you forgot | Test that fails | Error message | |---|---|---| | Registry entry | api-routes-sync | Route files on disk but missing from API_ROUTES: my-route | | Route file deleted | api-routes-sync | Registry entries with no route file: my-route | | Method mismatch | api-routes-sync | my-route: exports POST but registry only declares [GET] |

CI — Automatic on Every Push

.github/workflows/ci.yml runs on every push and PR to main/develop:

git push
  └── GitHub Actions
        ├── pnpm typecheck    ← TypeScript errors block merge
        └── pnpm test         ← Registry sync tests block merge
              ├── api-routes-sync     ← new route file not registered → FAIL
              ├── tool-registry-sync  ← new tool missing renderer/mock → FAIL
              └── tool-renderers      ← renderer crashes with mock data → FAIL

A PR that adds a new route or tool but skips required steps will be blocked from merging.

JSON Report

After running evals, click 💾 Save Report to write public/eval/report.json.

Browse it at: http://localhost:3000/eval/report.json

{
  "timestamp": "2026-02-18T17:34:00Z",
  "projectName": "My Project",
  "summary": {
    "total": 48,
    "pass": 45,
    "fail": 2,
    "skip": 1,
    "totalCostUsd": 0.0042
  },
  "components": [
    {
      "tool": "generateComponent",
      "states": {
        "input-streaming": { "light": { "status": "pass" }, "dark": { "status": "pass" } },
        "output-available": { "light": { "status": "pass" }, "dark": { "status": "pass" } }
      }
    }
  ],
  "routes": [
    {
      "key": "generate-component",
      "status": 200,
      "latencyMs": 1240,
      "shapeValid": true,
      "tokens": { "input": 312, "output": 890 },
      "costUsd": 0.00089
    }
  ],
  "aiEvals": [
    {
      "tool": "generateComponent",
      "schemaValid": true,
      "outputNonEmpty": true,
      "qualityScore": 9,
      "tokens": { "input": 50, "output": 120 },
      "costUsd": 0
    }
  ]
}

Vercel AI SDK Primitives Used

| Primitive | Where used | |-----------|-----------| | MockLanguageModelV3 | /api/eval/run — mock AI evals, zero cost | | mockValues(...arr) | Cycling responses for evaluator-optimizer loops | | generateObject + Zod | Schema validation + quality scoring | | generateText | Text output generation evals | | simulateReadableStream | Streaming correctness tests |

All from ai/test — no real API calls in mock mode.

Files Reference

eval-system/
├── README.md                              ← This file
├── NAMING-CONTRACT.md                     ← Exact interface each registry must satisfy
├── AGENTS.md                              ← AI agent config template (fill in [TODO] sections)
│
├── .github/
│   └── workflows/
│       └── ci.yml                         ← GitHub Actions: typecheck + unit tests on push
│
├── lib/
│   ├── eval/
│   │   ├── eval-types.ts                  ← Shared types (EvalStatus, EvalReport, EvalProgress)
│   │   ├── cost-calculator.ts             ← Token → $ calculator (update COST_PER_M)
│   │   ├── api-payloads.ts                ← Minimal payloads per route (update per project)
│   │   └── theme-injector.ts              ← data-theme attr helpers for light/dark columns
│   └── hooks/
│       └── use-eval-registry.ts           ← Central hook — auto-discovers, runs, tracks cost
│
├── app/
│   ├── eval/
│   │   └── page.tsx                       ← /eval live dashboard (4 tabs)
│   └── api/
│       └── eval/
│           ├── run/route.ts               ← Mock/real AI eval runner
│           └── save/route.ts              ← Writes public/eval/report.json
│
├── tests/
│   ├── unit/
│   │   └── lib/
│   │       ├── tool-registry-sync.test.ts ← TOOL_META ↔ route.ts ↔ renderers ↔ mock data
│   │       └── api-routes-sync.test.ts    ← Disk ↔ API_ROUTES ↔ HTTP methods
│   └── e2e/
│       └── eval-dashboard.spec.ts         ← Playwright: matrix + screenshots
│
└── public/
    └── eval/
        └── report.json                    ← Seed file (auto-overwritten on save)

Agent Context — AGENTS.md + Docs Index

This package ships AGENTS.md and docs-Vercel/ implementing the AGENTS.md + docs index methodology from Vercel's agent evals research.

AGENTS.md outperforms skills in our agent evals — Vercel, 2025

Vercel's findings (tested against Next.js 16 APIs not in model training data):

| Configuration | Pass Rate | vs Baseline | |---|---|---| | Baseline (no docs) | 53% | — | | Skill (default behavior) | 53% | +0pp | | Skill with explicit instructions | 79% | +26pp | | AGENTS.md + docs index | 100% | +47pp |

The approach: instead of hoping agents invoke a skill, embed a compressed docs index directly in AGENTS.md. The agent knows where every doc file lives and reads exactly what it needs — no decision point, no async loading, no ordering fragility.

What's included:

AGENTS.md — compressed index of all doc paths (8KB, 80% smaller than full docs)
docs-Vercel/ — full Vercel AI SDK v6, Gateway, Elements, and Capabilities reference docs
AGENTS.template.md — template for new projects (fill in [TODO] sections)

Note: docs-Vercel/ (~200 files) is not bundled in the npm package due to size. Install via shell script or download from the GitHub repo to get offline docs.

Naming Contract Summary

See NAMING-CONTRACT.md for full details.

| Required export | From path | Notes | |----------------|-----------|-------| | TOOL_REGISTRY | lib/registry | Array of all tools with name, label, ui, category | | API_ROUTE_REGISTRY | lib/registry | Array of all routes with key, path, methods | | TOOL_RENDERERS | components/.../chat | Map of tool name → { streaming, loading, result, error } | | renderInteractiveToolPreview(name) | components/.../HustleChat | Renders interactive tool form for eval preview | | getMockToolPart(name, state) | lib/registry/mock-tool-data | Returns mock ToolPart for any tool + state | | COST_PER_M | lib/eval/cost-calculator | Update this — model ID → { input, output } $/M tokens | | MINIMAL_PAYLOADS | lib/eval/api-payloads | Update this — route key → minimal valid request body (null = skip) |

GET/skip is automatic — isGetRoute() and shouldSkipRoute() derive from API_ROUTE_REGISTRY.methods and MINIMAL_PAYLOADS respectively. No extra sets to maintain.