yo-bug

v0.3.13

Published

2 months ago

MCP Server for visual test feedback in vibe coding — QA capability as a protocol

0High
0Medium
0Low

vibe-coding mcp model-context-protocol feedback testing qa ai-agent claude-code cursor windsurf visual-testing bug-report

yo-bug 🐛

"Yo, bug!" — Point at bugs, AI fixes them.

MCP Server that gives AI coding assistants QA superpowers. One install, then your AI handles the entire test-feedback-fix loop.

In vibe coding, the bottleneck is testing: humans find bugs but struggle to describe them. yo-bug solves this by letting users point, click, or type — while the AI automatically receives element locations, console errors, network failures, action recordings, and annotated screenshots.

What it does

| For the Human | For the AI | |---|---| | Click a broken element → done | Gets: CSS selector + computed styles + React/Vue component name | | Draw on a screenshot → done | Gets: annotated PNG as inline image content | | Type a behavior bug ("scroll doesn't auto-jump") → done | Gets: structured text feedback linked to a checklist case | | Check off a multi-step test case | Gets: which cases passed/failed, with linked feedback per case | | Just use the app normally | Gets: last 100 user actions + 50 console errors + failed network calls |

The AI drives the entire workflow through MCP tools. Humans never need to learn commands, configure proxies, or modify their code.

Install

npx yo-bug install

(or npm install -g yo-bug && yo-bug install if you prefer a global install)

This auto-detects your AI tool (Claude Code / Cursor / Windsurf) and writes the MCP config. One time, done forever.

How it works

AI writes code
    → AI calls start_test_session()
    → Reverse proxy starts, injects test SDK into HTML responses
    → Browser opens (zero code changes to your project)
    → AI pushes grouped test cases (8 QA dimensions, multi-step scenarios)
    → Human runs each case; pass/fail at the case level
    → When failing, picks feedback mode: element / screenshot / text
    → AI calls list_feedbacks() + get_feedback() — sees diagnostic data
    → AI must state root cause BEFORE writing fix (enforced by tool response)
    → AI fixes → calls resolve_feedback() → browser shows verify card
    → Human clicks Fixed / Locate (jump to element) / Still broken
    → Loop until done

MCP Tools (9 total)

Session Control

| Tool | Description | |---|---| | start_test_session(port?, open?, storage?) | Start test mode: auto-detect dev server, launch reverse proxy with SDK injection, open browser. storage defaults to "project" (saves under <cwd>/.yo-bug/feedback/); pass "home" for ~/.yo-bug/ | | stop_test_session() | Stop test mode, return session summary (feedback stats, checklist results, weak dimensions, whether dev server was auto-closed) |

Feedback

| Tool | Description | |---|---| | list_feedbacks(status?, type?, limit?, all_sessions?) | List submitted feedback. Defaults to current session only | | get_feedback(id) | Full details: element info, console errors, network errors, action steps, annotated screenshot. Response includes a mandatory diagnostic protocol that forces the AI to identify root cause before writing code | | resolve_feedback(id) | Mark as fixed → pushes a rich verification card to the browser (with screenshot thumbnail, element selector, page URL, locate button). Human confirms |

Test Checklist (Grouped Test Cases)

| Tool | Description | |---|---| | create_checklist(title, cases) | Push structured test cases to browser. Each case has a title, sequential steps[], expected result, priority, and QA dimension | | get_checklist_status() | See which cases passed/failed and any user feedback per case |

Test History

| Tool | Description | |---|---| | save_test_record(module, ...) | Save test results per module. Accumulates history for future reference | | get_test_history(module) | Get historical test records. Shows frequently failing scenarios so AI prioritizes coverage |

8 QA Test Dimensions

The create_checklist tool embeds professional QA methodology. AI is guided to cover:

Happy path — Core functionality works end-to-end
Empty/boundary — Empty inputs, special chars, max length, zero/negative values
Error states — Offline, server errors, timeouts, recovery
Duplicate ops — Double-click, re-submit, concurrent requests
State recovery — Refresh, back/forward, deep links, tab close/reopen
Loading/async — Loading states, failed loads, stale data
Responsive — 375px mobile width, touch targets, overflow
Interaction detail — Tab order, Enter/Escape, disabled states, focus

Each case is multi-step (open page → fill form → click submit → verify result), so the human marks a whole scenario passed/failed rather than micro-managing each step.

Three Feedback Modes

When a checklist case fails, users pick a mode that fits the bug:

| Mode | Shortcut | When to use | |---|---|---| | Element | Alt+E | Visual / component bug — click the broken element, AI gets selector + styles + framework component name | | Screenshot | Alt+S | Layout / styling bug — drag to select a region, annotate with arrow / rect / circle / freehand / text | | Text | (Checklist ✗) | Behavior / logic bug ("doesn't auto-scroll", "wrong order") — pure description, AI uses console/network data to find cause |

Other shortcuts: Alt+X toggle mode bar, Esc exit anything.

In-Browser Feedback Management

Above the floating bug button you'll see a list icon with a red badge showing open feedback count. Click it to:

See all open feedback you've submitted this session
Edit description inline (click description text)
Delete a feedback (trash icon, confirms first)
View screenshot thumbnails

When all feedback is resolved, the button auto-hides.

Auto-Captured Context

Every feedback submission automatically includes:

Console errors — last 50 console.error/console.warn entries with stack traces
Network failures — last 50 failed fetch/XHR requests with status, URL, duration
Unhandled exceptions — last 50 window.onerror + unhandledrejection events
Action recording — last 100 user actions (click / input / navigate / scroll / keypress) with timestamps
Element info — CSS selector, tag, text content, bounding rect, computed styles, React/Vue component name (when in element mode)

Verify-Fix Flow

When AI calls resolve_feedback(), a rich verify card appears on the left:

┌─────────────────────────────┐
│ ▼ VERIFY FIX            [3] │  ← collapsible header with count
├─────────────────────────────┤
│ [thumbnail*] "Button click  │
│              does nothing"   │
│              bug · element   │
│  📍 <button> #submit-btn    │  ← only when element info exists
│  🔗 /orders                  │
│  ⏱ 5 min ago                │
│  [🎯 Locate*] [✓] [✗]        │
└─────────────────────────────┘
   * thumbnail only if a screenshot was attached
   * Locate only if an element selector was captured

🎯 Locate scrolls to the element + flashes a red pulse twice. If the verify card was submitted on a different URL, it navigates there first. If the element no longer exists on the current page, shows a toast
✓ Fixed confirms the fix → status: resolved
✗ Still broken sends it back → status: open (AI sees it in next list_feedbacks call)
Click the header bar to collapse the whole stack when it gets in the way

Theming

Auto-detects prefers-color-scheme (light or dark)
All UI uses design tokens, contrast-verified (WCAG AA)
Glass-morphism panels with backdrop blur; clearly distinct from white host pages

Works Inside Modal Dialogs

If your app opens a <dialog>.showModal() (which traps focus and inerts the rest of the page), yo-bug still works:

Uses the Popover API to render in browser's top layer
Detects open modal dialogs via :modal selector + MutationObserver and relocates the SDK host inside the dialog so focus is shared

This means you can submit text feedback even when a modal is open — the textarea will accept input normally.

i18n

SDK auto-detects <html lang="...">:

Any value starting with zh (zh, zh-CN, zh-Hans, ...) → Chinese interface
Everything else → English interface

MCP tool descriptions are in English (AI translates to the user's language naturally in chat).

Architecture

Browser → Reverse Proxy (localhost:3695+) → Dev Server (localhost:5173)
              │
              ├─ Auto-injects SDK into HTML responses
              ├─ WebSocket passthrough (HMR works normally)
              ├─ Feedback API (POST/GET/PATCH/DELETE)
              ├─ Checklist API (push/poll/update)
              └─ Verify API (push/confirm)

MCP Server (stdio) ← AI Tool (Claude Code / Cursor / Windsurf)
              │
              ├─ start/stop_test_session → controls proxy lifecycle
              ├─ feedback tools → reads user submissions w/ diagnostic protocol
              ├─ checklist tools → pushes grouped test cases, reads results
              └─ history tools → persists test records per module

The proxy auto-detects dev server framework (Vite, Next.js, CRA, Webpack, Nuxt, Angular, Svelte, Astro) and port. If the dev server is already running, it connects. If not, it starts one. On stop_test_session it cleans up — and tells the AI whether the dev server was auto-closed or left running (because the user had started it themselves).

If port 3695 is taken, yo-bug auto-increments to 3696, 3697... up to 3795. Multiple projects can run yo-bug simultaneously without conflict.

Security

All data stays local — feedback files in <project>/.yo-bug/feedback/ by default (auto-.gitignore'd), or ~/.yo-bug/ if you opt in
Feedback IDs validated against path traversal
Input fields whitelist-filtered and length-limited
Network interceptor uses exact pathname matching (no substring false positives)
No data sent to any external service

Requirements

Node.js >= 18
Any MCP-compatible AI tool
A modern browser (Popover API needs Chrome 114+ / Firefox 125+ / Safari 17+, gracefully falls back if missing)

License

MIT