voice-walkthrough

v0.1.0

Published

17 days ago

Hold a key, click through your app, talk about a bug. Captures voice + screenshots + console errors and writes a Markdown report an AI agent can fix unattended.

0High
0Medium
0Low

leobraun

qa bug-report voice whisper claude-code ai-agent nextjs developer-tools

voice-walkthrough

Hold a key, click through your app, talk about a bug. The tool transcribes what you said, captures every click + scroll + navigation as a numbered screenshot timeline, grabs any browser-runtime errors that were live, and writes a Markdown report — designed to be pasted (or fswatch'd) straight into Claude Code / Codex.

A 30-second QA bug report becomes a self-contained agent task.

You                                          Your AI agent
─────                                        ─────────────
hold right Shift on /finance/invoices        ./scripts/walkthrough-watch.sh wait
"the edit button is dead after I switch      → fires
 invoices, watch — *click click click*"      reads debug/walkthroughs/guide.md
release                                      reads entry #4
                                             walks the 6-step screenshot timeline
                                             diagnoses, edits, runs tests
                                             commits + cleans up the entry
~30s                                         ~3-5 min, unattended

What it captures

Each "walkthrough" recording produces:

Voice transcript of what you said (Whisper).
Workflow timeline: every click, scroll-stop, and URL change during the recording, with a viewport screenshot taken ~120 ms after each event.
Console errors that fired anywhere on the page since the listener mounted — including React validation warnings that only ever reach console.error (e.g. <Select.Item /> must have a value prop).
Page metadata: URL, viewport size, theme, locale, focused element.

All bundled into one Markdown entry like this:

## #4 — 2026-05-07 14:22:11

- **Path:** `/finance/invoices/abc-123`
- **Viewport:** 1512×828
- **Workflow:** 6 steps · [`screenshots/4/`](screenshots/4/)
- **Errors:** [`errors/4.md`](errors/4.md) (3 — e.g. "INTERNAL_SERVER_ERROR …")

### Note
The edit button is dead after I switch invoices…

### Workflow
1. **0.0 s · Start** — Recording started on /finance/invoices
   ![](screenshots/4/01-start.png)
2. **2.4 s · Click** — Clicked button "Edit"
   ![](screenshots/4/02-click.png)
…

The success toast offers a "Copy prompt" button that copies a ready-to-paste agent prompt referencing the entry id, transcript, and side-file paths — including the cleanup commands the agent must run when it's done, so entries don't pile up.

Why

Reproducing a bug from a Slack message ("the edit button doesn't work sometimes") is most of the cost of fixing it. With voice-walkthrough you record the bug while it's happening — voice + screen state + actual clicks/scrolls + the actual console errors at the actual moment. An AI agent can pick that up and fix without any back-and-forth, often unattended via the file-watcher loop.

It's a developer tool, not a customer-facing one. It runs only when NODE_ENV !== "production" and refuses on the server side too.

Install

pnpm add voice-walkthrough sonner lucide-react html-to-image

Add OPENAI_API_KEY to .env.local (see .env.example).

Then follow docs/integration-nextjs.md — 5 minutes to a working setup. Other frameworks: notes at the bottom of that file.

Hotkeys

| Key | Effect | |---|---| | Right Shift (hold) | Start recording. Works anywhere. Release to save. | | Esc (during hold) | Cancel — discard recording, no save. | | Right Shift + any other key | Treated as a Shift-modifier shortcut (e.g. capitalize, select); recording is silently discarded. |

Optional: switch to triggerKey: "hash" if you'd rather hold #. It only fires outside editable elements (so you can still type # in inputs).

File layout in your project

debug/walkthroughs/
├── guide.md               ← agent workflow doc (committed; copy from this package)
├── walkthrough-log.md     ← entries appended here (gitignored)
├── screenshots/<id>/      ← per-entry screenshot folders (gitignored)
└── errors/<id>.md         ← per-entry console error dumps (gitignored)

Configurable via the outputDir option (server) and matching outputDirHint / guidePath (client).

Auto-fix loop

A cross-platform voice-walkthrough-watch CLI + a Claude Code prompt let you run a continuous "watch the log → dispatch a subagent → verify cleanup → repeat" loop.

# One-time: install the slash command
mkdir -p .claude/commands
cp node_modules/voice-walkthrough/templates/.claude/commands/walkthrough-watch.md \
   .claude/commands/walkthrough-watch.md
# edit it once to substitute your project's typecheck/test/migrate commands

Then in any claude session in your repo (auto mode), just type:

/walkthrough-watch

The watcher starts, sits on fswatch, and dispatches a subagent for each new entry. Other invocation methods (headless one-liner, paste-the-prompt) in docs/claude-watcher-prompt.md.

Costs

Whisper transcription: ~$0.006/min of audio. A typical walkthrough (10–30 s) is well under a cent.
Disk: each PNG is ~200 KB at 1× ratio, capped at 1600 px on the long edge. A 10-step workflow ≈ 2 MB. Cleanup deletes everything per-entry.
AI agent compute: paid by you, in your Claude Code / Codex subscription. Not metered by this package.

What's where in the source

src/
├── client/                          # mounts in your React app
│   ├── walkthrough-listener.tsx     # headless component, mount once
│   ├── use-walkthrough-note.tsx     # main hook + prompt builder
│   ├── use-voice-recorder.ts        # MediaRecorder + speech detection
│   ├── workflow-capture.ts          # click/scroll/navigate listeners
│   ├── walkthrough-error-buffer.ts  # window.error + console.error capture
│   └── screenshot.ts                # html-to-image viewport capture
├── server/                          # Next.js route + framework-agnostic core
│   ├── handler.ts                   # processWalkthroughRequest()
│   └── next.ts                      # createWalkthroughRoute()
└── types.ts                         # shared types between client + server

src/cli/
└── walkthrough-watch.ts             # cross-platform watcher CLI (Node fs.watchFile)
                                     # exposed as `voice-walkthrough-watch` bin

scripts/
└── walkthrough-watch.sh             # legacy bash variant (fswatch); identical surface

docs/
├── guide.md                         # agent workflow doc (copy into your project)
├── integration-nextjs.md            # 5-min Next.js setup
└── claude-watcher-prompt.md         # the prompt to paste into Claude Code

Limitations / known gaps

Watcher is cross-platform via Node fs.watchFile (the bundled voice-walkthrough-watch CLI). The scripts/walkthrough-watch.sh bash variant is kept around for shell-savvy macOS/Linux users but is no longer the primary entry point.
Next.js App Router only for the bundled server adapter. Other frameworks: 3 lines of glue around processWalkthroughRequest().
html-to-image limitations apply — cross-origin images, exotic CSS features (some backdrop-filter, complex SVG filters) may render imperfectly or be omitted from the screenshot. The step row still appears.
No production support, by design. This is dev-only tooling. The route refuses to run in prod and the listener is dead-code-eliminated.
No auth. Anyone hitting /api/voice-walkthrough in dev can write to your repo. That's the point. Don't expose your dev port publicly.
No tests yet. This grew from in-project hacks; tests are a TODO.

License

MIT.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

voice-walkthrough

What it captures

Why

Install

Hotkeys

File layout in your project

Auto-fix loop

Costs

What's where in the source

Limitations / known gaps

License