voice-walkthrough
v0.1.0
Published
Hold a key, click through your app, talk about a bug. Captures voice + screenshots + console errors and writes a Markdown report an AI agent can fix unattended.
Maintainers
Readme
voice-walkthrough
Hold a key, click through your app, talk about a bug. The tool transcribes what you said, captures every click + scroll + navigation as a numbered screenshot timeline, grabs any browser-runtime errors that were live, and writes a Markdown report — designed to be pasted (or fswatch'd) straight into Claude Code / Codex.
A 30-second QA bug report becomes a self-contained agent task.
You Your AI agent
───── ─────────────
hold right Shift on /finance/invoices ./scripts/walkthrough-watch.sh wait
"the edit button is dead after I switch → fires
invoices, watch — *click click click*" reads debug/walkthroughs/guide.md
release reads entry #4
walks the 6-step screenshot timeline
diagnoses, edits, runs tests
commits + cleans up the entry
~30s ~3-5 min, unattendedWhat it captures
Each "walkthrough" recording produces:
- Voice transcript of what you said (Whisper).
- Workflow timeline: every click, scroll-stop, and URL change during the recording, with a viewport screenshot taken ~120 ms after each event.
- Console errors that fired anywhere on the page since the listener
mounted — including React validation warnings that only ever reach
console.error(e.g.<Select.Item /> must have a value prop). - Page metadata: URL, viewport size, theme, locale, focused element.
All bundled into one Markdown entry like this:
## #4 — 2026-05-07 14:22:11
- **Path:** `/finance/invoices/abc-123`
- **Viewport:** 1512×828
- **Workflow:** 6 steps · [`screenshots/4/`](screenshots/4/)
- **Errors:** [`errors/4.md`](errors/4.md) (3 — e.g. "INTERNAL_SERVER_ERROR …")
### Note
The edit button is dead after I switch invoices…
### Workflow
1. **0.0 s · Start** — Recording started on /finance/invoices

2. **2.4 s · Click** — Clicked button "Edit"

…The success toast offers a "Copy prompt" button that copies a ready-to-paste agent prompt referencing the entry id, transcript, and side-file paths — including the cleanup commands the agent must run when it's done, so entries don't pile up.
Why
Reproducing a bug from a Slack message ("the edit button doesn't work sometimes") is most of the cost of fixing it. With voice-walkthrough you record the bug while it's happening — voice + screen state + actual clicks/scrolls + the actual console errors at the actual moment. An AI agent can pick that up and fix without any back-and-forth, often unattended via the file-watcher loop.
It's a developer tool, not a customer-facing one. It runs only when
NODE_ENV !== "production" and refuses on the server side too.
Install
pnpm add voice-walkthrough sonner lucide-react html-to-imageAdd OPENAI_API_KEY to .env.local (see .env.example).
Then follow docs/integration-nextjs.md — 5
minutes to a working setup. Other frameworks: notes at the bottom of that
file.
Hotkeys
| Key | Effect | |---|---| | Right Shift (hold) | Start recording. Works anywhere. Release to save. | | Esc (during hold) | Cancel — discard recording, no save. | | Right Shift + any other key | Treated as a Shift-modifier shortcut (e.g. capitalize, select); recording is silently discarded. |
Optional: switch to triggerKey: "hash" if you'd rather hold #. It only
fires outside editable elements (so you can still type # in inputs).
File layout in your project
debug/walkthroughs/
├── guide.md ← agent workflow doc (committed; copy from this package)
├── walkthrough-log.md ← entries appended here (gitignored)
├── screenshots/<id>/ ← per-entry screenshot folders (gitignored)
└── errors/<id>.md ← per-entry console error dumps (gitignored)Configurable via the outputDir option (server) and matching
outputDirHint / guidePath (client).
Auto-fix loop
A cross-platform voice-walkthrough-watch CLI + a Claude Code prompt let you
run a continuous "watch the log → dispatch a subagent → verify cleanup →
repeat" loop.
# One-time: install the slash command
mkdir -p .claude/commands
cp node_modules/voice-walkthrough/templates/.claude/commands/walkthrough-watch.md \
.claude/commands/walkthrough-watch.md
# edit it once to substitute your project's typecheck/test/migrate commandsThen in any claude session in your repo (auto mode), just type:
/walkthrough-watchThe watcher starts, sits on fswatch, and dispatches a subagent for each new
entry. Other invocation methods (headless one-liner, paste-the-prompt) in
docs/claude-watcher-prompt.md.
Costs
- Whisper transcription: ~$0.006/min of audio. A typical walkthrough (10–30 s) is well under a cent.
- Disk: each PNG is ~200 KB at 1× ratio, capped at 1600 px on the long edge. A 10-step workflow ≈ 2 MB. Cleanup deletes everything per-entry.
- AI agent compute: paid by you, in your Claude Code / Codex subscription. Not metered by this package.
What's where in the source
src/
├── client/ # mounts in your React app
│ ├── walkthrough-listener.tsx # headless component, mount once
│ ├── use-walkthrough-note.tsx # main hook + prompt builder
│ ├── use-voice-recorder.ts # MediaRecorder + speech detection
│ ├── workflow-capture.ts # click/scroll/navigate listeners
│ ├── walkthrough-error-buffer.ts # window.error + console.error capture
│ └── screenshot.ts # html-to-image viewport capture
├── server/ # Next.js route + framework-agnostic core
│ ├── handler.ts # processWalkthroughRequest()
│ └── next.ts # createWalkthroughRoute()
└── types.ts # shared types between client + server
src/cli/
└── walkthrough-watch.ts # cross-platform watcher CLI (Node fs.watchFile)
# exposed as `voice-walkthrough-watch` bin
scripts/
└── walkthrough-watch.sh # legacy bash variant (fswatch); identical surface
docs/
├── guide.md # agent workflow doc (copy into your project)
├── integration-nextjs.md # 5-min Next.js setup
└── claude-watcher-prompt.md # the prompt to paste into Claude CodeLimitations / known gaps
- Watcher is cross-platform via Node
fs.watchFile(the bundledvoice-walkthrough-watchCLI). Thescripts/walkthrough-watch.shbash variant is kept around for shell-savvy macOS/Linux users but is no longer the primary entry point. - Next.js App Router only for the bundled server adapter. Other
frameworks: 3 lines of glue around
processWalkthroughRequest(). - html-to-image limitations apply — cross-origin images, exotic CSS
features (some
backdrop-filter, complex SVG filters) may render imperfectly or be omitted from the screenshot. The step row still appears. - No production support, by design. This is dev-only tooling. The route refuses to run in prod and the listener is dead-code-eliminated.
- No auth. Anyone hitting
/api/voice-walkthroughin dev can write to your repo. That's the point. Don't expose your dev port publicly. - No tests yet. This grew from in-project hacks; tests are a TODO.
License
MIT.
