duck_talk

v0.1.4

Published

2 months ago

Voice interface for Claude Code

0High
0Medium
0Low

dhuynh95

Duck Talk

Talk to Claude Code. Hear it talk back. Approve, interrupt, or redirect — all by voice, from anywhere.

The core tech: a generic a voice layer that can wrap any black-box agent using Live Speech models (e.g. Gemini Live, OpenAI Realtime) for low latency conversations. No modifications to the agent.

             Duck Talk            Claude Code
              ┌──────┐          ╔══════════════╗
You ─speech─▶ │ STT  │ ─inst─▶  ║              ║
    ◀─audio── │ TTS  │ ◀─txt──  ║  (any agent) ║
              └──────┘          ╚══════════════╝

inst = instruction, e.g. "What is the latest PR?"
txt = raw stream of tokens

Demo

Quick start

You will need:

Claude Code CLI on PATH
ANTHROPIC_API_KEY — for Claude Code
GEMINI_API_KEY — for Gemini voice (free tier works, no credit card needed)

Option 1 — npx (fastest)

ANTHROPIC_API_KEY=sk-ant-... GEMINI_API_KEY=AIza... npx duck_talk
# Opens http://localhost:8000

Or set them in a .env file in the current directory:

ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=AIza...

Option 2 — from source

git clone https://github.com/dhuynh95/duck_talk.git && cd duck_talk
npm install
cp .env.example .env   # then edit with your API keys
npm run dev

Why

I wanted a coding assistant I could talk to on a walk — check on a long-running task, brainstorm architecture, review a plan. Hands-free, conversational, no laptop required.

STT tools like SuperWhisper and Wispr Flow get you halfway — you can dictate, but the agent never talks back. You can bolt TTS onto Claude Code via MCP, but you're waiting for the full response before hearing anything.

Voice-native agents like ChatGPT and Gemini Live have the conversation part down, but they're not connected to your codebase. They can't run commands, edit files, or see your project. And if your accent trips up the STT — "Cloud Code" instead of "Claude Code" — there's no way to catch it before it's sent.

Nothing combines all of this:

| | Multi turn voice | Audio output | Low latency | No context bloat | Setup | |---|---|---|---|---|---| | STT dictation | ❌ Push-to-talk | ❌ | ❌ No response | ✅ | ✅ | | MCP voice tool | ❌ Keyboard | ✅ | ❌ After completion | ❌ Extra MCP | ❌ Custom MCP | | Duck Talk | ✅ | ✅ | ✅ | ✅ | ✅ |

Key features

Real-time voice — talk to Claude Code hands-free. Say "stop" to interrupt mid-response.
Streaming TTS — responses spoken sentence-by-sentence as they stream. ~1.5s to first audio, not after completion.
Review mode — hear your instruction read back before it's sent. Accept, edit, or reject by voice or buttons. No more "Cloud Code" when you said "Claude Code."
Correction learning — edit a misheard instruction, the diff is saved. Future transcriptions auto-correct.
Session management — browse, resume, and rewind conversations. Built on Claude Code's native JSONL format.

Architecture

Two Gemini Live sessions — one listens, one speaks. Claude Code is the black box in between.

graph LR
    You((You))
    STT["Gemini Live #1<br/>STT · VAD · Tools"]
    API["Express Server<br/>+ Agent SDK"]
    TTS["Gemini Live #2<br/>Streaming TTS"]
    CC[["Claude Code<br/>(any agent)"]]

    You -->|speech| STT
    STT -->|instruction| API
    API <-->|text stream| CC
    API -->|sentences| TTS
    TTS -->|audio| You
    API -.->|context inject| STT

Flow of a single instruction:

sequenceDiagram
    actor You
    participant STT as Gemini Live<br/>(STT · VAD)
    participant API as Express Server
    participant CC as Claude Code
    participant TTS as TTS Session

    You->>STT: 🎤 speech
    Note over STT: VAD detects end of speech
    STT->>API: converse(instruction)
    Note over STT: ⏸ frozen (BLOCKING tool)
    STT-->>STT: tool response → unfreeze

    API->>CC: query(instruction)

    loop text streaming
        CC-->>API: text chunk (SSE)
        API-->>TTS: sentence buffer flush
        TTS-->>You: 🔊 audio
        API-->>STT: context inject
    end

    Note over TTS: audio drains

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme