clawkie-talkie
v1.0.7
Published
Walkie-talkie voice surface for an OpenClaw session
Downloads
326
Readme
Clawkie Talkie
Push-to-talk voice for thinking out loud into your main OpenClaw session — most often from the car.
The main use case is the drive: a feature, an essay, an app idea, or a stretch of training material you have been working on at the desk, picked back up on the highway. You switch that same OpenClaw thread to voice, use toggle talk mode to speak through the line of thought all the way through without being interrupted, and let your main agent push the work forward with full session context. Then, hours later, you reopen the thread on the laptop and read what was said.
Clawkie Talkie is not a separate assistant and not a quick-chat app. It is a voice lane into an existing OpenClaw conversation: the same session, the same thread, the same context — built for long-form prompts and long-form replies, not short back-and-forth.
After install, your agent gives you a host dashboard URL like https://clawkietalkie.app/dashboard/#host=<daemon-peer-id>. Open that dashboard URL on your phone, bookmark it there or add it to your phone home screen, pick a session from Recent OpenClaw Sessions, press the big button to toggle talk mode, talk through the whole thought without interruption, and hear the agent answer back at length.
The original OpenClaw/Discord thread stays canonical. Your spoken direction is transcribed into that session, the agent responds in that session, and Clawkie Talkie plays the reply back to you. Everything you say and hear in the car is still there in the thread when you sit back down.
host dashboard ──▶ Recent OpenClaw Sessions ──▶ talk next step
▲ │
└──────────── same OpenClaw session keeps moving ◀───────┘Install
The normal install path is agent-run. Tell your agent:
Install Clawkie-Talkie for me by following https://github.com/davidguttman/clawkie-talkie/blob/v1.0.0/AGENT-INSTALL.md.
Before installing, download or clone the repo and inspect the source; stop and ask me if anything looks suspicious, harmful, or you are not confident.
The install should set up the persistent Clawkie-Talkie daemon and the OpenClaw clawkie-voice-handoff skill.
Verify the daemon/service status, logs, and skill configuration before reporting success. When finished, give me the dashboard URL in the form https://clawkietalkie.app/dashboard/#host=<daemon-peer-id>, tell me to bookmark the dashboard URL on my phone or add it to my phone home screen, explain that the dashboard shows Recent OpenClaw Sessions and that I select a session to start voice, and mention that I can ask for my Clawkie dashboard URL again later if I need it.The install does three important things:
- Runs the daemon as a persistent user service.
- Generates one stable
DAEMON_PEER_IDand keeps it in the daemon.env. - Installs the OpenClaw
clawkie-voice-handoffskill with the same host ID.
Manual daemon setup is documented here:
Custom frontend/signaling/TURN setup is documented here:
Agent install/upgrade/repair flow is documented here:
Why this exists
A lot of the most useful thinking happens away from the desk — and the most useful version of it is not a one-liner. It is a full thought, spoken straight through, while you are driving and nothing else is competing for the keyboard.
Push-to-talk is the point. You use toggle talk mode to hold the floor for as long as you need to: a paragraph, a few minutes, an entire framing for a feature or an essay. The agent does not interrupt, the phone does not interrupt, and you get to land the idea before anyone responds.
Typical sessions look like:
- drafting writing or training material out loud on a long drive, then reading the agent's response when you stop;
- exploring a new app or product idea with your main agent, with the OpenClaw session's existing context already loaded;
- thinking through a real feature or design decision in the session that already has the code, the thread, and the prior turns;
- working through review feedback or a product direction at length, instead of bouncing one-line prompts.
Clawkie Talkie is for that mode of work. It is not a quick voice chat. It is a voice lane into a real OpenClaw session for long-form steering, with the canonical written transcript waiting for you when you get back.
The user experience
After install, open the dashboard URL your agent reported:
https://clawkietalkie.app/dashboard/#host=<daemon-peer-id>Bookmark the dashboard URL on your phone or add it to your phone home screen.
Open the dashboard before a drive and choose from Recent OpenClaw Sessions. If no sessions appear, start or resume an OpenClaw conversation at the desk, then refresh the dashboard on your phone.
The voice surface opens for that OpenClaw thread/session.
Tap the main button to record.
Talk through the whole thought — a long prompt, a paragraph of direction, a full framing for a feature or essay. Push-to-talk means the floor stays yours until you tap again.
Watch the live transcript while talking.
Tap stop. The transcript is sent immediately.
OpenClaw answers in the original session, at length, with the session's full context.
Clawkie Talkie speaks the reply back through the browser.
Later, at the desk, open the original OpenClaw/Discord thread and read everything that was said.
The interface is intentionally walkie-talkie shaped: one obvious start/stop control, visible transcript while speaking, a thinking state while the agent works, and spoken playback when the reply is ready. It is built for long-form voice steering of an existing session, not for becoming a separate quick-chat app.
What the links mean
The dashboard URL is the normal phone entry point to bookmark or add to your phone home screen:
https://clawkietalkie.app/dashboard/#host=<daemon-peer-id>It connects to your local daemon host and asks it for Recent OpenClaw Sessions. Pick one to open the voice surface for that session. If the list is empty, start or resume an OpenClaw conversation at the desk, then refresh the dashboard on your phone.
For troubleshooting and compatibility, the skill can still generate a compatibility /voice URL when explicitly requested:
https://clawkietalkie.app/voice#host=<daemon>&session=<openclaw-session-id>&sessionKey=<openclaw-session-key>&channel=<channel>&target=<message-target>Those values are not a login token or a credential bundle. They are routing information:
host— the stable local daemon ID on the user's machine.session— the OpenClaw session to continue. Prefer the actual OpenClaw sessionId UUID; fall back to the exact current session key only if the UUID is unavailable.sessionKey— optional exact OpenClaw session key, included whensessionis the UUID. The daemon uses it to select the OpenClaw agent and, when possible, derive Discord reply/transcript routing.channel+target— explicit originating message route, included when visible. Assistant reply delivery uses this route mandatorily; transcript mirroring also uses it best-effort.
The values live in the URL hash so they are parsed by the browser locally instead of being sent to the web server as normal request parameters.
The privacy boundary
Clawkie Talkie has a hosted browser UI, but the private work happens locally.
- The browser captures microphone audio and plays reply audio.
- The browser stores only UI settings such as provider/model/voice IDs.
- The browser does not receive provider API keys.
- The browser does not receive OpenClaw credentials.
- The local daemon talks to the local
openclawCLI. - OpenClaw's existing config/auth remains the owner of LLM, STT, and TTS credentials.
The daemon is the trusted side. The browser is just the remote control and audio surface.
Architecture
Phone / browser User's machine
┌────────────────────────────┐ ┌────────────────────────────┐
│ Clawkie Talkie web UI │ │ Clawkie Talkie daemon │
│ │ │ │
│ - mic capture │ WebRTC │ - session rendezvous │
│ - tap-to-talk control │ ◀──────▶ │ - OpenClaw CLI calls │
│ - live transcript display │ │ - STT / TTS orchestration │
│ - reply playback │ │ - channel/thread delivery │
└────────────────────────────┘ └────────────────────────────┘
│ │
│ signaling only │ local process calls
▼ ▼
https://api.rambly.app openclawThe signaling service helps the browser and daemon find each other. By default this is the hosted rambly-compatible broker; custom signaling/ICE/TURN is documented in Custom Clawkie Talkie stack. The voice session itself runs over WebRTC. The daemon shells out to OpenClaw for transcription, the agent turn, reply delivery, and speech synthesis.
There is no inbound HTTP port for the daemon. It connects outbound to signaling and providers.
Session model
There is one durable daemon per machine. Its DAEMON_PEER_ID is the stable rendezvous identity used in host=.... Treat the daemon host ID and dashboard URL as bearer routing material: anyone who has it can connect to the host dashboard, enumerate recent sessions exposed by that daemon, and select one for voice. Do not post it in public or shared chats; if it is exposed, rotate DAEMON_PEER_ID and update the installed OpenClaw skill/config to match.
Each voice room is then scoped by OpenClaw session:
voice room = daemon host + OpenClaw sessionThat means the same daemon can support multiple OpenClaw sessions without mixing them together. A Discord thread, a webchat session, and another channel can all produce different voice rooms through the same local daemon.
The dashboard asks the daemon for recent OpenClaw sessions, then the browser opens a voice room from already-known OpenClaw context: daemon host ID and session ID. The session ID is preferably the actual OpenClaw UUID; colon-style session keys are fallback/legacy inputs and cannot be assumed to exist for every surface.
What must already work
Clawkie Talkie depends on a working OpenClaw install for the same OS user that runs the daemon.
At minimum, that user needs:
openclawavailable onPATH;- a working OpenClaw session/runtime;
- a configured audio transcription provider;
- a configured text-to-speech provider.
The daemon uses these commands at runtime:
openclaw infer audio transcribe --file <wav> --json
openclaw agent --session-id <session> ...
openclaw infer tts convert --text <reply> --output <file> --jsonProvider selection is per request. Clawkie Talkie should not mutate OpenClaw's global provider preferences just because a user changes the voice settings in the browser.
Local development
From the repo root:
npm install
npm run devThat starts the daemon and the Vite client.
Vite's dev dependency cache is configured outside the repo/worktree by default so Docker or root-owned dev runs do not leave client/.vite behind and block worktree cleanup. Set CT_VITE_CACHE_DIR to override the cache directory, or run npm run print-dev-cache to see the current path. Do not rely on postdev cleanup for this: killed containers do not reliably run npm lifecycle cleanup hooks.
Regenerate baked hold music
The browser plays hold music from plain media elements so it can survive mobile PWA backgrounding better than a WebAudio-only effects graph. Raw/master MP3s live outside Vite's public directory in assets/hold-music-raw; processed tracks with stable public filenames are generated into baked Low/Medium/High directories (client/public/music-low, client/public/music, client/public/music-high) with the AM-radio effects, hiss, and crackle already mixed into each processed file. Matching original directories (music-original*) stay no-effects/no-noise, also with baked Low/Medium/High levels. The runtime picks a discrete level by URL instead of changing media-element or WebAudio gain volume.
After changing a raw track or the processing chain, regenerate with:
npm run music:regenThe script requires ffmpeg on PATH and applies the AM-radio hold chain in scripts/regenerate-hold-music.mjs.
Common checks:
npm run version:check
npm test
npm run typecheck
npm run buildAndroid/web release automation is documented in Android/web release process.
The shared cross-platform no-drift fixtures live in shared/contract/ and are
read by both web Vitest and Android JUnit tests.
Run only the daemon:
npm run daemonA healthy daemon prints something like:
Peer ID: <daemon-peer-id>
Join URL: https://clawkietalkie.app/dashboard/#host=<daemon-peer-id>
Waiting for phone…That host-only URL opens the host-scoped session dashboard. Bookmark the dashboard URL on the user's phone or add it to the phone home screen; the dashboard shows Recent OpenClaw Sessions and lets the user select the session to continue by voice. The installed OpenClaw skill can also give the user this dashboard URL again later from the configured daemon host ID.
Troubleshooting
The browser never connects
Check that the daemon is running and that the host value in the link matches the daemon's DAEMON_PEER_ID.
The page says to update the daemon
The hosted browser client is current by definition. If the UI reports a daemon protocol/capability mismatch, update the installed daemon from the latest repo source, keep the same DAEMON_PEER_ID, restart the service, and verify the dashboard/voice flow again.
A compatibility /voice page says the session is bad
The compatibility voice URL is missing routing fields or was built for the wrong context. Voice URLs need host and session; prefer the actual OpenClaw sessionId UUID, include sessionKey, channel, and target when those exact runtime values are visible, and use an exact session key in session only as fallback. Delivered assistant replies require an explicit reply route, either from channel/target or a resolvable Discord session key.
Voice records but no reply comes back
Check daemon logs. Common causes are:
openclawis not on the service user'sPATH;- OpenClaw is not authenticated/configured for that user;
- STT provider config is missing;
- TTS provider config is missing;
- the session ID in the link does not identify a real OpenClaw session.
Links break after restart
DAEMON_PEER_ID changed. It should be generated once and kept stable. The daemon .env and the installed OpenClaw skill must use the same value.
It works manually but not as a service
This is usually an environment problem. The launchd/systemd service needs the same access to node, npm, and openclaw that your interactive shell has.
WebRTC will not establish
Corporate networks, VPNs, proxies, and blocked UDP/TURN traffic can prevent the browser and daemon from connecting.
Repo map
client/— phone/browser UI.daemon/— local WebRTC/OpenClaw bridge.openclaw/clawkie-voice-handoff/— skill source for returning the dashboard URL and compatibility voice URLs.docs/install-daemon.md— daemon install and service setup.docs/voice-handoff.md— deterministic rendezvous protocol.test/— client, daemon, protocol, and infer tests.
Current boundaries
- macOS and Linux daemon install paths are documented.
- Windows daemon install is not documented yet.
- Clawkie Talkie is tied to OpenClaw sessions; it is not a standalone voice assistant.
- The original OpenClaw/channel thread remains the transcript source of truth.
