fruitstand-cli

v0.2.2

Published

3 days ago

Hands-free Claude Code with fruit keywords

0High
0Medium
0Low

Fruit Stand

A minimal, unobtrusive, hands-free Claude Code experience based on Voxtral Transcribe Realtime and using fruit keywords.

Warning: do not use this to develop Fruit Ninja or other fruit themed software!

After experimenting with different voice-only interfaces to Claude Code — something to let me use it hands-free while doing dishes — I settled on this approach: a tmux session where Claude Code takes up most of the vertical space, with a small indicator at the bottom that provides real-time feedback for what you're saying. Fruit-related keywords are used to end your commands and trigger actions in Claude Code.

How it works

You speak naturally, and your words appear in real time in the indicator panel at the bottom of the screen. When you say one of the fruit keywords, it triggers the corresponding action:

| Keyword | Action | |---------|--------| | 🍌 banana | Send everything you've said so far as a message to Claude Code | | 🥑 avocado | Interrupt Claude Code (Escape) | | 🍈 papaya | Clear what you've said without sending | | 🥝 kiwi | Send /clear to start a new conversation | | 🥭 mango | Toggle Claude Code's mode (Shift+Tab) |

For example, saying "refactor the auth module to use JWT banana" sends "refactor the auth module to use JWT" to Claude Code.

Fruit keywords were chosen because they are common, distinct, easily recognizable, easy to pronounce, but rarely used in programming outside of very specific circumstances.

If you happen to be developing Fruit Ninja or other fruit-themed software, overrides for the keywords are supported (run fruitstand --help for more).

Install

bun install -g fruitstand-cli

Setup

You need a Mistral API key for voice transcription (Voxtral). Add it to ~/.fruitstand:

echo "MISTRAL_API_KEY=your-key-here" > ~/.fruitstand

This file uses KEY=VALUE format, one per line. Lines starting with # are ignored. Environment variables already set in your shell take precedence.

Usage

fruitstand

CLI options

--continue         Continue the most recent conversation
--resume <id>      Resume a specific conversation by ID
--send <word>      Override "send" keyword (default: banana)
--interrupt <word> Override "interrupt" keyword (default: avocado)
--clear <word>     Override "clear" keyword (default: papaya)
--slash-clear <word> Override "/clear" keyword (default: kiwi)
--toggle-mode <word> Override "toggle mode" keyword (default: mango)

Subcommands

tmux (default) — Launch Claude Code with voice control in a tmux session
voice-input — Voice input panel (used internally by tmux)
listen — Debug: raw voice transcription output

Approaches that didn't work

Real-time voice models (Gemini Live, GPT Realtime)

My first attempt was using a real-time conversational model as the interface. The problem: these models too frequently responded with audio when I didn't need them to, and interrupted me when I didn't want them to. It turns out that when you're dictating your thoughts to a coding agent, things get complicated — you need to take a breath, pause, and think for a moment before moving forward. That's a natural part of articulating complex technical thoughts, and a model that interprets every pause as a turn boundary makes the experience frustrating.

This led me to the keyword-based approach: a simple streaming transcription where I intentionally trigger the send rather than a model deciding for me.

Re-implementing the Claude Code UI

Another approach I considered was building a custom UI — either in a browser or an alternative terminal interface — with voice input built in. The problem is that the Claude Code terminal UI, for all the criticism it gets online, is actually pretty good. It handles a wide range of functionality: switching between modes, resuming sessions, configuration options, and more. Any approach that tried to bring this experience into a different UI would have required re-implementing too much of the core Claude Code functionality, which I didn't want to do. It's also a moving target — as Claude Code ships new features, a custom UI would constantly fall behind.

The tmux approach sidesteps all of this. Claude Code runs unmodified in its own pane, and the voice input panel just sends keystrokes to it. This preserves full Claude Code fidelity, and I can still intervene manually with the keyboard when needed.

Tool call based approaches

There are a number of approaches that give Claude Code the ability to listen to or speak to the user via tool calls or skills. I found these too brittle — the model might just decide not to use the tool. It also wastes precious tool/skill context that should be reserved for something else.

How I use it

I tend to use this in two different ways.

Fully hands-free mode. I'm doing a mindless activity like dishwashing or folding clothes. In this mode I use Opus and give long, detailed prompts where the agent iterates for a while. I come back to check on it after I've finished cleaning a dish. The interaction is asynchronous — I speak a thought, send it, and let the agent work while I do something else with my hands.

Editor-alongside mode. I have my text editor open and I'm reviewing a PR or making active changes to a codebase. As I go, I spot small isolated tasks that I can quickly offload to the agent just by speaking and saying the trigger word. This is a more pleasant, involved coding experience — I use the agent to stay present by quickly offloading tedious bits that might otherwise cause me to get frustrated or lose focus. In this mode I don't wait for the agent's output. I'll mention that some other bit in a different file needs to be changed, and come back to check on it later.

Requirements

Bun
tmux
SoX (brew install sox)
A Mistral API key (MISTRAL_API_KEY env var)
Claude Code CLI (claude)

License

MIT