openclaw-voxsense
v0.1.2
Published
OpenClaw plugin that lets a multimodal model understand voice messages in context and hand them back to the main agent.
Maintainers
Readme
OpenClaw VoxSense
Context-aware voice understanding for OpenClaw.
面向 OpenClaw 的上下文语音理解插件。
Current channel support: Telegram only.
当前仅支持 Telegram。
Current model support: Gemini only for now.
当前模型支持暂时仅限 Gemini。
Demo / 演示
Prompt / 问题
我是个大帅哥能告诉我这句话里哪个字最长吗
ASR only (Groq / whisper-large-v3-turbo)

VoxSense (Gemini-3-Flash)

Summary / 简介
English
- Works with
OpenClaw >= 2026.3.11 - Currently supports Gemini multimodal models only
- Default mode is
handoff - A multimodal model listens to the raw voice message plus recent chat context
- The plugin returns structured understanding to the main OpenClaw agent
- Normal tool calling, memory, and TTS reply flow stay available
中文
- 适用于
OpenClaw >= 2026.3.11 - 当前暂时仅支持 Gemini 多模态模型
- 默认模式为
handoff - 使用多模态模型直接理解原始语音和最近会话上下文
- 插件把结构化理解结果交回主 OpenClaw agent
- 正常工具调用、记忆、多轮和 TTS 回复链路仍然可用
Comparison / 对比
| Item | OpenClaw built-in voice (current) | VoxSense |
| --- | --- | --- |
| Input path | tools.media.audio transcribes voice to text first | Multimodal model reads raw audio directly |
| What the agent receives | Transcript text | Transcript + intent + tone + notes + confidence |
| Context use during understanding | Mostly after transcription | During audio understanding itself |
| Tool calling | Available | Available in handoff mode |
| Multi-turn chat | Available | Available in handoff mode |
| Voice reply | Uses normal OpenClaw TTS | Uses normal OpenClaw TTS |
| Best for | Simple STT-first voice chat | Context-aware voice understanding |
| Tradeoff | Simpler and more predictable | Richer understanding, but more model-dependent |
Install / 安装
From npm / 从 npm 安装
openclaw plugins install openclaw-voxsenseFrom source / 从源码加载
{
"hooks": {
"internal": {
"enabled": true
}
},
"plugins": {
"load": {
"paths": [
"~/.openclaw/workspace/plugins/openclaw-voxsense"
]
},
"entries": {
"openclaw-voxsense": {
"enabled": true,
"hooks": {
"allowPromptInjection": true
},
"config": {
"provider": "haloai-gemini",
"model": "gemini-3-flash-preview",
"mode": "handoff",
"onlyWhenNoText": true,
"storeHeardTextInSession": true,
"debug": false
}
}
}
}
}Restart the gateway after config changes.
修改配置后需要重启网关。
TTS / 语音回复
VoxSense does not synthesize speech by itself.
VoxSense 本身不负责语音合成。
For voice replies, use normal OpenClaw TTS:
- configure
messages.tts - or enable it per session with
/tts always
如果你希望机器人“说话”,仍然需要使用 OpenClaw 自带的 TTS:
- 配置
messages.tts - 或者在会话里用
/tts always
Key Config / 关键配置
provider: provider id used for direct audio understandingmodel: model id used for direct audio understandingmode:handoff: recommended; understand voice, then hand the turn back to the main agentreply: legacy direct-reply mode
onlyWhenNoText: only intercept pure voice turnsstoreHeardTextInSession: persist understood voice content into session historydebug: verbose plugin logs
Runtime Command / 运行时命令
Primary command:
/voxsense status
/voxsense on
/voxsense off
/voxsense debug on
/voxsense debug offNotes / 说明
- VoxSense currently supports Gemini-family models exposed through the
google-generative-aigenerateContentshape - Other multimodal providers are not supported yet
- If you only want VoxSense, you can disable the built-in STT path with
tools.media.audio.enabled=false - If built-in STT stays enabled, both paths may run and cost more
Publish / 发布
Recommended release path:
- push source to GitHub
- publish package to npm
- let users install with
openclaw plugins install openclaw-voxsense
推荐发布方式:
- 源码放 GitHub
- npm 发布包
- 用户通过
openclaw plugins install openclaw-voxsense安装
