npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

openclaw-voxsense

v0.1.2

Published

OpenClaw plugin that lets a multimodal model understand voice messages in context and hand them back to the main agent.

Readme

OpenClaw VoxSense

Context-aware voice understanding for OpenClaw.
面向 OpenClaw 的上下文语音理解插件。

Current channel support: Telegram only.
当前仅支持 Telegram。

Current model support: Gemini only for now.
当前模型支持暂时仅限 Gemini。

Demo / 演示

Prompt / 问题

我是个大帅哥能告诉我这句话里哪个字最长吗

ASR only (Groq / whisper-large-v3-turbo)

ASR-only demo

VoxSense (Gemini-3-Flash)

VoxSense demo

Summary / 简介

English

  • Works with OpenClaw >= 2026.3.11
  • Currently supports Gemini multimodal models only
  • Default mode is handoff
  • A multimodal model listens to the raw voice message plus recent chat context
  • The plugin returns structured understanding to the main OpenClaw agent
  • Normal tool calling, memory, and TTS reply flow stay available

中文

  • 适用于 OpenClaw >= 2026.3.11
  • 当前暂时仅支持 Gemini 多模态模型
  • 默认模式为 handoff
  • 使用多模态模型直接理解原始语音和最近会话上下文
  • 插件把结构化理解结果交回主 OpenClaw agent
  • 正常工具调用、记忆、多轮和 TTS 回复链路仍然可用

Comparison / 对比

| Item | OpenClaw built-in voice (current) | VoxSense | | --- | --- | --- | | Input path | tools.media.audio transcribes voice to text first | Multimodal model reads raw audio directly | | What the agent receives | Transcript text | Transcript + intent + tone + notes + confidence | | Context use during understanding | Mostly after transcription | During audio understanding itself | | Tool calling | Available | Available in handoff mode | | Multi-turn chat | Available | Available in handoff mode | | Voice reply | Uses normal OpenClaw TTS | Uses normal OpenClaw TTS | | Best for | Simple STT-first voice chat | Context-aware voice understanding | | Tradeoff | Simpler and more predictable | Richer understanding, but more model-dependent |

Install / 安装

From npm / 从 npm 安装

openclaw plugins install openclaw-voxsense

From source / 从源码加载

{
  "hooks": {
    "internal": {
      "enabled": true
    }
  },
  "plugins": {
    "load": {
      "paths": [
        "~/.openclaw/workspace/plugins/openclaw-voxsense"
      ]
    },
    "entries": {
      "openclaw-voxsense": {
        "enabled": true,
        "hooks": {
          "allowPromptInjection": true
        },
        "config": {
          "provider": "haloai-gemini",
          "model": "gemini-3-flash-preview",
          "mode": "handoff",
          "onlyWhenNoText": true,
          "storeHeardTextInSession": true,
          "debug": false
        }
      }
    }
  }
}

Restart the gateway after config changes.
修改配置后需要重启网关。

TTS / 语音回复

VoxSense does not synthesize speech by itself.
VoxSense 本身不负责语音合成。

For voice replies, use normal OpenClaw TTS:

  • configure messages.tts
  • or enable it per session with /tts always

如果你希望机器人“说话”,仍然需要使用 OpenClaw 自带的 TTS:

  • 配置 messages.tts
  • 或者在会话里用 /tts always

Key Config / 关键配置

  • provider: provider id used for direct audio understanding
  • model: model id used for direct audio understanding
  • mode:
    • handoff: recommended; understand voice, then hand the turn back to the main agent
    • reply: legacy direct-reply mode
  • onlyWhenNoText: only intercept pure voice turns
  • storeHeardTextInSession: persist understood voice content into session history
  • debug: verbose plugin logs

Runtime Command / 运行时命令

Primary command:

/voxsense status
/voxsense on
/voxsense off
/voxsense debug on
/voxsense debug off

Notes / 说明

  • VoxSense currently supports Gemini-family models exposed through the google-generative-ai generateContent shape
  • Other multimodal providers are not supported yet
  • If you only want VoxSense, you can disable the built-in STT path with tools.media.audio.enabled=false
  • If built-in STT stays enabled, both paths may run and cost more

Publish / 发布

Recommended release path:

  1. push source to GitHub
  2. publish package to npm
  3. let users install with openclaw plugins install openclaw-voxsense

推荐发布方式:

  1. 源码放 GitHub
  2. npm 发布包
  3. 用户通过 openclaw plugins install openclaw-voxsense 安装