npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

openclaw-azure-speech

v1.0.0

Published

Azure AI Speech Service integration for OpenClaw — TTS and STT with 400+ neural voices, SSML support, and enterprise SLA.

Readme

openclaw-azure-speech

Azure AI Speech Service integration for OpenClaw — TTS and STT with 400+ neural voices, full SSML support, and enterprise SLA.

Features

  • TTS (Text-to-Speech): 400+ neural voices across 140+ languages/locales
  • STT (Speech-to-Text): Short audio transcription via REST API + realtime streaming via WebSocket
  • Zero dependencies: Pure fetch for REST calls, native WebSocket for streaming
  • SSML support: Full SSML override for fine-grained voice control
  • Voice listing: Browse available voices filtered by locale
  • Channel-aware output: Automatic opus/ogg for Telegram/WhatsApp voice messages, MP3 for others
  • CJK auto-switch: Automatically uses Chinese voice when CJK text is detected
  • Model directive support: Override voice with [[tts:voice=zh-CN-YunxiNeural]]

Prerequisites

  1. An Azure account with a Speech Service resource
  2. Get your subscription key and region from the Azure Portal → Speech resource → Keys and Endpoint

Installation

openclaw plugins install openclaw-azure-speech

Or link locally for development:

openclaw plugins install -l /path/to/openclaw-azure-speech

Configuration

Minimal setup (TTS only)

Add to your openclaw.json:

{
  plugins: {
    entries: {
      "azure-speech": {
        config: {
          subscriptionKey: "your-azure-speech-key",
          region: "eastasia",
        }
      }
    }
  },
  messages: {
    tts: {
      provider: "azure",  // use Azure as TTS provider
    }
  }
}

Full setup (TTS + STT)

STT requires an additional models.providers.azure entry for OpenClaw's media understanding auth pipeline:

{
  // Plugin config
  plugins: {
    entries: {
      "azure-speech": {
        config: {
          subscriptionKey: "your-azure-speech-key",
          region: "eastasia",
          voice: "zh-CN-XiaoxiaoNeural",  // optional, default auto-detects CJK
          sttLanguage: "zh-CN",            // optional, default zh-CN
        }
      }
    }
  },

  // TTS provider selection
  messages: {
    tts: {
      provider: "azure",
    }
  },

  // STT auth (required for audio transcription)
  models: {
    providers: {
      azure: {
        apiKey: "your-azure-speech-key",
        baseUrl: "https://eastasia.stt.speech.microsoft.com",
        models: []
      }
    }
  },

  // Audio transcription model entry
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [
          { provider: "azure", model: "default", language: "zh-CN" }
        ]
      }
    }
  }
}

Environment variables (alternative)

You can use environment variables instead of or alongside openclaw.json:

AZURE_SPEECH_KEY=your-key        # subscription key
AZURE_SPEECH_REGION=eastasia     # Azure region
AZURE_SPEECH_VOICE=zh-CN-XiaoxiaoNeural  # optional, default TTS voice
AZURE_SPEECH_STT_LANGUAGE=zh-CN          # optional, default STT language

Config resolution priority

The plugin resolves configuration from multiple sources (highest priority first):

  1. messages.tts.providers.azure (standard OpenClaw TTS provider config)
  2. plugins.entries.azure-speech.config (plugin config)
  3. Environment variables
  4. Built-in defaults

TTS directives

When messages.tts.modelOverrides.enabled is true (default), the model can override TTS settings per-reply:

[[tts:voice=zh-CN-YunxiNeural]]
[[tts:voiceId=en-US-GuyNeural]]
[[tts:outputFormat=ogg-48khz-16bit-mono-opus]]
[[tts:lang=ja-JP]]

Supported output formats

| Format | Use case | |--------|----------| | audio-24khz-48kbitrate-mono-mp3 | Default, good for most channels | | audio-24khz-96kbitrate-mono-mp3 | Higher quality MP3 | | ogg-48khz-16bit-mono-opus | Voice messages (Telegram, WhatsApp, etc.) | | riff-24khz-16bit-mono-pcm | WAV/PCM | | audio-48khz-192kbitrate-mono-mp3 | High-fidelity MP3 |

See Microsoft docs for the full list.

Popular voices

| Voice | Language | Gender | Styles | |-------|----------|--------|--------| | zh-CN-XiaoxiaoNeural | Chinese (Mandarin) | Female | cheerful, sad, angry, ... | | zh-CN-YunxiNeural | Chinese (Mandarin) | Male | narration, cheerful, ... | | en-US-JennyNeural | English (US) | Female | — | | en-US-GuyNeural | English (US) | Male | — | | ja-JP-NanamiNeural | Japanese | Female | — |

Use the listVoices API to browse all 400+ voices.

CJK auto-switch

When using the default English voice (en-US-JennyNeural), the plugin automatically switches to zh-CN-XiaoxiaoNeural if CJK characters are dominant in the text. This means you don't need to configure a Chinese voice explicitly for Chinese-dominant usage.

Architecture

The plugin registers three OpenClaw capabilities:

| Capability | Registration | Purpose | |-----------|-------------|---------| | SpeechProvider | api.registerSpeechProvider() | TTS synthesis | | RealtimeTranscriptionProvider | api.registerRealtimeTranscriptionProvider() | Streaming STT via WebSocket | | MediaUnderstandingProvider | api.registerMediaUnderstandingProvider() | Audio file transcription (short audio ≤60s) |

All API calls use pure fetch (zero runtime dependencies). The WebSocket STT uses Node.js native WebSocket (requires Node ≥ 22).

Development

git clone https://github.com/sawyer0x110/openclaw-azure-speech
cd openclaw-azure-speech
npm install
npm run build
npm test           # 63 unit tests
npm run typecheck  # TypeScript strict mode

License

MIT