@voxflow/openclaw-tts

v0.1.4

Published

9 days ago

VoxFlow TTS speech provider for OpenClaw — 94+ Chinese voices, multilingual, voice cloning

Downloads

482

0High
0Medium
0Low

gonghaoran

openclaw openclaw-plugin tts text-to-speech chinese voxflow

@voxflow/openclaw-tts

VoxFlow TTS speech provider for OpenClaw — bring 94+ Chinese voices and multilingual auto-detection to every channel your bot lives in, with zero API key setup.

What It Does

@voxflow/openclaw-tts registers VoxFlow as a speech provider inside OpenClaw. When a reply is ready, OpenClaw calls the plugin's synthesize() method, which sends the text to api.voxflow.studio, receives an MP3 buffer, and hands it back to OpenClaw. OpenClaw then re-encodes the audio per channel: SILK for WeChat voice messages, Opus for Telegram/Discord, and so on. Authentication uses the same JWT token shared with the VoxFlow CLI — login once via device code and all tools share the session.

Quick Start

Step 1 — Install the plugin

openclaw plugins install @voxflow/openclaw-tts
openclaw gateway restart

Step 2 — Set VoxFlow as your TTS provider in openclaw.json

{
  "messages": {
    "tts": {
      "auto": "inbound",
      "provider": "voxflow"
    }
  }
}

Step 3 — Authenticate

Send /voxflow login in any chat channel, or run in a terminal:

npx voxflow login

A browser window opens. Sign in with email OTP or Google. Done — the token is cached at ~/.config/voxflow/token.json and shared with the CLI.

Full openclaw.json Example

{
  // TTS behaviour
  "messages": {
    "tts": {
      // When to auto-synthesize replies (see TTS Modes table below)
      "auto": "inbound",
      // Use VoxFlow as the speech provider
      "provider": "voxflow"
    }
  },

  // Plugin-level config for the VoxFlow provider
  "plugins": {
    "entries": {
      "openclaw-tts": {
        "config": {
          // Voice ID from the VoxFlow voice library
          "voice": "v-female-R2s4N9qJ",
          // Playback speed multiplier (0.5 – 2.0, default 1.0)
          "speed": 1.0,
          // API endpoint (change only if self-hosting)
          "apiBase": "https://api.voxflow.studio",
          // Request timeout in milliseconds (default 15000)
          "timeoutMs": 15000
        }
      }
    }
  }
}

TTS Modes

Set messages.tts.auto to one of:

| Value | Behaviour | |-------|-----------| | off | TTS never runs automatically (default). Use /voxflow command or LLM directive to trigger manually. | | always | Every bot reply is synthesized to voice, regardless of how the user messaged. | | inbound | Voice reply only when the incoming user message was itself a voice message. | | tagged | The LLM decides per-reply by inserting a [[tts:…]] directive in its output. |

You can also force TTS on a single reply using an LLM directive in your system prompt or message:

[[tts:provider=voxflow voiceId=v-female-R2s4N9qJ speed=1.2]]

Voice Selection

Set plugins.entries["openclaw-tts"].config.voice to any voice ID from the VoxFlow library.

Browse all voices at voxflow.studio/app — filter by language, gender, and style.

Example Voices

| Voice ID | Description | |----------|-------------| | v-female-R2s4N9qJ | Default — warm Mandarin female, neutral style | | v-male-Zh4kP1mX | Deep Mandarin male, broadcast style | | v-female-En9wQ3jY | English female, clear and natural | | v-female-Ja5rT8nK | Japanese female, polite style |

Multilingual auto-detection is on by default: VoxFlow detects zh/en/ja per segment and switches pronunciation accordingly — no extra config needed.

Commands

Send these slash commands in any OpenClaw channel:

`/voxflow`

Synthesizes the previous bot reply as voice and sends it to the channel.

User:  /voxflow
Bot:   🔊 [voice message — 3s]

`/voxflow login`

Starts the device-code authentication flow (see Authentication section).

User:  /voxflow login
Bot:   Open this URL to log in:
       https://voxflow.studio/cli-auth?state=abc123&callback_port=0
       (expires in 10 minutes)

`/voxflow status`

Shows authentication status and remaining quota.

User:  /voxflow status
Bot:   ✅ Authenticated as [email protected]
       Quota remaining: 8,400 / 10,000 (free tier)
       Voice: v-female-R2s4N9qJ  Speed: 1.0x

Advanced Config

All options under plugins.entries["openclaw-tts"].config:

| Option | Type | Default | Description | |--------|------|---------|-------------| | voice | string | v-female-R2s4N9qJ | Voice ID. Browse at voxflow.studio/app. | | speed | number | 1.0 | Playback speed multiplier. Range: 0.5 – 2.0. | | apiBase | string | https://api.voxflow.studio | API base URL. Override only if self-hosting. | | timeoutMs | number | 15000 | HTTP request timeout in milliseconds. |

Authentication

The plugin reads the VoxFlow JWT from the following sources, in order:

VOXFLOW_TOKEN environment variable
VOXFLOW_JWT environment variable
~/.config/voxflow/token.json — written by voxflow login or /voxflow login

Device Code Flow (step by step)

Send /voxflow login in any channel, or run npx voxflow login in a terminal.
The plugin starts a temporary local HTTP server on a random port and opens (or prints) a URL like: https://voxflow.studio/cli-auth?state=<nonce>&callback_port=<port>
Open the URL in a browser. Sign in with email OTP or Google OAuth.
The browser redirects to localhost:<port>/callback?token=<JWT>.
The plugin writes the token to ~/.config/voxflow/token.json.
All subsequent requests use this cached token.

Token Expiry

Supabase JWTs expire after approximately 1 hour. When the token expires, the plugin will respond with an authentication error. Re-run /voxflow login to refresh.

CI / Headless Environments

Set the VOXFLOW_TOKEN or VOXFLOW_JWT environment variable to skip the browser flow entirely:

export VOXFLOW_TOKEN="eyJhbGciOiJFUzI1NiIs..."

Channel Support

VoxFlow outputs MP3. OpenClaw re-encodes per channel automatically:

| Channel | Encoding | Notes | |---------|----------|-------| | WeChat | SILK | Required by WeChat voice message protocol | | Telegram | Opus (OGG) | Standard Telegram voice note format | | Discord | Opus stream | Sent as audio attachment | | Slack | MP3 | Uploaded as file attachment | | WhatsApp | Opus (OGG) | Standard WhatsApp PTT format | | Feishu / Lark | MP3 | Uploaded as audio message | | DingTalk | MP3 | Uploaded as voice message |

No per-channel config is required — OpenClaw handles format conversion transparently.

Quota & Pricing

Each TTS synthesis call costs 100 quota regardless of text length (standard voices). Monthly quota resets every 30 days. Bonus quota from referrals never expires and is used after the monthly pool is exhausted.

| Tier | Monthly Quota | TTS Calls | Price | |------|--------------|-----------|-------| | Free | 10,000 | ~100 | $0 | | Plus | 100,000 | ~1,000 | $9/mo | | Pro | 250,000 | ~2,500 | $29/mo | | Max | 600,000 | ~6,000 | $59/mo |

Upgrade or check usage at voxflow.studio.

Troubleshooting

`404 Not Found` on `/voxflow login`

The plugin or bot is not connected to the internet, or apiBase is misconfigured. Verify connectivity to https://api.voxflow.studio/health.

`401 Unauthorized` / token expired

Your JWT has expired (Supabase tokens last ~1 hour). Re-authenticate:

npx voxflow login
# or in-chat:
/voxflow login

`429 Too Many Requests`

The TTS rate limiter is 20 requests/min. Reduce call frequency or upgrade your tier.

`No token found` on startup

The plugin cannot find a JWT. Either run /voxflow login, or set the VOXFLOW_TOKEN environment variable.

Voice ID not recognized

Check the voice ID against the library at voxflow.studio/app. Voice IDs are case-sensitive.

Audio not playing in WeChat

Ensure OpenClaw has WeChat voice message permissions. SILK encoding requires the bot account to have voice send rights in the target group.

Development / Contributing

Local Setup

# Clone the repo
git clone https://github.com/VoxFlowStudio/FlowStudio.git
cd FlowStudio/openclaw-plugin

# Install dependencies
npm install

# Build TypeScript
npm run build

# Link plugin locally (live reload, no copy)
openclaw plugins install -l .

Project Structure

openclaw-plugin/
├── index.ts                  → Plugin entry: definePluginEntry + registerSpeechProvider
├── src/
│   ├── speech-provider.ts    → buildVoxFlowSpeechProvider(): SpeechProviderPlugin
│   └── auth.ts               → getVoxFlowToken() — reads CLI cache or env var
├── tsconfig.json
└── package.json

Implemented Interface

The plugin implements OpenClaw's SpeechProviderPlugin interface:

| Method | Description | |--------|-------------| | id | "voxflow" — provider identifier used in openclaw.json | | label | Display name shown in OpenClaw UI | | isConfigured() | Returns true if a valid JWT is present | | synthesize(req) | Calls VoxFlow API, returns { audioBuffer, outputFormat, fileExtension, voiceCompatible } | | listVoices(req) | Fetches the full voice catalog from the VoxFlow API |

Running Tests

npm test

Contributing

Fork the repo and create a branch: feat/my-change
Make changes, run npm run build and npm test
Open a PR against main — use Squash and Merge

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@voxflow/openclaw-tts

What It Does

Quick Start

Full openclaw.json Example

TTS Modes

Voice Selection

Example Voices

Commands

/voxflow

/voxflow login

/voxflow status

Advanced Config

Authentication

Device Code Flow (step by step)

Token Expiry

CI / Headless Environments

Channel Support

Quota & Pricing

Troubleshooting

404 Not Found on /voxflow login

401 Unauthorized / token expired

429 Too Many Requests

No token found on startup

Voice ID not recognized

Audio not playing in WeChat

Development / Contributing

Local Setup

Project Structure

Implemented Interface

Running Tests

Contributing

Links

`/voxflow`

`/voxflow login`

`/voxflow status`

`404 Not Found` on `/voxflow login`

`401 Unauthorized` / token expired

`429 Too Many Requests`

`No token found` on startup