@dhee_ai/runner-tts

v0.1.1

Published

9 hours ago

Text-to-speech runner driving a ComfyUI TTS workflow (Qwen3-TTS / VibeVoice) on a local GPU.

0High
0Medium
0Low

@dhee_ai/runner-tts

Text-to-speech runner driving a ComfyUI TTS workflow (Qwen3-TTS / VibeVoice) on a local GPU.

A Dhee runner exposing the tool comfy.tts. Built against @dhee_ai/runner-sdk only (the runner firewall) and discovered by the engine via the dhee-runner-* npm convention.

Build

pnpm install && pnpm build

ComfyUI workflow

This runner drives any ComfyUI text-to-speech workflow. Drop your exported API-format workflow JSON in workflows/ (the package ships workflows/qwen3_narration.json — a single-voice Qwen3-TTS narrator built on FB_Qwen3TTSVoiceDesign → SaveAudio). The runner is workflow-agnostic — you do NOT edit source. Instead, the bundle node's config names which node/field receives each input:

textNodeId / textField — where the narration text is injected (required)
speakerNodeId / speakerField — optional speaker/voice name
voiceNodeId / voiceField — optional: an uploaded reference-voice file
fields — arbitrary static overrides keyed by node id then field (e.g. a Qwen3 voice instruct description, or a seed for a consistent narrator)

The workflow MUST end in a SaveAudio (or VHS audio save) node so Comfy emits an audio output. The endpoint is resolved by the engine via resolveEndpointUrl — COMFY_MODE=local (default) forces ENDPOINT_self_local / COMFYUI_BASE_URL.

Output contract

On success the runner writes the audio file and reports metadata.durationSeconds (parsed from the WAV header). Downstream nodes — notably the LTX director — read that duration to size video to the narration.

Use in a bundle

// bundle.json
"dependencies": {
  "runners":        { "comfy.tts": ">=0.1.0" },
  "runnerPackages": { "comfy.tts": "@dhee_ai/runner-tts" }
}

Then reference it from a node:

{
  "id": "segment_audio",
  "runner": {
    "tool": "comfy.tts",
    "config": {
      "workflowPath": "workflows/qwen3_narration.json",
      "outputPath": "audio/segment_1.wav",
      "endpoint": "self.local",
      "textNodeId": "1",
      "textField": "text",
      "textInput": "segment_narration",
      "fields": { "1": { "instruct": "A warm documentary narrator", "seed": 42 } }
    }
  }
}

Voice setup (optional, multi-speaker)

workflows/qwen3_narration.json needs no pre-saved speaker — instruct describes the voice inline. For multi-speaker dialogue, the bundle also ships qwen3_voice_design.json (design + save named speaker voices once) and qwen3_dialogue.json (FB_Qwen3TTSDialogueInference over a role bank). Point textNodeId at the dialogue node (script field) and pre-run the voice-design graph to populate the speaker store.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@dhee_ai/runner-tts

Build

ComfyUI workflow

Output contract

Use in a bundle

Voice setup (optional, multi-speaker)