@dhee_ai/runner-tts
v0.1.1
Published
Text-to-speech runner driving a ComfyUI TTS workflow (Qwen3-TTS / VibeVoice) on a local GPU.
Keywords
Readme
@dhee_ai/runner-tts
Text-to-speech runner driving a ComfyUI TTS workflow (Qwen3-TTS / VibeVoice) on a local GPU.
A Dhee runner exposing the tool comfy.tts.
Built against @dhee_ai/runner-sdk only (the runner firewall) and discovered
by the engine via the dhee-runner-* npm convention.
Build
pnpm install && pnpm buildComfyUI workflow
This runner drives any ComfyUI text-to-speech workflow. Drop your exported
API-format workflow JSON in workflows/ (the package ships
workflows/qwen3_narration.json — a single-voice Qwen3-TTS narrator built on
FB_Qwen3TTSVoiceDesign → SaveAudio). The runner is workflow-agnostic —
you do NOT edit source. Instead, the bundle node's config names which
node/field receives each input:
textNodeId/textField— where the narration text is injected (required)speakerNodeId/speakerField— optional speaker/voice namevoiceNodeId/voiceField— optional: an uploaded reference-voice filefields— arbitrary static overrides keyed by node id then field (e.g. a Qwen3 voiceinstructdescription, or aseedfor a consistent narrator)
The workflow MUST end in a SaveAudio (or VHS audio save) node so Comfy
emits an audio output. The endpoint is resolved by the engine via
resolveEndpointUrl — COMFY_MODE=local (default) forces
ENDPOINT_self_local / COMFYUI_BASE_URL.
Output contract
On success the runner writes the audio file and reports
metadata.durationSeconds (parsed from the WAV header). Downstream nodes —
notably the LTX director — read that duration to size video to the narration.
Use in a bundle
// bundle.json
"dependencies": {
"runners": { "comfy.tts": ">=0.1.0" },
"runnerPackages": { "comfy.tts": "@dhee_ai/runner-tts" }
}Then reference it from a node:
{
"id": "segment_audio",
"runner": {
"tool": "comfy.tts",
"config": {
"workflowPath": "workflows/qwen3_narration.json",
"outputPath": "audio/segment_1.wav",
"endpoint": "self.local",
"textNodeId": "1",
"textField": "text",
"textInput": "segment_narration",
"fields": { "1": { "instruct": "A warm documentary narrator", "seed": 42 } }
}
}
}Voice setup (optional, multi-speaker)
workflows/qwen3_narration.json needs no pre-saved speaker — instruct
describes the voice inline. For multi-speaker dialogue, the bundle also ships
qwen3_voice_design.json (design + save named speaker voices once) and
qwen3_dialogue.json (FB_Qwen3TTSDialogueInference over a role bank). Point
textNodeId at the dialogue node (script field) and pre-run the voice-design
graph to populate the speaker store.
