@deepdub/node
v2.0.0
Published
Deepdub API SDK
Readme
Deepdub Node.js SDK
Install and use the Deepdub Node.js SDK for text-to-speech generation with streaming support.
Installation
npm install --save @deepdub/node
# or
yarn add @deepdub/nodeRequirements: Node.js 18+ (uses native fetch)
Initialization
const { DeepdubClient } = require('@deepdub/node');
// Option 1: Pass API key directly
const deepdub = new DeepdubClient('dd-your-api-key');
// Option 2: Use DEEPDUB_API_KEY environment variable
require('dotenv').config();
const deepdubFromEnv = new DeepdubClient();
// HTTP protocol supports voiceReference and sampleRate with all formats
const deepdubHttp = new DeepdubClient('dd-your-api-key', { protocol: 'http' });Constructor parameters
| Parameter | Type | Default | Description |
| --- | --- | --- | --- |
| apiKey | string | process.env.DEEPDUB_API_KEY | Your Deepdub API key. Falls back to DEEPDUB_API_KEY if not provided. |
| options.protocol | 'websocket' \| 'http' | 'websocket' | Transport protocol: "websocket" for real-time streaming, or "http" for REST API. |
| options.baseUrl | string | US/EU REST URL | Base URL for the REST API. Falls back to DEEPDUB_BASE_URL. |
| options.baseWebsocketUrl | string | US/EU WebSocket URL | Base URL for the WebSocket API. Falls back to DEEPDUB_BASE_WEBSOCKET_URL. |
| options.baseWebsocketStreamingUrl | string | wss://wss.deepdub.ai/ws | Base URL for the WebSocket streaming API. Falls back to DEEPDUB_BASE_WEBSOCKET_STREAMING_URL. |
| options.eu | boolean | false | Use EU region endpoints. Falls back to DD_EU=1. |
Protocol comparison
| Feature | WebSocket | HTTP |
| --- | --- | --- |
| Streaming chunks (onChunk) | Yes | No |
| sampleRate option | mp3 only | All formats |
| voiceReference option | No | Yes |
| Concurrent generations | Yes | Yes |
Use WebSocket (default) for real-time streaming and low-latency playback. Use HTTP when you need voiceReference for instant voice cloning or sampleRate with non-mp3 formats.
Region endpoints
| Region | REST API | WebSocket API |
| --- | --- | --- |
| US (default) | https://restapi.deepdub.ai/api/v1 | wss://wsapi.deepdub.ai/open |
| EU | https://restapi.eu.deepdub.ai/api/v1 | wss://wsapi.eu.deepdub.ai/open |
Enable EU with { eu: true } or DD_EU=1.
Connection
For WebSocket protocol, call connect() before using generateToBuffer(), generateToFile(), or generateTo(). For HTTP protocol and REST methods, no connection step is needed.
connect() — Open a WebSocket connection
const deepdub = new DeepdubClient('dd-your-api-key');
await deepdub.connect();Returns: Promise<WebSocket> — the opened WebSocket connection.
Parameters
No parameters.
asyncConnect() — Alias for connect()
await deepdub.asyncConnect();Returns: Promise<WebSocket> — the opened WebSocket connection.
Parameters
No parameters.
disconnect() — Close the WebSocket connection
await deepdub.connect();
// ...generate audio...
deepdub.disconnect();Call disconnect() when you are done with the WebSocket. If you skip it, the open connection keeps the Node.js process alive.
Returns: void
Parameters
No parameters.
Text-to-Speech
tts() — Synchronous generation
Generate speech and receive the complete audio as a Buffer.
const fs = require('fs');
const audio = await deepdub.tts('Hello, welcome to Deepdub!', {
voicePromptId: 'your-voice-id',
model: 'dd-etts-3.2',
locale: 'en-US',
format: 'mp3',
});
fs.writeFileSync('output.mp3', audio);Returns: Promise<Buffer> — binary audio data in the specified format.
Parameters
| Parameter | Type | Default | Description |
| --- | --- | --- | --- |
| text | string | Required | Text to convert to speech. |
| params.voicePromptId | string | — | Voice prompt ID to use. Either this or voiceReference must be provided. |
| params.voiceReference | string \| Buffer | — | Audio reference for instant voice cloning. Accepts a file path, raw Buffer, or base64-encoded string. Either this or voicePromptId must be provided. |
| params.model | string | dd-etts-3.2 | Model ID. |
| params.locale | string | en-US | Language locale code (e.g. en-US, fr-FR). |
| params.format | string | mp3 | Audio output format. REST supports mp3, opus, and mulaw. |
| params.temperature | number | — | Generation temperature (0.0-1.0). Higher values produce more varied output. |
| params.variance | number | — | Voice variation level (0.0-1.0). |
| params.duration | number | — | Target audio duration in seconds. Mutually exclusive with tempo. |
| params.tempo | number | — | Playback speed multiplier. Mutually exclusive with duration. |
| params.seed | number | — | Random seed for deterministic generation. |
| params.promptBoost | boolean | — | Enhance voice prompt characteristics. |
| params.sampleRate | number | — | Output sample rate in Hz. Supported: 8000, 16000, 22050, 24000, 44100, 48000. |
| params.accentBaseLocale | string | — | Base accent locale. Must be provided together with accentLocale and accentRatio. |
| params.accentLocale | string | — | Target accent locale. Must be provided together with accentBaseLocale and accentRatio. |
| params.accentRatio | number | — | Accent blend ratio (0.0-1.0). Must be provided together with accentBaseLocale and accentLocale. |
| params.accentControl | object | — | Accent blending object: { accentBaseLocale, accentLocale, accentRatio }. |
| params.targetGender | string | — | Target gender for the output voice. |
| params.generationId | string | Auto-generated UUID | Optional UUID for request tracking. |
| params.superStretch | boolean | — | Enable super stretch for longer audio. |
| params.realtime | boolean | — | Enable real-time priority processing. |
| params.cleanAudio | boolean | — | Request audio cleanup when supported by the API. |
| params.autoGain | boolean | — | Request automatic gain control when supported by the API. |
| params.publish | boolean | — | Publish the generated asset when supported by the API. |
| params.performanceReferencePromptId | string | — | Voice prompt ID to use as a performance reference. |
Full example with all common TTS parameters
const audio = await deepdub.tts('This demonstrates common TTS parameters.', {
voicePromptId: 'your-voice-id',
model: 'dd-etts-3.2',
locale: 'en-US',
format: 'mp3',
temperature: 0.7,
variance: 0.6,
tempo: 1.1,
seed: 42,
promptBoost: true,
sampleRate: 44100,
accentBaseLocale: 'en-US',
accentLocale: 'fr-FR',
accentRatio: 0.3,
});
require('fs').writeFileSync('output.mp3', audio);Voice cloning from audio reference
const audio = await deepdub.tts('Cloning a voice from an audio sample.', {
voiceReference: './reference_audio.mp3',
model: 'dd-etts-3.2',
locale: 'en-US',
});
require('fs').writeFileSync('cloned_output.mp3', audio);ttsRetro() — Retroactive generation
Submit a TTS request and receive a URL for later retrieval.
const response = await deepdub.ttsRetro('Generate this audio for later retrieval.', {
voicePromptId: 'your-voice-id',
model: 'dd-etts-3.2',
locale: 'en-US',
});
const audioUrl = response.url;
console.log(`Audio available at: ${audioUrl}`);Fetch the audio later with the same x-api-key header.
Returns: Promise<{ url: string }> — an object with a url key pointing to the generated audio.
Parameters
| Parameter | Type | Default | Description |
| --- | --- | --- | --- |
| text | string | Required | Text to convert to speech. |
| params.voicePromptId | string | — | Voice prompt ID to use. Either this or voiceReference must be provided. |
| params.voiceReference | string \| Buffer | — | Audio reference for instant voice cloning. Accepts a file path, raw Buffer, or base64-encoded string. Either this or voicePromptId must be provided. |
| params.model | string | dd-etts-3.2 | Model ID. |
| params.locale | string | en-US | Language locale code. |
| params | TtsParams | — | Supports all tts() parameter fields. Retroactive generation is most commonly used with voicePromptId, model, and locale. |
WebSocket TTS
generateToBuffer() — Generate to buffer
Generate audio and receive a Buffer of audio data. WebSocket generation returns WAV by default.
const deepdub = new DeepdubClient('dd-your-api-key');
await deepdub.connect();
const buffer = await deepdub.generateToBuffer('Hello, welcome to Deepdub!', {
locale: 'en-US',
voicePromptId: 'your-voice-id',
});
console.log(`Generated ${buffer.length} bytes of audio`);
deepdub.disconnect();WebSocket generation defaults to format: 'wav'. Use { protocol: 'http' } on the client for HTTP-based generateToBuffer() with voiceReference or sampleRate for non-mp3 formats.
Returns: Promise<Buffer> — generated audio data.
Parameters
| Parameter | Type | Default | Description |
| --- | --- | --- | --- |
| text | string | Required | Text to convert to speech. |
| params | TtsParams | { format: 'wav' } over WebSocket | Supports voicePromptId, voiceReference (HTTP only), model, locale, format, temperature, variance, duration, tempo, seed, promptBoost, sampleRate, accent options, targetGender, generationId, superStretch, realtime, cleanAudio, autoGain, publish, and performanceReferencePromptId. |
| params.onChunk | (chunk: Buffer) => void | — | Callback receiving each audio chunk as a Buffer. WebSocket protocol only. |
| params.headerless | boolean | false | When true, chunks passed to onChunk have WAV headers stripped. WebSocket protocol only. |
generateToFile() — Generate to file
Generate audio and save directly to a file.
const deepdub = new DeepdubClient('dd-your-api-key');
await deepdub.connect();
await deepdub.generateToFile('./output.wav', 'Hello, welcome to Deepdub!', {
locale: 'en-US',
voicePromptId: 'your-voice-id',
});
deepdub.disconnect();Returns: Promise<Buffer | void> — generated audio data for HTTP protocol, or resolves when the WebSocket file write completes.
Parameters
| Parameter | Type | Default | Description |
| --- | --- | --- | --- |
| filePath | string | Required | Output file path. |
| text | string | Required | Text to convert to speech. |
| params | TtsParams | { format: 'wav' } over WebSocket | Same generation parameters as generateToBuffer(), including onChunk and headerless. |
generateTo() — Low-level generation helper
Generate audio to a selected output type. Most applications should use generateToBuffer() or generateToFile() instead.
await deepdub.connect();
const buffer = await deepdub.generateTo('buffer', 'Hello from the low-level API.', {
locale: 'en-US',
voicePromptId: 'your-voice-id',
});
deepdub.disconnect();Returns: Promise<Buffer | void> — generated audio for buffer output, or resolves when file output completes.
Parameters
| Parameter | Type | Default | Description |
| --- | --- | --- | --- |
| outputType | 'buffer' \| 'file' | Required | Output destination type. |
| text | string | Required | Text to convert to speech. |
| params | TtsParams | {} | Same generation parameters as generateToBuffer(), including onChunk and headerless. |
| filePath | string \| null | null | Output file path when outputType is 'file'. |
Streaming chunks
Receive audio data incrementally for real-time playback:
const buffer = await deepdub.generateToBuffer('Streaming audio in real time!', {
locale: 'en-US',
voicePromptId: 'your-voice-id',
model: 'dd-etts-3.2',
onChunk: (chunk) => {
console.log(`Received ${chunk.length} bytes`);
// Stream to an audio player, network response, etc.
},
});Headerless chunks
Strip WAV headers from each chunk for raw PCM data:
await deepdub.generateToBuffer('Raw PCM streaming.', {
locale: 'en-US',
voicePromptId: 'your-voice-id',
headerless: true,
onChunk: (chunk) => {
audioPlayer.write(chunk);
},
});asyncTts() — Streaming generation
Stream audio chunks over WebSocket for low-latency playback. If no WebSocket is connected, asyncTts() opens one automatically; call disconnect() when finished.
const deepdub = new DeepdubClient('dd-your-api-key');
const audioChunks = [];
for await (const chunk of deepdub.asyncTts('Streaming audio in real time!', {
voicePromptId: 'your-voice-id',
model: 'dd-etts-3.2',
locale: 'en-US',
format: 'wav',
})) {
audioChunks.push(chunk);
console.log(`Received chunk: ${chunk.length} bytes`);
}
require('fs').writeFileSync('streamed.wav', Buffer.concat(audioChunks));
deepdub.disconnect();Yields: Buffer — audio chunks as they are generated.
Parameters
| Parameter | Type | Default | Description |
| --- | --- | --- | --- |
| text | string | Required | Text to convert to speech. |
| params | TtsParams | { format: 'wav' } over WebSocket | Supports voicePromptId, model, locale, format, temperature, variance, duration, tempo, seed, promptBoost, accent options, sampleRate with mp3, targetGender, and generationId. |
| params.generationId | string | Auto-generated UUID | Optional UUID for request tracking. |
| params.targetGender | string | — | Target gender for the output voice. |
| params.onChunk | (chunk: Buffer) => void | — | Internal chunk callback used by the iterator. You usually do not need to pass this directly. |
| params.headerless | boolean | false | When true, chunks have WAV headers stripped. |
Concurrent generations
Run multiple generations in parallel on the same WebSocket connection:
const deepdub = new DeepdubClient('dd-your-api-key');
await deepdub.connect();
const sentences = [
'First sentence to generate.',
'Second sentence in parallel.',
'Third sentence simultaneously.',
];
await Promise.all(
sentences.map((text, index) =>
deepdub.generateToFile(`./output_${index}.wav`, text, {
locale: 'en-US',
voicePromptId: 'your-voice-id',
model: 'dd-etts-3.2',
})
)
);
deepdub.disconnect();Voice Management
listVoices() — List all voice prompts
const voices = await deepdub.listVoices();
for (const voice of voices.voicePrompts ?? []) {
console.log(`${voice.id}: ${voice.name ?? voice.title ?? 'Untitled'}`);
}Returns: Promise<{ voicePrompts: VoicePrompt[] }> — an object with a voicePrompts key containing voice prompt objects.
Parameters
| Parameter | Type | Default | Description |
| --- | --- | --- | --- |
| limit | number | — | Optional maximum number of voices to return. |
addVoice() — Upload a voice sample
const response = await deepdub.addVoice({
data: './voice_sample.wav',
name: 'Professional Narrator',
gender: 'female',
locale: 'en-US',
publish: false,
speakingStyle: 'Neutral',
age: 30,
});
console.log(`Created voice: ${JSON.stringify(response)}`);Returns: Promise<object> — created voice prompt information.
Parameters
| Parameter | Type | Default | Description |
| --- | --- | --- | --- |
| data | string \| Buffer | Required | Audio data: file path, raw Buffer, or base64-encoded string. |
| name | string | Required | Display name for the voice prompt. |
| gender | string | Required | Speaker gender: "male" or "female". Sent to the API as uppercase. |
| locale | string | Required | Language locale code (e.g. en-US). |
| publish | boolean | false | Whether to make the voice publicly available. |
| speakingStyle | string | 'Neutral' | Speaking style descriptor. |
| age | number | 0 | Age of the speaker. |
| filename | string | Derived from data | File name to send with the voice sample. |
| text | string | — | Transcript or text associated with the voice sample. |
| speakerId | string | — | Speaker ID to associate with the voice prompt. |
CLI Reference
# List available voices
deepdub list-voices
# Upload a new voice
deepdub add-voice \
--file path/to/audio.mp3 \
--name "My Voice" \
--gender male \
--locale en-US
# Generate text-to-speech
deepdub tts \
--text "Hello from the CLI!" \
--voice-prompt-id your-voice-id \
--out output.mp3
# Set API key via flag or environment
deepdub --api-key dd-your-key tts --text "Hello!" --voice-prompt-id your-id
export DEEPDUB_API_KEY=dd-your-keyEnvironment Variables
| Variable | Description | Default |
| --- | --- | --- |
| DEEPDUB_API_KEY | API key for authentication | — |
| DEEPDUB_BASE_URL | REST API base URL | US/EU production URL |
| DEEPDUB_BASE_WEBSOCKET_URL | WebSocket API base URL | US/EU production URL |
| DEEPDUB_BASE_WEBSOCKET_STREAMING_URL | Streaming WebSocket URL | wss://wss.deepdub.ai/ws |
| DD_EU | Use EU endpoints ("1" to enable) | "0" |
Error Handling
try {
const audio = await deepdub.tts('Hello!', {
voicePromptId: 'your-voice-id',
});
} catch (error) {
if (error.status === 401) {
console.error('Invalid API key');
} else if (error.status === 400) {
console.error('Invalid request parameters:', error.message);
} else {
console.error('API error:', error.message);
}
}For WebSocket operations, server errors are thrown as Error with the server message, such as rate limits or insufficient credits.
Available Models
| Model ID | Description |
| --- | --- |
| dd-etts-3.2 | Latest model (default) |
| dd-etts-3.0 | High-quality production model |
| dd-etts-2.5 | Stable production model |
Tests
Live API tests require DEEPDUB_API_KEY in .env:
npm testSet DEEPDUB_VOICE_REFERENCE_FILE to a real reference audio file to enable the optional voice reference test.
Individual suites: node test/test-tts.js, node test/test-async-tts.js, node test/test-eu-region.js, etc.
