@wopr-network/wopr-plugin-voice-deepgram-stt
v1.0.0
Published
Deepgram STT provider using nova-3 model
Readme
WOPR Voice Plugin: Deepgram STT
Cloud-based Speech-to-Text provider using Deepgram's nova-3 model.
Features
- Batch Transcription: Transcribe complete audio buffers via REST API
- Streaming Transcription: Real-time transcription via WebSocket
- Language Detection: Auto-detect language or specify explicitly
- High Accuracy: Uses Deepgram's latest nova-3 model
- Partial Transcripts: Get interim results during streaming
Requirements
- Deepgram API key (sign up at https://deepgram.com)
- Environment variable:
DEEPGRAM_API_KEY
Installation
cd /home/tsavo/wopr-project/plugins/wopr-plugin-voice-deepgram-stt
pnpm install
pnpm buildConfiguration
Set your API key in the environment:
export DEEPGRAM_API_KEY="your-api-key-here"Or configure via WOPR config:
{
"plugins": {
"voice-deepgram-stt": {
"apiKey": "your-api-key-here",
"model": "nova-3", // nova-3, nova-2, nova, enhanced, base
"language": "en", // Language code or "auto"
"wordTimestamps": false,
"timeoutMs": 30000
}
}
}Usage
Batch Transcription
const stt = ctx.getSTT();
if (!stt) {
throw new Error("No STT provider registered");
}
const audioBuffer = await fs.readFile("audio.wav");
const transcript = await stt.transcribe(audioBuffer, {
language: "en",
});
console.log("Transcript:", transcript);Streaming Transcription
const stt = ctx.getSTT();
const session = await stt.createSession({
language: "en",
vadEnabled: true,
vadSilenceMs: 1000,
});
// Listen for partial results
session.onPartial((chunk) => {
if (chunk.isFinal) {
console.log("Final:", chunk.text);
} else {
console.log("Partial:", chunk.text);
}
});
// Send audio chunks
for (const chunk of audioChunks) {
session.sendAudio(chunk);
}
// Signal end of audio
session.endAudio();
// Wait for final transcript
const finalTranscript = await session.waitForTranscript();
console.log("Complete:", finalTranscript);
// Cleanup
await session.close();API Reference
DeepgramProvider
Implements the STTProvider interface from wopr/voice.
Methods
validateConfig(): Validates configuration (throws on error)createSession(options?: STTOptions): Create streaming sessiontranscribe(audio: Buffer, options?: STTOptions): Batch transcriptionhealthCheck(): Check API connectivity (returns boolean)
DeepgramSession
Implements the STTSession interface for streaming.
Methods
sendAudio(audio: Buffer): Send audio chunk for transcriptionendAudio(): Signal end of audio streamonPartial(callback): Register callback for partial transcriptswaitForTranscript(timeoutMs?): Wait for final transcriptclose(): Close session and cleanup
Supported Models
nova-3(default): Latest and most accuratenova-2: Previous generationnova: Original nova modelenhanced: Enhanced accuracybase: Baseline model
Supported Languages
Deepgram supports 30+ languages. Common codes:
en- Englishes- Spanishfr- Frenchde- Germanzh- Chineseauto- Auto-detect
Full list: https://developers.deepgram.com/docs/languages
Error Handling
All methods throw descriptive errors:
try {
const transcript = await stt.transcribe(audio);
} catch (err) {
if (err.message.includes("HTTP 401")) {
console.error("Invalid API key");
} else if (err.message.includes("timeout")) {
console.error("Request timed out");
} else {
console.error("Transcription failed:", err);
}
}Performance
- Batch: ~2-5 seconds for 1 minute of audio
- Streaming: Real-time with <500ms latency
- Accuracy: 90-95% on clean audio
- Rate Limits: Varies by plan (check Deepgram dashboard)
License
MIT
