@derogab/stt-proxy
v0.3.1
Published
A simple and lightweight proxy for seamless integration with multiple STT (Speech-to-Text) providers including Whisper.cpp
Maintainers
Readme
stt-proxy
A simple and lightweight proxy for seamless integration with multiple STT providers including Whisper.cpp and Cloudflare AI.
Features
- Multi-provider support: Switch between STT providers with environment variables.
- TypeScript support: Full TypeScript definitions included.
- Simple API: Single function interface for all providers.
- Automatic provider detection: Automatically selects the best available provider based on environment variables.
Installation
npm install @derogab/stt-proxyQuick Start
import { transcribe } from '@derogab/stt-proxy';
const result = await transcribe('/path/to/audio.wav');
console.log(result.text);Configuration
The package automatically detects which STT provider to use based on your environment variables. Configure one or more providers:
Provider Selection
STT_PROVIDER=cloudflare # Optional, force a specific provider (whisper.cpp, cloudflare)When STT_PROVIDER is set, the specified provider will be used and an error is thrown if its credentials are not configured. When not set, providers are selected automatically based on priority.
Note:
PROVIDERis supported as a fallback for backward compatibility whenSTT_PROVIDERis not set.
Whisper.cpp (Local)
WHISPER_CPP_MODEL_PATH=/path/to/ggml-base.bin # Required, path to your GGML model fileDownload models from HuggingFace:
curl -L -o ggml-base.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.binCloudflare AI
CLOUDFLARE_ACCOUNT_ID=your-account-id # Required
CLOUDFLARE_AUTH_KEY=your-api-token # RequiredUses the @cf/openai/whisper-large-v3-turbo model.
API Reference
transcribe(audio: string | Buffer, options?): Promise<TranscribeOutput>
Transcribes audio to text using the configured STT provider. The package automatically manages provider initialization and cleanup.
Parameters:
audio: Path to audio file (string) or audio Bufferoptions(optional): Transcription options
Returns:
- Promise that resolves to an object with
textproperty
Options Format:
type TranscribeOptions = {
language?: string; // Language code (e.g., 'en', 'es', 'fr')
translate?: boolean; // Translate to English
};Output Format:
type TranscribeOutput = {
text: string;
};Example:
// Transcribe from file path
const result1 = await transcribe('/path/to/audio.wav');
console.log(result1.text);
// Transcribe from Buffer
const audioBuffer = fs.readFileSync('/path/to/audio.wav');
const result2 = await transcribe(audioBuffer);
console.log(result2.text);
// With options
const result3 = await transcribe('/path/to/audio.wav', {
language: 'en',
translate: false
});
console.log(result3.text);Provider Priority
When STT_PROVIDER environment variable is set, that provider is used directly.
Otherwise, the package selects providers in the following order:
- Whisper.cpp (if
WHISPER_CPP_MODEL_PATHis set and file exists) - Cloudflare AI (if
CLOUDFLARE_ACCOUNT_IDandCLOUDFLARE_AUTH_KEYare set)
If no providers are configured, the function throws an error.
Requirements
- FFmpeg: Required for audio conversion (Whisper.cpp only).
# macOS brew install ffmpeg # Ubuntu/Debian sudo apt install ffmpeg # Windows (with Chocolatey) choco install ffmpeg
Development
# Install dependencies
npm install
# Build the package
npm run build
# Run tests
npm testCredits
STT Proxy is made with ♥ by derogab and it's released under the MIT license.
Contributors
Tip
If you like this project or directly benefit from it, please consider buying me a coffee:
🔗 bc1qd0qatgz8h62uvnr74utwncc6j5ckfz2v2g4lef
⚡️ [email protected]
💶 Sponsor on GitHub
