krio-stt
v1.0.0
Published
Speech-to-text for Krio and other African languages — powered by Whisper large-v3 via Hugging Face
Maintainers
Readme
krio-stt
Speech-to-text for Krio and other African languages — powered by Whisper large-v3 via the Hugging Face Inference API.
Works with voice notes from WhatsApp, Telegram, local files, or any audio URL. No extra dependencies — pure Node.js.
Why Whisper large-v3?
Standard STT engines struggle with Krio (Sierra Leonean Creole), Nigerian Pidgin, and heavily accented speech. Whisper large-v3 is trained on a wide range of languages and accents and handles these significantly better than smaller models.
Installation
npm install krio-sttOr use it locally in a monorepo:
npm install ../krio-sttQuick Start
const { transcribeUrl, transcribeBuffer, transcribeFile } = require('krio-stt');Set your Hugging Face API key (free at huggingface.co/settings/tokens):
export HUGGINGFACE_API_KEY=hf_...Transcribe from a URL
// Public URL — no auth needed
const text = await transcribeUrl('https://example.com/audio.ogg');
// Protected URL — pass a Bearer token
const text = await transcribeUrl(mediaUrl, {
auth: { bearer: process.env.WHAPI_TOKEN },
});Transcribe from a Buffer
const text = await transcribeBuffer(audioBuffer, {
contentType: 'audio/ogg',
});Transcribe from a local file
const text = await transcribeFile('/path/to/voice.ogg');Auth Options
The auth option in transcribeUrl supports multiple formats:
// Bearer token (WhatsApp / Whapi, most APIs)
{ auth: { bearer: 'TOKEN' } }
// Basic auth
{ auth: { basic: { user: 'username', pass: 'password' } } }
// Arbitrary header
{ auth: { header: { key: 'X-API-Key', value: 'TOKEN' } } }
// Raw Authorization header value
{ auth: 'Bearer TOKEN' }Platform Examples
WhatsApp (via Whapi)
const { transcribeUrl } = require('krio-stt');
// In your webhook handler:
if (message.type === 'audio') {
const text = await transcribeUrl(message.media.url, {
auth: { bearer: process.env.WHAPI_TOKEN },
});
// text is now the Krio/English transcript — handle it like a typed message
}Telegram
const { transcribeUrl } = require('krio-stt');
// After calling getFile to resolve file_path:
const url = `https://api.telegram.org/file/bot${BOT_TOKEN}/${file_path}`;
const text = await transcribeUrl(url); // no auth header needed — token is in URLExpress file upload
const { transcribeBuffer } = require('krio-stt');
app.post('/transcribe', upload.single('audio'), async (req, res) => {
const text = await transcribeBuffer(req.file.buffer, {
contentType: req.file.mimetype,
});
res.json({ transcript: text });
});API Reference
transcribeUrl(url, [options]) → Promise<string>
| Option | Type | Default | Description |
|---------------|--------|--------------------------|----------------------------------------|
| auth | object | — | Auth config for downloading the file |
| contentType | string | auto-detected | Override MIME type |
| apiKey | string | HUGGINGFACE_API_KEY | Hugging Face API key |
| model | string | openai/whisper-large-v3| Whisper model ID |
| timeout | number | 60000 | Request timeout in milliseconds |
transcribeBuffer(buffer, [options]) → Promise<string>
Same options as above except auth (not applicable).
transcribeFile(filePath, [options]) → Promise<string>
Same options as transcribeBuffer. Content-type is inferred from the file extension.
Notes
- The Hugging Face free tier may have a cold-start delay (~20–30s) if the model hasn't been used recently. The first request after a period of inactivity will take longer.
- Audio files must be under 25 MB (Whisper's hard limit).
- For production use, consider a paid HF Inference Endpoint to avoid cold starts.
License
MIT
