@soniox/langchain
v0.1.4
Published
Soniox integration for LangChain.js
Readme
Soniox LangChain integration
Get started using the Soniox audio transcription loader in LangChain.
Setup
Install the package:
npm install @soniox/langchainCredentials
Get your Soniox API key from the Soniox Console and set it as an environment variable:
export SONIOX_API_KEY=your_api_keyUsage
Basic transcription
Transcribe audio files using the SonioxAudioTranscriptLoader:
import { SonioxAudioTranscriptLoader } from "@soniox/langchain";
// Fetch the file
const response = await fetch(
"https://github.com/soniox/soniox_examples/raw/refs/heads/master/speech_to_text/assets/coffee_shop.mp3",
);
const audioBuffer = await response.bytes(); // Uint8Array
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioBuffer, // Or you can pass in a URL string
},
{
language_hints: ["en"],
// Any other transcription parameters you find here
// https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription
},
);
const docs = await loader.load();
console.log(docs[0].pageContent); // Transcribed textTwo-way translation
Transcribe and translate between two languages simultaneously:
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioBuffer,
},
{
translation: {
type: "two_way",
language_a: "en",
language_b: "es",
},
language_hints: ["en", "es"],
},
);
const docs = await loader.load();One-way translation
Translate from any detected language to a target language:
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioBuffer,
},
{
translation: {
type: "one_way",
target_language: "fr",
},
language_hints: ["en"],
},
);
const docs = await loader.load();Advanced usage
Language hints
Provide language hints to improve transcription accuracy:
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioBuffer,
},
{
language_hints: ["en", "es"],
},
);Context for improved accuracy
Provide domain-specific context to improve transcription accuracy:
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioBuffer,
},
{
context: {
general: [
{ key: "industry", value: "healthcare" },
{ key: "meeting_type", value: "consultation" },
],
terms: ["hypertension", "cardiology", "metformin"],
translation_terms: [
{ source: "blood pressure", target: "presión arterial" },
{ source: "medication", target: "medicamento" },
],
},
},
);API reference
Constructor parameters
SonioxLoaderParams (required)
| Parameter | Type | Required | Description |
| ------------------- | ---------------------- | -------- | ------------------------------------------------------ |
| audio | Uint8Array \| string | Yes | Audio file as buffer or URL |
| audioFormat | SonioxAudioFormat | No | Audio file format |
| apiKey | string | No | Soniox API key (defaults to SONIOX_API_KEY env var) |
| apiBaseUrl | string | No | API base URL (defaults to https://api.soniox.com/v1) |
| pollingIntervalMs | number | No | Polling interval in ms (min: 1000, default: 1000) |
| pollingTimeoutMs | number | No | Polling timeout in ms (default: 180000) |
SonioxLoaderOptions (optional)
| Parameter | Type | Description |
| -------------------------------- | ---------------------------- | ---------------------------------------- |
| model | SonioxTranscriptionModelId | Model to use (default: "stt-async-v4") |
| translation | object | Translation configuration |
| language_hints | string[] | Language hints for transcription |
| language_hints_strict | boolean | Enforce strict language hints |
| enable_speaker_diarization | boolean | Enable speaker identification |
| enable_language_identification | boolean | Enable language detection |
| context | object | Context for improved accuracy |
Browse the documentation for a full list of supported options.
Supported audio formats
aac- Advanced Audio Codingaiff- Audio Interchange File Formatamr- Adaptive Multi-Rateasf- Advanced Systems Formatflac- Free Lossless Audio Codecmp3- MPEG Audio Layer IIIogg- Ogg Vorbiswav- Waveform Audio File Formatwebm- WebM Audio
Return value
The load() method returns an array containing a single Document object:
Document {
pageContent: string, // The transcribed text
metadata: SonioxTranscriptResponse // Full transcript with metadata
}The metadata includes transcribed text, speaker information (if diarization enabled), language information (if identification enabled), translation data (if translation enabled), and timing information.
