audio-transcripter

v2.0.1

Published

7 months ago

Lightweight TypeScript library for transcribing audio files using Google Gemini 2.0 models. Supports local files, remote URLs, and Blobs.

0High
0Medium
0Low

shriansh

gemini audio transcriber transcription google-genai typescript blob url buffer speech-to-text

🎙️ audio-transcripter

A lightweight TypeScript library for transcribing audio files using Google Gemini 2.0 models.

Supports local files, remote URLs, and in-memory buffers/blobs.

Ideal for meetings, interviews, podcasts, technical content, and more.

🚀 Installation

npm install audio-transcripter

🌟 Features

🎧 Supports local files (.wav, .mp3, .aac, .flac, .ogg, .webm, etc.)
🌐 Supports remote URLs (HTTP/HTTPS)
📦 Supports Blobs / Buffers
✨ Multiple transcription styles:
- accurate
- clean
- structured
- technical
- conversational
🔍 Verbose logging (optional)
⚙️ Written in TypeScript with full type safety

🧑‍💻 Usage

1️⃣ Transcribe Local File

import { runTranscription } from "audio-transcripter";

const result = await runTranscription({
	audioFile: "./assets/audio.webm",
	style: "structured", // optional, default: 'conversational'
	language: "english", // optional
});

if (result.success) {
	console.log("Transcription:", result.transcription);
} else {
	console.error("Error:", result.error);
}

2️⃣ Transcribe Remote URL

const result = await runTranscription({
	audioFile: "https://example.com/audio.mp3",
	style: "clean",
	language: "english",
});

3️⃣ Transcribe Blob / Buffer (for browser or Node.js)

import { runTranscriptionWithBlob } from "audio-transcripter";

// Example with a Node.js Buffer
const fs = await import("fs/promises");
const audioBuffer = await fs.readFile("./assets/audio.wav");

const result = await runTranscriptionWithBlob(audioBuffer, {
	style: "technical",
	language: "english",
});

if (result.success) {
	console.log("Transcription:", result.transcription);
} else {
	console.error("Error:", result.error);
}

📥 Configuration Options

| Option | Type | Default | Description | | ----------- | ------- | ------------------ | ------------------------------------------------- | | audioFile | string | required | Local file path or remote URL | | style | string | 'conversational' | Transcription style (see below) | | language | string | 'english' | Language of the audio | | verbose | boolean | true | Enable verbose console logs | | timeout | number | 5000 (ms) | Timeout for remote URL HEAD check (if applicable) |

🎨 Supported Transcription Styles

| Style | Description | | ---------------- | ------------------------------------------------------------ | | accurate | High accuracy, raw transcription including filler words | | clean | Edited for readability (filler words removed, grammar fixed) | | structured | Meeting/interview format with speakers and structure | | technical | Technical content with jargon preserved | | conversational | Casual, creative, natural conversation transcription |

🗂️ Supported File Formats

.mp3
.wav
.aac
.flac
.ogg
.webm / .weba

Unknown formats fallback to audio/octet-stream.

📚 API Reference

`runTranscription(config: TranscriptionConfig)`

Runs transcription on local file path or remote URL.

Returns: Promise<RunTranscriptionResult>

type RunTranscriptionResult = {
	success: boolean;
	transcription?: string;
	error?: string;
};

`runTranscriptionWithBlob(audioBlob: Blob | Buffer, options?)`

Runs transcription on an in-memory Blob or Node.js Buffer.

Returns: Promise<RunTranscriptionResult>

🗂️ Type Definitions

export type TranscriptionStyle =
	| "accurate"
	| "clean"
	| "structured"
	| "technical"
	| "conversational";

export interface TranscriptionConfig {
	audioFile: string;
	style?: TranscriptionStyle;
	language?: string | null;
	verbose?: boolean;
	timeout?: number;
}

export interface RunTranscriptionResult {
	success: boolean;
	transcription?: string;
	error?: string;
}

🔐 Authentication

This package requires a Gemini API Key.

1️⃣ Set TRANSCRIBER_KEY in your environment:

export TRANSCRIBER_KEY=your-gemini-api-key-here

2️⃣ Create a .env file:

TRANSCRIBER_KEY=your-gemini-api-key-here

Get your API key from Google MakerSuite.

🛠️ Tech Stack

📄 License

🙋 FAQ

Q: Does this upload my file to third-party storage?

A: No. Files are uploaded only to Gemini's File API endpoint.

Q: Can I use this in the browser?

A: runTranscriptionWithBlob works with browser Blob and Node.js Buffer.

Q: What models are used?

A: gemini-2.0-flash model via Google GenAI SDK.

Summary

✅ Lightweight
✅ Flexible API
✅ Multiple transcription styles
✅ Works with Files, URLs, Blobs/Buffer
✅ Production-ready TypeScript types