transcribio
v1.0.3
Published
AI-powered audio transcription using Gemini - CLI & Web UI
Maintainers
Readme
🎙️ Transcribio
AI-powered audio transcription using Google's Gemini API. Runs locally on your machine with a beautiful CLI and web interface.
Features
- 🎯 High Accuracy - Powered by Gemini 2.0 Flash/Pro
- 🗣️ Speaker Detection - Identifies different speakers
- ⏱️ Timestamps - Navigation-friendly time markers
- 🌍 50+ Languages - Auto-detection or manual selection
- 📤 Multiple Exports - TXT, SRT, VTT, JSON
- 💻 CLI & Web UI - Use your preferred interface
- 🔒 Privacy First - Runs locally, audio goes directly to Gemini
- 💸 Free - Uses Gemini's generous free tier
Installation
npm install -g transcribioQuick Start
1. Get API Key
Get your free Gemini API key at Google AI Studio
2. Configure
transcribio config --set-key3. Transcribe
# Using CLI
transcribio audio.mp3
# Or launch Web UI
transcribio uiCLI Usage
Basic Transcription
# Simple transcription
transcribio interview.mp3
# With specific options
transcribio podcast.wav --speakers --timestamps --output srt
# Save to file
transcribio meeting.m4a -f transcript.txt
# Different model
transcribio audio.mp3 --model proOptions
| Option | Description | Default |
| ----------------------- | ------------------------------ | ------- |
| -s, --speakers | Enable speaker detection | true |
| -t, --timestamps | Include timestamps | true |
| -l, --language <code> | Audio language (or 'auto') | auto |
| -o, --output <format> | Format: txt, srt, vtt, json | txt |
| -f, --file <path> | Save output to file | - |
| --model <name> | flash (fast) or pro (accurate) | flash |
| --translate <lang> | Translate to language | - |
Configuration Commands
# Set API key interactively
transcribio config --set-key
# Show current configuration
transcribio config --show
# Reset all settings
transcribio config --resetWeb Interface
Launch the web UI for a more visual experience:
transcribio uiThis opens a browser at http://localhost:3456 with:
- Drag & drop file upload
- Real-time progress
- Multiple export formats
- Beautiful formatted output
Custom port:
transcribio ui --port 8080Supported Formats
Input Audio
- MP3 (
.mp3) - WAV (
.wav) - M4A (
.m4a) - OGG (
.ogg) - FLAC (
.flac) - AAC (
.aac) - WebM (
.webm)
Output Formats
TXT (Plain Text)
[00:00] Speaker 1: Hello, welcome to the podcast.
[00:05] Speaker 2: Thanks for having me!SRT (SubRip Subtitle)
1
00:00:00,000 --> 00:00:05,000
[Speaker 1] Hello, welcome to the podcast.
2
00:00:05,000 --> 00:00:08,000
[Speaker 2] Thanks for having me!VTT (WebVTT)
WEBVTT
1
00:00:00.000 --> 00:00:05.000
<v Speaker 1>Hello, welcome to the podcast.
2
00:00:05.000 --> 00:00:08.000
<v Speaker 2>Thanks for having me!JSON
{
"success": true,
"language": "English",
"languageCode": "en",
"duration": "05:30",
"segments": [
{
"timestamp": "00:00",
"speaker": "Speaker 1",
"text": "Hello, welcome to the podcast."
}
],
"fullText": "Complete transcript...",
"summary": "Brief summary of the content"
}Examples
Transcribe Interview
transcribio interview.mp3 --speakers --timestamps -f interview.txtCreate Subtitles
transcribio video-audio.wav --output srt -f subtitles.srtTranslate Content
transcribio spanish-audio.mp3 --translate englishHigh Accuracy Mode
transcribio important-meeting.m4a --model pro --output json -f meeting.jsonAPI Usage (Programmatic)
Use Transcribio in your Node.js projects:
import { GeminiService, exportTranscript } from "transcribio";
// Initialize with API key
const gemini = new GeminiService("your-api-key");
// Transcribe audio
const result = await gemini.transcribe("audio.mp3", {
speakers: true,
timestamps: true,
language: "auto",
model: "flash",
});
// Export to different formats
const txt = exportTranscript(result, "txt");
const srt = exportTranscript(result, "srt");
const vtt = exportTranscript(result, "vtt");
const json = exportTranscript(result, "json");
console.log(result);Free Tier Limits
Gemini's free tier is generous:
| Model | Daily Requests | Speed | Accuracy | | ----- | -------------- | ------ | --------- | | Flash | ~1,000/day | Fast | Good | | Pro | ~50/day | Slower | Excellent |
Perfect for personal use, podcasts, interviews, and more!
Privacy & Security
- 🏠 Runs locally - No data stored on external servers
- 🔑 API key encrypted - Stored securely on your machine
- 🔒 Direct to Gemini - Audio sent only to Google's Gemini API
- 📝 No tracking - Zero analytics or telemetry
Troubleshooting
API Key Issues
# Check if key is configured
transcribio config --show
# Reset and reconfigure
transcribio config --reset
transcribio config --set-keyFile Size Issues
- Files under 20MB: Sent inline (faster)
- Files over 20MB: Use File API (slower but handles larger files)
- Maximum: 100MB
Unsupported Format
Convert your audio file to a supported format:
# Using ffmpeg
ffmpeg -i input.mp4 -vn -acodec libmp3lame output.mp3Development
Clone & Install
git clone https://github.com/junaidh-junu/transcribio.git
cd transcribio
npm installRun Locally
# CLI
node bin/transcribio.js audio.mp3
# Web UI
node bin/transcribio.js uiRun Tests
npm testLint Code
npm run lintProject Structure
transcribio/
├── bin/
│ └── transcribio.js # CLI entry point
├── src/
│ ├── cli/ # CLI implementation
│ ├── core/ # Gemini service
│ ├── exporters/ # Format exporters
│ ├── web/ # Web UI & server
│ └── config/ # Configuration management
├── tests/ # Test files
└── package.jsonContributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Roadmap
v2.0.0
- [ ] Batch processing (transcribe multiple files)
- [ ] YouTube URL support
- [ ] Word-level timestamps
- [ ] Custom vocabulary support
v3.0.0
- [ ] Real-time transcription
- [ ] Desktop app (Electron)
- [ ] Offline mode with local Whisper
License
MIT © Junaidh Haneefa
Links
Acknowledgments
- Built with Google Gemini AI
- Inspired by the need for free, privacy-focused transcription tools
Made with ❤️ by developers, for developers
