npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

transcribio

v1.0.3

Published

AI-powered audio transcription using Gemini - CLI & Web UI

Readme

🎙️ Transcribio

AI-powered audio transcription using Google's Gemini API. Runs locally on your machine with a beautiful CLI and web interface.

npm version license

Features

  • 🎯 High Accuracy - Powered by Gemini 2.0 Flash/Pro
  • 🗣️ Speaker Detection - Identifies different speakers
  • ⏱️ Timestamps - Navigation-friendly time markers
  • 🌍 50+ Languages - Auto-detection or manual selection
  • 📤 Multiple Exports - TXT, SRT, VTT, JSON
  • 💻 CLI & Web UI - Use your preferred interface
  • 🔒 Privacy First - Runs locally, audio goes directly to Gemini
  • 💸 Free - Uses Gemini's generous free tier

Installation

npm install -g transcribio

Quick Start

1. Get API Key

Get your free Gemini API key at Google AI Studio

2. Configure

transcribio config --set-key

3. Transcribe

# Using CLI
transcribio audio.mp3

# Or launch Web UI
transcribio ui

CLI Usage

Basic Transcription

# Simple transcription
transcribio interview.mp3

# With specific options
transcribio podcast.wav --speakers --timestamps --output srt

# Save to file
transcribio meeting.m4a -f transcript.txt

# Different model
transcribio audio.mp3 --model pro

Options

| Option | Description | Default | | ----------------------- | ------------------------------ | ------- | | -s, --speakers | Enable speaker detection | true | | -t, --timestamps | Include timestamps | true | | -l, --language <code> | Audio language (or 'auto') | auto | | -o, --output <format> | Format: txt, srt, vtt, json | txt | | -f, --file <path> | Save output to file | - | | --model <name> | flash (fast) or pro (accurate) | flash | | --translate <lang> | Translate to language | - |

Configuration Commands

# Set API key interactively
transcribio config --set-key

# Show current configuration
transcribio config --show

# Reset all settings
transcribio config --reset

Web Interface

Launch the web UI for a more visual experience:

transcribio ui

This opens a browser at http://localhost:3456 with:

  • Drag & drop file upload
  • Real-time progress
  • Multiple export formats
  • Beautiful formatted output

Custom port:

transcribio ui --port 8080

Supported Formats

Input Audio

  • MP3 (.mp3)
  • WAV (.wav)
  • M4A (.m4a)
  • OGG (.ogg)
  • FLAC (.flac)
  • AAC (.aac)
  • WebM (.webm)

Output Formats

TXT (Plain Text)

[00:00] Speaker 1: Hello, welcome to the podcast.
[00:05] Speaker 2: Thanks for having me!

SRT (SubRip Subtitle)

1
00:00:00,000 --> 00:00:05,000
[Speaker 1] Hello, welcome to the podcast.

2
00:00:05,000 --> 00:00:08,000
[Speaker 2] Thanks for having me!

VTT (WebVTT)

WEBVTT

1
00:00:00.000 --> 00:00:05.000
<v Speaker 1>Hello, welcome to the podcast.

2
00:00:05.000 --> 00:00:08.000
<v Speaker 2>Thanks for having me!

JSON

{
  "success": true,
  "language": "English",
  "languageCode": "en",
  "duration": "05:30",
  "segments": [
    {
      "timestamp": "00:00",
      "speaker": "Speaker 1",
      "text": "Hello, welcome to the podcast."
    }
  ],
  "fullText": "Complete transcript...",
  "summary": "Brief summary of the content"
}

Examples

Transcribe Interview

transcribio interview.mp3 --speakers --timestamps -f interview.txt

Create Subtitles

transcribio video-audio.wav --output srt -f subtitles.srt

Translate Content

transcribio spanish-audio.mp3 --translate english

High Accuracy Mode

transcribio important-meeting.m4a --model pro --output json -f meeting.json

API Usage (Programmatic)

Use Transcribio in your Node.js projects:

import { GeminiService, exportTranscript } from "transcribio";

// Initialize with API key
const gemini = new GeminiService("your-api-key");

// Transcribe audio
const result = await gemini.transcribe("audio.mp3", {
  speakers: true,
  timestamps: true,
  language: "auto",
  model: "flash",
});

// Export to different formats
const txt = exportTranscript(result, "txt");
const srt = exportTranscript(result, "srt");
const vtt = exportTranscript(result, "vtt");
const json = exportTranscript(result, "json");

console.log(result);

Free Tier Limits

Gemini's free tier is generous:

| Model | Daily Requests | Speed | Accuracy | | ----- | -------------- | ------ | --------- | | Flash | ~1,000/day | Fast | Good | | Pro | ~50/day | Slower | Excellent |

Perfect for personal use, podcasts, interviews, and more!

Privacy & Security

  • 🏠 Runs locally - No data stored on external servers
  • 🔑 API key encrypted - Stored securely on your machine
  • 🔒 Direct to Gemini - Audio sent only to Google's Gemini API
  • 📝 No tracking - Zero analytics or telemetry

Troubleshooting

API Key Issues

# Check if key is configured
transcribio config --show

# Reset and reconfigure
transcribio config --reset
transcribio config --set-key

File Size Issues

  • Files under 20MB: Sent inline (faster)
  • Files over 20MB: Use File API (slower but handles larger files)
  • Maximum: 100MB

Unsupported Format

Convert your audio file to a supported format:

# Using ffmpeg
ffmpeg -i input.mp4 -vn -acodec libmp3lame output.mp3

Development

Clone & Install

git clone https://github.com/junaidh-junu/transcribio.git
cd transcribio
npm install

Run Locally

# CLI
node bin/transcribio.js audio.mp3

# Web UI
node bin/transcribio.js ui

Run Tests

npm test

Lint Code

npm run lint

Project Structure

transcribio/
├── bin/
│   └── transcribio.js           # CLI entry point
├── src/
│   ├── cli/                     # CLI implementation
│   ├── core/                    # Gemini service
│   ├── exporters/               # Format exporters
│   ├── web/                     # Web UI & server
│   └── config/                  # Configuration management
├── tests/                       # Test files
└── package.json

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Roadmap

v2.0.0

  • [ ] Batch processing (transcribe multiple files)
  • [ ] YouTube URL support
  • [ ] Word-level timestamps
  • [ ] Custom vocabulary support

v3.0.0

  • [ ] Real-time transcription
  • [ ] Desktop app (Electron)
  • [ ] Offline mode with local Whisper

License

MIT © Junaidh Haneefa

Links

Acknowledgments

  • Built with Google Gemini AI
  • Inspired by the need for free, privacy-focused transcription tools

Made with ❤️ by developers, for developers