npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

@adamhancock/transcribe-cli

v1.0.4

Published

CLI tool for transcribing and summarizing MP4 recordings using Whisper and Ollama

Downloads

15

Readme

Transcribe CLI

A powerful TypeScript CLI tool for transcribing and summarizing MP4 recordings or VTT subtitle files using local models with synchronized audio and visual analysis.

Key Features

  • 🎙️ Timestamped Transcription: Full audio transcription with precise timestamps using Whisper
  • 📝 VTT File Support: Direct summarization of WebVTT subtitle files without transcription
  • 🎬 Flexible Input: Accepts MP4 videos, VTT subtitles, or both for combined analysis
  • 🧠 Intelligent Frame Selection: AI analyzes transcript to extract frames at meaningful moments with reasoning
  • 🖼️ Context-Aware Frame Analysis: Each frame is analyzed with awareness of previous frames
  • 🔄 Synchronized Analysis: Matches frame descriptions with relevant transcript segments
  • 📊 Map-Reduce Summarization: Uses map-reduce strategy for detailed, comprehensive summaries
  • 🎯 Smart Model Selection: Uses llava for visual analysis and llama3.2 for text summarization
  • Optimized Processing: Parallel audio/frame extraction, sequential frame analysis for context

Prerequisites

  • Node.js 18+
  • FFmpeg installed
    • macOS: brew install ffmpeg
    • Ubuntu/Debian: sudo apt update && sudo apt install ffmpeg
    • Windows: Download from ffmpeg.org
  • Python & pip (for Whisper)
    • macOS: brew install python (includes pip)
    • Ubuntu/Debian: sudo apt install python3-pip
    • Windows: Download from python.org (includes pip)
  • OpenAI Whisper - runs locally for transcription
    • Install: pip install openai-whisper
    • Note: First run may download model files (~140MB for base model)
  • Ollama - for summarization and frame analysis
    • Install: curl -fsSL https://ollama.com/install.sh | sh
    • Runs automatically as a background service (port 11434)
    • Default models (automatically pulled on first use):
      • llava - for visual frame analysis (multimodal)
      • llama3.2 - for text summarization (map-reduce strategy)

How it works

  1. Flexible Input Processing:

    • MP4 Files: Extracts audio, transcribes with Whisper, analyzes video frames
    • VTT Files: Parses WebVTT subtitles directly for immediate summarization
    • Combined: Use VTT for transcript + MP4 for visual frame analysis
  2. Audio Extraction & Transcription (for MP4):

    • Extracts audio and transcribes with Whisper
    • Produces timestamped segments (e.g., [1:30] Speaker says...)
    • Captures precise timing for synchronization with frames
  3. Intelligent Frame Selection (when video available):

    • AI analyzes transcript content to identify key moments
    • Selects frames at meaningful timestamps with reasoning
    • Shows why each moment was selected (e.g., "Introduction", "Key demonstration")
    • Validates timestamps against actual video duration
    • Falls back to content-based distribution if AI fails
  4. Context-Aware Frame Analysis:

    • Uses llava model for multimodal visual understanding
    • Analyzes frames sequentially to maintain narrative flow
    • Each frame analysis includes:
      • Visual description from the frame
      • Context from previous frame
      • Relevant transcript excerpt from that moment
  5. Map-Reduce Summarization:

    • Map Phase: Breaks transcript into chunks, extracts all details from each
    • Reduce Phase: Synthesizes comprehensive summary with multiple sections
    • Creates extremely detailed summaries including:
      • Meeting Overview: Purpose, participants, context
      • Topic Breakdown: Detailed analysis of each topic discussed
      • All Decisions: Every decision made with context
      • Action Items: Who, what, when for each task
      • Tools & Systems: All platforms and tools mentioned
      • Technical Details: Specifications and configurations
      • Problems & Solutions: Issues raised and resolutions
      • Next Steps: Future plans and follow-ups

Installation

Using npx (no installation required)

# Transcribe MP4 video
npx @adamhancock/transcribe-cli video.mp4

# Summarize VTT subtitle file
npx @adamhancock/transcribe-cli subtitles.vtt

# Combine VTT transcript with MP4 video analysis
npx @adamhancock/transcribe-cli subtitles.vtt video.mp4

Global installation

npm install -g @adamhancock/transcribe-cli

From source

git clone https://github.com/adamhancock/transcribe-cli.git
cd transcribe-cli
npm install
npm run build
npm link  # Makes 'transcribe' command available globally

Usage

Basic usage:

# Transcribe and summarize MP4 video
transcribe video.mp4

# Summarize VTT subtitle file
transcribe meeting.vtt

# Combine VTT transcript with MP4 video for frame analysis
transcribe meeting.vtt video.mp4

Options:

  • -o, --output <path> - Output file path (default: input_transcript.txt)
  • -a, --audio-only - Only extract audio without transcription
  • -t, --transcribe-only - Only transcribe without summarization
  • --host <host> - Ollama API host (default: http://localhost:11434)
  • --model <model> - Ollama model name for both frame and text analysis (overrides defaults)
  • --whisper-model <model> - Whisper model size (default: base)
  • --keep-audio - Keep extracted audio file after processing
  • --no-analyze-frames - Disable frame extraction and analysis (enabled by default)
  • --frame-interval <seconds> - Seconds between frame extraction (default: 3)
  • --max-frames <count> - Maximum number of frames to extract (default: 30)
  • --keep-frames - Keep extracted frames after processing
  • --save-timestamps - Save timestamped transcript and frame data to JSON file
  • --plain-transcript - Save transcript without timestamps in output file

Examples

# Basic MP4 transcription with intelligent frame analysis
transcribe recording.mp4

# Summarize a VTT subtitle file
transcribe meeting.vtt

# Combine VTT subtitles with video for visual analysis
transcribe meeting.vtt recording.mp4

# Quick transcription without analysis
transcribe recording.mp4 --transcribe-only

# Extract more frames for detailed videos
transcribe recording.mp4 --max-frames 50

# Use a faster Whisper model for quick results
transcribe recording.mp4 --whisper-model tiny

# Use a larger Whisper model for better accuracy
transcribe recording.mp4 --whisper-model large

# Save all artifacts for further processing
transcribe recording.mp4 --keep-audio --keep-frames --save-timestamps

# Get plain transcript without timestamps
transcribe recording.mp4 --plain-transcript

# Use different models for better summaries
transcribe meeting.vtt --model llama3.1  # More powerful model

# Process MP4 without frame analysis (audio only)
transcribe recording.mp4 --no-analyze-frames

# Custom output location
transcribe meeting.vtt -o ~/Documents/meeting-summary.txt

Performance Tips

  • Faster processing: Use --whisper-model tiny for quick drafts
  • Better accuracy: Use --whisper-model medium or large for important content
  • Reduce frames: Lower --max-frames for faster analysis of long videos
  • Vision models: llava is default; try bakllava or llava:13b for better accuracy

Output Format

The tool provides multiple outputs:

Console Output

  • Progress indicators for each processing stage
  • Frame extraction with reasoning (e.g., ✓ Frame 1 at 0:00 - Introduction)
  • Real-time analysis progress
  • Structured final summary

File Outputs

  1. Main transcript file (default: video_transcript.txt)

    • Timestamped transcript (e.g., [1:30] Speaker says...)
    • Comprehensive structured summary with sections
    • Use --plain-transcript for transcript without timestamps
  2. Timestamped data file (optional: video_timestamped.json)

    • Complete transcript with precise timestamps
    • Frame analyses with timestamps and descriptions
    • Frame extraction reasoning
    • Useful for creating subtitles, chapters, or navigating content

Example Output

Transcript of presentation.mp4:

[0:00] Welcome everyone to today's presentation on machine learning.
[0:05] I'll be covering three main topics today...
[0:12] First, let's understand what neural networks are...

---

Summary:

## Overview
This video presents an introduction to machine learning concepts...

## Main Topics Covered
1. **Neural Networks Basics**: The speaker explains...
2. **Training Process**: Demonstrates how models learn...
3. **Practical Applications**: Shows real-world examples...

## Key Points & Insights
- Neural networks mimic human brain structure
- Training requires large datasets and computational power
- Applications range from image recognition to natural language

## Visual Elements
- Slide presentations with diagrams
- Live coding demonstration
- Results visualization graphs

## Action Items or Next Steps
- Practice with the provided code examples
- Explore the recommended datasets
- Join the community forum for questions

Development

Run directly with tsx:

npm run dev -- video.mp4

Build:

npm run build

Publish to npm:

npm version patch  # or minor/major
npm publish