@krasnoperov/transcribe
v1.0.5
Published
CLI tool for audio/video transcription with speaker diarization, AI summarization, and infographic generation
Downloads
573
Maintainers
Readme
Transcribe
AI transcription skill for Claude Code - Transform audio/video recordings into transcripts with speaker diarization, AI-powered summaries, and visual infographics.
This skill provides a complete pipeline for processing recordings:
- Transcription - Convert audio/video to VTT format with speaker identification (OpenAI Whisper)
- Summarization - Generate structured markdown summaries (OpenAI GPT-5.1)
- Infographics - Create visual summaries from text (Google Gemini)
- All-in-one - Process video → transcript → summary → infographic in one command
See skills/transcribe/SKILL.md for complete usage guide.
Use in Claude Code
This is a Claude Code skill. Install it from the marketplace:
/plugin marketplace add krasnoperov/claude-plugins
/plugin install transcribe@krasnoperov-pluginsOnce installed, use the /transcribe skill in your conversations:
/transcribe transcribe meeting.mp4 to VTT with speaker diarization
/transcribe summarize this transcript into key points
/transcribe create an infographic from this summaryCommand Line Usage
You can also use this package directly via npx:
export OPENAI_API_KEY="your-openai-key"
export GOOGLE_AI_STUDIO_KEY="your-google-key"
# Transcribe audio/video
npx -y @krasnoperov/transcribe@latest transcribe meeting.mp4 -o transcript.vtt
# Generate summary
npx -y @krasnoperov/transcribe@latest summarize transcript.vtt -o summary.md
# Create infographic
npx -y @krasnoperov/transcribe@latest infographic summary.md -o visual.png
# All-in-one pipeline
npx -y @krasnoperov/transcribe@latest process recording.mp4 --output-dir ./outputGet your API keys:
Core Operations
transcribe <input> Audio/Video → VTT transcript with speakers
summarize <input> Text/VTT → Markdown summary
infographic <input> Text → Visual infographic image
process <input> All-in-one: video → transcript → summary → infographicThese operations can be used individually or chained together.
Examples
See skills/transcribe/examples/ directory:
- 01-basic-workflow.sh - Step-by-step transcription pipeline
- 02-all-in-one.sh - Single command processing
Transcription with Speaker Diarization
npx -y @krasnoperov/transcribe@latest transcribe podcast.mp3 \
--language es \
--model gpt-4o-transcribe-diarize \
-o podcast.vttOutput (VTT with speaker tags):
WEBVTT
00:00:00.000 --> 00:00:02.450
<v A>Welcome to the podcast...
00:00:02.850 --> 00:00:08.200
<v B>Thanks for having me...Custom Summarization
npx -y @krasnoperov/transcribe@latest summarize transcript.vtt \
--prompt "Focus on action items and decisions" \
-o summary.mdStyled Infographic
npx -y @krasnoperov/transcribe@latest infographic summary.md \
--style "modern minimal corporate" \
-o infographic.pngOptions
Transcribe
--model <model> gpt-4o-transcribe-diarize (default), gpt-4o-transcribe, whisper-1
--language <lang> Language code (en, es, ru, de, etc.)
-o, --output <file> Output VTT fileSummarize
--prompt <text> Custom summarization instructions
-o, --output <file> Output markdown fileInfographic
--style <text> Style instructions for visual
--reference <image> Reference image for style
-o, --output <file> Output image fileProcess (All-in-one)
--output-dir <dir> Output directory for all files
--language <lang> Language for transcription
--model <model> Transcription model
--style <text> Style for infographicRequirements
- Node.js >= 18.0.0
- ffmpeg (for audio extraction)
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpegDevelopment
npm run build # Build TypeScript
npm run typecheck # Type checking
npm run test # Run tests
npm run dev # Dev mode with type strippingLicense
MIT License - Copyright (c) 2025 Aleksei Krasnoperov
See LICENSE file for details.
