@barbapapazes/video-toolkit
v0.13.0
Published
A simple and efficient CLI toolkit for video automation. It automates the process of extracting audio, generating transcriptions using OpenAI Whisper, and creating video thumbnails with text overlays.
Readme
@barbapapazes/video-toolkit
A simple and efficient CLI toolkit for video automation. It automates the process of extracting audio, generating transcriptions using OpenAI Whisper, and creating video thumbnails with text overlays.
Features
- Audio Extraction: Automatically extracts audio from video files using
ffmpeg. - AI Transcription: Generates high-quality SRT subtitles using OpenAI's
whisper-1model (defaulted to French). - Thumbnail Generation:
- Extracts 5 frames from the video (Start, 25%, 50%, 75%, and End).
- Overlays custom text using SVG templates with
sharpfor high-quality rendering. - Supports multiple SVG templates - choose different templates for different video series.
- Generates 5 distinct PNG thumbnails for you to choose from.
- Template Management: Easily add and manage SVG templates for different use cases.
- Automatic Cleanup: Removes temporary audio and image files after processing.
- Global Configuration: Use a global configuration file to set up your preferences once and use the tool from anywhere.
Prerequisites
- Node.js (v24 or later required)
- ffmpeg installed and available in your PATH
- OpenAI API Key
- macOS (this tool is designed for macOS only)
Font Configuration (macOS)
The toolkit uses fontconfig for font discovery when rendering SVG text. On macOS systems using Homebrew:
- The toolkit automatically sets
PANGOCAIRO_BACKEND=fontconfigfor proper font rendering - System fonts are automatically available for use
- Custom fonts can be placed in the templates directory or installed system-wide
If you encounter font-related issues, ensure fontconfig is properly configured on your system.
Installation
pnpm install @barbapapazes/video-toolkitConfiguration
The toolkit uses c12 for configuration loading. You can configure the tool using:
- Global configuration:
~/.video-toolkitrc(applies to all projects) - Local configuration:
video-toolkit.config.{ts,js,mjs,json}in your project directory
Global Configuration
Create a configuration file at ~/.video-toolkitrc:
{
"openaiApiKey": "your-api-key-here"
}Local Configuration
Create a video-toolkit.config.ts file in your project directory for project-specific settings:
import { defineConfig } from '@Barbapapazes/video-toolkit'
export default defineConfig({
language: 'en',
templatesDir: './templates'
})Configuration Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| openaiApiKey | string | - | Required. Your OpenAI API key |
| language | string | 'fr' | Language code for transcription (e.g., 'en', 'fr', 'es', 'de') |
| model | string | 'whisper-1' | OpenAI Whisper model to use |
| templatesDir | string | undefined | Directory where SVG templates are stored. Can be absolute or relative to current working directory. If not set, uses ~/.config/video-toolkit/templates/. |
Usage
You can run the toolkit directly using npx or by installing it globally.
Process a specific video
video-toolkit path/to/your-video.mp4Interactive selection
If you run the command without arguments, it will list all .mp4 files in the current directory and let you choose one:
video-toolkitAdd a template
Add an SVG template to your templates directory:
video-toolkit add-template path/to/your-template.svgYou'll be prompted to name the template (use lowercase letters, numbers, and hyphens, e.g., series-x, youtube-intro).
Workflow
- Select Video: Provide a path or choose from the list.
- Transcription: The tool extracts audio and sends it to OpenAI.
- Thumbnail Text: You will be prompted to enter text for the thumbnails.
- If you provide text, you'll be asked to select a template.
- 5 thumbnails will be generated with that text using the selected template.
- If you leave it empty, thumbnail generation will be skipped.
- Results: SRT subtitles and PNG thumbnails are saved in the same directory as the video.
SVG Templates
The toolkit uses SVG templates to render text overlays on thumbnails using the sharp library.
Template Location
Templates are stored in ~/.config/video-toolkit/templates/. Each template is a separate SVG file.
Template Naming Convention
Template names should use lowercase letters, numbers, and hyphens only:
- ✅
series-x.svg - ✅
youtube-intro.svg - ✅
podcast-2024.svg - ❌
Series X.svg - ❌
MyTemplate.svg
Creating a Template
Create an SVG file with a {{text}} placeholder where you want the text to appear:
<svg width="1920" height="1080" viewBox="0 0 1920 1080" xmlns="http://www.w3.org/2000/svg">
<!-- Semi-transparent overlay -->
<rect width="1920" height="1080" fill="rgba(0, 0, 0, 0.5)" />
<!-- Text placeholder -->
<text x="960" y="540" font-family="Arial, Helvetica, sans-serif"
font-size="80" font-weight="bold" fill="#ffffff"
text-anchor="middle" dominant-baseline="middle">
{{text}}
</text>
</svg>The {{text}} placeholder will be replaced with the actual text you provide during thumbnail generation. The SVG should match the dimensions of your video frames (typically 1920x1080 for HD video).
Using Custom Fonts
The toolkit uses fontconfig for font discovery, following Sharp's recommended approach. Fonts are stored as actual font files that fontconfig can discover, rather than being embedded in SVGs.
How It Works
When you first use a template, the toolkit:
- Checks if fonts referenced in the SVG are available
- Looks for font files in the template directory or system fonts
- If not found, downloads fonts from Google Fonts
- Saves downloaded fonts to
~/.config/video-toolkit/fonts/ - Configures fontconfig to discover fonts from this directory
- All future uses automatically find and use these fonts
Option 1: Local Font Files
Place font files (.ttf, .woff, or .woff2) in the same directory as your SVG template:
~/.config/video-toolkit/templates/
├── my-template.svg
├── SofiaSans-Light-300.ttf
├── SofiaSans-Medium-500.ttf
└── SofiaSans-Bold-700.ttfFont file naming conventions:
FontName.ttf→ default weight (400)FontName-300.ttforFontName-Light.ttf→ weight 300FontName-500.ttforFontName-Medium.ttf→ weight 500FontName-700.ttforFontName-Bold.ttf→ weight 700
Option 2: Google Fonts (Automatic)
If fonts aren't found locally, they're automatically downloaded from Google Fonts:
<text font-family="Sofia Sans" font-size="128" font-weight="500">
{{text}}
</text>What happens:
- First use: Downloads "Sofia Sans" weight 500 from Google Fonts
- Saves it to
~/.config/video-toolkit/fonts/SofiaSans-500.woff2 - Configures fontconfig to discover this directory
- Subsequent uses: fontconfig finds the font instantly (no download)
Benefits:
- ✅ One-time download - Fonts downloaded once, reused forever
- ✅ Proper font rendering - Uses fontconfig (Sharp's recommended approach)
- ✅ System-wide availability - Downloaded fonts work across all templates
- ✅ Works offline - After first download, no internet needed
- ✅ 1400+ font families from Google Fonts
- ✅ Automatic fontconfig setup - Creates configuration automatically
Technical Note:
The toolkit creates a fontconfig configuration at ~/.config/video-toolkit/fonts.conf that points to the fonts directory. On macOS, it sets PANGOCAIRO_BACKEND=fontconfig to ensure proper font discovery. This follows Sharp's documentation for optimal font rendering with SVGs.
Managing Multiple Templates
You can have multiple templates for different purposes:
series-x.svg- For your X video seriesseries-y.svg- For your Y video seriesyoutube-intro.svg- For YouTube introductionspodcast.svg- For podcast episodes
Use the add-template command to easily add new templates, and during thumbnail generation, you'll be prompted to select which template to use.
Output Files
For an input file my-video.mp4, the tool generates:
my-video.srt: The generated subtitles.my-video_thumbnail_first.png: Thumbnail from the first frame.my-video_thumbnail_25.png: Thumbnail at 25% duration.my-video_thumbnail_50.png: Thumbnail at 50% duration.my-video_thumbnail_75.png: Thumbnail at 75% duration.my-video_thumbnail_last.png: Thumbnail from the last frame.
License
MIT
