remotion-captioneer

v0.9.0

Published

3 months ago

Drop-in animated captions for Remotion. Audio to word-level synced subtitle components. Supports OpenAI, Groq, Deepgram, AssemblyAI.

0High
0Medium
0Low

sar333

remotion captions subtitles video whisper openai groq deepgram assemblyai speech-to-text animation programmatic-video react karaoke typewriter

🎬 remotion-captioneer

Drop-in animated captions for Remotion.

Feed it audio. Get word-level synced, beautifully animated captions. Four styles. Zero hassle.

🌐 Live Demo →

🤝 Works With `@remotion/captions`

Our types are fully compatible with the official @remotion/captions package. You can convert freely between them:

import { createTikTokStyleCaptions } from "@remotion/captions";
import { toCaptionArray, fromCaptionArray } from "remotion-captioneer";

// Convert our CaptionData → flat Caption[] for @remotion/captions
const flatCaptions = toCaptionArray(myCaptionData);
const { pages } = createTikTokStyleCaptions({
  captions: flatCaptions,
  combineTokensWithinMilliseconds: 1200,
});

// Or go the other way: Caption[] → CaptionData
const captionData = fromCaptionArray(flatCaptions);

| | @remotion/captions (official) | remotion-captioneer (this) | |---|---|---| | Caption types | ✅ Caption type | ✅ Compatible + CaptionData with segments | | Page segmentation | ✅ createTikTokStyleCaptions() | ❌ Use official package | | Animated components | ❌ Build yourself | ✅ 4 ready-to-use styles | | STT/transcription | ❌ Separate package | ✅ 5 providers built-in | | CLI tool | ❌ | ✅ npx captioneer process |

🎥 Caption Styles Preview

Word Highlight

Each word lights up as it's spoken with a scale animation.

"Hello world this is"
  dim  dim  GOLD  dim

Karaoke

Progressive color fill — left-to-right like karaoke.

"Hello world this is"
 RED   red  ░░░░  ░░░

Typewriter

Character-by-character reveal with blinking cursor.

┌─────────────────────┐
│ Hello world th|      │
└─────────────────────┘

Bounce

Active word bounces up with spring physics.

"Hello  world  this  is"
  ↓     ↑      ↓     ↓
       bounce!

👉 See them animated live at the demo page.

✨ Features

🎙️ 5 STT Providers — Local Whisper, OpenAI, Groq, Deepgram, AssemblyAI
🎨 14 Caption Styles — Word Highlight, Karaoke, Typewriter, Bounce, Wave, Glow, Erase, Pill, Flicker, Highlighter, Blur, Rainbow, Scale, Spotlight
🎭 24 Presets — TikTok, Instagram, YouTube, Podcast, Cinematic, Music, Tutorial, Minimal, Gaming, News, Education, Fun
🎵 Audio-Video Sync — Beat detection, volume-reactive animations, timeline keyframes
📦 Template System — Data-driven video generation from JSON config
🧱 Layout Primitives — Stack, Row, Columns, Grid, Center, FadeIn, SlideUp
📤 7 Export Formats — SRT, VTT, ASS, TXT, word-level SRT & VTT
⚡ Drop-in Components — <AnimatedCaptions> works out of the box
🔧 CLI Tool — process, batch, export, presets, providers, styles
📐 Zero Config — Works with sensible defaults, customizable everything
🔷 TypeScript — Full type definitions included
🐳 Docker Ready — Deploy rendering at scale

🚀 Quick Start

Option 1: Scaffold a Project

npx captioneer init my-video
cd my-video
npm install
npm start

This creates a ready-to-use Remotion project with captions.

Option 2: Add to Existing Project

1. Install

npm install remotion-captioneer

Option 2: Add to Existing Project

1. Install

npm install remotion-captioneer

2. Generate Captions from Audio

npx captioneer process my-audio.mp4

This creates my-audio-captions.json with word-level timestamps.

3. Use in Your Remotion Project

import { AbsoluteFill } from "remotion";
import { AnimatedCaptions } from "remotion-captioneer";
import captions from "./my-audio-captions.json";

export const MyVideo = () => {
  return (
    <AbsoluteFill style={{ backgroundColor: "#0a0a0a" }}>
      <AnimatedCaptions
        captions={captions}
        style="word-highlight"
        position="bottom"
        highlightColor="#FFD700"
      />
    </AbsoluteFill>
  );
};

That's it. Render with npx remotion render as usual.

🎨 Caption Styles

14 animated styles, each with a unique visual feel:

| Style | Effect | Best For | |-------|--------|----------| | word-highlight | Each word lights up with scale animation | Podcasts, interviews | | karaoke | Progressive left-to-right color fill | Music, singing | | typewriter | Character-by-character reveal + cursor | Tutorials, code demos | | bounce | Active word bounces with spring physics | Social media, reels | | wave | Words animate in a wave pattern | Music, rhythmic content | | glow | Neon glow pulsing on active word | Cinematic, dramatic | | typewriter-erase | Types then erases word-by-word | Transitions, reveals | | pill | Active word in a colored pill/badge | Clean, modern look | | flicker | Flickers in like a neon sign | Retro, neon aesthetic | | highlighter | Yellow highlighter behind active word | Study, educational | | blur | Future words blur, active word sharpens | Dramatic reveals | | rainbow | Cycling rainbow colors on active word | Fun, playful content | | scale | Words grow from small to full size | Energetic, bold | | spotlight | Radial spotlight effect behind active word | Theatrical, stage |

<AnimatedCaptions captions={captions} style="word-highlight" />
<AnimatedCaptions captions={captions} style="karaoke" />
<AnimatedCaptions captions={captions} style="typewriter" />
<AnimatedCaptions captions={captions} style="bounce" />
<AnimatedCaptions captions={captions} style="wave" />
<AnimatedCaptions captions={captions} style="glow" />
<AnimatedCaptions captions={captions} style="typewriter-erase" />
<AnimatedCaptions captions={captions} style="pill" />
<AnimatedCaptions captions={captions} style="flicker" />
<AnimatedCaptions captions={captions} style="highlighter" />
<AnimatedCaptions captions={captions} style="blur" />
<AnimatedCaptions captions={captions} style="rainbow" />
<AnimatedCaptions captions={captions} style="scale" />
<AnimatedCaptions captions={captions} style="spotlight" />

📡 STT Providers

Choose your speech-to-text backend. Supports 5 providers out of the box:

| Provider | Env Variable | Speed | Offline | Best For | |----------|-------------|-------|---------|----------| | Local Whisper | — | ⭐⭐ | ✅ | Privacy, no API costs | | OpenAI | OPENAI_API_KEY | ⭐⭐⭐ | ❌ | Best accuracy | | Groq | GROQ_API_KEY | ⭐⭐⭐⭐⭐ | ❌ | Ultra-fast inference | | Deepgram | DEEPGRAM_API_KEY | ⭐⭐⭐⭐ | ❌ | Real-time capable | | AssemblyAI | ASSEMBLYAI_API_KEY | ⭐⭐⭐ | ❌ | Rich features |

🎭 Caption Presets

Apply a professional look instantly with one of 16 built-in presets:

import { AnimatedCaptions, applyPreset } from "remotion-captioneer";

// Use a preset
<AnimatedCaptions
  captions={captions}
  {...applyPreset("tiktok")}
/>

// Or spread individual props
const tiktokStyle = applyPreset("cinematic-gold");
<AnimatedCaptions captions={captions} {...tiktokStyle} />

Available Presets

| Category | Presets | |----------|---------| | Social Media | tiktok, instagram-reels, youtube-shorts, twitter-clips | | Podcast | podcast-clean, podcast-bold | | Cinematic | cinematic-gold, cinematic-white, cinematic-neon | | Music | music-karaoke, music-wave | | Tutorial | tutorial-typewriter, tutorial-erase | | Minimal | minimal-white, minimal-subtle | | Gaming | gaming-neon, gaming-bold | | News & Documentary | news-ticker, documentary | | Education | education-highlighter, education-scale | | Fun & Creative | fun-rainbow, retro-flicker |

# List presets from CLI
npx captioneer presets

📤 Export Formats

Export captions to standard subtitle formats:

import { toSRT, toVTT, toASS, toPlainText } from "remotion-captioneer";

const srt = toSRT(captionData);       // SubRip (.srt)
const vtt = toVTT(captionData);       // WebVTT (.vtt)
const ass = toASS(captionData);       // SubStation Alpha (.ass)
const txt = toPlainText(captionData); // Plain text

// Word-level exports (for custom timing)
const srtWords = toWordLevelSRT(captionData);
const vttWords = toWordLevelVTT(captionData);

# Export from CLI
npx captioneer export captions.json --format srt
npx captioneer export captions.json --format vtt --output subtitles.vtt
npx captioneer export captions.json --format ass
npx captioneer export captions.json --format srt-words

Formats: srt, vtt, ass, txt, srt-words, vtt-words

Auto-Detection

The CLI auto-detects available providers from environment variables:

# Groq is fastest — set this first if you have a key
export GROQ_API_KEY="gsk_..."

# Or OpenAI
export OPENAI_API_KEY="sk-..."

# Then just run — it picks the best available
npx captioneer process audio.mp4

Explicit Provider

npx captioneer process audio.mp4 --provider groq
npx captioneer process audio.mp4 --provider openai --model whisper-1
npx captioneer process audio.mp4 --provider deepgram --model nova-2
npx captioneer process audio.mp4 --provider assemblyai
npx captioneer process audio.mp4 --provider local --model base

Check Provider Status

npx captioneer providers

📡 Available STT Providers:

  local           ✅ ready
                  models: tiny, base, small, medium, large

  groq            ✅ ready
                  models: whisper-large-v3, whisper-large-v3-turbo, distil-whisper-large-v3-en

  openai          ⚪ not configured
                  models: whisper-1

Programmatic Usage

import { GroqProvider, OpenAIProvider } from "remotion-captioneer";

// Groq — ultra-fast
const groq = new GroqProvider("gsk_...");
const captions = await groq.transcribe("audio.mp4", {
  model: "whisper-large-v3-turbo",
  language: "en",
});

// OpenAI
const openai = new OpenAIProvider("sk-...");
const captions = await openai.transcribe("audio.mp4");

// Auto-detect from env
import { detectProvider } from "remotion-captioneer";
const detected = detectProvider();
if (detected) {
  const captions = await detected.provider.transcribe("audio.mp4");
}

🎵 Audio-Video Sync

Frame-perfect animations synchronized to audio. No more manually timing keyframes.

Pre-analyze Audio

import { analyzeAudio } from "remotion-captioneer";

const analysis = await analyzeAudio("my-audio.mp4");
// Returns: beats, volumeFrames, bpm, energy levels

Beat-Reactive Hooks

import {
  AudioSyncProvider,
  useBeatPulse,
  useVolume,
  useEnergy,
} from "remotion-captioneer";

// Wrap your composition
const MyVideo = () => (
  <AudioSyncProvider analysis={audioAnalysis}>
    <BeatReactiveContent />
  </AudioSyncProvider>
);

// Use in any child component
const BeatReactiveContent = () => {
  const pulse = useBeatPulse();       // 0→1 spring on each beat
  const volume = useVolume();          // Current volume 0-1
  const energy = useEnergy();          // Smoothed energy 0-1

  return (
    <div style={{
      transform: `scale(${1 + pulse * 0.2})`,
      opacity: 0.5 + volume * 0.5,
    }}>
      🎵 Synced to the beat!
    </div>
  );
};

Timeline Keyframes

import { useTimelineValue, fadeInOut } from "remotion-captioneer";

// Map animation to audio timestamps (in ms)
const opacity = useTimelineValue({
  keyframes: [
    { timeMs: 0, value: 0 },
    { timeMs: 1000, value: 1, easing: "easeOut" },
    { timeMs: 5000, value: 1 },
    { timeMs: 6000, value: 0, easing: "easeIn" },
  ],
  defaultValue: 0,
});

// Or use the helper
const fadeOpacity = useTimelineValue(
  fadeInOut(0, 1000, 5000, 6000)
);

Available Hooks

| Hook | Returns | Use For | |------|---------|---------| | useVolume() | number (0-1) | Opacity, scale, size | | useBeat() | BeatInfo \| null | Flash effects, pulses | | useBeatPulse() | number (0-1 spring) | Bounce, scale on beat | | useEnergy() | number (0-1) | Background intensity | | useIsOnBeat() | boolean | Conditional rendering | | useTimelineValue() | number | Keyframe animations | | useTimelineProgress() | number (0-1) | Progress bars |

📦 Template System

Build videos from JSON config. No code needed for simple videos.

Quick Template

import { buildTemplate, TemplateComposition } from "remotion-captioneer";

const template = buildTemplate({
  name: "My Captioned Video",
  intro: {
    title: "Episode 1",
    subtitle: "Getting Started",
    logo: "/logo.png",
  },
  captions: [
    { captions: myCaptions, captionStyle: "word-highlight" },
  ],
  outro: {
    heading: "Thanks for watching!",
    cta: "Subscribe for more",
    logo: "/logo.png",
  },
});

// Use as Remotion composition
<TemplateComposition template={template} />

Preset Scenes

import {
  createIntroScene,
  createCaptionScene,
  createOutroScene,
  createDividerScene,
} from "remotion-captioneer";

const intro = createIntroScene({
  title: "My Video",
  subtitle: "A demo",
  durationSec: 3,
});

const content = createCaptionScene({
  captions: myCaptions,
  captionStyle: "karaoke",
  highlightColor: "#FF6B6B",
});

const outro = createOutroScene({
  heading: "The End",
  cta: "Like & Subscribe",
  logo: "/logo.png",
});

Design Tokens

Customize the entire look with a single config:

const template = buildTemplate({
  name: "Brand Video",
  tokens: {
    colors: {
      primary: "#6366F1",
      accent: "#FFD700",
      background: "#0a0a0a",
      text: "#FFFFFF",
    },
    typography: {
      headingFont: "Poppins, sans-serif",
      bodyFont: "Inter, sans-serif",
    },
  },
  // ...
});

🧱 Layout Primitives

Composable layout building blocks for any Remotion video:

import {
  Stack, Row, Columns, Grid,
  Center, FadeIn, SlideUp,
  GradientBg, Overlay, Positioned,
} from "remotion-captioneer";

// Vertical stack
<Stack gap={24}>
  <FadeIn delayMs={0}>Title</FadeIn>
  <FadeIn delayMs={200}>Subtitle</FadeIn>
</Stack>

// Horizontal columns
<Columns ratios={[2, 1]} gap={32}>
  <div>Main content</div>
  <div>Sidebar</div>
</Columns>

// Grid layout
<Grid columns={3} gap={16}>
  {items.map(item => <Card key={item.id} />)}
</Grid>

// Animated entrance
<SlideUp delayMs={500} durationMs={800}>
  <div>Slides up with delay</div>
</SlideUp>

// Gradient background
<GradientBg from="#0a0a0a" to="#1a1a2e">
  <Center>Content here</Center>
</GradientBg>

🎙️ CLI Reference

Process Audio

# Basic usage (auto-detects provider from env vars)
npx captioneer process audio.mp4

# Specify provider
npx captioneer process audio.mp4 --provider groq
npx captioneer process audio.mp4 --provider openai --model whisper-1

# With options
npx captioneer process audio.mp4 --provider groq --language en --output captions.json
npx captioneer process audio.mp4 --provider local --model base

# Pass API key directly
npx captioneer process audio.mp4 --provider groq --api-key gsk_...

Options:

-p, --provider <provider> — STT provider: local, openai, groq, deepgram, assemblyai
-m, --model <model> — Model name (provider-specific)
-k, --api-key <key> — API key (or use env vars)
-l, --language <lang> — Language code: en, es, fr, de, etc.
-o, --output <path> — Output JSON path
-v, --verbose — Verbose output

Other Commands

# Scaffold a new project
npx captioneer init my-video

# List available providers and their status
npx captioneer providers

# List available caption styles
npx captioneer styles

# List available presets
npx captioneer presets

# Export captions to SRT/VTT/ASS
npx captioneer export captions.json --format srt
npx captioneer export captions.json --format vtt --output subs.vtt

# Batch process a directory of audio files
npx captioneer batch ./audio-files/
npx captioneer batch ./audio-files/ --provider groq --output-dir ./captions/

# Start real-time preview server
npx captioneer preview

# Open Remotion Studio with demos
npx captioneer demo

📖 Caption Data Format

The generated JSON follows this structure:

interface CaptionData {
  segments: Array<{
    text: string;           // Full segment text
    startMs: number;        // Segment start time (ms)
    endMs: number;          // Segment end time (ms)
    words: Array<{
      word: string;         // Word text
      startMs: number;      // Word start time (ms)
      endMs: number;        // Word end time (ms)
      confidence: number;   // Whisper confidence (0-1)
    }>;
  }>;
  language: string;         // Detected language
  durationMs: number;       // Total duration (ms)
}

You can also create caption data manually or from other sources — just match this format.

⚙️ Configuration

Create a .captioneerrc file in your project root:

{
  "whisperPath": "./whisper.cpp",
  "modelPath": "./whisper.cpp/models/ggml-base.bin",
  "defaultModel": "base",
  "defaultLanguage": "en",
  "defaultStyle": "word-highlight"
}

Or add to your package.json:

{
  "captioneer": {
    "defaultModel": "base",
    "defaultLanguage": "en"
  }
}

🎬 Full Example

import {
  AbsoluteFill,
  Audio,
  Composition,
  staticFile,
} from "remotion";
import { AnimatedCaptions } from "remotion-captioneer";
import captions from "./captions.json";

export const CaptionedVideo = () => (
  <AbsoluteFill
    style={{
      background: "linear-gradient(135deg, #0a0a0a 0%, #1a1a2e 100%)",
    }}
  >
    <Audio src={staticFile("my-audio.mp4")} />
    <AnimatedCaptions
      captions={captions}
      style="karaoke"
      position="bottom"
      highlightColor="#FF6B6B"
      fontSize={64}
      fontFamily="Inter, sans-serif"
    />
  </AbsoluteFill>
);

export const RemotionRoot = () => (
  <Composition
    id="CaptionedVideo"
    component={CaptionedVideo}
    durationInFrames={900} // 30s at 30fps
    fps={30}
    width={1920}
    height={1080}
  />
);

🐳 Docker

FROM node:20-slim

# Install whisper.cpp dependencies
RUN apt-get update && apt-get install -y git cmake build-essential

WORKDIR /app
COPY . .
RUN npm install

# The CLI will auto-install whisper.cpp on first run
ENTRYPOINT ["npx", "captioneer"]

🛠️ Component Props

`<AnimatedCaptions>`

| Prop | Type | Default | Description | |------|------|---------|-------------| | captions | CaptionData | required | Caption data object | | style | CaptionStyle | "word-highlight" | Caption animation style | | fontFamily | string | "Inter, sans-serif" | Font family | | fontSize | number | 56 | Font size in px | | fontColor | string | "rgba(255,255,255,0.5)" | Inactive text color | | highlightColor | string | "#FFD700" | Active/highlight color | | position | "top" \| "center" \| "bottom" | "bottom" | Vertical position |

📚 Examples

See the examples/ directory for complete working examples:

| File | What it shows | |------|---------------| | 01-basic.tsx | Simplest captioned video | | 02-presets.tsx | Using presets (TikTok, Cinematic, Gaming) | | 03-audio-sync.tsx | Beat-reactive animations | | 04-template.tsx | Multi-scene template (intro → content → outro) | | 05-layouts.tsx | Custom layouts with primitives | | 06-export.ts | Export to SRT, VTT, ASS formats | | 07-emoji.tsx | Emoji reactions at word timestamps |

🗺️ Roadmap

✅ Completed

[x] 14 caption styles (word-highlight, karaoke, typewriter, bounce, wave, glow, typewriter-erase, pill, flicker, highlighter, blur, rainbow, scale, spotlight)
[x] Multi-line auto-wrapping with smart breaks (smartWrap())
[x] Word-level emoji reactions (EmojiReactions + autoGenerateReactions())
[x] Real-time preview server (npx captioneer preview)
[x] Batch processing mode (npx captioneer batch ./audio/)
[x] Multi-provider STT (OpenAI, Groq, Deepgram, AssemblyAI)
[x] @remotion/captions compatibility layer
[x] Audio-video sync (beat detection, volume hooks, timeline)
[x] Template system for data-driven videos
[x] Layout primitives (Stack, Row, Columns, Grid, etc.)
[x] 16 caption presets (TikTok, Instagram, Podcast, Cinematic, etc.)
[x] Export formats (SRT, VTT, ASS, TXT, word-level)

🔮 Future

[ ] Caption style marketplace (community-contributed styles)
[ ] AI-powered auto-emoji (LLM suggests emojis from context)
[ ] Multi-language caption support with RTL
[ ] Caption editor with visual timeline
[ ] Integration with video hosting APIs (YouTube, Vimeo)
[ ] Real-time caption rendering in browser (WebCodecs)
[ ] Caption translation utilities
[ ] Speaker diarization (multi-speaker support)

🤝 Contributing

Contributions welcome! Please open an issue first to discuss what you'd like to change.

Fork the repo
Create your feature branch (git checkout -b feature/amazing)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing)
Open a Pull Request

📄 License

💡 Why This Exists

Everyone using Remotion for captioned videos ends up rebuilding the same thing:

Get audio → run Whisper → parse output → sync to frames → animate words

This package handles steps 2-5 so you can focus on your content, not plumbing.

⭐ Star this repo if it helps you!

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

🎬 remotion-captioneer

🌐 Live Demo →

🤝 Works With @remotion/captions

🎥 Caption Styles Preview

Word Highlight

Karaoke

Typewriter

Bounce

✨ Features

🚀 Quick Start

Option 1: Scaffold a Project

Option 2: Add to Existing Project

1. Install

Option 2: Add to Existing Project

1. Install

2. Generate Captions from Audio

3. Use in Your Remotion Project

🎨 Caption Styles

📡 STT Providers

🎭 Caption Presets

Available Presets

📤 Export Formats

Auto-Detection

Explicit Provider

Check Provider Status

Programmatic Usage

🎵 Audio-Video Sync

Pre-analyze Audio

Beat-Reactive Hooks

Timeline Keyframes

Available Hooks

📦 Template System

Quick Template

Preset Scenes

Design Tokens

🧱 Layout Primitives

🎙️ CLI Reference

Process Audio

Other Commands

📖 Caption Data Format

⚙️ Configuration

🎬 Full Example

🐳 Docker

🛠️ Component Props

<AnimatedCaptions>

📚 Examples

🗺️ Roadmap

✅ Completed

🔮 Future

🤝 Contributing

📄 License

💡 Why This Exists

🤝 Works With `@remotion/captions`

`<AnimatedCaptions>`