@nadimtuhin/ytranscript

v1.2.5

Published

14 days ago

Fast YouTube transcript extraction with bulk processing, Google Takeout support, MCP server, and multiple output formats

0High
0Medium
0Low

nadimtuhin

youtube transcript captions subtitles bulk google-takeout cli mcp model-context-protocol ai bun

ytranscript

Extract transcripts from your entire YouTube watch history in minutes. Build AI-powered video summaries, searchable archives, or feed transcripts directly to Claude, Cursor, and other AI assistants via the built-in MCP server.

Read the blog post: "Automating My Second Brain with YouTube Transcripts"

Why ytranscript?

No API keys required - Uses YouTube's public innertube API directly
Works with AI assistants - Built-in MCP server for Claude, Cursor, and others
Bulk processing - Process thousands of videos from Google Takeout exports
Resume-safe - Automatically skips already-processed videos
Multiple formats - JSON, JSONL, CSV, SRT, VTT, plain text

Quick Start

# Get a transcript in 10 seconds
npx @nadimtuhin/ytranscript get dQw4w9WgXcQ

# Output: "We're no strangers to love, you know the rules..."

Installation

# Global install (recommended for CLI usage)
npm install -g @nadimtuhin/ytranscript

# Or use with npx (no install)
npx @nadimtuhin/ytranscript get VIDEO_ID

# Add to a project (for library usage)
npm add @nadimtuhin/ytranscript

Runtimes supported: Node.js 18+ and Bun 1.0+

MCP Server (AI Assistant Integration)

ytranscript includes an MCP (Model Context Protocol) server that lets Claude, Cursor, and other AI assistants fetch YouTube transcripts directly.

Available Tools

| Tool | Description | |------|-------------| | get_transcript | Fetch transcript with format options (text, segments, srt, vtt) | | get_transcript_languages | List available caption languages for a video | | extract_video_id | Extract video ID from various YouTube URL formats | | get_transcripts_bulk | Fetch transcripts for multiple videos at once |

Setup with Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "ytranscript": {
      "command": "npx",
      "args": ["-y", "@nadimtuhin/ytranscript", "mcp"]
    }
  }
}

Or if installed globally:

{
  "mcpServers": {
    "ytranscript": {
      "command": "ytranscript-mcp"
    }
  }
}

Example Prompts for Claude

Once configured, you can ask Claude:

"Get the transcript for this YouTube video: https://youtube.com/watch?v=dQw4w9WgXcQ"
"Summarize the key points from this video"
"What languages are available for this video's captions?"
"Get transcripts for these 5 videos and compare their content"

CLI Usage

Single Video

# Basic usage (outputs plain text)
ytranscript get dQw4w9WgXcQ

# From URL
ytranscript get "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# With specific language
ytranscript get dQw4w9WgXcQ --lang es

# Output as SRT subtitles
ytranscript get dQw4w9WgXcQ --format srt -o video.srt

# Output as JSON with timestamps
ytranscript get dQw4w9WgXcQ --format json

Check Available Languages

ytranscript info dQw4w9WgXcQ
# Output:
#   en     English (auto-generated)
#   es     Spanish
#   fr     French

Bulk Processing

# From Google Takeout exports
ytranscript bulk \
  --history "Takeout/YouTube/history/watch-history.json" \
  --watch-later "Takeout/YouTube/playlists/Watch later-videos.csv" \
  --out-jsonl transcripts.jsonl \
  --out-csv transcripts.csv

# From a list of video IDs
ytranscript bulk --videos "dQw4w9WgXcQ,jNQXAC9IVRw,9bZkp7q19f0"

# From a file (one ID or URL per line)
ytranscript bulk --file videos.txt

# Resume a previous run (skips already-processed videos)
ytranscript bulk --history watch-history.json --resume

Rate Limiting

YouTube may rate-limit requests. Use these flags to control pacing:

ytranscript bulk \
  --history watch-history.json \
  --concurrency 4 \      # Max concurrent requests (default: 4, safe: 1-8)
  --pause-after 10 \     # Pause after N requests (default: 10)
  --pause-ms 5000        # Pause duration in ms (default: 5000)

Recommended for large batches: --concurrency 2 --pause-after 10 --pause-ms 5000

Proxy Support

Route requests through an HTTP proxy to avoid rate limiting or access from restricted networks:

# CLI with proxy
ytranscript get dQw4w9WgXcQ --proxy http://localhost:8080

# Bulk with proxy
ytranscript bulk --history watch-history.json --proxy http://user:[email protected]:8080

# With authentication
ytranscript get dQw4w9WgXcQ --proxy http://username:password@proxy:8080

Programmatic usage:

import { fetchTranscript } from '@nadimtuhin/ytranscript';

const transcript = await fetchTranscript('dQw4w9WgXcQ', {
  proxy: {
    url: 'http://localhost:8080',
  },
});

Proxy support inspired by ytfetcher

Programmatic API

Fetch a Single Transcript

import { fetchTranscript } from '@nadimtuhin/ytranscript';

try {
  const transcript = await fetchTranscript('dQw4w9WgXcQ', {
    languages: ['en', 'es'], // Preference order
    includeAutoGenerated: true,
  });

  console.log(transcript.text);           // Full transcript text
  console.log(transcript.segments);       // Array of { text, start, duration }
  console.log(transcript.language);       // 'en'
  console.log(transcript.isAutoGenerated); // true/false
} catch (error) {
  // See "Error Handling" section below
  console.error(error.message);
}

Bulk Processing

import {
  loadWatchHistory,
  loadWatchLater,
  mergeVideoSources,
  processVideos,
} from '@nadimtuhin/ytranscript';

// Load from Google Takeout
const history = await loadWatchHistory('./watch-history.json');
const watchLater = await loadWatchLater('./watch-later.csv');

// Merge and deduplicate
const videos = mergeVideoSources(history, watchLater);

// Process with progress callback
const results = await processVideos(videos, {
  concurrency: 4,
  pauseAfter: 10,
  pauseDuration: 5000,
  onProgress: (completed, total, result) => {
    const status = result.transcript ? 'OK' : 'FAIL';
    console.log(`[${completed}/${total}] ${result.meta.videoId}: ${status}`);
  },
});

// Filter successful results
const transcripts = results.filter((r) => r.transcript);

Streaming for Large Datasets

import { streamVideos, appendJsonl } from '@nadimtuhin/ytranscript';

for await (const result of streamVideos(videos, { concurrency: 4 })) {
  // Write each result immediately (resume-safe)
  await appendJsonl(result, 'output.jsonl');
}

Output Formatting

import { fetchTranscript, formatSrt, formatVtt, formatText } from '@nadimtuhin/ytranscript';
import { writeFile } from 'fs/promises';

const transcript = await fetchTranscript('dQw4w9WgXcQ');

// SRT subtitles
const srt = formatSrt(transcript);
await writeFile('video.srt', srt);

// VTT subtitles
const vtt = formatVtt(transcript);
await writeFile('video.vtt', vtt);

// Plain text with timestamps
const text = formatText(transcript, true);
// [0:00] First line of transcript
// [0:05] Second line...

Error Handling

The library throws errors for various failure cases:

| Error Message | Cause | Solution | |---------------|-------|----------| | No captions available for this video | Video has no captions/subtitles | Check with ytranscript info first | | No suitable caption track found | Requested language not available | Use includeAutoGenerated: true or different language | | Caption track is empty | Captions exist but have no content | Rare; try a different language | | HTTP 429 | Rate limited by YouTube | Reduce concurrency, add pauses | | HTTP 403 | Video is private or region-locked | Cannot access this video |

try {
  const transcript = await fetchTranscript(videoId);
} catch (error) {
  if (error.message.includes('No captions available')) {
    console.log('This video has no subtitles');
  } else if (error.message.includes('429')) {
    console.log('Rate limited - slow down requests');
  }
}

Limitations

| Scenario | Supported | |----------|-----------| | Public videos with captions | ✅ Yes | | Auto-generated captions | ✅ Yes | | Manual/community captions | ✅ Yes | | Private videos | ❌ No | | Age-restricted videos | ❌ No | | Live streams (while live) | ❌ No | | Premiere videos (before premiere) | ❌ No | | Region-locked videos | ❌ No (unless you're in the allowed region) |

Google Takeout

To export your YouTube data:

Go to Google Takeout
Deselect all, then select only "YouTube and YouTube Music"
Click "All YouTube data included" and select:
- History → Watch history
- Playlists (includes Watch Later)
Export and download
Extract the archive

The relevant files are:

Takeout/YouTube and YouTube Music/history/watch-history.json
Takeout/YouTube and YouTube Music/playlists/Watch later-videos.csv

API Reference

Types

interface Transcript {
  videoId: string;
  text: string;
  segments: TranscriptSegment[];
  language: string;
  isAutoGenerated: boolean;
}

interface TranscriptSegment {
  text: string;
  start: number;    // seconds
  duration: number; // seconds
}

interface WatchHistoryMeta {
  videoId: string;
  title?: string;
  url?: string;
  channel?: { name?: string; url?: string };
  watchedAt?: string;
  source: 'history' | 'watch_later' | 'manual';
}

interface TranscriptResult {
  meta: WatchHistoryMeta;
  transcript: Transcript | null;
  error?: string;  // Present when transcript is null
}

interface FetchOptions {
  languages?: string[];          // Default: ['en']
  timeout?: number;              // Default: 30000 (ms)
  includeAutoGenerated?: boolean; // Default: true
  proxy?: ProxyConfig;           // Optional proxy configuration
}

interface ProxyConfig {
  url: string;        // HTTP proxy URL (e.g., "http://user:pass@host:port")
}

interface BulkOptions extends FetchOptions {
  concurrency?: number;    // Default: 4
  pauseAfter?: number;     // Default: 10
  pauseDuration?: number;  // Default: 5000 (ms)
  skipIds?: Set<string>;   // Videos to skip
  onProgress?: (completed: number, total: number, result: TranscriptResult) => void;
}

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Report bugs via GitHub Issues
Security issues: see SECURITY.md

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ytranscript

Why ytranscript?

Quick Start

Installation

MCP Server (AI Assistant Integration)

Available Tools

Setup with Claude Desktop

Example Prompts for Claude

CLI Usage

Single Video

Check Available Languages

Bulk Processing

Rate Limiting

Proxy Support

Programmatic API

Fetch a Single Transcript

Bulk Processing

Streaming for Large Datasets

Output Formatting

Error Handling

Limitations

Google Takeout

API Reference

Types

Contributing

License