@nadimtuhin/ytranscript
v1.2.5
Published
Fast YouTube transcript extraction with bulk processing, Google Takeout support, MCP server, and multiple output formats
Maintainers
Readme
ytranscript
Extract transcripts from your entire YouTube watch history in minutes. Build AI-powered video summaries, searchable archives, or feed transcripts directly to Claude, Cursor, and other AI assistants via the built-in MCP server.
Read the blog post: "Automating My Second Brain with YouTube Transcripts"
Why ytranscript?
- No API keys required - Uses YouTube's public innertube API directly
- Works with AI assistants - Built-in MCP server for Claude, Cursor, and others
- Bulk processing - Process thousands of videos from Google Takeout exports
- Resume-safe - Automatically skips already-processed videos
- Multiple formats - JSON, JSONL, CSV, SRT, VTT, plain text
Quick Start
# Get a transcript in 10 seconds
npx @nadimtuhin/ytranscript get dQw4w9WgXcQ
# Output: "We're no strangers to love, you know the rules..."Installation
# Global install (recommended for CLI usage)
npm install -g @nadimtuhin/ytranscript
# Or use with npx (no install)
npx @nadimtuhin/ytranscript get VIDEO_ID
# Add to a project (for library usage)
npm add @nadimtuhin/ytranscriptRuntimes supported: Node.js 18+ and Bun 1.0+
MCP Server (AI Assistant Integration)
ytranscript includes an MCP (Model Context Protocol) server that lets Claude, Cursor, and other AI assistants fetch YouTube transcripts directly.
Available Tools
| Tool | Description |
|------|-------------|
| get_transcript | Fetch transcript with format options (text, segments, srt, vtt) |
| get_transcript_languages | List available caption languages for a video |
| extract_video_id | Extract video ID from various YouTube URL formats |
| get_transcripts_bulk | Fetch transcripts for multiple videos at once |
Setup with Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"ytranscript": {
"command": "npx",
"args": ["-y", "@nadimtuhin/ytranscript", "mcp"]
}
}
}Or if installed globally:
{
"mcpServers": {
"ytranscript": {
"command": "ytranscript-mcp"
}
}
}Example Prompts for Claude
Once configured, you can ask Claude:
- "Get the transcript for this YouTube video: https://youtube.com/watch?v=dQw4w9WgXcQ"
- "Summarize the key points from this video"
- "What languages are available for this video's captions?"
- "Get transcripts for these 5 videos and compare their content"
CLI Usage
Single Video
# Basic usage (outputs plain text)
ytranscript get dQw4w9WgXcQ
# From URL
ytranscript get "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
# With specific language
ytranscript get dQw4w9WgXcQ --lang es
# Output as SRT subtitles
ytranscript get dQw4w9WgXcQ --format srt -o video.srt
# Output as JSON with timestamps
ytranscript get dQw4w9WgXcQ --format jsonCheck Available Languages
ytranscript info dQw4w9WgXcQ
# Output:
# en English (auto-generated)
# es Spanish
# fr FrenchBulk Processing
# From Google Takeout exports
ytranscript bulk \
--history "Takeout/YouTube/history/watch-history.json" \
--watch-later "Takeout/YouTube/playlists/Watch later-videos.csv" \
--out-jsonl transcripts.jsonl \
--out-csv transcripts.csv
# From a list of video IDs
ytranscript bulk --videos "dQw4w9WgXcQ,jNQXAC9IVRw,9bZkp7q19f0"
# From a file (one ID or URL per line)
ytranscript bulk --file videos.txt
# Resume a previous run (skips already-processed videos)
ytranscript bulk --history watch-history.json --resumeRate Limiting
YouTube may rate-limit requests. Use these flags to control pacing:
ytranscript bulk \
--history watch-history.json \
--concurrency 4 \ # Max concurrent requests (default: 4, safe: 1-8)
--pause-after 10 \ # Pause after N requests (default: 10)
--pause-ms 5000 # Pause duration in ms (default: 5000)Recommended for large batches: --concurrency 2 --pause-after 10 --pause-ms 5000
Proxy Support
Route requests through an HTTP proxy to avoid rate limiting or access from restricted networks:
# CLI with proxy
ytranscript get dQw4w9WgXcQ --proxy http://localhost:8080
# Bulk with proxy
ytranscript bulk --history watch-history.json --proxy http://user:[email protected]:8080
# With authentication
ytranscript get dQw4w9WgXcQ --proxy http://username:password@proxy:8080Programmatic usage:
import { fetchTranscript } from '@nadimtuhin/ytranscript';
const transcript = await fetchTranscript('dQw4w9WgXcQ', {
proxy: {
url: 'http://localhost:8080',
},
});Proxy support inspired by ytfetcher
Programmatic API
Fetch a Single Transcript
import { fetchTranscript } from '@nadimtuhin/ytranscript';
try {
const transcript = await fetchTranscript('dQw4w9WgXcQ', {
languages: ['en', 'es'], // Preference order
includeAutoGenerated: true,
});
console.log(transcript.text); // Full transcript text
console.log(transcript.segments); // Array of { text, start, duration }
console.log(transcript.language); // 'en'
console.log(transcript.isAutoGenerated); // true/false
} catch (error) {
// See "Error Handling" section below
console.error(error.message);
}Bulk Processing
import {
loadWatchHistory,
loadWatchLater,
mergeVideoSources,
processVideos,
} from '@nadimtuhin/ytranscript';
// Load from Google Takeout
const history = await loadWatchHistory('./watch-history.json');
const watchLater = await loadWatchLater('./watch-later.csv');
// Merge and deduplicate
const videos = mergeVideoSources(history, watchLater);
// Process with progress callback
const results = await processVideos(videos, {
concurrency: 4,
pauseAfter: 10,
pauseDuration: 5000,
onProgress: (completed, total, result) => {
const status = result.transcript ? 'OK' : 'FAIL';
console.log(`[${completed}/${total}] ${result.meta.videoId}: ${status}`);
},
});
// Filter successful results
const transcripts = results.filter((r) => r.transcript);Streaming for Large Datasets
import { streamVideos, appendJsonl } from '@nadimtuhin/ytranscript';
for await (const result of streamVideos(videos, { concurrency: 4 })) {
// Write each result immediately (resume-safe)
await appendJsonl(result, 'output.jsonl');
}Output Formatting
import { fetchTranscript, formatSrt, formatVtt, formatText } from '@nadimtuhin/ytranscript';
import { writeFile } from 'fs/promises';
const transcript = await fetchTranscript('dQw4w9WgXcQ');
// SRT subtitles
const srt = formatSrt(transcript);
await writeFile('video.srt', srt);
// VTT subtitles
const vtt = formatVtt(transcript);
await writeFile('video.vtt', vtt);
// Plain text with timestamps
const text = formatText(transcript, true);
// [0:00] First line of transcript
// [0:05] Second line...Error Handling
The library throws errors for various failure cases:
| Error Message | Cause | Solution |
|---------------|-------|----------|
| No captions available for this video | Video has no captions/subtitles | Check with ytranscript info first |
| No suitable caption track found | Requested language not available | Use includeAutoGenerated: true or different language |
| Caption track is empty | Captions exist but have no content | Rare; try a different language |
| HTTP 429 | Rate limited by YouTube | Reduce concurrency, add pauses |
| HTTP 403 | Video is private or region-locked | Cannot access this video |
try {
const transcript = await fetchTranscript(videoId);
} catch (error) {
if (error.message.includes('No captions available')) {
console.log('This video has no subtitles');
} else if (error.message.includes('429')) {
console.log('Rate limited - slow down requests');
}
}Limitations
| Scenario | Supported | |----------|-----------| | Public videos with captions | ✅ Yes | | Auto-generated captions | ✅ Yes | | Manual/community captions | ✅ Yes | | Private videos | ❌ No | | Age-restricted videos | ❌ No | | Live streams (while live) | ❌ No | | Premiere videos (before premiere) | ❌ No | | Region-locked videos | ❌ No (unless you're in the allowed region) |
Google Takeout
To export your YouTube data:
- Go to Google Takeout
- Deselect all, then select only "YouTube and YouTube Music"
- Click "All YouTube data included" and select:
- History → Watch history
- Playlists (includes Watch Later)
- Export and download
- Extract the archive
The relevant files are:
Takeout/YouTube and YouTube Music/history/watch-history.jsonTakeout/YouTube and YouTube Music/playlists/Watch later-videos.csv
API Reference
Types
interface Transcript {
videoId: string;
text: string;
segments: TranscriptSegment[];
language: string;
isAutoGenerated: boolean;
}
interface TranscriptSegment {
text: string;
start: number; // seconds
duration: number; // seconds
}
interface WatchHistoryMeta {
videoId: string;
title?: string;
url?: string;
channel?: { name?: string; url?: string };
watchedAt?: string;
source: 'history' | 'watch_later' | 'manual';
}
interface TranscriptResult {
meta: WatchHistoryMeta;
transcript: Transcript | null;
error?: string; // Present when transcript is null
}
interface FetchOptions {
languages?: string[]; // Default: ['en']
timeout?: number; // Default: 30000 (ms)
includeAutoGenerated?: boolean; // Default: true
proxy?: ProxyConfig; // Optional proxy configuration
}
interface ProxyConfig {
url: string; // HTTP proxy URL (e.g., "http://user:pass@host:port")
}
interface BulkOptions extends FetchOptions {
concurrency?: number; // Default: 4
pauseAfter?: number; // Default: 10
pauseDuration?: number; // Default: 5000 (ms)
skipIds?: Set<string>; // Videos to skip
onProgress?: (completed: number, total: number, result: TranscriptResult) => void;
}Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
- Report bugs via GitHub Issues
- Security issues: see SECURITY.md
License
MIT
