youtube-channel-video-scraper
v1.0.0
Published
A library to scrape video data from YouTube channels using Playwright.
Readme
YouTube Channel Video Scraper Library
A reusable TypeScript library for scraping latest video data and channel statistics from YouTube channels using Playwright.
Features
- Scrapes the latest ~10 non-Shorts videos from specified YouTube channel URLs.
- Extracts video details: Title, View Count, Upload Date (relative to absolute), Thumbnail URL, Video ID.
- Extracts channel details: Name, Subscriber Count, Total Video Count, Canonical Channel ID.
- Handles redirects from username URLs (e.g.,
/@username/videos) to canonical channel URLs (/channel/UC.../videos). - Uses Playwright for scraping, avoiding the need for official API keys.
- Configurable options:
- HTTP/S proxy support.
- Customizable timeouts for page operations.
- Adjustable concurrency level for scraping multiple channels.
- Returns data as a structured array of objects, including per-channel error details.
Disclaimers - Important!
- YouTube Terms of Service: Scraping YouTube programmatically might violate their Terms of Service. This library is provided for educational purposes only and is not intended for sustained use against YouTube's systems. Use this library responsibly and entirely at your own risk. The authors are not liable for any consequences, such as IP blocking or account suspension.
- Scraping Fragility: Web scraping relies on the target website's structure. YouTube frequently updates its layout and code. These changes will inevitably break this scraper without warning. This library requires ongoing maintenance to adapt. Do not use this in production environments.
Prerequisites
- Node.js
- pnpm
Usage (as a Library)
Install the package from npm:
pnpm add rebers/youtube-channel-video-scraper
# or npm install / yarn add
# Remember to install playwright browsers in your project too!
pnpm exec playwright install --with-deps chromiumThen, import and use the scrapeYouTubeChannels function:
import { scrapeYouTubeChannels, ChannelResult, ScraperOptions } from 'youtube-channel-video-scraper';
async function runScraper() {
const channelUrls = [
"https://www.youtube.com/@ChannelName1/videos",
"https://www.youtube.com/channel/UCxxxxxxxxxxxxxxxxxxxxxx/videos", // Example with channel ID
"https://www.youtube.com/@AnotherChannel/videos"
// Add more channel URLs
];
const options: ScraperOptions = {
// Optional: Specify proxy server
// proxyUrl: 'http://user:password@your-proxy-server:port',
// Optional: Set a general timeout for page navigation/waits (default 60000ms)
// timeout: 45000,
// Optional: Set concurrency level (how many channels to scrape in parallel, default 1)
concurrency: 3,
};
try {
console.log(`Starting scrape for ${channelUrls.length} channels...`);
const results: ChannelResult[] = await scrapeYouTubeChannels(channelUrls, options);
console.log("Scraping finished.");
results.forEach(result => {
if (result.error) {
// Handle channels that failed to scrape
console.error(`Error scraping ${result.inputUrl}: ${result.error}`);
} else {
// Process successful results
console.log(`Successfully scraped: ${result.name} (${result.channelId})`);
console.log(` Subscribers: ${result.subscriberCount}, Total Videos: ${result.totalVideos}`);
console.log(` Found ${result.videos.length} recent videos:`);
result.videos.forEach(video => {
console.log(` - [${video.videoId}] ${video.title} (${video.viewCount} views, Uploaded: ${video.uploadDate})`);
});
}
});
} catch (error) {
// Catch potential critical errors during the overall process setup
console.error("Critical scraping error:", error);
}
}
runScraper();
Configuration Options (ScraperOptions)
proxyUrl(string, optional): URL of an HTTP/S proxy server (e.g.,http://user:pass@host:port).timeout(number, optional): Maximum time in milliseconds for page navigations and waits (default: 60000).concurrency(number, optional): Number of channels to scrape in parallel (default: 1).
Returned Data Structure (ChannelResult[])
The function returns a Promise resolving to an array, where each element represents a channel processed:
interface VideoResult {
videoId: string; // YouTube video ID (e.g., "dQw4w9WgXcQ")
title: string; // Video title
viewCount: number | null;// Parsed view count (null if parsing fails)
uploadDate: string | null; // Estimated upload date as ISO string (null if parsing fails)
thumbnailUrl: string; // URL of the video thumbnail
}
interface ChannelResult {
inputUrl: string; // The original URL provided for this channel
channelId: string | null;// The canonical YouTube channel ID (UC...) (null if not found/error)
name: string | null; // Channel name (null if not found/error)
subscriberCount: number | null; // Parsed subscriber count (null if not found/parsing fails)
totalVideos: number | null; // Parsed total video count (null if not found/parsing fails)
videos: VideoResult[]; // Array of the latest non-Shorts video results
error?: string; // Error message if scraping failed for this specific channel
}Error Handling
- The main
scrapeYouTubeChannelsfunction should be wrapped intry...catchfor setup errors. - Individual channel scraping errors (e.g., network issues, parsing failures, blocked requests) are caught internally. The corresponding
ChannelResultobject for that channel will have theerrorfield populated with a message, and other fields might benull. Check theerrorfield for eachChannelResultin the returned array.
Development
- Build: Compile TypeScript to JavaScript in the
distdirectory:pnpm build - Dev Mode: Run the entry point (
src/index.ts) usingts-nodeand automatically restart on file changes usingnodemon:
(Note: The defaultpnpm devsrc/index.tsmight not perform any action when run directly unless modified for CLI testing).
Testing
Run the automated tests using Jest:
pnpm testTests are located in src/ alongside the code they test (e.g., index.test.ts). Ensure Playwright browsers are installed before running tests.
License
ISC
