amon-subtitles
v1.0.5
Published
## Description
Readme
YouTube Subtitles Fetcher
Description
This project provides a getSubtitles function to fetch subtitles from YouTube videos. You can specify preferred subtitle languages (default is ['vi', 'en', 'fr']) and the function will return subtitles for YouTube videos.
Installation
First, you need to install the required libraries in the project:
Axios: Used to make HTTP requests.
Lodash: Used to process arrays and objects.
HE: Used to decode special characters in subtitles.
Striptags: Used to strip HTML tags in subtitles.
Install the required libraries:
npm i amon-subtitlesImporting the Functions:
import { getSubtitles, getContentPage } from 'amon-subtitles'1. Fetching YouTube Subtitles
To fetch subtitles from a YouTube video, use the getSubtitles function. You can specify the videoID (the unique identifier of the YouTube video) and the preferredLangs (a list of languages you prefer for the subtitles). The function will return the subtitles for the video in the specified languages.
Usage
const data = await getSubtitles({
videoID: 'wLuZ0WMyr9U', // YouTube video ID
preferredLangs: ['vi'], // Preferred subtitle language(s)
});
Example response:
{
lang: 'vi', // Language of the subtitles
lines: [
{
start: '0:01',
dur: '0:05',
seconds: 1,
text: 'Welcome to the video!'
},
{
start: '0:06',
dur: '0:10',
seconds: 6,
text: 'Today, we will learn JavaScript programming.'
},
// Additional subtitle lines...
]
}
2. Fetching Content from a Website
You can use the getContentPage function to extract and clean the content from a website, even if the website has bot protection mechanisms (like captchas or JavaScript rendering). This function will remove HTML tags, scripts, and unnecessary content to return clean text from the page
Usage
const fetchDataWebsite = await getContentPage(url="https://www.base64decode.org/")Example response:
[
"Welcome to Base64 Decode and Encode",
"Our tool allows you to decode and encode data in Base64 format.",
// Other lines of clean content...
]Additional Information:
getSubtitles will return the subtitles in the specified languages. If the preferred language is unavailable, it will attempt to use the first available language. If no subtitles are found, it throws an error.
getContentPage is a robust method for scraping and extracting content, especially useful for bypassing bot protection like CAPTCHAs or JavaScript rendering. It uses Puppeteer to simulate a real user browsing experience, allowing you to get data even from websites that block traditional scrapers
