story-spider
v1.0.4
Published
A TypeScript library for scraping stories from various Vietnamese websites
Maintainers
Readme
Story Spider
Story Spider is a TypeScript library for scraping stories from popular Vietnamese story websites like truyenyy.vn, truyenfull.vn, etc. It features a modular architecture that makes it easy to add support for additional websites.
Features
- Collect story information (title, description, author, genre, status, total chapters)
- Get chapter lists with details
- Retrieve chapter URLs by chapter number
- Get chapter content in HTML or text format
- Intelligent content cleaning with html-to-text integration
- Advanced caching for reduced bandwidth and faster performance
- Rate limiting to avoid overloading servers
- Extensible adapter system for supporting multiple websites
Installation
npm install story-spiderUsage
Basic Usage
import { StorySpider, TruyenfullScraper } from 'story-spider';
// Create story spider
const storySpider = new StorySpider({
rateLimiterOptions: {
minTime: 1000,
maxConcurrent: 1,
}
});
// Register a scraper
storySpider.registerScraper(new TruyenfullScraper());
// Get story information
const storyInfo = await storySpider.scapeStoryInfo('https://truyenyy.vn/truyen/example-story/');
console.log(storyInfo);
// Get chapter list
const chapters = await storySpider.scapeChapterList('https://truyenyy.vn/truyen/example-story/');
console.log(chapters);
// Get chapter content
const chapterContent = await storySpider.scapeChapterContent(chapterUrl);
console.log(chapterContent);Creating a Custom Scraper
You can create your own scraper for any website by extending the StoryScraper class:
import { StoryScraper, StoryInfo, ChapterInfo } from 'story-spider';
export class CustomScraper extends StoryScraper {
getSiteIdentifier(): string {
// Implementation for getting site id
}
getSupportedDomains(): string[] {
// Implementation for getting supported domains
}
canHandle(url: string): boolean {
// Implementation for check if url can handle
}
async scapeStoryInfo(storyUrl: string): Promise<StoryInfo> {
// Implementation for getting story info
}
async scapeChapterList(storyUrl: string): Promise<ChapterInfo[]> {
// Implementation for getting chapter list
}
async scapeChapterContent(chapterUrl: string): Promise<string> {
// Implementation for getting chapter content
}
}
// Register the scraper
storySpider.registerScraper(new CustomScraper());Supported Websites
The base library provides infrastructure for scrapers. Specific website scrapers can be implemented separately or contributed to this project.
Dependencies
- html-to-text: For advanced HTML content cleaning
- axios: For network requests
- cheerio: For HTML parsing
- winston: For logging
- node-cache: For caching
License
ISC
