tgstat-parser
v1.0.0
Published
A parser for extracting Telegram channel data from TgStat website
Maintainers
Readme
TgStat Parser
A modular JavaScript parser for extracting Telegram channel data from TgStat website.
Overview
This script extracts structured data from Telegram channel listings on TgStat website (telega.in), including:
- Channel ID, name, and description
- Subscriber count and engagement metrics
- Price information with discounts
- Subject/category
- Avatar URL
- Rating and other metrics
Usage
Browser Bookmarklet
- Create a new bookmark in your browser
- Name it "TgStat Parser"
- In the URL field, paste this code:
javascript:(function(){const script=document.createElement('script');script.src='https://raw.githubusercontent.com/yourusername/tgstat-parser/main/tgstat_parser/browser-url-fetcher.js';document.body.appendChild(script);script.onload=function(){window.tgStatParser.showUI()}})();- Save the bookmark
- When viewing any webpage, click the bookmark to run the parser UI
- Enter a telega.in URL and click "Parse URL"
Command Line
Install the package:
npm install -g tgstat-parserParse a telega.in URL:
tgstat-parser https://telega.in/catalog/investmentsOptions:
-h, --help Show help message
-s, --selector <sel> CSS selector for channel elements (default: '.channels-item')
--no-json Disable JSON export
--no-csv Disable CSV export
--json-file <file> JSON output filename (default: 'tgstat_channels.json')
--csv-file <file> CSV output filename (default: 'tgstat_channels.csv')Browser Console
Load the script directly in the browser console:
// Load the standalone script
const script = document.createElement('script');
script.src = 'https://raw.githubusercontent.com/yourusername/tgstat-parser/main/tgstat_parser/browser-url-fetcher.js';
document.body.appendChild(script);
// After script is loaded
script.onload = function() {
// Show UI
window.tgStatParser.showUI();
// Or parse URL directly
window.tgStatParser.parseUrl('https://telega.in/catalog/investments');
};Node.js
const { extractFromUrl } = require('tgstat-parser/urlFetcher');
const { saveToJson } = require('tgstat-parser/exportUtils');
async function parseData() {
const url = 'https://telega.in/catalog/investments';
const channels = await extractFromUrl(url);
// Save to file
saveToJson(channels, 'channels.json');
// Do something with the data
console.log(`Parsed ${channels.length} channels`);
}
parseData().catch(console.error);Features
- Extract data from telega.in URLs directly
- Works in both browser and Node.js environments
- Export data to JSON and CSV formats
- Handle CORS restrictions in browser with proxy
- Modular architecture for easy maintenance
- User-friendly UI in browser environment
File Structure
main.js- Main entry point for processing HTML already in the browserurlFetcher.js- Module for fetching and parsing content from telega.in URLsparser.js- Core parsing logic to process multiple channel elementschannelExtractor.js- Logic for extracting data from individual channel elementsdomUtils.js- Utility functions for working with DOM elementsexportUtils.js- Functions for exporting data to JSON and CSV formatscli.js- Command line interface for Node.js usagebrowser-url-fetcher.js- Bundled version for browser usage with UIpackage.json- Project configuration and dependencies
Output Format
The parser extracts data into this structure:
{
channelId: "67387",
channelName: "Павел Шумилов",
description: "Авторский канал. Акции России, США, Китая...",
subject: "Инвестиции",
metrics: {
subscribers: 18057,
rating: 122.0,
apr: 4082,
err: 22.6,
cpv: 3.75
},
price: {
original: 18000.0,
discounted: 15300.0,
discountPercent: 15,
discountType: "sale"
},
avatarUrl: "/system/channels/avatars/000/067/387/original/eviNUoDO3Hw.jpg",
channelAdded: "catalog"
}Dependencies
- Node.js >= 14.0.0
- jsdom (for Node.js DOM parsing)
- node-fetch (for Node.js HTTP requests)
License
MIT
