tgstat-parser

v1.0.0

Published

a year ago

A parser for extracting Telegram channel data from TgStat website

0High
0Medium
0Low

kora-reinolds

telegram parser tgstat telega.in scraper

TgStat Parser

A modular JavaScript parser for extracting Telegram channel data from TgStat website.

Overview

This script extracts structured data from Telegram channel listings on TgStat website (telega.in), including:

Channel ID, name, and description
Subscriber count and engagement metrics
Price information with discounts
Subject/category
Avatar URL
Rating and other metrics

Usage

Browser Bookmarklet

Create a new bookmark in your browser
Name it "TgStat Parser"
In the URL field, paste this code:

javascript:(function(){const script=document.createElement('script');script.src='https://raw.githubusercontent.com/yourusername/tgstat-parser/main/tgstat_parser/browser-url-fetcher.js';document.body.appendChild(script);script.onload=function(){window.tgStatParser.showUI()}})();

Save the bookmark
When viewing any webpage, click the bookmark to run the parser UI
Enter a telega.in URL and click "Parse URL"

Command Line

Install the package:

npm install -g tgstat-parser

Parse a telega.in URL:

tgstat-parser https://telega.in/catalog/investments

Options:

-h, --help             Show help message
-s, --selector <sel>   CSS selector for channel elements (default: '.channels-item')
--no-json              Disable JSON export
--no-csv               Disable CSV export
--json-file <file>     JSON output filename (default: 'tgstat_channels.json')
--csv-file <file>      CSV output filename (default: 'tgstat_channels.csv')

Browser Console

Load the script directly in the browser console:

// Load the standalone script
const script = document.createElement('script');
script.src = 'https://raw.githubusercontent.com/yourusername/tgstat-parser/main/tgstat_parser/browser-url-fetcher.js';
document.body.appendChild(script);

// After script is loaded
script.onload = function() {
  // Show UI
  window.tgStatParser.showUI();
  
  // Or parse URL directly
  window.tgStatParser.parseUrl('https://telega.in/catalog/investments');
};

Node.js

const { extractFromUrl } = require('tgstat-parser/urlFetcher');
const { saveToJson } = require('tgstat-parser/exportUtils');

async function parseData() {
  const url = 'https://telega.in/catalog/investments';
  const channels = await extractFromUrl(url);
  
  // Save to file
  saveToJson(channels, 'channels.json');
  
  // Do something with the data
  console.log(`Parsed ${channels.length} channels`);
}

parseData().catch(console.error);

Features

Extract data from telega.in URLs directly
Works in both browser and Node.js environments
Export data to JSON and CSV formats
Handle CORS restrictions in browser with proxy
Modular architecture for easy maintenance
User-friendly UI in browser environment

File Structure

main.js - Main entry point for processing HTML already in the browser
urlFetcher.js - Module for fetching and parsing content from telega.in URLs
parser.js - Core parsing logic to process multiple channel elements
channelExtractor.js - Logic for extracting data from individual channel elements
domUtils.js - Utility functions for working with DOM elements
exportUtils.js - Functions for exporting data to JSON and CSV formats
cli.js - Command line interface for Node.js usage
browser-url-fetcher.js - Bundled version for browser usage with UI
package.json - Project configuration and dependencies

Output Format

The parser extracts data into this structure:

{
  channelId: "67387",
  channelName: "Павел Шумилов",
  description: "Авторский канал. Акции России, США, Китая...",
  subject: "Инвестиции",
  metrics: {
    subscribers: 18057,
    rating: 122.0,
    apr: 4082, 
    err: 22.6,
    cpv: 3.75
  },
  price: {
    original: 18000.0,
    discounted: 15300.0,
    discountPercent: 15,
    discountType: "sale"
  },
  avatarUrl: "/system/channels/avatars/000/067/387/original/eviNUoDO3Hw.jpg",
  channelAdded: "catalog"
}

Dependencies

Node.js >= 14.0.0
jsdom (for Node.js DOM parsing)
node-fetch (for Node.js HTTP requests)

License

MIT