@ts-utilkit/webscraping
v0.2.0
Published
Web scraping utilities for HTML parsing, data extraction, and web content processing
Maintainers
Readme
@ts-utilkit/webscraping
web Scraping Functions - TypeScript utility functions for webscraping operations.
Installation
npm install @ts-utilkit/webscrapingFeatures
- 🚀 TypeScript-first with complete type definitions
- ✅ Comprehensive test coverage (>95%)
- 📦 Tree-shakeable ESM and CommonJS support
- 🔒 Type-safe with strict TypeScript configuration
- 📖 Extensive JSDoc documentation
Available Functions (16)
extractComments- Extracts HTML comments from contentextractEmails- Extracts email addresses from HTMLextractHeaders- Extracts header tags (h1-h6) from HTMLextractImages- Extracts all image sources from HTMLextractLinks- Extracts all hyperlinks from HTMLextractListItems- Extracts items from HTML lists (ul, ol)extractMetaTags- Extracts meta tags from HTMLextractPhoneNumbers- Extracts phone numbers from HTMLextractScripts- Extracts JavaScript from script tagsextractStructuredData- Parses JSON-LD structured dataextractStyles- Extracts CSS from style tagsextractTables- Extracts data from HTML tablesextractText- Extracts all text content from HTMLfetchHTML- Fetches HTML content from a URLparseJSON- Parses JSON with error handlingsanitizeHTML- Removes potentially dangerous HTML content
Quick Example
import { scrapeText } from '@ts-utilkit/webscraping';
// Extract text content from HTML
const html = '<div><p>Hello</p><p>World</p></div>';
const text = scrapeText(html); // 'Hello World'License
MIT © Mykyta Forofontov
Contributing
Contributions are welcome! Please see the main repository for contribution guidelines.
