@kreisler/js-scraper

v1.0.0

Published

20 days ago

A powerful TypeScript/JavaScript web scraping library with JSDOM HTML parsing and Cloudflare bypass support

0High
0Medium
0Low

kreisler

scraper scraping web-scraper html-parser jsdom cloudscraper cloudflare typescript nodejs

@kreisler/js-scraper

A powerful TypeScript/JavaScript web scraping library with JSDOM HTML parsing and Cloudflare bypass support.

Features

🚀 Easy to use - Simple API for web scraping
🛡️ Cloudflare bypass - Built-in support for Cloudflare protection
📄 HTML Parsing - JSDOM-based DOM manipulation with jQuery-like syntax
🔄 Automatic retries - Configurable retry logic with exponential backoff
📊 Metadata extraction - Get response metadata (status, headers, content type, size, response time)
🎯 TypeScript support - Full TypeScript type definitions
⚡ Zero dependencies - Minimal footprint with essential libraries only

Installation

npm install @kreisler/js-scraper
# or
yarn add @kreisler/js-scraper
# or
pnpm add @kreisler/js-scraper

Quick Start

Basic Scraping

import { scrape } from '@kreisler/js-scraper'

const dom = await scrape('https://example.com')

// Extract elements
const title = dom.$('title')?.textContent
const paragraphs = dom.$$('p')

With Metadata

import { scrapeWithMetadata } from '@kreisler/js-scraper'

const response = await scrapeWithMetadata('https://example.com')

console.log(`Status: ${response.statusCode}`)
console.log(`Content-Type: ${response.contentType}`)
console.log(`Response Time: ${response.responseTime}ms`)
console.log(`Content Size: ${response.size} bytes`)

Advanced Usage

import { RequestService, jQuery } from '@kreisler/js-scraper'

// Custom options
const html = await RequestService.fetchData({
  url: 'https://example.com',
  method: 'GET',
  timeout: 30000,
  retries: 3,
  headers: {
    'Custom-Header': 'value'
  }
})

// Parse HTML
const dom = jQuery(html)

// DOM manipulation
const $ = dom.$  // querySelector
const $$ = dom.$$ // querySelectorAll
const title = dom.title
const document = dom.document

Utility Functions

import { getTexts, getAttrs, getText, getAttr } from '@kreisler/js-scraper'

const dom = await scrape('https://example.com')

// Get text from single element
const heading = getText(dom.$('h1'))

// Get attribute from single element
const link = getAttr(dom.$('a'), 'href')

// Get text from multiple elements
const paragraphs = getTexts(dom.$$('p'))

// Get attributes from multiple elements
const links = getAttrs(dom.$$('a'), 'href')

API Reference

`scrape(url, options?)`

Fetch and parse HTML content from a URL.

Parameters:

url (string) - The URL to scrape
options (object, optional)
- timeout (number) - Request timeout in ms (default: 30000)
- retries (number) - Number of retry attempts (default: 3)
- headers (object) - Custom headers

Returns: DOM object with jQuery-like syntax

`scrapeWithMetadata(url, options?)`

Fetch content with metadata.

Parameters:

url (string) - The URL to fetch
options (object, optional)
- timeout (number) - Request timeout in ms
- headers (object) - Custom headers

Returns: ParsedResponse object with:

content (string) - HTML content
statusCode (number) - HTTP status code
headers (object) - Response headers
contentType (string) - Content type
size (number) - Content size in bytes
responseTime (number) - Response time in ms

`ScraperService.fetchData(options)`

Low-level fetch with retry logic.

Parameters: FetchOptions

url (string)
method (string) - 'GET', 'POST', 'HEAD'
headers (object, optional)
timeout (number, optional)
retries (number, optional)

Returns: Promise - HTML content

`ScraperService.fetchWithMetadata(options)`

Fetch with full metadata.

Returns: Promise

`jQuery(html)`

Parse HTML string and return DOM object.

Returns: DOM object with:

$(selector) - querySelector
$$(selector) - querySelectorAll
document - JSDOM document
title - Page title

Utility Functions

getText(element) - Get text content from element
getAttr(element, attr) - Get attribute from element
getTexts(elements) - Get text from multiple elements
getAttrs(elements, attr) - Get attributes from multiple elements

Error Handling

import { scrape, FetchError } from '@kreisler/js-scraper'

try {
  const dom = await scrape('https://example.com')
} catch (error) {
  if (error instanceof FetchError) {
    console.error(`Failed to fetch ${error.url}: ${error.message}`)
    console.error(`Status: ${error.statusCode}`)
  }
}

Examples

Scrape Product Information

import { scrape, getTexts, getAttrs } from '@kreisler/js-scraper'

async function scrapeProducts(url) {
  const dom = await scrape(url)
  
  const products = dom.$$('.product-item')
  
  return Array.from(products).map(product => ({
    name: product.querySelector('.name')?.textContent,
    price: product.querySelector('.price')?.textContent,
    link: product.querySelector('a')?.getAttribute('href')
  }))
}

Scrape Data Table

import { scrape } from '@kreisler/js-scraper'

async function scrapeTable(url) {
  const dom = await scrape(url)
  
  const rows = dom.$$('table tbody tr')
  
  return Array.from(rows).map(row => {
    const cells = row.querySelectorAll('td')
    return Array.from(cells).map(cell => cell.textContent)
  })
}

Configuration

Disable SSL Certificate Verification (Development Only)

process.env.NODE_TLS_REJECT_UNAUTHORIZED = '0'

Custom User-Agent

import { scrape } from '@kreisler/js-scraper'

const dom = await scrape('https://example.com', {
  headers: {
    'User-Agent': 'Custom User Agent'
  }
})

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@kreisler/js-scraper

Features

Installation

Quick Start

Basic Scraping

With Metadata

Advanced Usage

Utility Functions

API Reference

scrape(url, options?)

scrapeWithMetadata(url, options?)

ScraperService.fetchData(options)

ScraperService.fetchWithMetadata(options)

jQuery(html)

Utility Functions

Error Handling

Examples

Scrape Product Information

Scrape Data Table

Configuration

Disable SSL Certificate Verification (Development Only)

Custom User-Agent

License

Contributing

`scrape(url, options?)`

`scrapeWithMetadata(url, options?)`

`ScraperService.fetchData(options)`

`ScraperService.fetchWithMetadata(options)`

`jQuery(html)`