scrapwave

v1.4.4

Published

8 months ago

A lightweight and powerful web scraping library for Node.js with TypeScript support.

0High
0Medium
0Low

brogrammer_uz

web scraping scraper crawler data extraction html parser typescript puppeteer

ScrapWave - Lightweight Web Scraping Library

ScrapWave is a lightweight and powerful web scraping library built on top of got and cheerio. It allows you to fetch, parse, and extract data from web pages with ease.

npm npm GitHub

🚀 Installation

npm install scrapwave

or using Yarn:

yarn add scrapwave

📖 Quick Start

import ScrapWave from "scrapwave";

(async () => {
  const scrapper = await ScrapWave.connect("https://example.com");
  console.log(scrapper.getTitle());
})();

🎯 Features

Fetch and parse web pages with ease.
Extract metadata, links, images, emails, and phone numbers.
Scrape table data and structured JSON-LD content.
Supports POST requests with form data.
Configurable request options (timeout, retries, headers, etc.).
Download images automatically.

📌 API Methods

`ScrapWave.connect(url: string): Promise<ScrapWave>`

Fetches the HTML content of the given URL and returns a ScrapWave instance.

`ScrapWave.post(url: string, data: Record<string, string>): Promise<ScrapWave>`

Sends a POST request with form data and returns a ScrapWave instance.

`scrapper.getTitle(): string`

Returns the title of the page.

`scrapper.getMetadata(): Metadata`

Extracts metadata (title, description, author, Open Graph & Twitter metadata).

`scrapper.getLinks(selector: string): string[]`

Extracts all links from the given selector.

`scrapper.imageSources(selector = "img"): string[]`

Extracts image URLs from the page.

`scrapper.extractEmails(): string[]`

Finds and returns email addresses from the page content.

`scrapper.extractPhones(): string[]`

Finds and returns phone numbers from the page content.

`scrapper.getJsonLD(): JsonLD`

Extracts JSON-LD structured data from the page.

`scrapper.getForms(): FormDetails[]`

Finds and extracts form details (action, method, input fields).

`scrapper.getTextList(selector: string): string[]`

Extracts a list of text content from elements like <li>.

`scrapper.outerHtml(selector: string): string`

Extracts the outer HTML of the given selector.

`scrapper.tableData(selector: string): TableData`

Extracts table data from the given selector.

`scrapper.text(selector: string): string`

Extracts the text content of the given selector.

`scrapper.html(selector: string): string`

Extracts the inner HTML of the given selector.

`scrapper.attr(selector: string, attribute: string): string | undefined`

Retrieves the value of a specific attribute from the given selector.

`scrapper.exists(selector: string): boolean`

Checks whether a specific selector exists on the page.

`scrapper.count(selector: string): number`

Counts the number of elements that match the given selector.

`scrapper.downloadImages(folder = "images"): Promise<void>`

Downloads all images from the page into the specified folder.

`scrapper.setRequestOptions(options: RequestOptions): void`

Allows customizing request settings such as timeout, retries, headers, etc.

⚙️ Customizing Request Options

ScrapWave.setRequestOptions({
  timeout: { request: 4000 }, // Set timeout to 4s
  retry: { limit: 3 }, // Allow up to 3 retries
});

🛠️ Usage Examples

Extracting Links

const scrapper = await ScrapWave.connect("https://example.com");
console.log(scrapper.getLinks("a"));

Scraping Table Data

const scrapper = await ScrapWave.connect("https://example.com");
console.log(scrapper.tableData("table"));

Downloading Images

const scrapper = await ScrapWave.connect("https://example.com");
await scrapper.downloadImages("downloads");

Extracting Emails & Phone Numbers

const scrapper = await ScrapWave.connect("https://example.com");
console.log("Emails:", scrapper.extractEmails());
console.log("Phones:", scrapper.extractPhones());

Extracting Text & HTML

const scrapper = await ScrapWave.connect("https://example.com");
console.log("Text:", scrapper.text("p"));
console.log("HTML:", scrapper.html("div"));
console.log("Outer HTML:", scrapper.outerHtml("div"));

Extracting Metadata

const scrapper = await ScrapWave.connect("https://example.com");
console.log(scrapper.getMetadata());

Extracting List Items

const scrapper = await ScrapWave.connect("https://example.com");
console.log(scrapper.getTextList("ul"));

Checking for Element Existence & Counting Elements

const scrapper = await ScrapWave.connect("https://example.com");
console.log("Exists:", scrapper.exists("h1"));
console.log("Count:", scrapper.count("li"));

📌 Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request.

📜 License

MIT License. Feel free to use and modify as needed.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ScrapWave - Lightweight Web Scraping Library

🚀 Installation

📖 Quick Start

🎯 Features

📌 API Methods

ScrapWave.connect(url: string): Promise<ScrapWave>

ScrapWave.post(url: string, data: Record<string, string>): Promise<ScrapWave>

scrapper.getTitle(): string

scrapper.getMetadata(): Metadata

scrapper.getLinks(selector: string): string[]

scrapper.imageSources(selector = "img"): string[]

scrapper.extractEmails(): string[]

scrapper.extractPhones(): string[]

scrapper.getJsonLD(): JsonLD

scrapper.getForms(): FormDetails[]

scrapper.getTextList(selector: string): string[]

scrapper.outerHtml(selector: string): string

scrapper.tableData(selector: string): TableData

scrapper.text(selector: string): string

scrapper.html(selector: string): string

scrapper.attr(selector: string, attribute: string): string | undefined

scrapper.exists(selector: string): boolean

scrapper.count(selector: string): number

scrapper.downloadImages(folder = "images"): Promise<void>

scrapper.setRequestOptions(options: RequestOptions): void