clearscrape

v1.1.0

Published

25 days ago

Official Node.js SDK for ClearScrape - Web Scraping API

Downloads

219

0High
0Medium
0Low

clearscrapex

clearscrape web-scraping scraper scraping-api proxy antibot playwright puppeteer

ClearScrape Node.js SDK

Official Node.js client for the ClearScrape web scraping API.

Features

Simple, promise-based API
Full TypeScript support
Automatic retries with exponential backoff
Support for all ClearScrape features:
- JavaScript rendering
- Premium residential proxies
- Antibot bypass
- Screenshots
- Domain-specific extractors (Amazon, Walmart, Google, etc.)
- Scraping Browser (Playwright/Puppeteer)
- Residential Proxy service

Installation

npm install clearscrape

yarn add clearscrape

pnpm add clearscrape

Quick Start

import { ClearScrape } from 'clearscrape';

const client = new ClearScrape({
  apiKey: process.env.CLEARSCRAPE_API_KEY
});

// Basic scrape
const result = await client.scrape({
  url: 'https://example.com'
});

console.log(result.data.html);

Usage Examples

Basic Scraping

// Simple HTML fetch
const result = await client.scrape({
  url: 'https://example.com'
});

// Get just the HTML
const html = await client.getHtml('https://example.com');

// Get just the text content
const text = await client.getText('https://example.com');

JavaScript Rendering

Enable JavaScript rendering for dynamic websites (SPAs, React, Vue, etc.):

const result = await client.scrape({
  url: 'https://example.com/spa-page',
  jsRender: true,
  waitFor: '.product-list',  // Wait for element
  wait: 3000                  // Additional wait time (ms)
});

Premium Proxies

Use residential proxies to avoid blocks and geo-target:

const result = await client.scrape({
  url: 'https://example.com',
  premiumProxy: true,
  proxyCountry: 'us'  // Target specific country
});

Antibot Bypass

Bypass Cloudflare, DataDome, PerimeterX and other bot protection:

const result = await client.scrape({
  url: 'https://protected-site.com',
  antibot: true,
  premiumProxy: true
});

Screenshots

Capture screenshots of web pages:

import fs from 'fs';

// Get base64 screenshot
const screenshot = await client.screenshot('https://example.com');

// Save to file
fs.writeFileSync('screenshot.png', Buffer.from(screenshot, 'base64'));

// Screenshot specific element
const result = await client.scrape({
  url: 'https://example.com',
  jsRender: true,
  screenshotSelector: '.product-card'
});

Domain Extractors

Extract structured data from supported websites:

import { AmazonProduct, GoogleSerpResult } from 'clearscrape';

// Amazon product data
const product = await client.extract<AmazonProduct>(
  'https://www.amazon.com/dp/B09V3KXJPB',
  'amazon'
);

console.log(product.title);      // "Apple AirPods Pro..."
console.log(product.price);      // "$249.00"
console.log(product.rating);     // "4.7"
console.log(product.reviewCount); // "125,432"

// Google SERP data
const serp = await client.extract<GoogleSerpResult>(
  'https://www.google.com/search?q=best+laptops',
  'google'
);

console.log(serp.organicResults[0].title);
console.log(serp.featuredSnippet);
console.log(serp.relatedSearches);

Supported domains:

amazon - Product pages
walmart - Product pages
google - Search results
google_shopping - Shopping results
ebay - Product pages
target - Product pages
etsy - Product pages
bestbuy - Product pages
homedepot - Product pages
zillow - Property listings
yelp - Business pages
indeed - Job listings
linkedin_jobs - Job listings

Scraping Browser (Playwright/Puppeteer)

Connect to cloud browsers with built-in antibot bypass:

// With Playwright
import { chromium } from 'playwright';

const browser = await chromium.connectOverCDP(
  client.getBrowserWsUrl()
);

const page = await browser.newPage();
await page.goto('https://example.com');

const title = await page.title();
await browser.close();

// With Puppeteer
import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
  browserWSEndpoint: client.getBrowserWsUrl()
});

const page = await browser.newPage();
await page.goto('https://example.com');

await browser.close();

// With country targeting
const wsUrl = client.getBrowserWsUrl({ proxyCountry: 'gb' });

Residential Proxies

Use ClearScrape proxies with any HTTP client:

// Get proxy configuration
const proxy = client.getProxyConfig();
// { host: 'proxy.clearscrape.io', port: 8000, username: '...', password: '...' }

// Get proxy URL string
const proxyUrl = client.getProxyUrl();
// 'http://apikey:[email protected]:8000'

// With country targeting
const proxyUrl = client.getProxyUrl({ country: 'us' });

// With session sticky IP
const proxyUrl = client.getProxyUrl({ session: 'my-session-123' });

// Combined
const proxyUrl = client.getProxyUrl({ country: 'us', session: 'abc' });

Use with axios:

import axios from 'axios';
import { HttpsProxyAgent } from 'https-proxy-agent';

const agent = new HttpsProxyAgent(client.getProxyUrl({ country: 'us' }));

const response = await axios.get('https://httpbin.org/ip', {
  httpsAgent: agent
});

Use with node-fetch:

import fetch from 'node-fetch';
import { HttpsProxyAgent } from 'https-proxy-agent';

const agent = new HttpsProxyAgent(client.getProxyUrl());

const response = await fetch('https://httpbin.org/ip', { agent });

Configuration

const client = new ClearScrape({
  // Required: Your API key
  apiKey: 'your-api-key',

  // Optional: Custom base URL (default: https://clearscrape.io/api)
  baseUrl: 'https://clearscrape.io/api',

  // Optional: Request timeout in ms (default: 60000)
  timeout: 60000,

  // Optional: Number of retries (default: 3)
  retries: 3
});

Error Handling

import {
  ClearScrape,
  ClearScrapeError,
  InsufficientCreditsError,
  RateLimitError
} from 'clearscrape';

try {
  const result = await client.scrape({ url: 'https://example.com' });
} catch (error) {
  if (error instanceof InsufficientCreditsError) {
    console.log(`Need ${error.required} credits`);
  } else if (error instanceof RateLimitError) {
    console.log('Rate limited, try again later');
  } else if (error instanceof ClearScrapeError) {
    console.log(`Error ${error.statusCode}: ${error.message}`);
  }
}

TypeScript

The SDK is written in TypeScript and includes full type definitions:

import {
  ClearScrape,
  ScrapeOptions,
  ScrapeResponse,
  AmazonProduct,
  GoogleSerpResult,
  DomainType
} from 'clearscrape';

// Full type safety
const options: ScrapeOptions = {
  url: 'https://example.com',
  jsRender: true,
  premiumProxy: true
};

const result: ScrapeResponse = await client.scrape(options);

API Reference

`ClearScrape`

Main client class.

Constructor

new ClearScrape(config: ClearScrapeConfig)

Methods

| Method | Description | |--------|-------------| | scrape(options) | Scrape a URL with full options | | getHtml(url, options?) | Get HTML content only | | getText(url, options?) | Get text content only | | screenshot(url, options?) | Capture screenshot | | extract(url, domain) | Extract structured data | | getProxyConfig(options?) | Get proxy configuration object | | getProxyUrl(options?) | Get proxy URL string | | getBrowserWsUrl(options?) | Get Scraping Browser WebSocket URL |

`ScrapeOptions`

| Option | Type | Description | |--------|------|-------------| | url | string | Target URL (required) | | method | string | HTTP method (default: GET) | | jsRender | boolean | Enable JS rendering (+5 credits) | | premiumProxy | boolean | Use residential proxy (+10 credits) | | antibot | boolean | Enable antibot bypass (+25 credits) | | proxyCountry | string | 2-letter country code | | waitFor | string | CSS selector to wait for | | wait | number | Wait time in ms | | autoScroll | boolean | Scroll to load content | | screenshot | boolean | Capture screenshot | | screenshotSelector | string | Screenshot specific element | | headers | object | Custom HTTP headers | | body | string\|object | Request body | | domain | DomainType | Domain extractor |

Credits

| Feature | Cost | |---------|------| | Base request | 1 credit | | + JavaScript rendering | +5 credits | | + Premium proxy | +10 credits | | + Antibot bypass | +25 credits | | Domain API extraction | 25 credits |

Support

License

MIT