@scrapeless-ai/sdk

v1.11.0

Published

4 months ago

Node SDK for Scrapeless AI

Downloads

10,358

0High
0Medium
0Low

scrapeless_team

scrapeless scraping browser automation webscraping webcrawling captcha serp proxy

Scrapeless Node SDK

The official Node.js SDK for Scrapeless AI - End-to-End Data Infrastructure for AI Developers & Enterprises.

📑 Table of Contents

🌟 Features

Browser: Advanced browser session management supporting Playwright and Puppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows.
Universal Scraping API: web interaction and data extraction with full browser capabilities. Execute JavaScript rendering, simulate user interactions (clicks, scrolls), bypass anti-scraping measures, and export structured data in formats.
Crawl: Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.
Scraping API: Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors.
Deep SerpApi: Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates.
Proxies: Geo-targeted proxy network with 195+ countries. Optimize requests for better success rates and regional data access.
Actor: Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management.
Storage Solutions: Scalable data storage solutions for crawled content, supporting seamless integration with cloud services and databases.
TypeScript Support: Full TypeScript definitions for better development experience

📦 Installation

Install the SDK using npm:

npm install @scrapeless-ai/sdk

Or using yarn:

yarn add @scrapeless-ai/sdk

Or using pnpm:

pnpm add @scrapeless-ai/sdk

🚀 Quick Start

Prerequisite

Basic Setup

import { Scrapeless } from '@scrapeless-ai/sdk';

// Initialize the client
const client = new Scrapeless({
  apiKey: 'your-api-key' // Get your API key from https://scrapeless.com
});

Environment Variables

You can also configure the SDK using environment variables:

# Required
SCRAPELESS_API_KEY=your-api-key

# Optional - Custom API endpoints
SCRAPELESS_BASE_API_URL=https://api.scrapeless.com
SCRAPELESS_ACTOR_API_URL=https://actor.scrapeless.com
SCRAPELESS_STORAGE_API_URL=https://storage.scrapeless.com
SCRAPELESS_BROWSER_API_URL=https://browser.scrapeless.com
SCRAPELESS_CRAWL_API_URL=https://api.scrapeless.com

📖 Usage Examples

Browser

Advanced browser session management supporting Playwright and Puppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows:

import { Scrapeless } from '@scrapeless-ai/sdk';
import puppeteer from 'puppeteer-core';

const client = new Scrapeless();

// Create a browser session
const { browserWSEndpoint } = await client.browser.create({
  sessionName: 'my-session',
  sessionTTL: 180,
  proxyCountry: 'US'
});

// Connect with Puppeteer
const browser = await puppeteer.connect({
  browserWSEndpoint: browserWSEndpoint
});

const page = await browser.newPage();
await page.goto('https://example.com');
console.log(await page.title());

await browser.close();

Crawl

Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.

const result = await client.scrapingCrawl.scrapeUrl('https://example.com');

console.log(result);

Scraping API

Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors:

const result = await client.scraping.scrape({
  actor: 'scraper.shopee',
  input: {
    url: 'https://shopee.tw/a-i.10228173.24803858474'
  }
});

console.log(result.data);

Deep SerpApi

Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates:

const results = await client.deepserp.scrape({
  actor: 'scraper.google.search',
  input: {
    q: 'nike site:www.nike.com'
  }
});

console.log(results);

Actor

Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management:

// Run an actor
const run = await client.actor.run(actor.id, {
  input: { url: 'https://example.com' },
  runOptions: {
    CPU: 2,
    memory: 2048,
    timeout: 3600,
    version: 'v1.0.0'
  }
});

console.log('Actor run result:', run);

Profiles

Manage browser profiles for persistent sessions.

const createResponse = await client.profiles.create('My Profile');
console.log('Profile created:', createResponse);

🔧 API Reference

Client Configuration

interface ScrapelessConfig {
  apiKey?: string; // Your API key
  timeout?: number; // Request timeout in milliseconds (default: 30000)
  baseApiUrl?: string; // Base API URL
  actorApiUrl?: string; // Actor service URL
  storageApiUrl?: string; // Storage service URL
  browserApiUrl?: string; // Browser service URL
  scrapingCrawlApiUrl?: string; // Crawl service URL
}

Available Services

The SDK provides the following services through the main client:

client.browser - browser automation with Playwright/Puppeteer support, anti-detection tools (fingerprinting, CAPTCHA solving), and extensible workflows.
client.universal - JS rendering, user simulation (clicks/scrolls), anti-block bypass, and structured data export.
client.scrapingCrawl - Recursive site crawling with multi-format export (Markdown, JSON, HTML, screenshots, links).
client.scraping - Pre-built connectors for sites (e.g., e-commerce, travel) to extract product data, pricing, and reviews.
client.deepserp - Search engine results extraction
client.proxies - Proxy management
client.actor - Scalable workflow automation with built-in scheduling and resource management.
client.storage - Data storage solutions

Error Handling

The SDK throws ScrapelessError for API-related errors:

import { ScrapelessError } from '@scrapeless-ai/sdk';

try {
  const result = await client.scraping.scrape({ url: 'invalid-url' });
} catch (error) {
  if (error instanceof ScrapelessError) {
    console.error(`Scrapeless API Error: ${error.message}`);
    console.error(`Status Code: ${error.statusCode}`);
  }
}

📚 Examples

Check out the examples directory for comprehensive usage examples:

🧪 Testing

Run the test suite:

npm test

The SDK includes comprehensive tests for all services and utilities.

🛠️ Contributing & Development Guide

We welcome all contributions! For details on how to report issues, submit pull requests, follow code style, and set up local development, please see our Contributing & Development Guide.

Quick Start:

git clone https://github.com/scrapeless-ai/sdk-node.git
cd sdk-node
pnpm install
pnpm test
pnpm lint
pnpm format

See CONTRIBUTING.md for full details on contribution process, development workflow, code quality, project structure, best practices, and more.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📞 Support

📖 Documentation: https://docs.scrapeless.com
💬 Community: Join our Discord
🐛 Issues: GitHub Issues
📧 Email: [email protected]

🏢 About Scrapeless

Scrapeless is a powerful web scraping and browser automation platform that helps businesses extract data from any website at scale. Our platform provides:

High-performance web scraping infrastructure
Global proxy network
Browser automation capabilities
Enterprise-grade reliability and support

Visit scrapeless.com to learn more and get started.

Made with ❤️ by the Scrapeless team