@scrapeless-ai/sdk
v1.11.0
Published
Node SDK for Scrapeless AI
Downloads
4,527
Maintainers
Readme
Scrapeless Node SDK
The official Node.js SDK for Scrapeless AI - End-to-End Data Infrastructure for AI Developers & Enterprises.
📑 Table of Contents
- 🌟 Features
- 📦 Installation
- 🚀 Quick Start
- 📖 Usage Examples
- 🔧 API Reference
- 📚 Examples
- 🧪 Testing
- 🛠️ Contributing & Development Guide
- 📄 License
- 📞 Support
- 🏢 About Scrapeless
🌟 Features
- Browser: Advanced browser session management supporting Playwright and Puppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows.
- Universal Scraping API: web interaction and data extraction with full browser capabilities. Execute JavaScript rendering, simulate user interactions (clicks, scrolls), bypass anti-scraping measures, and export structured data in formats.
- Crawl: Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.
- Scraping API: Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors.
- Deep SerpApi: Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates.
- Proxies: Geo-targeted proxy network with 195+ countries. Optimize requests for better success rates and regional data access.
- Actor: Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management.
- Storage Solutions: Scalable data storage solutions for crawled content, supporting seamless integration with cloud services and databases.
- TypeScript Support: Full TypeScript definitions for better development experience
📦 Installation
Install the SDK using npm:
npm install @scrapeless-ai/sdkOr using yarn:
yarn add @scrapeless-ai/sdkOr using pnpm:
pnpm add @scrapeless-ai/sdk🚀 Quick Start
Prerequisite
Log in to the Scrapeless Dashboard and get the API Key
Basic Setup
import { Scrapeless } from '@scrapeless-ai/sdk';
// Initialize the client
const client = new Scrapeless({
apiKey: 'your-api-key' // Get your API key from https://scrapeless.com
});Environment Variables
You can also configure the SDK using environment variables:
# Required
SCRAPELESS_API_KEY=your-api-key
# Optional - Custom API endpoints
SCRAPELESS_BASE_API_URL=https://api.scrapeless.com
SCRAPELESS_ACTOR_API_URL=https://actor.scrapeless.com
SCRAPELESS_STORAGE_API_URL=https://storage.scrapeless.com
SCRAPELESS_BROWSER_API_URL=https://browser.scrapeless.com
SCRAPELESS_CRAWL_API_URL=https://api.scrapeless.com📖 Usage Examples
Browser
Advanced browser session management supporting Playwright and Puppeteer frameworks, with configurable anti-detection capabilities (e.g., fingerprint spoofing, CAPTCHA solving) and extensible automation workflows:
import { Scrapeless } from '@scrapeless-ai/sdk';
import puppeteer from 'puppeteer-core';
const client = new Scrapeless();
// Create a browser session
const { browserWSEndpoint } = await client.browser.create({
sessionName: 'my-session',
sessionTTL: 180,
proxyCountry: 'US'
});
// Connect with Puppeteer
const browser = await puppeteer.connect({
browserWSEndpoint: browserWSEndpoint
});
const page = await browser.newPage();
await page.goto('https://example.com');
console.log(await page.title());
await browser.close();Crawl
Extract data from single pages or traverse entire domains, exporting in formats including Markdown, JSON, HTML, screenshots, and links.
const result = await client.scrapingCrawl.scrapeUrl('https://example.com');
console.log(result);Scraping API
Direct data extraction APIs for websites (e.g., e-commerce, travel platforms). Retrieve structured product information, pricing, and reviews with pre-built connectors:
const result = await client.scraping.scrape({
actor: 'scraper.shopee',
input: {
url: 'https://shopee.tw/a-i.10228173.24803858474'
}
});
console.log(result.data);Deep SerpApi
Google SERP data extraction API. Fetch organic results, news, images, and more with customizable parameters and real-time updates:
const results = await client.deepserp.scrape({
actor: 'scraper.google.search',
input: {
q: 'nike site:www.nike.com'
}
});
console.log(results);Actor
Deploy custom crawling and data processing workflows at scale with built-in scheduling and resource management:
// Run an actor
const run = await client.actor.run(actor.id, {
input: { url: 'https://example.com' },
runOptions: {
CPU: 2,
memory: 2048,
timeout: 3600,
version: 'v1.0.0'
}
});
console.log('Actor run result:', run);Profiles
Manage browser profiles for persistent sessions.
const createResponse = await client.profiles.create('My Profile');
console.log('Profile created:', createResponse);🔧 API Reference
Client Configuration
interface ScrapelessConfig {
apiKey?: string; // Your API key
timeout?: number; // Request timeout in milliseconds (default: 30000)
baseApiUrl?: string; // Base API URL
actorApiUrl?: string; // Actor service URL
storageApiUrl?: string; // Storage service URL
browserApiUrl?: string; // Browser service URL
scrapingCrawlApiUrl?: string; // Crawl service URL
}Available Services
The SDK provides the following services through the main client:
client.browser- browser automation with Playwright/Puppeteer support, anti-detection tools (fingerprinting, CAPTCHA solving), and extensible workflows.client.universal- JS rendering, user simulation (clicks/scrolls), anti-block bypass, and structured data export.client.scrapingCrawl- Recursive site crawling with multi-format export (Markdown, JSON, HTML, screenshots, links).client.scraping- Pre-built connectors for sites (e.g., e-commerce, travel) to extract product data, pricing, and reviews.client.deepserp- Search engine results extractionclient.proxies- Proxy managementclient.actor- Scalable workflow automation with built-in scheduling and resource management.client.storage- Data storage solutions
Error Handling
The SDK throws ScrapelessError for API-related errors:
import { ScrapelessError } from '@scrapeless-ai/sdk';
try {
const result = await client.scraping.scrape({ url: 'invalid-url' });
} catch (error) {
if (error instanceof ScrapelessError) {
console.error(`Scrapeless API Error: ${error.message}`);
console.error(`Status Code: ${error.statusCode}`);
}
}📚 Examples
Check out the examples directory for comprehensive usage examples:
🧪 Testing
Run the test suite:
npm testThe SDK includes comprehensive tests for all services and utilities.
🛠️ Contributing & Development Guide
We welcome all contributions! For details on how to report issues, submit pull requests, follow code style, and set up local development, please see our Contributing & Development Guide.
Quick Start:
git clone https://github.com/scrapeless-ai/sdk-node.git
cd sdk-node
pnpm install
pnpm test
pnpm lint
pnpm formatSee CONTRIBUTING.md for full details on contribution process, development workflow, code quality, project structure, best practices, and more.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
📞 Support
- 📖 Documentation: https://docs.scrapeless.com
- 💬 Community: Join our Discord
- 🐛 Issues: GitHub Issues
- 📧 Email: [email protected]
🏢 About Scrapeless
Scrapeless is a powerful web scraping and browser automation platform that helps businesses extract data from any website at scale. Our platform provides:
- High-performance web scraping infrastructure
- Global proxy network
- Browser automation capabilities
- Enterprise-grade reliability and support
Visit scrapeless.com to learn more and get started.
Made with ❤️ by the Scrapeless team
