@xcrawl/xcrawl
v1.0.0
Published
Official Node.js SDK for XCrawl - A powerful web scraping API service
Maintainers
Readme
XCrawl Node SDK
XCrawl Node SDK provides an interface to the XCrawl API, including scraping, search, sitemap discovery, and site crawling.
Table of Contents
- Installation
- Quick Start
- Core APIs
- Configuration
- Output Formats
- Error Handling
- TypeScript Support
- Requirements
- License
Installation
npm install @xcrawl/xcrawlQuick Start
- Get an API key from xcrawl.com.
- Set
XCRAWL_API_KEYor passapiKeydirectly.
export XCRAWL_API_KEY=your-api-keyimport { XcrawlClient } from '@xcrawl/xcrawl';
const client = new XcrawlClient();
// Sync scrape
const syncResult = await client.scrape('https://example.com', {
output: { formats: ['markdown'] }
});
console.log(syncResult.data.markdown);
// Async scrape + auto polling
const job = await client.scrape('https://example.com', {
mode: 'async',
output: { formats: ['markdown', 'summary'] }
});
const result = await client.waitForJob(job.scrape_id, { timeout: 60 });
console.log(result.status);Core APIs
Scrape
const result = await client.scrape('https://example.com', {
proxy: { location: 'US' },
request: {
device: 'desktop',
only_main_content: true,
block_ads: true,
},
output: {
formats: ['html', 'markdown', 'links', 'screenshot'],
screenshot: 'full_page'
}
});Async Status Check
const status = await client.getJobResult('scrape-id-123');
if (status.status === 'completed') {
console.log(status.data.markdown);
} else if (status.status === 'failed') {
console.error(status.message);
}Structured Extraction (json format)
output.json.prompt and output.json.json_schema are both optional.
const result = await client.scrape('https://example.com', {
output: {
formats: ['json'],
json: {
prompt: 'Extract product name and price',
json_schema: {
type: 'object',
properties: {
name: { type: 'string' },
price: { type: 'number' }
}
}
}
}
});Search
const result = await client.search({
query: 'web scraping',
location: 'New York, NY',
language: 'en',
limit: 10
});
console.log(result.data.organic_results);Map
const result = await client.map('https://example.com', {
filter: '.*\\.html$',
limit: 1000,
include_subdomains: true,
ignore_query_parameters: true
});
console.log(result.data.links);Crawl
const job = await client.crawl('https://example.com', {
crawler: {
limit: 100,
max_depth: 3,
include: ['.*\\.html$'],
exclude: ['.*\\/admin\\/.*']
},
output: { formats: ['markdown'] }
});
const crawlStatus = await client.getCrawlStatus(job.crawl_id);
console.log(crawlStatus.status);Webhook (Async)
await client.scrape('https://example.com', {
mode: 'async',
webhook: {
url: 'https://your-server.com/webhook',
events: ['completed', 'failed']
}
});Configuration
const client = new XcrawlClient({
apiKey: 'your-api-key',
apiUrl: 'https://run.xcrawl.com',
timeout: 60,
maxRetries: 3,
backoffFactor: 0.5,
});Retry behavior:
- Retries: HTTP 5xx and network failures
- No retry: HTTP 4xx and validation errors
Output Formats
Supported values in output.formats:
markdownhtmlraw_htmllinkssummaryscreenshotjson
If output.formats is omitted or set to [], the response returns metadata only.
Error Handling
import { XcrawlClient, XcrawlError, JobTimeoutError } from '@xcrawl/xcrawl';
const client = new XcrawlClient({ apiKey: 'your-api-key' });
try {
const result = await client.scrape('https://example.com', {
output: { formats: ['markdown'] }
});
console.log(result.data.markdown);
} catch (error) {
if (error instanceof JobTimeoutError) {
console.error(`Job ${error.jobId} timed out after ${error.timeoutSeconds}s`);
} else if (error instanceof XcrawlError) {
console.error(error.code, error.message, error.status, error.request_id);
}
}TypeScript Support
The package ships with built-in TypeScript types.
import type {
ScrapeOptions,
SearchOptions,
MapOptions,
CrawlOptions,
XcrawlError,
JobTimeoutError,
} from '@xcrawl/xcrawl';Requirements
- Node.js >= 14.0.0
License
MIT
