@crawlgate/sdk
v1.0.1
Published
Official JavaScript/TypeScript SDK for CrawlGate Search Engine API
Downloads
228
Maintainers
Readme
@crawlgate/sdk
Official JavaScript/TypeScript SDK for CrawlGate Search Engine API.
Installation
npm install @crawlgate/sdkQuick Start
import { CrawlGateClient } from '@crawlgate/sdk';
const client = new CrawlGateClient({
apiKey: 'sk_live_...',
apiUrl: 'https://api.crawlgate.io' // or your self-hosted URL
});
// Scrape a single page
const doc = await client.scrape('https://example.com');
console.log(doc.markdown);Configuration
const client = new CrawlGateClient({
// Required: API key (or set CRAWLGATE_API_KEY env var)
apiKey: 'sk_live_...',
// Optional: API URL (default: https://api.crawlgate.io)
apiUrl: 'https://api.crawlgate.io',
// Optional: Request timeout in ms (default: 90000)
timeoutMs: 90000,
// Optional: Max retries for failed requests (default: 3)
maxRetries: 3,
// Optional: Backoff factor for retries in seconds (default: 0.5)
backoffFactor: 0.5
});Engines
CrawlGate supports three scraping engines:
| Engine | Description | Best For |
|--------|-------------|----------|
| static | Axios + Cheerio (no browser) | Fast, simple pages |
| dynamic | Playwright headless browser | JavaScript-heavy sites |
| smart | Auto-selects static/dynamic | Cost optimization |
API Reference
Scrape
Scrape a single URL.
const doc = await client.scrape('https://example.com', {
engine: 'smart', // 'static' | 'dynamic' | 'smart'
formats: ['markdown', 'html'],
onlyMainContent: true,
excludeTags: ['nav', 'footer'],
waitFor: 1000, // Wait ms before scraping (dynamic only)
timeout: 30000,
proxy: 'stealth' // 'iproyal' | 'stealth' | 'tor'
});
console.log(doc.markdown);
console.log(doc.metadata?.title);Scrape with LLM Extraction
Extract structured data using AI.
import { z } from 'zod';
const schema = z.object({
productName: z.string(),
price: z.number(),
inStock: z.boolean(),
features: z.array(z.string())
});
const doc = await client.scrape('https://example.com/product', {
engine: 'smart',
extract: {
schema,
systemPrompt: 'Extract product information from the page',
provider: 'openai', // 'openai' | 'anthropic'
enableFallback: true // Try other provider if primary fails
}
});
console.log(doc.extract?.data);
// { productName: '...', price: 99.99, inStock: true, features: [...] }Batch Scrape
Scrape multiple URLs in a single job.
// Method 1: Wait for completion (recommended)
const job = await client.batchScrape(
['https://a.com', 'https://b.com', 'https://c.com'],
{
options: {
formats: ['markdown'],
engine: 'smart'
},
pollInterval: 2000, // Poll every 2 seconds
timeout: 300 // Max wait time in seconds
}
);
console.log(`Scraped ${job.completed} URLs`);
job.data.forEach(doc => {
console.log(doc.url, doc.markdown?.length);
});
// Method 2: Manual polling
const { id } = await client.startBatchScrape(
['https://a.com', 'https://b.com'],
{ options: { formats: ['markdown'] } }
);
let status = await client.getBatchScrapeStatus(id);
while (status.status === 'scraping') {
console.log(`Progress: ${status.completed}/${status.total}`);
await new Promise(r => setTimeout(r, 2000));
status = await client.getBatchScrapeStatus(id);
}
// Cancel a batch job
await client.cancelBatchScrape(id);
// Get errors
const errors = await client.getBatchScrapeErrors(id);
console.log('Failed URLs:', errors.errors.map(e => e.url));Crawl
Crawl multiple pages from a website.
// Method 1: Wait for completion (recommended)
const job = await client.crawl('https://example.com', {
limit: 10,
engine: 'dynamic',
formats: ['markdown'],
pollInterval: 2000, // Poll every 2 seconds
timeout: 300 // Max wait time in seconds
});
console.log(`Crawled ${job.completed} pages`);
job.data.forEach(doc => {
console.log(doc.url, doc.markdown?.length);
});
// Method 2: Manual polling
const { id } = await client.startCrawl('https://example.com', { limit: 10 });
let status = await client.getCrawlStatus(id);
while (status.status === 'scraping') {
console.log(`Progress: ${status.completed}/${status.total}`);
await new Promise(r => setTimeout(r, 2000));
status = await client.getCrawlStatus(id);
}
// Cancel a crawl job
await client.cancelCrawl(id);
// Get errors
const errors = await client.getCrawlErrors(id);
console.log('Failed URLs:', errors.errors.map(e => e.url));
console.log('Blocked by robots.txt:', errors.robotsBlocked);Extract (Standalone LLM Extraction)
Extract structured data from URLs using LLM.
import { z } from 'zod';
// With Zod schema
const result = await client.extract({
urls: ['https://example.com/product'],
schema: z.object({
name: z.string(),
price: z.number(),
inStock: z.boolean(),
features: z.array(z.string())
}),
systemPrompt: 'Extract product information from the page',
provider: 'openai',
timeout: 60
});
console.log(result.data);
// With natural language prompt
const result = await client.extract({
urls: ['https://example.com/about'],
prompt: 'Extract the company name, founding year, and list of team members',
enableWebSearch: true
});
console.log(result.data);
// Manual polling
const { id } = await client.startExtract({
urls: ['https://example.com'],
schema: { name: 'string', price: 'number' }
});
let status = await client.getExtractStatus(id);
while (status.status === 'processing') {
await new Promise(r => setTimeout(r, 2000));
status = await client.getExtractStatus(id);
}
console.log(status.data);Map
Discover all URLs on a website.
const result = await client.map('https://example.com', {
engine: 'dynamic'
});
console.log(`Found ${result.count} URLs:`);
result.links.forEach(url => console.log(url));Search
Search the web with optional scraping.
// Basic search
const results = await client.search('best restaurants in NYC', {
limit: 10,
lang: 'en',
country: 'us'
});
results.data.forEach(r => {
console.log(`${r.title}: ${r.url}`);
});
// Search + scrape each result
const results = await client.search('best laptops 2024', {
limit: 5,
scrapeOptions: {
formats: ['markdown']
},
engine: 'smart'
});
results.data.forEach(r => {
console.log(r.title);
console.log(r.markdown?.substring(0, 500));
});
// Search + LLM extraction
const results = await client.search('iPhone reviews', {
limit: 5,
scrapeOptions: { formats: ['markdown'] },
extract: {
schema: z.object({
sentiment: z.enum(['positive', 'negative', 'neutral']),
rating: z.number().optional(),
summary: z.string()
}),
systemPrompt: 'Analyze the review sentiment'
}
});
console.log(results.extract?.data);Usage & Monitoring
Monitor your API usage.
// Get concurrency usage
const { concurrency, maxConcurrency } = await client.getConcurrency();
console.log(`Using ${concurrency}/${maxConcurrency} concurrent requests`);
// Get credit usage
const credits = await client.getCreditUsage();
console.log(`Remaining credits: ${credits.remainingCredits}`);
// Get token usage (for LLM extraction)
const tokens = await client.getTokenUsage();
console.log(`Remaining tokens: ${tokens.remainingTokens}`);
// Get queue status
const queue = await client.getQueueStatus();
console.log(`Jobs in queue: ${queue.jobsInQueue}`);
console.log(`Active: ${queue.activeJobsInQueue}, Waiting: ${queue.waitingJobsInQueue}`);Error Handling
import {
CrawlGateClient,
CrawlGateError,
AuthenticationError,
ValidationError,
JobTimeoutError,
RateLimitError
} from '@crawlgate/sdk';
try {
const doc = await client.scrape('https://example.com');
} catch (error) {
if (error instanceof AuthenticationError) {
console.error('Invalid API key');
} else if (error instanceof ValidationError) {
console.error('Invalid request:', error.message);
} else if (error instanceof JobTimeoutError) {
console.error(`Job ${error.jobId} timed out after ${error.timeoutSeconds}s`);
} else if (error instanceof RateLimitError) {
console.error('Rate limited, retry after:', error.retryAfter);
} else if (error instanceof CrawlGateError) {
console.error('API error:', error.message, error.statusCode);
}
}TypeScript Support
Full TypeScript support with exported types:
import type {
CrawlGateClientOptions,
ScrapeOptions,
CrawlOptions,
MapOptions,
SearchOptions,
Document,
CrawlJob,
SearchResponse,
Engine,
ExtractOptions,
// Batch scrape
BatchScrapeOptions,
BatchScrapeJob,
BatchScrapeResponse,
// Extract
ExtractRequestOptions,
ExtractResponse,
// Usage
ConcurrencyInfo,
CreditUsage,
TokenUsage,
QueueStatus,
// Errors
CrawlError,
CrawlErrorsResponse
} from '@crawlgate/sdk';Environment Variables
# API key (used if not passed to constructor)
CRAWLGATE_API_KEY=sk_live_...
# API URL (used if not passed to constructor)
CRAWLGATE_API_URL=https://api.crawlgate.ioDocumentation
Full documentation available at docs.crawlgate.io
Support
- Website: crawlgate.io
- Documentation: docs.crawlgate.io
