@firescraper/sdk
v1.0.0
Published
Official TypeScript SDK for the FireScraper web scraping API. Turn websites into clean, structured text for RAG pipelines and AI agents.
Maintainers
Readme
FireScraper SDK
Official TypeScript SDK for the FireScraper web scraping API. Turn websites into clean, structured text for RAG pipelines and AI agents.
Install
npm install @firescraper/sdkQuick start
import { FireScraper } from '@firescraper/sdk';
const client = new FireScraper('fsk_your_api_key');
// Start a crawl
const session = await client.scrape({
name: 'Docs crawl',
urls: ['https://docs.example.com/'],
maxDepth: 2,
});
// Wait for completion
const result = await client.waitForCompletion(session.id, {
onProgress: (s) => console.log(`${s.counts.success} pages scraped`),
});
// Download results as Markdown
const download = await client.getResults(session.id, 'markdown');Authentication
Create an API key at Settings > API Keys in the FireScraper dashboard. Keys start with fsk_ and are shown only once.
// Minimal — just pass the key
const client = new FireScraper('fsk_your_api_key');
// With options
const client = new FireScraper({
apiKey: 'fsk_your_api_key',
baseUrl: 'https://firescraper.com', // default
timeout: 30_000, // default: 30s
});API Reference
client.scrape(options)
Start a new scrape session.
const session = await client.scrape({
name: 'Product pages',
urls: ['https://example.com/products'],
ignoreUrls: ['https://example.com/products/archive'],
maxDepth: 3,
minTextLength: 80,
scraper: 'article', // 'article' | 'full'
uniqueTextDownloads: true,
respectRobotsTxt: true,
contentSelector: 'main article',
webhookUrl: 'https://your-app.com/webhook',
extractionSchema: {
type: 'object',
properties: {
title: { type: 'string' },
price: { type: 'number' },
},
},
});
console.log(session.id); // 'SESSION_ID'
console.log(session.status); // 'in-progress'| Option | Type | Required | Description |
|---|---|---|---|
| name | string | Yes | Project name |
| urls | string[] | Yes | Seed URLs to crawl |
| ignoreUrls | string[] | No | URLs to skip |
| maxDepth | number | No | Link-hop depth (0 = seed only) |
| minTextLength | number | No | Minimum word count per page |
| scraper | 'article' \| 'full' | No | Extraction mode (default: 'article') |
| uniqueTextDownloads | boolean | No | Deduplicate text content |
| respectRobotsTxt | boolean | No | Honour robots.txt rules |
| contentSelector | string | No | CSS selector to restrict extraction |
| webhookUrl | string | No | POST callback when crawl finishes |
| extractionSchema | object | No | JSON Schema for structured extraction |
client.getSession(sessionId)
Get the current status, page counts, and processing state.
const status = await client.getSession(session.id);
console.log(status.session.status); // 'in-progress' | 'done' | ...
console.log(status.counts.success); // pages scraped successfully
console.log(status.processing.queueLength); // pages still in queueclient.waitForCompletion(sessionId, options?)
Poll until the session reaches a terminal status.
const final = await client.waitForCompletion(session.id, {
pollInterval: 5_000, // check every 5s (default)
timeout: 600_000, // give up after 10 min
onProgress: (s) => {
console.log(`${s.counts.success}/${s.counts.total} pages`);
},
});Throws FireScraperError with code 'TIMEOUT' if the deadline is exceeded.
client.listResults(sessionId)
List the export files available after a crawl completes.
const { files } = await client.listResults(session.id);
// [{ format: 'csv', fileName: 'corpus-csv.csv' }, ...]client.getResults(sessionId, format)
Download a result file. Returns an ArrayBuffer you can write to disk or decode.
import { writeFileSync } from 'fs';
const result = await client.getResults(session.id, 'markdown');
writeFileSync(result.fileName, Buffer.from(result.data));Supported formats: zip, csv, json, jsonl, markdown, structured, manifest, documents, chunks, extracted.
client.getPartialResults(sessionId, format?)
Download pages scraped so far while the crawl is still running. Supports csv, json, jsonl.
const partial = await client.getPartialResults(session.id, 'json');
console.log(`${partial.rowCount} rows exported mid-crawl`);Error handling
All errors are instances of FireScraperError with a typed code field.
import { FireScraper, FireScraperError } from '@firescraper/sdk';
try {
await client.scrape({ name: 'Test', urls: ['https://example.com'] });
} catch (error) {
if (error instanceof FireScraperError) {
switch (error.code) {
case 'UNAUTHORIZED': // invalid or revoked API key
case 'RATE_LIMITED': // too many requests
case 'NOT_FOUND': // session doesn't exist
case 'BAD_REQUEST': // invalid input
case 'TIMEOUT': // request or poll timed out
case 'SERVER_ERROR': // 5xx from the API
case 'NETWORK_ERROR': // fetch failed
console.error(error.code, error.message);
}
}
}Common patterns
Feed a RAG pipeline
const session = await client.scrape({
name: 'Knowledge base',
urls: ['https://docs.example.com/'],
maxDepth: 4,
scraper: 'article',
respectRobotsTxt: true,
});
await client.waitForCompletion(session.id);
const docs = await client.getResults(session.id, 'documents');
const text = new TextDecoder().decode(docs.data);
// Each line is a JSON document ready for chunking + embedding
for (const line of text.split('\n').filter(Boolean)) {
const doc = JSON.parse(line);
await vectorStore.upsert(doc.document_id, doc.text);
}Webhooks with HMAC signature verification
const session = await client.scrape({
name: 'Weekly pricing refresh',
urls: ['https://competitor.com/pricing'],
maxDepth: 1,
webhookUrl: 'https://your-app.com/firescraper/done',
});
// Store the signing secret securely — it is only returned once
console.log('Webhook secret:', session.webhookSecret);
// In your webhook handler, verify the x-firescraper-signature header:
// t=<timestamp>,v1=<hmac_sha256_hex>
// Compute: HMAC-SHA256(secret, "<timestamp>.<raw_body>")
// Compare v1 with a constant-time comparison.
// Deliveries retry up to 3 times with exponential backoff.Extract structured data
const session = await client.scrape({
name: 'Product catalog',
urls: ['https://shop.example.com/products'],
maxDepth: 2,
extractionSchema: {
type: 'object',
properties: {
product_name: { type: 'string' },
price: { type: 'number' },
in_stock: { type: 'boolean' },
},
},
});
await client.waitForCompletion(session.id);
const extracted = await client.getResults(session.id, 'extracted');Requirements
- Node.js 18+ (uses native
fetch) - Works in any runtime with
fetch— Node.js, Bun, Deno, Cloudflare Workers, browsers
Support
License
MIT
