@firescraper/sdk

v1.0.0

Published

25 days ago

Official TypeScript SDK for the FireScraper web scraping API. Turn websites into clean, structured text for RAG pipelines and AI agents.

0High
0Medium
0Low

molokstech

web-scraping scraper rag ai llm crawl firescraper data-extraction structured-data sdk

FireScraper SDK

Official TypeScript SDK for the FireScraper web scraping API. Turn websites into clean, structured text for RAG pipelines and AI agents.

Install

npm install @firescraper/sdk

Quick start

import { FireScraper } from '@firescraper/sdk';

const client = new FireScraper('fsk_your_api_key');

// Start a crawl
const session = await client.scrape({
  name: 'Docs crawl',
  urls: ['https://docs.example.com/'],
  maxDepth: 2,
});

// Wait for completion
const result = await client.waitForCompletion(session.id, {
  onProgress: (s) => console.log(`${s.counts.success} pages scraped`),
});

// Download results as Markdown
const download = await client.getResults(session.id, 'markdown');

Authentication

Create an API key at Settings > API Keys in the FireScraper dashboard. Keys start with fsk_ and are shown only once.

// Minimal — just pass the key
const client = new FireScraper('fsk_your_api_key');

// With options
const client = new FireScraper({
  apiKey: 'fsk_your_api_key',
  baseUrl: 'https://firescraper.com', // default
  timeout: 30_000,                     // default: 30s
});

API Reference

`client.scrape(options)`

Start a new scrape session.

const session = await client.scrape({
  name: 'Product pages',
  urls: ['https://example.com/products'],
  ignoreUrls: ['https://example.com/products/archive'],
  maxDepth: 3,
  minTextLength: 80,
  scraper: 'article',           // 'article' | 'full'
  uniqueTextDownloads: true,
  respectRobotsTxt: true,
  contentSelector: 'main article',
  webhookUrl: 'https://your-app.com/webhook',
  extractionSchema: {
    type: 'object',
    properties: {
      title: { type: 'string' },
      price: { type: 'number' },
    },
  },
});

console.log(session.id);     // 'SESSION_ID'
console.log(session.status); // 'in-progress'

| Option | Type | Required | Description | |---|---|---|---| | name | string | Yes | Project name | | urls | string[] | Yes | Seed URLs to crawl | | ignoreUrls | string[] | No | URLs to skip | | maxDepth | number | No | Link-hop depth (0 = seed only) | | minTextLength | number | No | Minimum word count per page | | scraper | 'article' \| 'full' | No | Extraction mode (default: 'article') | | uniqueTextDownloads | boolean | No | Deduplicate text content | | respectRobotsTxt | boolean | No | Honour robots.txt rules | | contentSelector | string | No | CSS selector to restrict extraction | | webhookUrl | string | No | POST callback when crawl finishes | | extractionSchema | object | No | JSON Schema for structured extraction |

`client.getSession(sessionId)`

Get the current status, page counts, and processing state.

const status = await client.getSession(session.id);

console.log(status.session.status);       // 'in-progress' | 'done' | ...
console.log(status.counts.success);       // pages scraped successfully
console.log(status.processing.queueLength); // pages still in queue

`client.waitForCompletion(sessionId, options?)`

Poll until the session reaches a terminal status.

const final = await client.waitForCompletion(session.id, {
  pollInterval: 5_000,   // check every 5s (default)
  timeout: 600_000,      // give up after 10 min
  onProgress: (s) => {
    console.log(`${s.counts.success}/${s.counts.total} pages`);
  },
});

Throws FireScraperError with code 'TIMEOUT' if the deadline is exceeded.

`client.listResults(sessionId)`

List the export files available after a crawl completes.

const { files } = await client.listResults(session.id);
// [{ format: 'csv', fileName: 'corpus-csv.csv' }, ...]

`client.getResults(sessionId, format)`

Download a result file. Returns an ArrayBuffer you can write to disk or decode.

import { writeFileSync } from 'fs';

const result = await client.getResults(session.id, 'markdown');
writeFileSync(result.fileName, Buffer.from(result.data));

Supported formats: zip, csv, json, jsonl, markdown, structured, manifest, documents, chunks, extracted.

`client.getPartialResults(sessionId, format?)`

Download pages scraped so far while the crawl is still running. Supports csv, json, jsonl.

const partial = await client.getPartialResults(session.id, 'json');
console.log(`${partial.rowCount} rows exported mid-crawl`);

Error handling

All errors are instances of FireScraperError with a typed code field.

import { FireScraper, FireScraperError } from '@firescraper/sdk';

try {
  await client.scrape({ name: 'Test', urls: ['https://example.com'] });
} catch (error) {
  if (error instanceof FireScraperError) {
    switch (error.code) {
      case 'UNAUTHORIZED':  // invalid or revoked API key
      case 'RATE_LIMITED':  // too many requests
      case 'NOT_FOUND':     // session doesn't exist
      case 'BAD_REQUEST':   // invalid input
      case 'TIMEOUT':       // request or poll timed out
      case 'SERVER_ERROR':  // 5xx from the API
      case 'NETWORK_ERROR': // fetch failed
        console.error(error.code, error.message);
    }
  }
}

Common patterns

Feed a RAG pipeline

const session = await client.scrape({
  name: 'Knowledge base',
  urls: ['https://docs.example.com/'],
  maxDepth: 4,
  scraper: 'article',
  respectRobotsTxt: true,
});

await client.waitForCompletion(session.id);

const docs = await client.getResults(session.id, 'documents');
const text = new TextDecoder().decode(docs.data);

// Each line is a JSON document ready for chunking + embedding
for (const line of text.split('\n').filter(Boolean)) {
  const doc = JSON.parse(line);
  await vectorStore.upsert(doc.document_id, doc.text);
}

Webhooks with HMAC signature verification

const session = await client.scrape({
  name: 'Weekly pricing refresh',
  urls: ['https://competitor.com/pricing'],
  maxDepth: 1,
  webhookUrl: 'https://your-app.com/firescraper/done',
});

// Store the signing secret securely — it is only returned once
console.log('Webhook secret:', session.webhookSecret);

// In your webhook handler, verify the x-firescraper-signature header:
// t=<timestamp>,v1=<hmac_sha256_hex>
// Compute: HMAC-SHA256(secret, "<timestamp>.<raw_body>")
// Compare v1 with a constant-time comparison.
// Deliveries retry up to 3 times with exponential backoff.

Extract structured data

const session = await client.scrape({
  name: 'Product catalog',
  urls: ['https://shop.example.com/products'],
  maxDepth: 2,
  extractionSchema: {
    type: 'object',
    properties: {
      product_name: { type: 'string' },
      price: { type: 'number' },
      in_stock: { type: 'boolean' },
    },
  },
});

await client.waitForCompletion(session.id);
const extracted = await client.getResults(session.id, 'extracted');

Requirements

Node.js 18+ (uses native fetch)
Works in any runtime with fetch — Node.js, Bun, Deno, Cloudflare Workers, browsers

Support

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme