@contexaworks/pdf-to-excel

v0.2.1

Published

4 months ago

Convert PDF documents to Excel spreadsheets with AI-powered table extraction

0High
0Medium
0Low

strathausen

contexa pdf excel pdf-to-excel spreadsheet table-extraction ocr ai rapidapi

@contexaworks/pdf-to-excel

Convert PDF documents to Excel spreadsheets (.xlsx) with AI-powered table extraction. Automatically detects tables, headers, column types, and multi-page layouts.

Install

npm install @contexaworks/pdf-to-excel

Quick Start

Get your API key from RapidAPI — subscribe to a plan and copy your X-RapidAPI-Key from any endpoint page.

import { ContexaPdfToExcel } from "@contexaworks/pdf-to-excel";
import { readFileSync, writeFileSync } from "fs";

const client = new ContexaPdfToExcel({
  apiKey: "your-rapidapi-key",
});

// Convert a PDF to Excel — returns the .xlsx binary
const result = await client.pdfToExcel(readFileSync("report.pdf"));

writeFileSync("report.xlsx", Buffer.from(result.data));
console.log(`Done in ${result.processingTimeMs}ms`);

Usage

File Input

The SDK accepts multiple file input formats:

// Buffer / Uint8Array (Node.js)
await client.pdfToExcel(readFileSync("file.pdf"));

// File / Blob (browser)
const input = document.querySelector<HTMLInputElement>("#file-input");
await client.pdfToExcel(input.files[0]);

// Base64 string
await client.pdfToExcel({
  base64: "JVBERi0xLjQg...",
  mimeType: "application/pdf",
  fileName: "report.pdf",
});

Extract Specific Pages

const result = await client.pdfToExcel(file, {
  pages: "1-5", // only extract pages 1 through 5
});

Custom Extraction Prompt

Guide the AI on what to extract:

const result = await client.pdfToExcel(file, {
  prompt: "Only extract the revenue table, ignore footnotes",
});

Progress Tracking

For large PDFs, track extraction progress page by page:

const result = await client.pdfToExcel(file, {
  onProgress: (job) => {
    const p = job.result?.progress;
    if (p) {
      console.log(`Page ${p.currentPage}/${p.totalPages} — ${p.tablesFound} tables found`);
    }
  },
});

Manual Polling

If you prefer to handle polling yourself (e.g. in a queue worker):

// Submit the job without waiting
const { jobId } = await client.pdfToExcel(file, { poll: false });

// Check status later
const job = await client.getJob(jobId);
console.log(job.status); // "processing" | "completed" | "failed"

// Fetch the result when ready
const response = await client.getJobResult(jobId);
const xlsx = Buffer.from(await response.arrayBuffer());

Webhook Callback

Get notified when extraction completes instead of polling:

const { jobId } = await client.pdfToExcel(file, {
  poll: false,
  callbackUrl: "https://your-server.com/webhook",
});
// Your webhook receives: { jobId, status, processingTimeMs }

What It Extracts

The AI handles complex PDF layouts including:

Multi-page tables — tables that span across pages are automatically merged
Multi-row headers — grouped/spanning column headers are preserved
Column types — numbers, dates, currencies, percentages detected automatically
Merged cells — vertically merged cells are reconstructed in the Excel output
Multiple tables — each table gets its own worksheet

API Reference

`new ContexaPdfToExcel(options)`

| Option | Type | Required | Description | |-----------|----------|----------|--------------------------------------------------| | apiKey | string | Yes | Your RapidAPI key | | baseUrl | string | No | API base URL (default: RapidAPI proxy) |

`client.pdfToExcel(file, options?)`

| Option | Type | Default | Description | |---------------|-------------------------|---------|------------------------------------| | prompt | string | — | Custom extraction instructions | | pages | string | — | Page range (e.g. "1-5") | | poll | boolean | true | Wait for completion | | onProgress | (job: Job) => void | — | Progress callback | | pollInterval| number | 2000 | Polling interval in ms | | callbackUrl | string | — | Webhook URL for completion |

Returns ExcelResult (when poll: true) or AsyncSubmitResult (when poll: false).

`client.getJob(id)`

Returns the current status of a job.

`client.getJobResult(id)`

Returns the raw Response — for Excel jobs this contains the .xlsx binary.

RapidAPI Setup

Go to Contexa PDF to Excel on RapidAPI
Subscribe to a plan (free tier available)
Copy your X-RapidAPI-Key from any endpoint page
Pass it as apiKey when creating the client

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@contexaworks/pdf-to-excel

Install

Quick Start

Usage

File Input

Extract Specific Pages

Custom Extraction Prompt

Progress Tracking

Manual Polling

Webhook Callback

What It Extracts

API Reference

new ContexaPdfToExcel(options)

client.pdfToExcel(file, options?)

client.getJob(id)

client.getJobResult(id)

RapidAPI Setup

License

`new ContexaPdfToExcel(options)`

`client.pdfToExcel(file, options?)`

`client.getJob(id)`

`client.getJobResult(id)`