@contexaworks/pdf-to-excel
v0.2.1
Published
Convert PDF documents to Excel spreadsheets with AI-powered table extraction
Maintainers
Readme
@contexaworks/pdf-to-excel
Convert PDF documents to Excel spreadsheets (.xlsx) with AI-powered table extraction. Automatically detects tables, headers, column types, and multi-page layouts.
Powered by the Contexa PDF to Excel API on RapidAPI.
Install
npm install @contexaworks/pdf-to-excelQuick Start
Get your API key from RapidAPI — subscribe to a plan and copy your X-RapidAPI-Key from any endpoint page.
import { ContexaPdfToExcel } from "@contexaworks/pdf-to-excel";
import { readFileSync, writeFileSync } from "fs";
const client = new ContexaPdfToExcel({
apiKey: "your-rapidapi-key",
});
// Convert a PDF to Excel — returns the .xlsx binary
const result = await client.pdfToExcel(readFileSync("report.pdf"));
writeFileSync("report.xlsx", Buffer.from(result.data));
console.log(`Done in ${result.processingTimeMs}ms`);Usage
File Input
The SDK accepts multiple file input formats:
// Buffer / Uint8Array (Node.js)
await client.pdfToExcel(readFileSync("file.pdf"));
// File / Blob (browser)
const input = document.querySelector<HTMLInputElement>("#file-input");
await client.pdfToExcel(input.files[0]);
// Base64 string
await client.pdfToExcel({
base64: "JVBERi0xLjQg...",
mimeType: "application/pdf",
fileName: "report.pdf",
});Extract Specific Pages
const result = await client.pdfToExcel(file, {
pages: "1-5", // only extract pages 1 through 5
});Custom Extraction Prompt
Guide the AI on what to extract:
const result = await client.pdfToExcel(file, {
prompt: "Only extract the revenue table, ignore footnotes",
});Progress Tracking
For large PDFs, track extraction progress page by page:
const result = await client.pdfToExcel(file, {
onProgress: (job) => {
const p = job.result?.progress;
if (p) {
console.log(`Page ${p.currentPage}/${p.totalPages} — ${p.tablesFound} tables found`);
}
},
});Manual Polling
If you prefer to handle polling yourself (e.g. in a queue worker):
// Submit the job without waiting
const { jobId } = await client.pdfToExcel(file, { poll: false });
// Check status later
const job = await client.getJob(jobId);
console.log(job.status); // "processing" | "completed" | "failed"
// Fetch the result when ready
const response = await client.getJobResult(jobId);
const xlsx = Buffer.from(await response.arrayBuffer());Webhook Callback
Get notified when extraction completes instead of polling:
const { jobId } = await client.pdfToExcel(file, {
poll: false,
callbackUrl: "https://your-server.com/webhook",
});
// Your webhook receives: { jobId, status, processingTimeMs }What It Extracts
The AI handles complex PDF layouts including:
- Multi-page tables — tables that span across pages are automatically merged
- Multi-row headers — grouped/spanning column headers are preserved
- Column types — numbers, dates, currencies, percentages detected automatically
- Merged cells — vertically merged cells are reconstructed in the Excel output
- Multiple tables — each table gets its own worksheet
API Reference
new ContexaPdfToExcel(options)
| Option | Type | Required | Description |
|-----------|----------|----------|--------------------------------------------------|
| apiKey | string | Yes | Your RapidAPI key |
| baseUrl | string | No | API base URL (default: RapidAPI proxy) |
client.pdfToExcel(file, options?)
| Option | Type | Default | Description |
|---------------|-------------------------|---------|------------------------------------|
| prompt | string | — | Custom extraction instructions |
| pages | string | — | Page range (e.g. "1-5") |
| poll | boolean | true | Wait for completion |
| onProgress | (job: Job) => void | — | Progress callback |
| pollInterval| number | 2000 | Polling interval in ms |
| callbackUrl | string | — | Webhook URL for completion |
Returns ExcelResult (when poll: true) or AsyncSubmitResult (when poll: false).
client.getJob(id)
Returns the current status of a job.
client.getJobResult(id)
Returns the raw Response — for Excel jobs this contains the .xlsx binary.
RapidAPI Setup
- Go to Contexa PDF to Excel on RapidAPI
- Subscribe to a plan (free tier available)
- Copy your X-RapidAPI-Key from any endpoint page
- Pass it as
apiKeywhen creating the client
License
MIT
