oxpdf

v0.1.3

Published

3 months ago

TypeScript/JavaScript SDK for the 0xPdf PDF-to-JSON API

Downloads

0High
0Medium
0Low

0xpdf

pdf ocr parsing json api document extraction

oxpdf

TypeScript/JavaScript SDK for the 0xPdf PDF-to-JSON API.

Works in Node.js 18+, Bun, Deno, and the browser — zero dependencies, uses native fetch.

Installation

npm install oxpdf

Quick Start

import { OxPDFClient } from "oxpdf";

const client = new OxPDFClient({ apiKey: "your_api_key" });

// Parse with a built-in template
const result = await client.parse(pdfBuffer, "invoice.pdf", {
  schemaTemplate: "invoice",
});
console.log(result.data);

// Parse from a file path (Node.js only)
const result2 = await client.parseFile("./invoice.pdf", {
  schemaTemplate: "invoice",
});

// Queue async processing and wait for completion
const queued = await client.upload(pdfBuffer, "invoice.pdf", {
  schemaId: "your_schema_id",
});
const finalStatus = await client.waitForJob(queued.job_id, {
  intervalMs: 2000,
  timeoutMs: 180000,
});
console.log(finalStatus.status);

// Streaming parse with real-time progress
for await (const event of client.parseFileStream("./large.pdf", {
  schemaTemplate: "invoice",
})) {
  if (event.event === "page") console.log(event.data.message);
  if (event.event === "complete") console.log("Done!", event.data);
}

Browser Usage

const input = document.querySelector<HTMLInputElement>("#pdf-upload")!;
input.addEventListener("change", async () => {
  const file = input.files![0];
  const result = await client.parse(file, file.name, {
    schemaTemplate: "invoice",
  });
  console.log(result.data);
});

Error Handling

import { OxPDFClient, OxPDFError } from "oxpdf";

try {
  const result = await client.parseFile("doc.pdf", {
    schemaTemplate: "invoice",
  });
} catch (e) {
  if (e instanceof OxPDFError) {
    console.error(`API error: ${e.message} (status: ${e.statusCode})`);
  }
}

Full API Reference

Constructor

new OxPDFClient({ apiKey, baseUrl?, timeout?, retry? })

| Option | Type | Default | Description | | --------- | -------- | ------------------------------------ | -------------------- | | apiKey | string | — | Your 0xPdf API key | | baseUrl | string | https://api.0xpdf.io/api/v1 | API base URL | | timeout | number | 120000 | Request timeout (ms) | | retry | object | { maxRetries: 2, initialDelayMs: 500, backoffMultiplier: 2 } | Retry/backoff config |

PDF Parsing

| Method | Description | |---|---| | parse(file, filename, options?) | Sync parse from Buffer/Blob/File | | parseFile(filePath, options?) | Sync parse from file path (Node.js) | | parseStream(file, filename, options?) | Streaming SSE parse (async generator) | | parseFileStream(filePath, options?) | Streaming parse from file (Node.js) | | validate(file, filename, options?) | Dry-run validation |

Async Jobs

| Method | Description | |---|---| | upload(file, filename, options?) | Queue PDF for background processing | | jobStatus(jobId) | Poll async job status | | waitForJob(jobId, options?) | Poll until completed/failed (with timeout) |

Image Extraction

| Method | Description | |---|---| | extractImages(file, filename, options?) | Extract images from a PDF | | listImages(limit?, offset?) | List extracted images | | getImageUrl(imageId, expirationSeconds?) | Get/refresh presigned URL | | deleteImage(imageId) | Delete a specific image | | deleteAllImages() | Delete all images |

File Management

| Method | Description | |---|---| | listFiles() | List uploaded PDFs | | getFile(pdfId) | Get PDF metadata + download URL | | deleteFile(pdfId) | Delete an uploaded PDF |

Schema CRUD

| Method | Description | |---|---| | listSchemas() | List saved schemas | | getSchema(schemaId) | Get schema with full definition | | createSchema(options) | Create a new schema | | updateSchema(schemaId, options) | Update existing schema | | deleteSchema(schemaId) | Delete a schema | | setDefaultSchema(schemaId) | Set as default | | generateSchema(options) | AI-generate a schema |

Templates

| Method | Description | |---|---| | listTemplates() | Parse templates (invoice, receipt, etc.) | | listSchemaTemplates() | Schema editor templates | | getSchemaTemplate(templateId) | Get template with full schema |

Analytics & Pricing

| Method | Description | |---|---| | getAnalytics() | Usage analytics | | submitFeedback(feedback) | Submit feedback | | getPricing(billingCycle?) | Get pricing tiers | | getCurrentTier() | Current subscription & quota |

Parse Options

| Option | Type | Description | | ---------------- | ---------------------- | -------------------------------------- | | schema | Record<string, any> | Custom JSON schema | | schemaTemplate | string | Pre-built template name | | schemaId | string | Saved schema ID | | useOcr | boolean | Enable OCR (default: false) | | ocrEngine | string | "surya" or "groq_vision" | | pages | number[] | Specific pages to parse |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

oxpdf

Installation

Quick Start

Browser Usage

Error Handling

Full API Reference

Constructor

PDF Parsing

Async Jobs

Image Extraction

File Management

Schema CRUD

Templates

Analytics & Pricing

Parse Options