pdf-case-parser
v1.0.3
Published
Node.js port of PDF extraction and case parser
Readme
pdf-case-parser-node
Parse case records from PDF files (local path or URL).
Runtime
This package supports:
- Node.js backend: full mode (text PDF + OCR with
tesseract/pdftoppm) - Frontend/browser: text extraction + OCR via
tesseract.js
Install
npm install pdf-case-parser-nodeAPI
const { parsePdfFile, parsePdfUrl, parsePdfBuffer } = require("pdf-case-parser-node");1) Backend local file path
const records = await parsePdfFile({
pdfPath: "C:\\data\\input.pdf",
pageNumber: 3
});2) Backend URL
const records = await parsePdfUrl({
pdfUrl: "https://example.com/file.pdf",
pageNumber: 3
});3) Frontend file input (supports OCR files)
import { parsePdfFile } from "pdf-case-parser-node";
const file = event.target.files[0];
const records = await parsePdfFile({
file,
pageNumber: 2,
forceOcr: true
});
console.log(records);4) Backend uploaded buffer
const records = await parsePdfBuffer({
buffer: req.file.buffer,
pageNumber: 2
});OCR executable paths
You can provide paths in 3 ways (priority order):
- Function options:
await parsePdfUrl({
pdfUrl: "https://example.com/file.pdf",
pageNumber: 1,
tesseractPath: "C:\\Program Files\\Tesseract-OCR\\tesseract.exe",
pdftoppmPath: "C:\\Program Files\\poppler-24.08.0\\Library\\bin\\pdftoppm.exe"
});- Environment variables:
$env:TESSERACT_PATH="C:\Program Files\Tesseract-OCR\tesseract.exe"
$env:PDFTOPPM_PATH="C:\Program Files\poppler-24.08.0\Library\bin\pdftoppm.exe"
node app.js- Auto-detection:
- Local project folders (no path passing needed):
./tesseract.exe./pdftoppm.exe./bin/tesseract.exe./bin/pdftoppm.exe./vendor/tesseract.exe./vendor/pdftoppm.exe
C:\Program Files\Tesseract-OCR\tesseract.exeC:\Program Files (x86)\Tesseract-OCR\tesseract.exeC:\Program Files\poppler-24.08.0\Library\bin\pdftoppm.exeC:\Program Files\poppler\Library\bin\pdftoppm.exe- PATH lookup (
tesseract,pdftoppm)
