pdf-case-parser

v1.0.3

Published

2 months ago

Node.js port of PDF extraction and case parser

0High
0Medium
0Low

siva22

pdf-case-parser-node

Parse case records from PDF files (local path or URL).

Runtime

This package supports:

Node.js backend: full mode (text PDF + OCR with tesseract/pdftoppm)
Frontend/browser: text extraction + OCR via tesseract.js

Install

npm install pdf-case-parser-node

API

const { parsePdfFile, parsePdfUrl, parsePdfBuffer } = require("pdf-case-parser-node");

1) Backend local file path

const records = await parsePdfFile({
  pdfPath: "C:\\data\\input.pdf",
  pageNumber: 3
});

2) Backend URL

const records = await parsePdfUrl({
  pdfUrl: "https://example.com/file.pdf",
  pageNumber: 3
});

3) Frontend file input (supports OCR files)

import { parsePdfFile } from "pdf-case-parser-node";

const file = event.target.files[0];
const records = await parsePdfFile({
  file,
  pageNumber: 2,
  forceOcr: true
});
console.log(records);

4) Backend uploaded buffer

const records = await parsePdfBuffer({
  buffer: req.file.buffer,
  pageNumber: 2
});

OCR executable paths

You can provide paths in 3 ways (priority order):

Function options:

await parsePdfUrl({
  pdfUrl: "https://example.com/file.pdf",
  pageNumber: 1,
  tesseractPath: "C:\\Program Files\\Tesseract-OCR\\tesseract.exe",
  pdftoppmPath: "C:\\Program Files\\poppler-24.08.0\\Library\\bin\\pdftoppm.exe"
});

Environment variables:

$env:TESSERACT_PATH="C:\Program Files\Tesseract-OCR\tesseract.exe"
$env:PDFTOPPM_PATH="C:\Program Files\poppler-24.08.0\Library\bin\pdftoppm.exe"
node app.js

Auto-detection:

Local project folders (no path passing needed):
- ./tesseract.exe
- ./pdftoppm.exe
- ./bin/tesseract.exe
- ./bin/pdftoppm.exe
- ./vendor/tesseract.exe
- ./vendor/pdftoppm.exe
C:\Program Files\Tesseract-OCR\tesseract.exe
C:\Program Files (x86)\Tesseract-OCR\tesseract.exe
C:\Program Files\poppler-24.08.0\Library\bin\pdftoppm.exe
C:\Program Files\poppler\Library\bin\pdftoppm.exe
PATH lookup (tesseract, pdftoppm)

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

pdf-case-parser-node

Runtime

Install

API

1) Backend local file path

2) Backend URL

3) Frontend file input (supports OCR files)

4) Backend uploaded buffer

OCR executable paths