invoice-parser
v1.0.1
Published
A PDF invoice parser using OpenAI
Maintainers
Readme
Invoice Parser
A robust Node.js library that extracts structured data from PDF invoices using OpenAI's GPT models.
It handles complex layouts, merged table columns, and varying invoice formats by converting the PDF to raw text and leveraging AI to parse it into a standardized JSON schema.
Features
- 📄 PDF Parsing: Efficiently extracts raw text layers from PDF documents using
pdf2json. - 🤖 AI-Powered Extraction: Uses OpenAI to intelligently identify, categorize, and normalize fields.
- 📦 Standardized JSON: Outputs a consistent data structure (vendor, taxes, products, etc.) regardless of the invoice's visual layout.
- ⚡ Simple API: Exposes a single asynchronous function for seamless integration.
Installation
Install the package via npm:
npm install invoice-parserUsage
import { parseInvoice } from 'invoice-parser';
// 1. Configuration
const API_KEY = process.env.OPENAI_API_KEY || "sk-proj-xxxxxxxxxxxxxxxxxxx";
const filePath = "./path/to/your/invoice.pdf";
async function main() {
try {
console.log(`Processing ${filePath}...`);
// 2. Call the parser
const data = await parseInvoice(filePath, API_KEY);
// 3. Use the data
console.log("Extraction Complete!");
console.log(JSON.stringify(data, null, 2));
} catch (error) {
console.error("Error parsing invoice:", error.message);
}
}
main();Returns: Promise containing the extracted fields.
Throws:
- Error: If the file path is incorrect or the file doesn't exist.
- Error: If the PDF is image-only (scanned) and contains no selectable text.
- Error: If the OpenAI API quota is exceeded or the key is invalid.
Contributing
- Fork the repository.
- Create a new branch (git checkout -b feature/improvement).
- Make your changes and commit.
- Push to your branch and submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
Ishan Dhingra
- Github: IshanDhingraCodes
