llm-document-ocr

v1.2.0

Published

2 years ago

LLM Based OCR and Document Parsing for Node.js

0High
0Medium
0Low

llm gpt gpt4 ocr pdf image claude claude3

Sponsored by Mercoa, the API for BillPay and Invoicing. Everything you need to launch accounts payable in your product with a single API!

LLM Based OCR and Document Parsing for Node.js. Uses GPT4 and Claude3 for OCR and data extraction.

Converts PDFs (including multi page PDFs) into PNGs for use with GPT4
Automatically crops white-space to create smaller inputs
Cleans up JSON string returned by the LLM and converts it to an JSON object
Custom prompt support for capturing any data you need

Supports:

✅ PNG
✅ WEBP
✅ JPEG / JPG
✅ GIF
✅ PDF
✅ Multi-page PDF
❌ DOC
❌ DOCX

Installation

npm i --save llm-document-ocr

yarn add llm-document-ocr

Note: If you are deploying via Docker, see the Dockerfile for an example Alpine base image.

Usage

import { DocumentOcr, prompts } from "llm-document-ocr";

const documentOcr = new DocumentOcr({
  apiKey: 'YOUR-OpenAi/Anthropic-API-KEY' // required, defaults to process.env.OPENAI_API_KEY. OpenAI models need an OpenAI API key, Antrhopic models need an Anthropic API key.
  model: "gpt-4o", // optional, defaults to "gpt-4-turbo". Options are: "gpt-4-turbo", "gpt-4o", "claude-3-opus-20240229", "claude-3-sonnet-20240229", "claude-3-haiku-20240307"
  standardFontDataUrl: "https://unpkg.com/[email protected]/standard_fonts/", // optional, defaults to "https://unpkg.com/[email protected]/standard_fonts/". You can use the systems fonts or the fonts under ./node_modules/pdfjs-dist/standard_fonts/ as well.
});

const documentData = await documentOcr.process({
  model: "gpt-4o", // optional, defaults to model defined in constructor
  document: 'JVBERi0xLjMNCiXi48/TDQoNCjEgMCBvYmoNCjw8DQ...', // Base64 String, Base64 URI, or Buffer
  mimeType: 'application/pdf', // mime-type of the document or image
  prompt: 'invoiceStartDate, invoiceEndDate, amount', // system prompt for data extraction. See examples below.
  pageOptions: 'FIRST_AND_LAST' // optional, defaults to 'ALL'. Determines which page of the PDF will be processed. Available options are 'ALL', 'FIRST_AND_LAST', 'FIRST', 'LAST'.
})

Prompts

Prompts will be automatically prefixed to tell the LLM to return JSON. You will need to specify the data you wish to extract, and the LLM will return a JSON object with those keys.

For example, the prompt we use at Mercoa for invoice processing is the following:

`invoice number, invoice amount, currency (as ISO 4217 code), dueDate, invoiceDate, serviceStartDate, serviceEndDate,
  vendor's [name, email with @, website],
  line items [amnt, price, qty, des, name, cur (as ISO 4217 code)]`;

And this returns a JSON object that looks like:

{
  invoiceNumber?: string | number
  invoiceAmount?: string | number
  currency?: string
  dueDate?: string
  invoiceDate?: string
  serviceStartDate?: string
  serviceEndDate?: string
  vendor: {
    name?: string
    email?: string
    website?: string
  }
  lineItems: Array<{
    des?: string
    qty?: string | number
    price?: string | number
    amnt?: string | number
    name?: string
    cur?: string
  }>
}

Issues and Contributing

If you encounter a bug or want to see something added/changed, please go ahead and open an issue

If you wish to contribute to the library, thanks! Please see the CONTRIBUTING guide for more details.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Installation

Usage

Prompts

Issues and Contributing

License