@rocknerve/pdftotext
v0.9.2
Published
Another simple Node.js wrapper for the popular `pdftotext` library.
Readme
@rocknerve/pdftotext
Another simple Node.js wrapper for the popular pdftotext library.
This one supports parse-until-time-limit and parse-until-maximum-text-size.
It also automatically installs pdftotext if it runs as the root user on a Debian/Ubuntu/Mint system, which is pretty nice.
No intermediate files are used.
Install:
npm i @rocknerve/pdftotextExample usage:
const ConvertPDFToText = require("@rocknerve/pdftotext");
your_pdf_data_buffer = await readFile("example.pdf");
your_pdf_data_buffer = await (await fetch("https://example.com/example.pdf")).arrayBuffer();
your_plain_text_string = await ConvertPDFToText({
input: { body: your_pdf_data_buffer },
timelimit_ms: 10_000, // optional; limit processing to 10 seconds
sizelimit_bytes: 65535, // optional; limit text output to 64KB
logger: (line) => console.log(`--- PDF parsing status: ${line}`), // optional; you can also pass `false` to avoid default logging to stdout
});Potential future features:
- Allow streaming IO
- Allow PDF URLs to be passed and fetched automatically
- Make nice for Deno 2
- Support DOCX or other types too
