pdfs2json
v1.0.2
Published
Extract text from pdf file(s) in text or JSON format.
Downloads
8
Readme
README
Introduction
This app, pdfs2json, facilitates extraction of textual content as plain text or JSON objects from a set of pdf files.
Quick Start
The following instructions provide a simple demonstration in the use of the package.
Create and cd into a new folder. For example:
mkdir test
cd test
Initialize, accepting the default values.
npm init
Install dependencies:
npm install pdfs2json
A simple test using the following code will determine if the app is correctly returning a JSON object.
Create the default entry file index.js with this content:
(async () => {
let p2j = require('pdfs2json');
let url = 'http://www.xmlpdf.com/manualfiles/hello-world.pdf';
let json_result = await p2j.pdf2txt.fileJSONContent(url);
console.log('json file : ', json_result);
})();
Perform the test:
node .
Or:
node index.js
Here is the expected output:
json file : { url: 'http://www.xmlpdf.com/manualfiles/hello-world.pdf', json: [ 'Hello, world!' ] }
API
/**
* Public API:
* pdf2text.fileTextContent(url)
* pdf2text.filesTextContent(urls)
* pdf2text.fileJSONContent(url)
* pdf2text.filesJSONContent(urls)
*/