node-pdf-extractor
v1.0.0
Published
A simple and powerful Node.js PDF text extractor
Downloads
105
Maintainers
Readme
node-pdf-extractor
A simple and powerful Node.js PDF text extractor.
Installation
npm install node-pdf-extractorOr install globally for CLI usage:
npm install -g node-pdf-extractorUsage
As a Module
const { extractText, extractFromPath, PDFExtractor } = require('node-pdf-extractor');
// Simple text extraction
const text = await extractText('document.pdf');
console.log(text);
// Full extraction with metadata
const result = await extractFromPath('document.pdf');
console.log(result.text); // Extracted text
console.log(result.numPages); // Number of pages
console.log(result.info); // PDF info (title, author, etc.)
// Using the class
const extractor = new PDFExtractor();
const data = await extractor.extract('document.pdf');
console.log(data.text);Extract from Buffer
const fs = require('fs');
const { extractFromBuffer } = require('node-pdf-extractor');
const buffer = fs.readFileSync('document.pdf');
const result = await extractFromBuffer(buffer);
console.log(result.text);With Express/Multer (File Uploads)
const express = require('express');
const multer = require('multer');
const { extractFromBuffer } = require('node-pdf-extractor');
const app = express();
const upload = multer({ storage: multer.memoryStorage() });
app.post('/extract', upload.single('pdf'), async (req, res) => {
try {
const result = await extractFromBuffer(req.file.buffer);
res.json(result);
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.listen(3000);CLI Usage
# Extract and print to console
pdf-extract document.pdf
# Extract and save to file
pdf-extract document.pdf output.txtAPI Reference
extractText(input, options)
Returns just the text string from a PDF.
input- File path (string) or Bufferoptions- Optional parsing options
extractFromPath(filePath, options)
Extracts text and metadata from a PDF file.
filePath- Path to the PDF file- Returns:
{ text, numPages, info, metadata, version }
extractFromBuffer(buffer, options)
Extracts text and metadata from a PDF buffer.
buffer- PDF file as Buffer- Returns:
{ text, numPages, info, metadata, version }
extractPages(input, startPage, endPage)
Extracts text from specific pages.
input- File path or BufferstartPage- Starting page (1-indexed)endPage- Ending page (1-indexed)
saveToFile(text, outputPath)
Saves text to a file.
text- Text content to saveoutputPath- Output file path
PDFExtractor Class
OOP interface with the same methods:
extract(filePath)- Extract from pathextractBuffer(buffer)- Extract from buffergetText(input)- Get text onlygetPages(input, start, end)- Extract specific pagessave(text, path)- Save to file
License
MIT
