npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

node-pdf-extractor

v1.0.0

Published

A simple and powerful Node.js PDF text extractor

Downloads

105

Readme

node-pdf-extractor

A simple and powerful Node.js PDF text extractor.

Installation

npm install node-pdf-extractor

Or install globally for CLI usage:

npm install -g node-pdf-extractor

Usage

As a Module

const { extractText, extractFromPath, PDFExtractor } = require('node-pdf-extractor');

// Simple text extraction
const text = await extractText('document.pdf');
console.log(text);

// Full extraction with metadata
const result = await extractFromPath('document.pdf');
console.log(result.text);       // Extracted text
console.log(result.numPages);   // Number of pages
console.log(result.info);       // PDF info (title, author, etc.)

// Using the class
const extractor = new PDFExtractor();
const data = await extractor.extract('document.pdf');
console.log(data.text);

Extract from Buffer

const fs = require('fs');
const { extractFromBuffer } = require('node-pdf-extractor');

const buffer = fs.readFileSync('document.pdf');
const result = await extractFromBuffer(buffer);
console.log(result.text);

With Express/Multer (File Uploads)

const express = require('express');
const multer = require('multer');
const { extractFromBuffer } = require('node-pdf-extractor');

const app = express();
const upload = multer({ storage: multer.memoryStorage() });

app.post('/extract', upload.single('pdf'), async (req, res) => {
    try {
        const result = await extractFromBuffer(req.file.buffer);
        res.json(result);
    } catch (error) {
        res.status(500).json({ error: error.message });
    }
});

app.listen(3000);

CLI Usage

# Extract and print to console
pdf-extract document.pdf

# Extract and save to file
pdf-extract document.pdf output.txt

API Reference

extractText(input, options)

Returns just the text string from a PDF.

  • input - File path (string) or Buffer
  • options - Optional parsing options

extractFromPath(filePath, options)

Extracts text and metadata from a PDF file.

  • filePath - Path to the PDF file
  • Returns: { text, numPages, info, metadata, version }

extractFromBuffer(buffer, options)

Extracts text and metadata from a PDF buffer.

  • buffer - PDF file as Buffer
  • Returns: { text, numPages, info, metadata, version }

extractPages(input, startPage, endPage)

Extracts text from specific pages.

  • input - File path or Buffer
  • startPage - Starting page (1-indexed)
  • endPage - Ending page (1-indexed)

saveToFile(text, outputPath)

Saves text to a file.

  • text - Text content to save
  • outputPath - Output file path

PDFExtractor Class

OOP interface with the same methods:

  • extract(filePath) - Extract from path
  • extractBuffer(buffer) - Extract from buffer
  • getText(input) - Get text only
  • getPages(input, start, end) - Extract specific pages
  • save(text, path) - Save to file

License

MIT