xpdf-wrapper
v0.1.0
Published
Node.js wrapper for Xpdf command-line tools
Maintainers
Readme
📄 xpdf-wrapper
A powerful Node.js wrapper for Xpdf command-line tools
Extract text, images, fonts, and metadata from PDF files with ease
Getting Started • API Reference • Examples • Configuration
🌟 Why xpdf-wrapper?
xpdf-wrapper brings the power of Xpdf's battle-tested PDF processing tools to Node.js. Whether you need to extract text for search indexing, convert PDFs to images, or analyze document metadata, this library provides a clean, modern API with full TypeScript support.
✨ Key Features
| Feature | Description |
|---------|-------------|
| 📄 Complete Xpdf Suite | All 9 tools included: pdftotext, pdftops, pdftoppm, pdftopng, pdftohtml, pdfinfo, pdfimages, pdffonts, pdfdetach |
| 🔄 Buffer Support | Process PDFs directly from memory - no need to save temporary files |
| 📝 Direct Text Output | pdftotext returns extracted text directly in result.text |
| 🎯 TypeScript First | Complete type definitions for all tools and options |
| ⚡ Zero Config | Xpdf binaries are automatically downloaded on install |
| 🔀 Flexible API | Choose between standalone functions or the unified Xpdf class |
| 🚀 Batch Processing | Process multiple PDFs or run multiple operations concurrently |
📦 Installation
# Using npm
npm install xpdf-wrapper
# Using yarn
yarn add xpdf-wrapper
# Using pnpm
pnpm add xpdf-wrapperNote: Xpdf binaries are automatically downloaded for your platform (Windows, macOS, Linux) during installation.
🚀 Quick Start
Basic Text Extraction
import { pdftotext } from "xpdf-wrapper";
// Extract text from a PDF file
const result = await pdftotext("./document.pdf");
console.log(result.text);Working with Buffers
import { pdftotext } from "xpdf-wrapper";
import { readFileSync } from "fs";
// Process PDF directly from a Buffer
const pdfBuffer = readFileSync("./document.pdf");
const result = await pdftotext(pdfBuffer);
console.log(result.text);Get PDF Metadata
import { pdfinfo } from "xpdf-wrapper";
const result = await pdfinfo("./document.pdf");
console.log(result.stdout);
// Output:
// Creator: Microsoft Word
// Producer: Adobe PDF Library
// CreationDate: Mon Dec 25 12:00:00 2024
// Pages: 5
// File size: 102400 bytes
// ...📚 API Reference
Available Tools
xpdf-wrapper provides wrappers for all 9 Xpdf command-line tools:
| Tool | Function | Description |
|------|----------|-------------|
| pdftotext | pdftotext() | Extract text content from PDF |
| pdftops | pdftops() | Convert PDF to PostScript |
| pdftoppm | pdftoppm() | Convert PDF pages to PPM images |
| pdftopng | pdftopng() | Convert PDF pages to PNG images |
| pdftohtml | pdftohtml() | Convert PDF to HTML |
| pdfinfo | pdfinfo() | Get PDF metadata and information |
| pdfimages | pdfimages() | Extract embedded images from PDF |
| pdffonts | pdffonts() | List fonts used in PDF |
| pdfdetach | pdfdetach() | Extract file attachments from PDF |
Standalone Functions
All tool wrappers accept either a file path (string) or a Buffer as input:
import {
pdftotext,
pdftops,
pdftoppm,
pdftopng,
pdftohtml,
pdfinfo,
pdfimages,
pdffonts,
pdfdetach
} from "xpdf-wrapper";
// Using file path
const text = await pdftotext("./document.pdf", undefined, { layout: true });
// Using Buffer
const buffer = readFileSync("./document.pdf");
const info = await pdfinfo(buffer, { rawDates: true });
// With options
const fonts = await pdffonts("./document.pdf");The Xpdf Class
For more structured results and batch operations, use the Xpdf class:
import { Xpdf } from "xpdf-wrapper";
import { readFileSync } from "fs";
const xpdf = new Xpdf();
// Extract text with parsed result
const textResult = await xpdf.pdfToText("./document.pdf");
console.log(textResult.text);
// Get PDF info with parsed metadata
const infoResult = await xpdf.pdfInfo("./document.pdf");
console.log(infoResult.info.Pages); // 5
console.log(infoResult.info.Creator); // "Microsoft Word"
// List fonts with parsed output
const fontsResult = await xpdf.pdfFonts("./document.pdf");
console.log(fontsResult.fonts); // Array of font objects
// Works with Buffers too
const buffer = readFileSync("./document.pdf");
const result = await xpdf.pdfInfo(buffer);Processing Multiple PDFs
Pass an array to process multiple PDF files:
const xpdf = new Xpdf();
// Process multiple PDFs
const results = await xpdf.pdfInfo([
"./document1.pdf",
"./document2.pdf",
"./document3.pdf"
]);
// Results is an array
results.forEach((result, index) => {
console.log(`Document ${index + 1}: ${result.info.Pages} pages`);
});
// Mix file paths and Buffers
const buffer = readFileSync("./document2.pdf");
const mixedResults = await xpdf.pdfToText([
"./document1.pdf",
buffer,
"./document3.pdf"
]);Batch Operations
Run multiple operations on the same PDF(s) concurrently:
const xpdf = new Xpdf();
// Run multiple operations on a single PDF
const results = await xpdf.batch("./document.pdf", [
"pdfInfo",
"pdfFonts",
"pdfToText"
]);
// Access results by operation name
console.log("Page count:", results.pdfInfo?.info.Pages);
console.log("Fonts used:", results.pdfFonts?.fonts);
console.log("Text content:", results.pdfToText?.text);⚙️ Configuration
Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| NODE_XPDF_BIN_DIR | <package>/bin | Custom path to Xpdf binaries |
Custom Options
Configure the Xpdf class with custom options:
import { Xpdf } from "xpdf-wrapper";
const xpdf = new Xpdf({
// Custom binary directory
binDir: "/opt/xpdf/bin",
// Runtime options
run: {
timeoutMs: 30000, // 30 second timeout
}
});Tool-Specific Options
Each tool supports its own set of options matching the Xpdf CLI:
// pdftotext options
await pdftotext("./doc.pdf", undefined, {
firstPage: 1,
lastPage: 10,
layout: true, // Maintain original layout
table: true, // Table mode
lineEnd: "unix", // Line endings: "unix" | "dos" | "mac"
enc: "UTF-8", // Output encoding
ownerPassword: "secret",
userPassword: "secret"
});
// pdfinfo options
await pdfinfo("./doc.pdf", {
firstPage: 1,
lastPage: 5,
box: true, // Print page box info
meta: true, // Print metadata
rawDates: true, // Print dates in raw format
});
// pdftopng options
await pdftopng("./doc.pdf", "./output", {
firstPage: 1,
lastPage: 1,
resolution: 300, // DPI
mono: true, // Monochrome output
gray: true, // Grayscale output
});📁 Examples
The examples/ directory contains working examples:
| Example | Description |
|---------|-------------|
| buffer-example.ts | Working with PDF Buffers |
| pdftotext-example.ts | Text extraction examples |
| pdfinfo-example.ts | Getting PDF metadata |
| batch-example.ts | Batch processing examples |
Running Examples
# First, build the project
npm run build
# Then run an example
npx tsx examples/buffer-example.ts
npx tsx examples/pdftotext-example.ts
npx tsx examples/pdfinfo-example.ts
npx tsx examples/batch-example.ts�️ Development
# Clone the repository
git clone https://github.com/iqbal-rashed/xpdf-wrapper.git
cd xpdf-wrapper
# Install dependencies
npm install
# Build the project
npm run build
# Run tests
npm test
# Run tests in watch mode
npm run test:watch
# Lint the code
npm run lint
# Format the code
npm run format🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📋 Requirements
- Node.js 18.0 or higher
- Platforms: Windows, macOS, Linux (binaries auto-downloaded)
🔗 Related Links
�📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Made with ❤️ by Rashed Iqbal
⭐ Star this repo if you find it helpful! ⭐
