afpp
v2.5.2
Published
Async Fast PDF Parser for Node.js — dependency-light, TypeScript-first, production-ready.
Downloads
708
Maintainers
Readme
afpp
afpp — A modern, dependency-light PDF parser for Node.js.
Built for performance, reliability, and developer sanity.
Overview
afpp (Another PDF Parser, Properly) is a Node.js library for extracting text and images from PDF files without manual native build steps, event-loop blocking, or fragile runtime assumptions.
The project was created to address recurring problems encountered with existing PDF tooling in the Node.js ecosystem:
- Excessive bundle sizes and transitive dependencies
- Native build steps (canvas, ImageMagick, Ghostscript)
- Browser-specific assumptions (
window, DOM, canvas) - Poor TypeScript support
- Unreliable handling of encrypted PDFs
- Performance and memory inefficiencies
afpp focuses on predictable behavior, explicit APIs, and production-ready defaults.
Key Features
- No manual build step required — prebuilt native binaries are bundled automatically via
@napi-rs/canvas - Fully asynchronous, non-blocking architecture
- First-class TypeScript support
- Supports local files, buffers, and remote URLs
- Handles encrypted PDFs
- Configurable concurrency and rendering scale
- Minimal and auditable dependency graph
Requirements
- Node.js >= 22.14.0
Installation
Install using your preferred package manager:
npm install afpp
# or
yarn add afpp
# or
pnpm add afppQuick Start
All parsing functions accept the same input types:
string(file path)BufferUint8ArrayURL
Extract Text from a PDF
import { pdf2string } from 'afpp';
const pages = await pdf2string('./document.pdf');
console.log(pages); // ['Page 1 text', 'Page 2 text', ...]Render PDF Pages as Images
import { pdf2image } from 'afpp';
(async () => {
const url = new URL('https://pdfobject.com/pdf/sample.pdf');
const images = await pdf2image(url);
console.log(images); // [Buffer, Buffer, ...]
})();Streaming API (Large PDFs)
For large PDFs, use streaming functions to process pages incrementally without loading all results into memory:
import { writeFile } from 'fs/promises';
import { streamPdf2image, streamPdf2string } from 'afpp';
// Stream images - process each page as it's rendered
for await (const { pageNumber, pageCount, data } of streamPdf2image(
'./large.pdf',
)) {
await writeFile(`page-${pageNumber}.png`, data);
console.log(`Processed ${pageNumber}/${pageCount}`);
}
// Stream text - process each page as it's extracted
for await (const { pageNumber, data } of streamPdf2string('./large.pdf')) {
console.log(`Page ${pageNumber}: ${data.substring(0, 100)}...`);
}Benefits:
- Lower peak memory usage
- Faster time-to-first-result
- Built-in progress tracking via
pageNumberandpageCount
Extract PDF Metadata
import { getPdfMetadata } from 'afpp';
const metadata = await getPdfMetadata('./document.pdf');
console.log(metadata.pageCount); // e.g. 9
console.log(metadata.isEncrypted); // false
console.log(metadata.title); // 'My Document' or undefined
console.log(metadata.creationDate); // Date object or undefined
// Encrypted PDF
const meta = await getPdfMetadata('./secure.pdf', { password: 'secret' });
console.log(meta.isEncrypted); // trueLow-Level Parsing API
For advanced use cases, parsePdf exposes page-level control and transformation.
import { parsePdf } from 'afpp';
(async () => {
const response = await fetch('https://pdfobject.com/pdf/sample.pdf');
const buffer = Buffer.from(await response.arrayBuffer());
const result = await parsePdf(buffer, {}, (pageContent) => pageContent);
console.log(result);
})();Configuration
All public APIs accept a shared options object.
const result = await parsePdf(buffer, {
concurrency: 5,
imageEncoding: 'jpeg',
password: 'STRONG_PASS',
scale: 4,
});AfppParseOptions
| Option | Type | Default | Description |
| --------------- | ------------------------------------- | ------- | ---------------------------------------------------------------------------------- |
| concurrency | number \| 'auto' | 1 | Number of pages processed in parallel. Use 'auto' for CPU-based scaling. |
| imageEncoding | 'png' \| 'jpeg' \| 'webp' \| 'avif' | 'png' | Output format for rendered images |
| password | string | — | Password for encrypted PDFs |
| scale | number | 1.0 | Rendering scale. Valid range: 0.1–10. (1.0 = 72 DPI, 2.0 = 144 DPI, 3.0 = 216 DPI) |
PdfMetadata
Returned by getPdfMetadata. All fields except pageCount and isEncrypted are optional — absent metadata fields are undefined, never empty strings.
| Field | Type | Description |
| ------------------ | --------- | ------------------------------------------------ |
| pageCount | number | Total number of pages |
| isEncrypted | boolean | Whether the document required a password to open |
| title | string? | Document title |
| author | string? | Document author |
| subject | string? | Document subject |
| creator | string? | Application that created the document |
| producer | string? | PDF producer application |
| creationDate | Date? | Document creation date |
| modificationDate | Date? | Document last modification date |
Design Principles
- Node-first: No browser globals or DOM assumptions
- Explicit over implicit: No magic configuration
- Fail fast: Clear errors instead of silent corruption
- Production-oriented: Optimized for long-running processes
Contributing
See CONTRIBUTING.md for development setup and pull request guidelines.
License
MIT © Richard Solár
