afpp

v2.5.2

Published

3 days ago

Async Fast PDF Parser for Node.js — dependency-light, TypeScript-first, production-ready.

Downloads

708

0High
0Medium
0Low

l2ysho

parse-pdf parser-pdf pdf pdf-parser pdf-to-text pdf.js pdf2json pdf2text pdfreader

afpp

Version npm Downloads Repo Size Last Commit

afpp — A modern, dependency-light PDF parser for Node.js.
Built for performance, reliability, and developer sanity.

Overview

afpp (Another PDF Parser, Properly) is a Node.js library for extracting text and images from PDF files without manual native build steps, event-loop blocking, or fragile runtime assumptions.

The project was created to address recurring problems encountered with existing PDF tooling in the Node.js ecosystem:

Excessive bundle sizes and transitive dependencies
Native build steps (canvas, ImageMagick, Ghostscript)
Browser-specific assumptions (window, DOM, canvas)
Poor TypeScript support
Unreliable handling of encrypted PDFs
Performance and memory inefficiencies

afpp focuses on predictable behavior, explicit APIs, and production-ready defaults.

Key Features

No manual build step required — prebuilt native binaries are bundled automatically via @napi-rs/canvas
Fully asynchronous, non-blocking architecture
First-class TypeScript support
Supports local files, buffers, and remote URLs
Handles encrypted PDFs
Configurable concurrency and rendering scale
Minimal and auditable dependency graph

Requirements

Node.js >= 22.14.0

Installation

Install using your preferred package manager:

npm install afpp
# or
yarn add afpp
# or
pnpm add afpp

Quick Start

All parsing functions accept the same input types:

string (file path)
Buffer
Uint8Array
URL

Extract Text from a PDF

import { pdf2string } from 'afpp';

const pages = await pdf2string('./document.pdf');
console.log(pages); // ['Page 1 text', 'Page 2 text', ...]

Render PDF Pages as Images

import { pdf2image } from 'afpp';

(async () => {
  const url = new URL('https://pdfobject.com/pdf/sample.pdf');
  const images = await pdf2image(url);

  console.log(images); // [Buffer, Buffer, ...]
})();

Streaming API (Large PDFs)

For large PDFs, use streaming functions to process pages incrementally without loading all results into memory:

import { writeFile } from 'fs/promises';

import { streamPdf2image, streamPdf2string } from 'afpp';

// Stream images - process each page as it's rendered
for await (const { pageNumber, pageCount, data } of streamPdf2image(
  './large.pdf',
)) {
  await writeFile(`page-${pageNumber}.png`, data);
  console.log(`Processed ${pageNumber}/${pageCount}`);
}

// Stream text - process each page as it's extracted
for await (const { pageNumber, data } of streamPdf2string('./large.pdf')) {
  console.log(`Page ${pageNumber}: ${data.substring(0, 100)}...`);
}

Benefits:

Lower peak memory usage
Faster time-to-first-result
Built-in progress tracking via pageNumber and pageCount

Extract PDF Metadata

import { getPdfMetadata } from 'afpp';

const metadata = await getPdfMetadata('./document.pdf');
console.log(metadata.pageCount); // e.g. 9
console.log(metadata.isEncrypted); // false
console.log(metadata.title); // 'My Document' or undefined
console.log(metadata.creationDate); // Date object or undefined

// Encrypted PDF
const meta = await getPdfMetadata('./secure.pdf', { password: 'secret' });
console.log(meta.isEncrypted); // true

Low-Level Parsing API

For advanced use cases, parsePdf exposes page-level control and transformation.

import { parsePdf } from 'afpp';

(async () => {
  const response = await fetch('https://pdfobject.com/pdf/sample.pdf');
  const buffer = Buffer.from(await response.arrayBuffer());

  const result = await parsePdf(buffer, {}, (pageContent) => pageContent);
  console.log(result);
})();

Configuration

All public APIs accept a shared options object.

const result = await parsePdf(buffer, {
  concurrency: 5,
  imageEncoding: 'jpeg',
  password: 'STRONG_PASS',
  scale: 4,
});

AfppParseOptions

| Option | Type | Default | Description | | --------------- | ------------------------------------- | ------- | ---------------------------------------------------------------------------------- | | concurrency | number \| 'auto' | 1 | Number of pages processed in parallel. Use 'auto' for CPU-based scaling. | | imageEncoding | 'png' \| 'jpeg' \| 'webp' \| 'avif' | 'png' | Output format for rendered images | | password | string | — | Password for encrypted PDFs | | scale | number | 1.0 | Rendering scale. Valid range: 0.1–10. (1.0 = 72 DPI, 2.0 = 144 DPI, 3.0 = 216 DPI) |

PdfMetadata

Returned by getPdfMetadata. All fields except pageCount and isEncrypted are optional — absent metadata fields are undefined, never empty strings.

| Field | Type | Description | | ------------------ | --------- | ------------------------------------------------ | | pageCount | number | Total number of pages | | isEncrypted | boolean | Whether the document required a password to open | | title | string? | Document title | | author | string? | Document author | | subject | string? | Document subject | | creator | string? | Application that created the document | | producer | string? | PDF producer application | | creationDate | Date? | Document creation date | | modificationDate | Date? | Document last modification date |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

afpp

Overview

Key Features

Requirements

Installation

Quick Start

Extract Text from a PDF

Render PDF Pages as Images

Streaming API (Large PDFs)

Extract PDF Metadata

Low-Level Parsing API

Configuration

AfppParseOptions

PdfMetadata

Design Principles

Contributing

License