pdfium-native
v0.5.1
Published
Native Node.js bindings for PDFium
Maintainers
Readme
pdfium-native
Fast, native PDF rendering and text extraction for Node.js — powered by PDFium, the same engine used in Chromium. Built as a C++ addon with N-API for ABI stability across Node.js versions.
Designed for server-side workloads. Non-blocking, fast, and production-ready.
🚀 Quick Start
import { loadDocument } from 'pdfium-native';
const doc = await loadDocument('invoice.pdf');
const page = await doc.getPage(0);
const text = await page.getText();
const image = await page.render({ scale: 3, format: 'png' }); // high-resolution render
page.close();
doc.destroy();💡 Why pdfium-native?
⚡ Performance
- Native C++ — no WASM overhead, no JS parsing
- Non-blocking — all operations run off the main thread via libuv workers
🛠️ Developer experience
- Built-in JPEG/PNG rendering — no extra dependencies like sharp
- Prebuilt binaries for 10 platform/arch combinations — no compile step
- Full TypeScript support with types included
🔒 Reliability
- Built on PDFium — the PDF engine used in Chromium
- ABI-stable via N-API — works across Node.js 20–24 without recompilation
- Password-protected PDFs supported out of the box
🎯 Use cases
- 🖼️ Generate thumbnails and previews for uploaded PDFs
- 📄 Extract searchable text from documents at scale
- ⚙️ Build server-side PDF processing pipelines
- ✂️ Split PDFs by page range or merge multiple PDFs into one
- 🔗 Read annotations, bookmarks, links, and form fields from existing PDFs
📊 How it compares
| | pdfium-native | @hyzyla/pdfium | pdfjs-dist | | --------------- | -------------------- | ------------------- | ----------------- | | Engine | PDFium (C++ addon) | PDFium (WASM) | pdf.js (JS) | | Rendering | ✅ JPEG/PNG built-in | ⚠️ Raw bitmap (BYO) | ⚠️ Canvas/browser | | Text extraction | ✅ | ❌ | ✅ | | Search | ✅ With rects | ❌ | ⚠️ Manual | | Split / Merge | ✅ | ❌ | ❌ | | Annotations | ✅ | ❌ | ⚠️ Partial | | Bookmarks | ✅ | ❌ | ✅ | | Links | ✅ | ❌ | ✅ | | Form fields | ✅ | ❌ | ✅ | | Async I/O | ✅ libuv workers | ❌ Sync | ❌ Main thread | | Platforms | macOS/Linux/Windows | Any (WASM) | Any |
¹ Prebuilt binaries downloaded at install — no runtime dependencies. Falls back to source compilation if unavailable.
Table of Contents
📦 Install
npm install pdfium-nativePrebuilt binaries are available for all supported platforms — most installs require no compiler. If no prebuilt is available, the package falls back to compiling from source (requires a C++ toolchain: Xcode CLI tools on macOS, build-essential on Linux, Visual Studio on Windows).
🌍 Supported Platforms
| OS | Architectures | | --------------------- | ---------------------- | | macOS | arm64, x64 | | Linux (glibc) | x64, arm64, arm, ppc64 | | Linux (musl / Alpine) | x64, arm64 | | Windows | x64, arm64 |
📚 API
loadDocument(input, password?)
Opens a PDF from a Buffer or file path string. Returns Promise<PDFiumDocument>.
const doc = await loadDocument(buffer);
const doc = await loadDocument('/path/to/file.pdf');
const doc = await loadDocument(buffer, 'secret');splitDocument(input, splitAt, options?)
Splits a PDF into multiple documents at the given page indices. Each index in splitAt marks the first page of a new chunk. Returns Promise<Buffer[]>, or Promise<void> if outputs is set.
A 10-page PDF with splitAt: [3, 7] produces three documents: pages 0–2, 3–6, and 7–9.
import { splitDocument } from 'pdfium-native';
// split a two-page PDF into two single-page documents
const [part1, part2] = await splitDocument('report.pdf', [1]);
// split into three parts (no split points = single document containing all pages)
const [a, b, c] = await splitDocument(buffer, [3, 7]);
// write parts to files
await splitDocument('report.pdf', [5], {
outputs: ['first-half.pdf', 'second-half.pdf'],
});
// password-protected source
const parts = await splitDocument('encrypted.pdf', [3], { password: 'secret' });| Option | Type | Default | Description |
| ---------- | ---------- | ------- | --------------------------------------------------------------------------------------------------------- |
| outputs | string[] | — | Write each part to these file paths instead of returning Buffers. Must have splitAt.length + 1 entries. |
| password | string | — | Password for the source PDF. |
mergeDocuments(inputs, options?)
Combines multiple PDFs into a single document. Returns Promise<Buffer>, or Promise<void> if output is set.
Each element of inputs can be a Buffer, a file path string, or an object { input, password? } for password-protected PDFs.
import { mergeDocuments } from 'pdfium-native';
// merge two files
const buf = await mergeDocuments(['part1.pdf', 'part2.pdf']);
// mix buffers, paths, and password-protected PDFs
const buf = await mergeDocuments([
buffer1,
'part2.pdf',
{ input: 'encrypted.pdf', password: 'secret' },
]);
// write directly to a file
await mergeDocuments(['a.pdf', 'b.pdf'], { output: 'merged.pdf' });| Option | Type | Default | Description |
| -------- | -------- | ------- | ------------------------------------------------------ |
| output | string | — | Write to this file path instead of returning a Buffer. |
PDFiumDocument
| Property | Type | Description |
| ----------- | ------------------ | --------------------------------------- |
| pageCount | number | Total number of pages. |
| metadata | DocumentMetadata | Title, author, dates, PDF version, etc. |
getPage(index)
Loads a page by 0-based index. Returns Promise<PDFiumPage>.
pages()
Async generator that yields every page. Caller must close each page.
for await (const page of doc.pages()) {
console.log(await page.getText());
page.close();
}getBookmarks()
Returns the bookmark/outline tree. Returns Promise<Bookmark[]>.
interface Bookmark {
title: string;
pageIndex?: number;
open: boolean; // whether the node is initially expanded
actionType?: 'goto' | 'remoteGoto' | 'uri' | 'launch' | 'embeddedGoto';
url?: string; // external URL for URI bookmarks
destX?: number; // destination X coordinate
destY?: number; // destination Y coordinate
destZoom?: number; // destination zoom level
children?: Bookmark[];
}destroy()
Closes the document and frees all native resources. Must be called when done.
PDFiumPage
| Property | Type | Description |
| ----------------- | ------------------- | --------------------------------------------------- |
| width | number | Page width in points (1 pt = 1/72 inch). |
| height | number | Page height in points. |
| number | number | 0-based page index. |
| objectCount | number | Number of page objects (text, images, paths, etc.). |
| rotation | number | Page rotation: 0, 1 (90° CW), 2 (180°), 3 (270°). |
| hasTransparency | boolean | Whether the page has transparency. |
| label | string? | Page label (e.g. 'i', 'ii', '1'). |
| cropBox | PageObjectBounds? | Crop box (visible region), if set. |
| trimBox | PageObjectBounds? | Trim box (intended finished size), if set. |
getText()
Extracts all text from the page. Returns Promise<string>.
render(options?)
Renders the page to an encoded image. Returns Promise<Buffer>, or Promise<void> when output is specified.
interface PageRenderOptions {
scale?: number; // default: 1 (72 DPI). Use 3–4 for print quality.
width?: number; // override render width in pixels
height?: number; // override render height in pixels
format?: 'jpeg' | 'png'; // default: 'png'
quality?: number; // JPEG quality 1–100 (default: 100)
output?: string; // write to file instead of returning a Buffer
rotation?: 0 | 1 | 2 | 3; // 0=none, 1=90° CW, 2=180°, 3=270° CW
transparent?: boolean; // transparent background (PNG only, default: false)
renderAnnotations?: boolean; // render annotations (default: true)
grayscale?: boolean; // render in grayscale
lcdText?: boolean; // LCD-optimized sub-pixel text rendering
}getObject(index)
Returns the page object at the given index. Returns Promise<PageObject>. Objects are discriminated by type:
type PageObject = TextPageObject | ImagePageObject | OtherPageObject;
// all objects have:
// bounds: { left, bottom, right, top }
// fillColor: { r, g, b, a } | null
// strokeColor: { r, g, b, a } | null
// type: 'text' adds: text, fontSize, fontName, fontWeight?, italicAngle?,
// renderMode?, fontFamily?, isEmbedded?, fontFlags?
// type: 'image' adds: imageWidth, imageHeight, horizontalDpi?, verticalDpi?,
// bitsPerPixel?, colorspace?, filters?, render()
// type: 'path' | 'shading' | 'form' | 'unknown'Image objects have a render() method for extracting the embedded image:
const obj = await page.getObject(0);
if (obj.type === 'image') {
const png = await obj.render(); // PNG buffer (default)
const jpeg = await obj.render({ format: 'jpeg', quality: 80 });
const raw = await obj.render({ format: 'raw' }); // original encoded stream
await obj.render({ output: '/tmp/image.png' }); // write to file
await obj.render({ rendered: true }); // apply image mask and transformation matrix
}interface ImageRenderOptions {
format?: 'jpeg' | 'png' | 'raw'; // default: 'png'. 'raw' returns original stream bytes
quality?: number; // JPEG quality 1–100 (default: 100)
output?: string; // write to file instead of returning a Buffer
rendered?: boolean; // apply image mask and transformation matrix (default: false)
}objects()
Async generator that yields every page object. Convenience wrapper around getObject().
for await (const obj of page.objects()) {
if (obj.type === 'image') {
await obj.render({ output: `image-${obj.imageWidth}x${obj.imageHeight}.png` });
}
}getLinks()
Returns all links on the page. Returns Promise<Link[]>.
interface Link {
bounds?: { left; bottom; right; top };
url?: string; // external URL
pageIndex?: number; // internal link target
actionType?: 'goto' | 'remoteGoto' | 'uri' | 'launch' | 'embeddedGoto' | 'unknown';
destX?: number; // destination X coordinate
destY?: number; // destination Y coordinate
destZoom?: number; // destination zoom level
filePath?: string; // file path for remote goto / launch actions
}search(text, options?)
Searches for text on the page. Returns Promise<SearchMatch[]> with character positions and bounding rectangles.
const matches = await page.search('invoice', {
caseSensitive: true,
wholeWord: false,
consecutive: false,
});
// [{ charIndex: 42, length: 7, matchedText: 'invoice', rects: [{ left, top, right, bottom }] }]getAnnotations()
Returns all annotations on the page. Returns Promise<Annotation[]>.
interface Annotation {
type: 'text' | 'link' | 'highlight' | 'underline' | 'strikeout' | /* ... */ 'unknown';
bounds?: { left; bottom; right; top };
contents: string;
color: { r; g; b; a } | null;
interiorColor?: { r; g; b; a }; // fill color for markup annotations
author: string; // annotation author
subject: string; // annotation subject
creationDate: string; // PDF date string (e.g. "D:20250101120000Z")
modDate: string; // modification date
flags: number; // annotation flags bitmask (PDF spec Table 165)
border?: { horizontalRadius; verticalRadius; width };
quadPoints?: Array<{ x1; y1; x2; y2; x3; y3; x4; y4 }>;
}getFormFields()
Returns all form fields on the page. Returns Promise<FormField[]>.
interface FormField {
type:
| 'unknown'
| 'pushButton'
| 'checkbox'
| 'radioButton'
| 'comboBox'
| 'listBox'
| 'textField'
| 'signature';
name: string; // field name
value: string; // current value
alternateName?: string; // tooltip / alternate field name
exportValue?: string; // export value (checkboxes / radio buttons)
flags: number; // field flags bitmask
bounds?: { left; bottom; right; top };
isChecked: boolean; // whether checkbox / radio is checked
options?: FormFieldOption[]; // options for combo box / list box
}
interface FormFieldOption {
label: string;
isSelected: boolean;
}const fields = await page.getFormFields();
const textFields = fields.filter((f) => f.type === 'textField');
const checked = fields.filter((f) => f.isChecked);close()
Closes the page and frees resources. Must be called when done with the page.
DocumentMetadata
interface DocumentMetadata {
title: string;
author: string;
subject: string;
keywords: string;
creator: string;
producer: string;
creationDate: string;
modDate: string;
pdfVersion: number; // e.g. 17 for PDF 1.7
permissions: {
print: boolean;
modify: boolean;
copy: boolean;
annotate: boolean;
fillForms: boolean;
extractForAccessibility: boolean;
assemble: boolean;
printHighQuality: boolean;
};
isTagged: boolean; // whether the PDF is a tagged PDF
language: string; // document language (e.g. 'en-US')
signatureCount: number; // number of digital signatures
attachmentCount: number; // number of file attachments
permanentId?: string; // permanent file identifier (hex)
changingId?: string; // changing file identifier (hex)
}⚙️ Concurrency
concurrency(value?): number
Gets or sets the maximum number of concurrent native operations dispatched to the thread pool.
The default is the number of CPU cores (os.availableParallelism()). A value of 0 resets to the default.
PDFium is single-threaded internally — all operations are serialized through a global mutex. The concurrency limiter prevents excess libuv worker threads from being blocked waiting on that mutex.
import { loadDocument, concurrency } from 'pdfium-native';
concurrency(); // 8 (CPU cores)
concurrency(2); // limit to 2 concurrent operations
concurrency(0); // reset to default🙏 Acknowledgements
This project uses prebuilt PDFium binaries from bblanchon/pdfium-binaries, which provides automated builds of the PDFium library for multiple platforms. Thanks to @bblanchon for maintaining this invaluable resource.
🧹 Memory Management
Always call page.close() and doc.destroy() when done. While GC-triggered destructor hooks exist as a safety net, they should not be relied on — explicit cleanup ensures resources are freed promptly.
const doc = await loadDocument('file.pdf');
try {
const page = await doc.getPage(0);
try {
// use page
} finally {
page.close();
}
} finally {
doc.destroy();
}📄 License
MIT
