npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pdfium-native

v0.5.1

Published

Native Node.js bindings for PDFium

Readme

pdfium-native

npm version Node.js License Platform

Fast, native PDF rendering and text extraction for Node.js — powered by PDFium, the same engine used in Chromium. Built as a C++ addon with N-API for ABI stability across Node.js versions.

Designed for server-side workloads. Non-blocking, fast, and production-ready.

🚀 Quick Start

import { loadDocument } from 'pdfium-native';

const doc = await loadDocument('invoice.pdf');
const page = await doc.getPage(0);

const text = await page.getText();
const image = await page.render({ scale: 3, format: 'png' }); // high-resolution render

page.close();
doc.destroy();

💡 Why pdfium-native?

⚡ Performance

  • Native C++ — no WASM overhead, no JS parsing
  • Non-blocking — all operations run off the main thread via libuv workers

🛠️ Developer experience

  • Built-in JPEG/PNG rendering — no extra dependencies like sharp
  • Prebuilt binaries for 10 platform/arch combinations — no compile step
  • Full TypeScript support with types included

🔒 Reliability

  • Built on PDFium — the PDF engine used in Chromium
  • ABI-stable via N-API — works across Node.js 20–24 without recompilation
  • Password-protected PDFs supported out of the box

🎯 Use cases

  • 🖼️ Generate thumbnails and previews for uploaded PDFs
  • 📄 Extract searchable text from documents at scale
  • ⚙️ Build server-side PDF processing pipelines
  • ✂️ Split PDFs by page range or merge multiple PDFs into one
  • 🔗 Read annotations, bookmarks, links, and form fields from existing PDFs

📊 How it compares

| | pdfium-native | @hyzyla/pdfium | pdfjs-dist | | --------------- | -------------------- | ------------------- | ----------------- | | Engine | PDFium (C++ addon) | PDFium (WASM) | pdf.js (JS) | | Rendering | ✅ JPEG/PNG built-in | ⚠️ Raw bitmap (BYO) | ⚠️ Canvas/browser | | Text extraction | ✅ | ❌ | ✅ | | Search | ✅ With rects | ❌ | ⚠️ Manual | | Split / Merge | ✅ | ❌ | ❌ | | Annotations | ✅ | ❌ | ⚠️ Partial | | Bookmarks | ✅ | ❌ | ✅ | | Links | ✅ | ❌ | ✅ | | Form fields | ✅ | ❌ | ✅ | | Async I/O | ✅ libuv workers | ❌ Sync | ❌ Main thread | | Platforms | macOS/Linux/Windows | Any (WASM) | Any |

¹ Prebuilt binaries downloaded at install — no runtime dependencies. Falls back to source compilation if unavailable.

Table of Contents

📦 Install

npm install pdfium-native

Prebuilt binaries are available for all supported platforms — most installs require no compiler. If no prebuilt is available, the package falls back to compiling from source (requires a C++ toolchain: Xcode CLI tools on macOS, build-essential on Linux, Visual Studio on Windows).

🌍 Supported Platforms

| OS | Architectures | | --------------------- | ---------------------- | | macOS | arm64, x64 | | Linux (glibc) | x64, arm64, arm, ppc64 | | Linux (musl / Alpine) | x64, arm64 | | Windows | x64, arm64 |

📚 API

loadDocument(input, password?)

Opens a PDF from a Buffer or file path string. Returns Promise<PDFiumDocument>.

const doc = await loadDocument(buffer);
const doc = await loadDocument('/path/to/file.pdf');
const doc = await loadDocument(buffer, 'secret');

splitDocument(input, splitAt, options?)

Splits a PDF into multiple documents at the given page indices. Each index in splitAt marks the first page of a new chunk. Returns Promise<Buffer[]>, or Promise<void> if outputs is set.

A 10-page PDF with splitAt: [3, 7] produces three documents: pages 0–2, 3–6, and 7–9.

import { splitDocument } from 'pdfium-native';

// split a two-page PDF into two single-page documents
const [part1, part2] = await splitDocument('report.pdf', [1]);

// split into three parts (no split points = single document containing all pages)
const [a, b, c] = await splitDocument(buffer, [3, 7]);

// write parts to files
await splitDocument('report.pdf', [5], {
  outputs: ['first-half.pdf', 'second-half.pdf'],
});

// password-protected source
const parts = await splitDocument('encrypted.pdf', [3], { password: 'secret' });

| Option | Type | Default | Description | | ---------- | ---------- | ------- | --------------------------------------------------------------------------------------------------------- | | outputs | string[] | — | Write each part to these file paths instead of returning Buffers. Must have splitAt.length + 1 entries. | | password | string | — | Password for the source PDF. |


mergeDocuments(inputs, options?)

Combines multiple PDFs into a single document. Returns Promise<Buffer>, or Promise<void> if output is set.

Each element of inputs can be a Buffer, a file path string, or an object { input, password? } for password-protected PDFs.

import { mergeDocuments } from 'pdfium-native';

// merge two files
const buf = await mergeDocuments(['part1.pdf', 'part2.pdf']);

// mix buffers, paths, and password-protected PDFs
const buf = await mergeDocuments([
  buffer1,
  'part2.pdf',
  { input: 'encrypted.pdf', password: 'secret' },
]);

// write directly to a file
await mergeDocuments(['a.pdf', 'b.pdf'], { output: 'merged.pdf' });

| Option | Type | Default | Description | | -------- | -------- | ------- | ------------------------------------------------------ | | output | string | — | Write to this file path instead of returning a Buffer. |


PDFiumDocument

| Property | Type | Description | | ----------- | ------------------ | --------------------------------------- | | pageCount | number | Total number of pages. | | metadata | DocumentMetadata | Title, author, dates, PDF version, etc. |

getPage(index)

Loads a page by 0-based index. Returns Promise<PDFiumPage>.

pages()

Async generator that yields every page. Caller must close each page.

for await (const page of doc.pages()) {
  console.log(await page.getText());
  page.close();
}

getBookmarks()

Returns the bookmark/outline tree. Returns Promise<Bookmark[]>.

interface Bookmark {
  title: string;
  pageIndex?: number;
  open: boolean; // whether the node is initially expanded
  actionType?: 'goto' | 'remoteGoto' | 'uri' | 'launch' | 'embeddedGoto';
  url?: string; // external URL for URI bookmarks
  destX?: number; // destination X coordinate
  destY?: number; // destination Y coordinate
  destZoom?: number; // destination zoom level
  children?: Bookmark[];
}

destroy()

Closes the document and frees all native resources. Must be called when done.


PDFiumPage

| Property | Type | Description | | ----------------- | ------------------- | --------------------------------------------------- | | width | number | Page width in points (1 pt = 1/72 inch). | | height | number | Page height in points. | | number | number | 0-based page index. | | objectCount | number | Number of page objects (text, images, paths, etc.). | | rotation | number | Page rotation: 0, 1 (90° CW), 2 (180°), 3 (270°). | | hasTransparency | boolean | Whether the page has transparency. | | label | string? | Page label (e.g. 'i', 'ii', '1'). | | cropBox | PageObjectBounds? | Crop box (visible region), if set. | | trimBox | PageObjectBounds? | Trim box (intended finished size), if set. |

getText()

Extracts all text from the page. Returns Promise<string>.

render(options?)

Renders the page to an encoded image. Returns Promise<Buffer>, or Promise<void> when output is specified.

interface PageRenderOptions {
  scale?: number; // default: 1 (72 DPI). Use 3–4 for print quality.
  width?: number; // override render width in pixels
  height?: number; // override render height in pixels
  format?: 'jpeg' | 'png'; // default: 'png'
  quality?: number; // JPEG quality 1–100 (default: 100)
  output?: string; // write to file instead of returning a Buffer
  rotation?: 0 | 1 | 2 | 3; // 0=none, 1=90° CW, 2=180°, 3=270° CW
  transparent?: boolean; // transparent background (PNG only, default: false)
  renderAnnotations?: boolean; // render annotations (default: true)
  grayscale?: boolean; // render in grayscale
  lcdText?: boolean; // LCD-optimized sub-pixel text rendering
}

getObject(index)

Returns the page object at the given index. Returns Promise<PageObject>. Objects are discriminated by type:

type PageObject = TextPageObject | ImagePageObject | OtherPageObject;

// all objects have:
//   bounds: { left, bottom, right, top }
//   fillColor: { r, g, b, a } | null
//   strokeColor: { r, g, b, a } | null

// type: 'text' adds: text, fontSize, fontName, fontWeight?, italicAngle?,
//   renderMode?, fontFamily?, isEmbedded?, fontFlags?
// type: 'image' adds: imageWidth, imageHeight, horizontalDpi?, verticalDpi?,
//   bitsPerPixel?, colorspace?, filters?, render()
// type: 'path' | 'shading' | 'form' | 'unknown'

Image objects have a render() method for extracting the embedded image:

const obj = await page.getObject(0);
if (obj.type === 'image') {
  const png = await obj.render(); // PNG buffer (default)
  const jpeg = await obj.render({ format: 'jpeg', quality: 80 });
  const raw = await obj.render({ format: 'raw' }); // original encoded stream
  await obj.render({ output: '/tmp/image.png' }); // write to file
  await obj.render({ rendered: true }); // apply image mask and transformation matrix
}
interface ImageRenderOptions {
  format?: 'jpeg' | 'png' | 'raw'; // default: 'png'. 'raw' returns original stream bytes
  quality?: number; // JPEG quality 1–100 (default: 100)
  output?: string; // write to file instead of returning a Buffer
  rendered?: boolean; // apply image mask and transformation matrix (default: false)
}

objects()

Async generator that yields every page object. Convenience wrapper around getObject().

for await (const obj of page.objects()) {
  if (obj.type === 'image') {
    await obj.render({ output: `image-${obj.imageWidth}x${obj.imageHeight}.png` });
  }
}

getLinks()

Returns all links on the page. Returns Promise<Link[]>.

interface Link {
  bounds?: { left; bottom; right; top };
  url?: string; // external URL
  pageIndex?: number; // internal link target
  actionType?: 'goto' | 'remoteGoto' | 'uri' | 'launch' | 'embeddedGoto' | 'unknown';
  destX?: number; // destination X coordinate
  destY?: number; // destination Y coordinate
  destZoom?: number; // destination zoom level
  filePath?: string; // file path for remote goto / launch actions
}

search(text, options?)

Searches for text on the page. Returns Promise<SearchMatch[]> with character positions and bounding rectangles.

const matches = await page.search('invoice', {
  caseSensitive: true,
  wholeWord: false,
  consecutive: false,
});
// [{ charIndex: 42, length: 7, matchedText: 'invoice', rects: [{ left, top, right, bottom }] }]

getAnnotations()

Returns all annotations on the page. Returns Promise<Annotation[]>.

interface Annotation {
  type: 'text' | 'link' | 'highlight' | 'underline' | 'strikeout' | /* ... */ 'unknown';
  bounds?: { left; bottom; right; top };
  contents: string;
  color: { r; g; b; a } | null;
  interiorColor?: { r; g; b; a }; // fill color for markup annotations
  author: string; // annotation author
  subject: string; // annotation subject
  creationDate: string; // PDF date string (e.g. "D:20250101120000Z")
  modDate: string; // modification date
  flags: number; // annotation flags bitmask (PDF spec Table 165)
  border?: { horizontalRadius; verticalRadius; width };
  quadPoints?: Array<{ x1; y1; x2; y2; x3; y3; x4; y4 }>;
}

getFormFields()

Returns all form fields on the page. Returns Promise<FormField[]>.

interface FormField {
  type:
    | 'unknown'
    | 'pushButton'
    | 'checkbox'
    | 'radioButton'
    | 'comboBox'
    | 'listBox'
    | 'textField'
    | 'signature';
  name: string; // field name
  value: string; // current value
  alternateName?: string; // tooltip / alternate field name
  exportValue?: string; // export value (checkboxes / radio buttons)
  flags: number; // field flags bitmask
  bounds?: { left; bottom; right; top };
  isChecked: boolean; // whether checkbox / radio is checked
  options?: FormFieldOption[]; // options for combo box / list box
}

interface FormFieldOption {
  label: string;
  isSelected: boolean;
}
const fields = await page.getFormFields();
const textFields = fields.filter((f) => f.type === 'textField');
const checked = fields.filter((f) => f.isChecked);

close()

Closes the page and frees resources. Must be called when done with the page.


DocumentMetadata

interface DocumentMetadata {
  title: string;
  author: string;
  subject: string;
  keywords: string;
  creator: string;
  producer: string;
  creationDate: string;
  modDate: string;
  pdfVersion: number; // e.g. 17 for PDF 1.7
  permissions: {
    print: boolean;
    modify: boolean;
    copy: boolean;
    annotate: boolean;
    fillForms: boolean;
    extractForAccessibility: boolean;
    assemble: boolean;
    printHighQuality: boolean;
  };
  isTagged: boolean; // whether the PDF is a tagged PDF
  language: string; // document language (e.g. 'en-US')
  signatureCount: number; // number of digital signatures
  attachmentCount: number; // number of file attachments
  permanentId?: string; // permanent file identifier (hex)
  changingId?: string; // changing file identifier (hex)
}

⚙️ Concurrency

concurrency(value?): number

Gets or sets the maximum number of concurrent native operations dispatched to the thread pool.

The default is the number of CPU cores (os.availableParallelism()). A value of 0 resets to the default.

PDFium is single-threaded internally — all operations are serialized through a global mutex. The concurrency limiter prevents excess libuv worker threads from being blocked waiting on that mutex.

import { loadDocument, concurrency } from 'pdfium-native';

concurrency(); // 8 (CPU cores)
concurrency(2); // limit to 2 concurrent operations
concurrency(0); // reset to default

🙏 Acknowledgements

This project uses prebuilt PDFium binaries from bblanchon/pdfium-binaries, which provides automated builds of the PDFium library for multiple platforms. Thanks to @bblanchon for maintaining this invaluable resource.

🧹 Memory Management

Always call page.close() and doc.destroy() when done. While GC-triggered destructor hooks exist as a safety net, they should not be relied on — explicit cleanup ensures resources are freed promptly.

const doc = await loadDocument('file.pdf');
try {
  const page = await doc.getPage(0);
  try {
    // use page
  } finally {
    page.close();
  }
} finally {
  doc.destroy();
}

📄 License

MIT