@abineshsolairaj/pdf-merge

v1.5.0

Published

4 days ago

Utility to merge PDFs from Base64 strings, Buffers, file paths, or URLs, with page selection, document metadata, watermarks, page numbers, AbortSignal cancellation, and dual ESM/CJS support.

Downloads

946

@abineshsolairaj/pdf-merge

Merge PDFs in Node.js or TypeScript — from Base64, Buffers, file paths, or URLs — with per-input page selection, document metadata, watermarks, page numbers, AbortSignal cancellation, SSRF-safe URL fetching, typed errors, a CLI, and dual ESM/CommonJS support. Built on pdf-lib.

Why this library?

Most PDF-merge packages on npm cover the basics — take an array of buffers, concatenate them. @abineshsolairaj/pdf-merge adds the production-grade extras that real applications usually have to bolt on themselves:

Four input sources in one API — Base64, Buffer / Uint8Array, file paths, and URLs — so you don't have to pre-decode or pre-fetch.
Per-input page selection via array ([1, 3, 5]) or range string ("1-3,5,8-10") — pick exactly the pages you want from each source.
Document metadata — stamp title, author, subject, keywords, creator, and creation/modification dates on the merged output.
Watermarks and page numbers — overlay configurable text on every page, plus continuous page numbering with {current} / {total} templates.
AbortSignal cancellation — cancel mid-merge from React effects, request handlers, or batch jobs; in-flight URL fetches abort cleanly.
SSRF-safe URL fetching by default — protocol allowlist, response size cap, per-request timeout, redirect-protocol checks, bounded concurrency, and URL sanitization in error messages so signed-URL tokens never reach your logs.
Typed errors with input-index correlation — instanceof PdfFetchError tells you exactly which URL failed and gives you the sanitized .url and .index.
Bundled pdf-merge CLI (see CLI for the recommended invocation forms).
Dual ESM and CommonJS build — works in modern bundlers, Deno, Bun, and legacy require() consumers from a single install.
Order-preserving — output pages always follow the input array order.
Zero runtime config — sensible defaults, fully configurable per call.

Install

npm install @abineshsolairaj/pdf-merge

Requires Node.js 18+ (uses the global fetch and AbortController).

Quick start

import {
  mergeBase64PDFs,
  mergePdfBuffers,
  mergePdfFiles,
  mergePdfUrls,
} from '@abineshsolairaj/pdf-merge';

// From Base64 strings → Base64 string
const b64Out = await mergeBase64PDFs([pdfA_b64, pdfB_b64]);

// From raw bytes → Uint8Array (no Base64 round-trip)
const bytesOut = await mergePdfBuffers([bufA, bufB]);

// From disk → Uint8Array
const fileOut = await mergePdfFiles(['./a.pdf', './b.pdf']);

// From URLs → Base64 string
const urlOut = await mergePdfUrls(['https://example.com/a.pdf', 'https://example.com/b.pdf']);

// With watermark, page numbers, metadata, and cancellation
const controller = new AbortController();
const annotated = await mergePdfFiles(['./cover.pdf', './body.pdf'], {
  metadata: { title: 'Annual Report 2026', author: 'Operations' },
  watermark: { text: 'CONFIDENTIAL', opacity: 0.15, rotate: 45 },
  pageNumbers: { format: 'Page {current} of {total}' },
  signal: controller.signal,
});

API

`mergeBase64PDFs(inputs, options?): Promise<string>`

Merges Base64-encoded PDFs and returns the merged document as Base64.

Accepts plain Base64 or data:application/pdf;base64,… prefixed strings.
inputs is (string | { data: string; pages?: PageSelector })[].
options accepts metadata, watermark, pageNumbers, ignoreEncryption, and signal. See Document metadata, Watermark, Page numbers, Encrypted source PDFs, and Cancellation.

const merged = await mergeBase64PDFs([
  reportA_b64,                                // all pages of A
  { data: reportB_b64, pages: [1, 3, 5] },    // pages 1, 3, 5 of B
  { data: reportC_b64, pages: '1-3,7' },      // pages 1, 2, 3, 7 of C
]);

`mergePdfBuffers(inputs, options?): Promise<Uint8Array>`

Merges raw PDF bytes (from fs.readFile, an S3 SDK, an HTTP body, multer, etc.) and returns the merged document as a Uint8Array. Skips the Base64 round-trip, saving ~33 % memory vs mergeBase64PDFs.

inputs is (Buffer | Uint8Array | { data: Buffer | Uint8Array; pages?: PageSelector })[].
options accepts metadata, watermark, pageNumbers, ignoreEncryption, and signal. See Document metadata, Watermark, Page numbers, Encrypted source PDFs, and Cancellation.

const merged = await mergePdfBuffers([bufA, { data: bufB, pages: [4, 2] }]);

`mergePdfFiles(inputs, options?): Promise<Uint8Array>`

Reads PDFs from the given file paths in parallel and merges them. Returns a Uint8Array — write it straight to disk.

inputs is (string | { path: string; pages?: PageSelector })[].
options accepts metadata, watermark, pageNumbers, ignoreEncryption, and signal. See Document metadata, Watermark, Page numbers, Encrypted source PDFs, and Cancellation.

import { promises as fs } from 'fs';

const merged = await mergePdfFiles([
  './cover.pdf',
  { path: './body.pdf', pages: '2-9' },
]);
await fs.writeFile('./out.pdf', merged);

`mergePdfUrls(urls, options?): Promise<string>`

Fetches PDFs from the given URLs (concurrently, with a bounded pool) and returns the merged document as Base64.

urls is (string | { url: string; headers?: Record<string,string>; pages?: PageSelector })[].

Options:

| Option | Default | Description | | --- | --- | --- | | timeoutMs | 5000 | Per-request timeout in milliseconds. | | maxBytesPerUrl | 100 * 1024 * 1024 | Maximum bytes accepted from any single response. | | allowedProtocols | ['http:', 'https:'] | URL protocols allowed. Pass ['https:'] to harden further. | | concurrency | 8 | Maximum number of URL fetches running in parallel. | | headers | — | Default headers applied to every fetch. Per-URL headers (object form) override on key conflict. | | metadata | — | Document metadata to stamp on the merged output. See Document metadata. | | watermark | — | Text watermark to draw on every page of the merged output. See Watermark. | | pageNumbers | — | Stamp continuous page numbers. See Page numbers. | | signal | — | AbortSignal to cancel the merge. See Cancellation. | | ignoreEncryption | false | Allow merging source PDFs that declare encryption metadata. See Encrypted source PDFs. |

const merged = await mergePdfUrls(
  [
    'https://example.com/a.pdf',
    { url: 'https://example.com/b.pdf', headers: { Authorization: 'Bearer token-b' }, pages: '1-3' },
  ],
  {
    timeoutMs: 8000,
    maxBytesPerUrl: 20 * 1024 * 1024,
    allowedProtocols: ['https:'],
    concurrency: 4,
    headers: { 'User-Agent': 'my-app/1.0' },
  },
);

Document metadata

Every merge function accepts an optional second options argument with a metadata field. Stamp the merged document with whatever properties your downstream system surfaces (Finder, Explorer, document-management systems, email clients, etc.):

await mergePdfFiles(
  ['./cover.pdf', './body.pdf'],
  {
    metadata: {
      title: 'Annual Report 2026',
      author: 'Operations',
      subject: 'Year-end summary',
      keywords: ['annual', 'operations', '2026'],
      creator: 'my-app/1.0',
      creationDate: new Date(),
      modificationDate: new Date(),
    },
  },
);

The MergePdfUrlsOptions interface extends the same shape, so all four functions accept { metadata } the same way. Every field is optional — omit what you don't want to set.

The Producer field is hard-coded by pdf-lib on save and cannot be customized at this layer.

Watermark

Every merge function accepts an optional watermark field on the same options argument. The watermark is stamped on every page of the merged output using Helvetica (no additional fonts are embedded). The feature is fully opt-in — when watermark is omitted, no extra drawing is performed and the original byte-identical fast path is preserved.

await mergePdfFiles(
  ['./report.pdf'],
  {
    watermark: {
      text: 'CONFIDENTIAL',
      opacity: 0.18,         // 0–1, default 0.2
      fontSize: 90,          // points, default 48
      color: { r: 0.7, g: 0.1, b: 0.1 }, // RGB 0–1, default mid-gray
      rotate: 45,            // degrees CCW, default 0
      position: 'center',    // see below
    },
  },
);

position accepts a named placement — 'center' (default), 'top-left', 'top-right', 'bottom-left', 'bottom-right' — or an explicit { x, y } in PDF points measured from the bottom-left of the page. Named placements are computed per page so they work on any page size.

Invalid input throws PdfMergeError: empty text, opacity outside 0–1, non-positive font size, or color channels outside 0–1.

Page numbers

Stamp continuous page numbering on every page of the merged output. Common use case: stitch several PDFs together and number 1..N across the result. Available on the same options argument as everything else.

await mergePdfFiles(
  ['./cover.pdf', './body.pdf', './appendix.pdf'],
  {
    pageNumbers: {
      format: 'Page {current} of {total}', // tokens: {current}, {total}
      startAt: 1,                          // first-page value; default 1
      position: 'bottom-center',           // see below
      fontSize: 10,                        // default 10
      color: { r: 0.4, g: 0.4, b: 0.4 },   // default mid-gray
    },
  },
);

position accepts a named placement — 'bottom-center' (default), 'top-left', 'top-center', 'top-right', 'bottom-left', 'bottom-right' — or an explicit { x, y } in PDF points from the bottom-left of the page.

Invalid input throws PdfMergeError (empty format, non-integer startAt, non-positive fontSize, color channels outside 0–1, unknown position).

Cancellation with `AbortSignal`

Every merge function accepts options.signal: AbortSignal. The merge aborts cleanly at the next source-iteration boundary, and in-flight HTTP fetches in mergePdfUrls are cancelled immediately.

const controller = new AbortController();

// Cancel after 30 seconds, or whenever the user clicks "stop".
setTimeout(() => controller.abort(new Error('took too long')), 30_000);

try {
  await mergePdfUrls(urls, { signal: controller.signal, timeoutMs: 60_000 });
} catch (err) {
  if (err instanceof Error && err.message === 'took too long') {
    // user-cancelled — clean up state and move on
  } else {
    throw err;
  }
}

If you call controller.abort(reason) with a reason, that exact value is thrown. Without a reason, the merge throws an AbortError-shaped DOMException. Either way, signal.aborted checks in caller code behave correctly (this matches the Node fetch convention).

Encrypted source PDFs

Some PDFs declare encryption metadata but contain readable content streams — a quirk of older generators. By default mergePdfBuffers and the other merge functions reject these (matching pdf-lib's behavior). Pass options.ignoreEncryption: true to opt in:

await mergePdfBuffers([legacyReport], { ignoreEncryption: true });

This skips pdf-lib's encryption check at load time. It does not decrypt the document — truly password-protected files will still fail because their content streams cannot be read without the key.

Page selection

Every merge function accepts an object form per input that lets you pick which pages to keep from that document. Pages are 1-indexed to match what you see in a PDF viewer.

PageSelector is either:

a number[] — explicit page numbers, e.g. [1, 3, 5]. Duplicates are kept (the page appears multiple times in the output).
a string — comma-separated ranges, e.g. "1-3,5,8-10". Descending ranges ("5-1") reverse the page order.

Out-of-range, zero, negative, or unparseable selectors throw PdfMergeError. Inputs without a pages field — including all plain-string / plain-Buffer inputs — behave exactly as they always have and include every page.

Errors

All errors extend PdfMergeError, so a single catch (err: PdfMergeError) covers everything.

| Error | When it's thrown | Notable fields | | --- | --- | --- | | PdfMergeError | Empty input, invalid input shape, bad URL, disallowed protocol, invalid page selector. | — | | InvalidPdfFormatError | Input is not valid Base64, not a parseable PDF, or fails the %PDF- header check. | — | | PdfFetchError | HTTP failure, timeout, oversize response, redirect to a disallowed protocol, or non-PDF response body. | .url (credentials/query stripped), .index (failing input position) |

import { PdfFetchError } from '@abineshsolairaj/pdf-merge';

try {
  await mergePdfUrls(urls);
} catch (err) {
  if (err instanceof PdfFetchError) {
    console.error(`URL #${err.index} failed: ${err.url}`);
  } else {
    throw err;
  }
}

Security model

mergePdfUrls is the higher-risk function — it dereferences caller-supplied URLs. Defaults are chosen to be safe out of the box:

Protocol allowlist. Only http: and https: are accepted; file:, data:, ftp:, etc. are rejected before any socket is opened.
Response size cap. maxBytesPerUrl is enforced both against the declared Content-Length and during streaming, so a hostile endpoint that serves an unbounded body cannot exhaust the process heap.
URL sanitization in errors. Credentials (user:pass@) and query strings are stripped from URLs before they appear in error messages or on PdfFetchError.url, so signed-URL tokens and HTTP Basic passwords don't leak into logs.
Redirect-protocol check. If a redirect lands on a disallowed protocol, the request is rejected.
Per-request timeout. Default 5000 ms via AbortController.
Bounded concurrency. concurrency (default 8) caps simultaneous in-flight fetches.

What this library does not do for you:

DNS / IP allowlisting. SSRF to internal hosts via http://10.0.0.1/… or cloud metadata endpoints is not blocked — do IP-range filtering at your network or application layer if you accept URLs from end users.
Authentication. Pass any required tokens via the headers option (or per-URL headers for signed requests).

CLI

The package ships a small pdf-merge binary. There are two recommended ways to invoke it:

One-shot via `npx` (no install)

Always pass the full scoped package name, otherwise npx will try to resolve an unrelated pdf-merge package from the registry:

npx @abineshsolairaj/pdf-merge cover.pdf body.pdf appendix.pdf -o out.pdf

Installed (global or as a dev dependency)

Once installed, you can call the unscoped pdf-merge directly — npm puts the bin on your PATH:

# global install
npm install -g @abineshsolairaj/pdf-merge
pdf-merge cover.pdf body.pdf appendix.pdf -o out.pdf

# or, within a project
npm install --save-dev @abineshsolairaj/pdf-merge
npx pdf-merge cover.pdf body.pdf appendix.pdf -o out.pdf

Mix local files and URLs in one call, and attach a page selector to any input with a trailing colon:

npx @abineshsolairaj/pdf-merge \
  cover.pdf \
  'body.pdf:2-9' \
  https://example.com/appendix.pdf \
  -o annual-report.pdf \
  --title 'Annual Report 2026' \
  --author 'Operations' \
  --keywords 'annual,operations,2026'

Run pdf-merge --help for the full option list. Metadata flags (--title, --author, --subject, --creator, --keywords) and URL options (--concurrency, --timeout, --https-only) are all supported.

Module format

The package ships both ESM and CommonJS builds via a conditional exports map, so all of the following work without any tooling tweaks:

import { mergePdfFiles } from '@abineshsolairaj/pdf-merge';     // ESM / bundlers

const { mergePdfFiles } = require('@abineshsolairaj/pdf-merge'); // CommonJS

TypeScript definitions are shipped from the CJS build and resolve automatically for both consumers.

Backward compatibility

Page selection, per-URL headers, and the buffer/file/concurrency options were added as additive unions — every previous call signature still type-checks and produces byte-identical output. If you don't pass a pages field, the merge runs through the original code path unchanged. A regression test pins this.

Development

npm install
npm run build       # tsc → dist/cjs + dist/esm
npm test            # unit + integration tests
npm run test:cli    # rebuilds and exercises the CLI binary + dual build
npm run test:e2e    # end-to-end harness against real generated PDFs

The unit suite spins up a local HTTP server and exercises ordering, 404 handling, invalid Base64, empty input, timeouts, non-PDF responses, the security defaults, page selection, header precedence, and concurrency capping. The e2e harness in test/e2e.ts generates real multi-page A4 PDFs and exercises every public method end-to-end, including round-tripping each merged output back through pdf-lib.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@abineshsolairaj/pdf-merge

Why this library?

Install

Quick start

API

mergeBase64PDFs(inputs, options?): Promise<string>

mergePdfBuffers(inputs, options?): Promise<Uint8Array>

mergePdfFiles(inputs, options?): Promise<Uint8Array>

mergePdfUrls(urls, options?): Promise<string>

Document metadata

Watermark

Page numbers

Cancellation with AbortSignal

Encrypted source PDFs

Page selection

Errors

Security model

CLI

One-shot via npx (no install)

Installed (global or as a dev dependency)

Module format

Backward compatibility

Development

License

`mergeBase64PDFs(inputs, options?): Promise<string>`

`mergePdfBuffers(inputs, options?): Promise<Uint8Array>`

`mergePdfFiles(inputs, options?): Promise<Uint8Array>`

`mergePdfUrls(urls, options?): Promise<string>`

Cancellation with `AbortSignal`

One-shot via `npx` (no install)