@abineshsolairaj/pdf-merge
v1.5.0
Published
Utility to merge PDFs from Base64 strings, Buffers, file paths, or URLs, with page selection, document metadata, watermarks, page numbers, AbortSignal cancellation, and dual ESM/CJS support.
Downloads
946
Maintainers
Readme
@abineshsolairaj/pdf-merge
Merge PDFs in Node.js or TypeScript — from Base64, Buffers, file paths, or URLs — with per-input page selection, document metadata, watermarks, page numbers,
AbortSignalcancellation, SSRF-safe URL fetching, typed errors, a CLI, and dual ESM/CommonJS support. Built onpdf-lib.
Why this library?
Most PDF-merge packages on npm cover the basics — take an array of
buffers, concatenate them. @abineshsolairaj/pdf-merge adds the
production-grade extras that real applications usually have to bolt on
themselves:
- Four input sources in one API — Base64,
Buffer/Uint8Array, file paths, and URLs — so you don't have to pre-decode or pre-fetch. - Per-input page selection via array (
[1, 3, 5]) or range string ("1-3,5,8-10") — pick exactly the pages you want from each source. - Document metadata — stamp title, author, subject, keywords, creator, and creation/modification dates on the merged output.
- Watermarks and page numbers — overlay configurable text on every
page, plus continuous page numbering with
{current}/{total}templates. AbortSignalcancellation — cancel mid-merge from React effects, request handlers, or batch jobs; in-flight URL fetches abort cleanly.- SSRF-safe URL fetching by default — protocol allowlist, response size cap, per-request timeout, redirect-protocol checks, bounded concurrency, and URL sanitization in error messages so signed-URL tokens never reach your logs.
- Typed errors with input-index correlation —
instanceof PdfFetchErrortells you exactly which URL failed and gives you the sanitized.urland.index. - Bundled
pdf-mergeCLI (see CLI for the recommended invocation forms). - Dual ESM and CommonJS build — works in modern bundlers, Deno, Bun,
and legacy
require()consumers from a single install. - Order-preserving — output pages always follow the input array order.
- Zero runtime config — sensible defaults, fully configurable per call.
Install
npm install @abineshsolairaj/pdf-mergeRequires Node.js 18+ (uses the global fetch and AbortController).
Quick start
import {
mergeBase64PDFs,
mergePdfBuffers,
mergePdfFiles,
mergePdfUrls,
} from '@abineshsolairaj/pdf-merge';
// From Base64 strings → Base64 string
const b64Out = await mergeBase64PDFs([pdfA_b64, pdfB_b64]);
// From raw bytes → Uint8Array (no Base64 round-trip)
const bytesOut = await mergePdfBuffers([bufA, bufB]);
// From disk → Uint8Array
const fileOut = await mergePdfFiles(['./a.pdf', './b.pdf']);
// From URLs → Base64 string
const urlOut = await mergePdfUrls(['https://example.com/a.pdf', 'https://example.com/b.pdf']);
// With watermark, page numbers, metadata, and cancellation
const controller = new AbortController();
const annotated = await mergePdfFiles(['./cover.pdf', './body.pdf'], {
metadata: { title: 'Annual Report 2026', author: 'Operations' },
watermark: { text: 'CONFIDENTIAL', opacity: 0.15, rotate: 45 },
pageNumbers: { format: 'Page {current} of {total}' },
signal: controller.signal,
});API
mergeBase64PDFs(inputs, options?): Promise<string>
Merges Base64-encoded PDFs and returns the merged document as Base64.
- Accepts plain Base64 or
data:application/pdf;base64,…prefixed strings. inputsis(string | { data: string; pages?: PageSelector })[].optionsacceptsmetadata,watermark,pageNumbers,ignoreEncryption, andsignal. See Document metadata, Watermark, Page numbers, Encrypted source PDFs, and Cancellation.
const merged = await mergeBase64PDFs([
reportA_b64, // all pages of A
{ data: reportB_b64, pages: [1, 3, 5] }, // pages 1, 3, 5 of B
{ data: reportC_b64, pages: '1-3,7' }, // pages 1, 2, 3, 7 of C
]);mergePdfBuffers(inputs, options?): Promise<Uint8Array>
Merges raw PDF bytes (from fs.readFile, an S3 SDK, an HTTP body, multer,
etc.) and returns the merged document as a Uint8Array. Skips the Base64
round-trip, saving ~33 % memory vs mergeBase64PDFs.
inputsis(Buffer | Uint8Array | { data: Buffer | Uint8Array; pages?: PageSelector })[].optionsacceptsmetadata,watermark,pageNumbers,ignoreEncryption, andsignal. See Document metadata, Watermark, Page numbers, Encrypted source PDFs, and Cancellation.
const merged = await mergePdfBuffers([bufA, { data: bufB, pages: [4, 2] }]);mergePdfFiles(inputs, options?): Promise<Uint8Array>
Reads PDFs from the given file paths in parallel and merges them. Returns a
Uint8Array — write it straight to disk.
inputsis(string | { path: string; pages?: PageSelector })[].optionsacceptsmetadata,watermark,pageNumbers,ignoreEncryption, andsignal. See Document metadata, Watermark, Page numbers, Encrypted source PDFs, and Cancellation.
import { promises as fs } from 'fs';
const merged = await mergePdfFiles([
'./cover.pdf',
{ path: './body.pdf', pages: '2-9' },
]);
await fs.writeFile('./out.pdf', merged);mergePdfUrls(urls, options?): Promise<string>
Fetches PDFs from the given URLs (concurrently, with a bounded pool) and returns the merged document as Base64.
urls is (string | { url: string; headers?: Record<string,string>; pages?: PageSelector })[].
Options:
| Option | Default | Description |
| --- | --- | --- |
| timeoutMs | 5000 | Per-request timeout in milliseconds. |
| maxBytesPerUrl | 100 * 1024 * 1024 | Maximum bytes accepted from any single response. |
| allowedProtocols | ['http:', 'https:'] | URL protocols allowed. Pass ['https:'] to harden further. |
| concurrency | 8 | Maximum number of URL fetches running in parallel. |
| headers | — | Default headers applied to every fetch. Per-URL headers (object form) override on key conflict. |
| metadata | — | Document metadata to stamp on the merged output. See Document metadata. |
| watermark | — | Text watermark to draw on every page of the merged output. See Watermark. |
| pageNumbers | — | Stamp continuous page numbers. See Page numbers. |
| signal | — | AbortSignal to cancel the merge. See Cancellation. |
| ignoreEncryption | false | Allow merging source PDFs that declare encryption metadata. See Encrypted source PDFs. |
const merged = await mergePdfUrls(
[
'https://example.com/a.pdf',
{ url: 'https://example.com/b.pdf', headers: { Authorization: 'Bearer token-b' }, pages: '1-3' },
],
{
timeoutMs: 8000,
maxBytesPerUrl: 20 * 1024 * 1024,
allowedProtocols: ['https:'],
concurrency: 4,
headers: { 'User-Agent': 'my-app/1.0' },
},
);Document metadata
Every merge function accepts an optional second options argument with a
metadata field. Stamp the merged document with whatever properties your
downstream system surfaces (Finder, Explorer, document-management systems,
email clients, etc.):
await mergePdfFiles(
['./cover.pdf', './body.pdf'],
{
metadata: {
title: 'Annual Report 2026',
author: 'Operations',
subject: 'Year-end summary',
keywords: ['annual', 'operations', '2026'],
creator: 'my-app/1.0',
creationDate: new Date(),
modificationDate: new Date(),
},
},
);The MergePdfUrlsOptions interface extends the same shape, so all four
functions accept { metadata } the same way. Every field is optional — omit
what you don't want to set.
The
Producerfield is hard-coded by pdf-lib on save and cannot be customized at this layer.
Watermark
Every merge function accepts an optional watermark field on the same
options argument. The watermark is stamped on every page of the merged
output using Helvetica (no additional fonts are embedded). The feature is
fully opt-in — when watermark is omitted, no extra drawing is performed
and the original byte-identical fast path is preserved.
await mergePdfFiles(
['./report.pdf'],
{
watermark: {
text: 'CONFIDENTIAL',
opacity: 0.18, // 0–1, default 0.2
fontSize: 90, // points, default 48
color: { r: 0.7, g: 0.1, b: 0.1 }, // RGB 0–1, default mid-gray
rotate: 45, // degrees CCW, default 0
position: 'center', // see below
},
},
);position accepts a named placement — 'center' (default), 'top-left',
'top-right', 'bottom-left', 'bottom-right' — or an explicit
{ x, y } in PDF points measured from the bottom-left of the page. Named
placements are computed per page so they work on any page size.
Invalid input throws PdfMergeError: empty text, opacity outside 0–1,
non-positive font size, or color channels outside 0–1.
Page numbers
Stamp continuous page numbering on every page of the merged output.
Common use case: stitch several PDFs together and number 1..N across
the result. Available on the same options argument as everything else.
await mergePdfFiles(
['./cover.pdf', './body.pdf', './appendix.pdf'],
{
pageNumbers: {
format: 'Page {current} of {total}', // tokens: {current}, {total}
startAt: 1, // first-page value; default 1
position: 'bottom-center', // see below
fontSize: 10, // default 10
color: { r: 0.4, g: 0.4, b: 0.4 }, // default mid-gray
},
},
);position accepts a named placement — 'bottom-center' (default),
'top-left', 'top-center', 'top-right', 'bottom-left',
'bottom-right' — or an explicit { x, y } in PDF points from the
bottom-left of the page.
Invalid input throws PdfMergeError (empty format, non-integer
startAt, non-positive fontSize, color channels outside 0–1,
unknown position).
Cancellation with AbortSignal
Every merge function accepts options.signal: AbortSignal. The merge
aborts cleanly at the next source-iteration boundary, and in-flight
HTTP fetches in mergePdfUrls are cancelled immediately.
const controller = new AbortController();
// Cancel after 30 seconds, or whenever the user clicks "stop".
setTimeout(() => controller.abort(new Error('took too long')), 30_000);
try {
await mergePdfUrls(urls, { signal: controller.signal, timeoutMs: 60_000 });
} catch (err) {
if (err instanceof Error && err.message === 'took too long') {
// user-cancelled — clean up state and move on
} else {
throw err;
}
}If you call controller.abort(reason) with a reason, that exact value
is thrown. Without a reason, the merge throws an AbortError-shaped
DOMException. Either way, signal.aborted checks in caller code
behave correctly (this matches the Node fetch convention).
Encrypted source PDFs
Some PDFs declare encryption metadata but contain readable content
streams — a quirk of older generators. By default mergePdfBuffers and
the other merge functions reject these (matching pdf-lib's behavior).
Pass options.ignoreEncryption: true to opt in:
await mergePdfBuffers([legacyReport], { ignoreEncryption: true });This skips pdf-lib's encryption check at load time. It does not decrypt the document — truly password-protected files will still fail because their content streams cannot be read without the key.
Page selection
Every merge function accepts an object form per input that lets you pick which pages to keep from that document. Pages are 1-indexed to match what you see in a PDF viewer.
PageSelector is either:
- a
number[]— explicit page numbers, e.g.[1, 3, 5]. Duplicates are kept (the page appears multiple times in the output). - a
string— comma-separated ranges, e.g."1-3,5,8-10". Descending ranges ("5-1") reverse the page order.
Out-of-range, zero, negative, or unparseable selectors throw PdfMergeError.
Inputs without a pages field — including all plain-string / plain-Buffer
inputs — behave exactly as they always have and include every page.
Errors
All errors extend PdfMergeError, so a single catch (err: PdfMergeError)
covers everything.
| Error | When it's thrown | Notable fields |
| --- | --- | --- |
| PdfMergeError | Empty input, invalid input shape, bad URL, disallowed protocol, invalid page selector. | — |
| InvalidPdfFormatError | Input is not valid Base64, not a parseable PDF, or fails the %PDF- header check. | — |
| PdfFetchError | HTTP failure, timeout, oversize response, redirect to a disallowed protocol, or non-PDF response body. | .url (credentials/query stripped), .index (failing input position) |
import { PdfFetchError } from '@abineshsolairaj/pdf-merge';
try {
await mergePdfUrls(urls);
} catch (err) {
if (err instanceof PdfFetchError) {
console.error(`URL #${err.index} failed: ${err.url}`);
} else {
throw err;
}
}Security model
mergePdfUrls is the higher-risk function — it dereferences caller-supplied
URLs. Defaults are chosen to be safe out of the box:
- Protocol allowlist. Only
http:andhttps:are accepted;file:,data:,ftp:, etc. are rejected before any socket is opened. - Response size cap.
maxBytesPerUrlis enforced both against the declaredContent-Lengthand during streaming, so a hostile endpoint that serves an unbounded body cannot exhaust the process heap. - URL sanitization in errors. Credentials (
user:pass@) and query strings are stripped from URLs before they appear in error messages or onPdfFetchError.url, so signed-URL tokens and HTTP Basic passwords don't leak into logs. - Redirect-protocol check. If a redirect lands on a disallowed protocol, the request is rejected.
- Per-request timeout. Default
5000 msviaAbortController. - Bounded concurrency.
concurrency(default8) caps simultaneous in-flight fetches.
What this library does not do for you:
- DNS / IP allowlisting. SSRF to internal hosts via
http://10.0.0.1/…or cloud metadata endpoints is not blocked — do IP-range filtering at your network or application layer if you accept URLs from end users. - Authentication. Pass any required tokens via the
headersoption (or per-URLheadersfor signed requests).
CLI
The package ships a small pdf-merge binary. There are two recommended
ways to invoke it:
One-shot via npx (no install)
Always pass the full scoped package name, otherwise npx will try to
resolve an unrelated pdf-merge package from the registry:
npx @abineshsolairaj/pdf-merge cover.pdf body.pdf appendix.pdf -o out.pdfInstalled (global or as a dev dependency)
Once installed, you can call the unscoped pdf-merge directly — npm puts
the bin on your PATH:
# global install
npm install -g @abineshsolairaj/pdf-merge
pdf-merge cover.pdf body.pdf appendix.pdf -o out.pdf
# or, within a project
npm install --save-dev @abineshsolairaj/pdf-merge
npx pdf-merge cover.pdf body.pdf appendix.pdf -o out.pdfMix local files and URLs in one call, and attach a page selector to any input with a trailing colon:
npx @abineshsolairaj/pdf-merge \
cover.pdf \
'body.pdf:2-9' \
https://example.com/appendix.pdf \
-o annual-report.pdf \
--title 'Annual Report 2026' \
--author 'Operations' \
--keywords 'annual,operations,2026'Run pdf-merge --help for the full option list. Metadata flags
(--title, --author, --subject, --creator, --keywords) and URL
options (--concurrency, --timeout, --https-only) are all supported.
Module format
The package ships both ESM and CommonJS builds via a conditional exports
map, so all of the following work without any tooling tweaks:
import { mergePdfFiles } from '@abineshsolairaj/pdf-merge'; // ESM / bundlersconst { mergePdfFiles } = require('@abineshsolairaj/pdf-merge'); // CommonJSTypeScript definitions are shipped from the CJS build and resolve automatically for both consumers.
Backward compatibility
Page selection, per-URL headers, and the buffer/file/concurrency options were
added as additive unions — every previous call signature still
type-checks and produces byte-identical output. If you don't pass a pages
field, the merge runs through the original code path unchanged. A regression
test pins this.
Development
npm install
npm run build # tsc → dist/cjs + dist/esm
npm test # unit + integration tests
npm run test:cli # rebuilds and exercises the CLI binary + dual build
npm run test:e2e # end-to-end harness against real generated PDFsThe unit suite spins up a local HTTP server and exercises ordering, 404
handling, invalid Base64, empty input, timeouts, non-PDF responses, the
security defaults, page selection, header precedence, and concurrency
capping. The e2e harness in test/e2e.ts generates real multi-page A4 PDFs
and exercises every public method end-to-end, including round-tripping each
merged output back through pdf-lib.
