@transkripid/pdf-text-replace

v1.0.0

Published

6 months ago

Find and replace text in PDF files with preserved formatting

Downloads

0High
0Medium
0Low

maman

pdf text replace find search modify edit

pdf-text-replace

Find and replace text in PDF files while preserving formatting.

Features

Chainable API mimicking JavaScript's String.replace()
Supports string and RegExp search patterns
Preserves font styles, colors, and layout
Handles FlateDecode compressed streams
Graceful error handling (returns original buffer on failure)
Automatic Unicode transliteration (CJK, Cyrillic, accented characters → ASCII)
Pure TypeScript with minimal dependencies (pako for zlib, any-ascii for transliteration)

Installation

# From npm (when published)
npm install pdf-text-replace

# From local path
npm install /path/to/pdf-text-replace

Usage

import { PDF } from 'pdf-text-replace';
import { readFileSync, writeFileSync } from 'fs';

const input = readFileSync('document.pdf');

const modified = new PDF(input)
  .replace('John Doe', 'Jane Smith')
  .replace('[email protected]', '[email protected]')
  .replace(/\d{4}-\d{4}-\d{4}/g, 'XXXX-XXXX-XXXX')
  .toBuffer();

writeFileSync('modified.pdf', modified);

API

`new PDF(input: Buffer | Uint8Array)`

Create a new PDF instance from a buffer.

`.replace(search: string | RegExp, replacement: string): this`

Queue a text replacement operation. Returns this for chaining.

search - String or RegExp pattern to find
replacement - Text to replace matches with

`.toBuffer(): Buffer`

Apply all queued replacements and return the modified PDF as a Buffer.

Returns the original buffer unchanged if:

No matches are found
An error occurs during processing

How It Works

The library parses PDF content streams (both raw and FlateDecode compressed), finds text operators (Tj, TJ), and performs replacements while:

Preserving the original font and styling
Adjusting horizontal scaling (Tz operator) when replacement text has different width
Rebuilding the PDF with updated stream lengths and xref table

Unicode Support

Replacement text containing Unicode characters is automatically transliterated to ASCII for compatibility with standard PDF fonts (WinAnsiEncoding):

// Chinese → Pinyin
.replace('Author', '银宵')        // Becomes "YinXiao"

// Korean → Romanized  
.replace('Name', '스트레이')      // Becomes "seuteulei"

// Cyrillic → Latin
.replace('Hello', 'Привет')       // Becomes "Privet"

// Accented → Plain ASCII
.replace('Name', 'José García')   // Becomes "Jose Garcia"

This uses any-ascii for transliteration.

Limitations

Only works with PDFs using WinAnsiEncoding (standard Latin text)
Complex font encodings (CID, Identity-H) are not supported
Unicode replacement text is transliterated to ASCII (original Unicode cannot be preserved)
Text split across multiple operators may not be found
Scanned/image-based PDFs cannot be modified

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme