@transkripid/pdf-text-replace
v1.0.0
Published
Find and replace text in PDF files with preserved formatting
Maintainers
Readme
pdf-text-replace
Find and replace text in PDF files while preserving formatting.
Features
- Chainable API mimicking JavaScript's
String.replace() - Supports string and RegExp search patterns
- Preserves font styles, colors, and layout
- Handles FlateDecode compressed streams
- Graceful error handling (returns original buffer on failure)
- Automatic Unicode transliteration (CJK, Cyrillic, accented characters → ASCII)
- Pure TypeScript with minimal dependencies (pako for zlib, any-ascii for transliteration)
Installation
# From npm (when published)
npm install pdf-text-replace
# From local path
npm install /path/to/pdf-text-replaceUsage
import { PDF } from 'pdf-text-replace';
import { readFileSync, writeFileSync } from 'fs';
const input = readFileSync('document.pdf');
const modified = new PDF(input)
.replace('John Doe', 'Jane Smith')
.replace('[email protected]', '[email protected]')
.replace(/\d{4}-\d{4}-\d{4}/g, 'XXXX-XXXX-XXXX')
.toBuffer();
writeFileSync('modified.pdf', modified);API
new PDF(input: Buffer | Uint8Array)
Create a new PDF instance from a buffer.
.replace(search: string | RegExp, replacement: string): this
Queue a text replacement operation. Returns this for chaining.
search- String or RegExp pattern to findreplacement- Text to replace matches with
.toBuffer(): Buffer
Apply all queued replacements and return the modified PDF as a Buffer.
Returns the original buffer unchanged if:
- No matches are found
- An error occurs during processing
How It Works
The library parses PDF content streams (both raw and FlateDecode compressed), finds text operators (Tj, TJ), and performs replacements while:
- Preserving the original font and styling
- Adjusting horizontal scaling (
Tzoperator) when replacement text has different width - Rebuilding the PDF with updated stream lengths and xref table
Unicode Support
Replacement text containing Unicode characters is automatically transliterated to ASCII for compatibility with standard PDF fonts (WinAnsiEncoding):
// Chinese → Pinyin
.replace('Author', '银宵') // Becomes "YinXiao"
// Korean → Romanized
.replace('Name', '스트레이') // Becomes "seuteulei"
// Cyrillic → Latin
.replace('Hello', 'Привет') // Becomes "Privet"
// Accented → Plain ASCII
.replace('Name', 'José García') // Becomes "Jose Garcia"This uses any-ascii for transliteration.
Limitations
- Only works with PDFs using
WinAnsiEncoding(standard Latin text) - Complex font encodings (CID, Identity-H) are not supported
- Unicode replacement text is transliterated to ASCII (original Unicode cannot be preserved)
- Text split across multiple operators may not be found
- Scanned/image-based PDFs cannot be modified
License
MIT
