npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@shubhu/pdfdiff

v1.6.0

Published

Platform/framework agnostic PDF diffing library with CLI support

Downloads

413

Readme

PDFDiff

A platform/framework agnostic PDF diffing library with CLI support. Compare PDF files and get detailed text-based differences using modern JavaScript.

Features

  • 🔍 Text-based PDF comparison using pdfjs-dist for accurate text extraction
  • 📊 Multiple diff modes - character, word, and line-level comparison
  • 🖥️ CLI interface with comprehensive options and colored output
  • 📦 ESM support for modern JavaScript environments
  • 🌐 Browser compatible with standalone builds for web applications
  • 🔧 Platform agnostic - works across Node.js and browser environments
  • 📝 JSDoc type annotations for better development experience
  • Fast comparison with detailed timing information
  • 🎯 Flexible ignore options for whitespace and case differences
  • 🎨 Visual diff support with positioned text overlay for PDF rendering

Installation

Global Installation (CLI usage)

npm install -g pdfdiff

Local Installation (Library usage)

npm install pdfdiff

CLI Usage

Basic Comparison

pdfdiff file1.pdf file2.pdf

Available Options

pdfdiff <pdf1> <pdf2> [options]

Arguments:
  pdf1                  First PDF file to compare
  pdf2                  Second PDF file to compare

Options:
  -i, --ignore-whitespace    Ignore whitespace differences
  -c, --ignore-case          Ignore case differences  
  -u, --show-unchanged       Show unchanged lines in output
  -m, --mode <mode>          Diff mode: char (default), word, or line
  --context <number>         Number of context lines around changes (default: 3)
  --no-color                 Disable color output
  -h, --help                 Show help message
  -v, --version              Show version number

Examples

# Basic comparison (character-level by default)
pdfdiff document1.pdf document2.pdf

# Word-level comparison
pdfdiff file1.pdf file2.pdf --mode word

# Line-level comparison  
pdfdiff file1.pdf file2.pdf --mode line

# Ignore whitespace differences
pdfdiff file1.pdf file2.pdf --ignore-whitespace

# Show unchanged lines with custom context
pdfdiff file1.pdf file2.pdf --show-unchanged --context 5

# Ignore case differences
pdfdiff file1.pdf file2.pdf --ignore-case

# Combined options
pdfdiff file1.pdf file2.pdf --mode word --ignore-case --no-color

Exit Codes

  • 0 - Files are identical
  • 1 - Files are different or error occurred

Diff Modes

PDFDiff supports three different comparison modes, each providing different levels of granularity:

Character Mode (Default)

  • Most granular: Compares text character by character
  • Best for: Detecting small changes, typos, and precise modifications
  • Output: Shows exact character differences
  • Example: Hello World vs Hello Earth shows individual character changes

Word Mode

  • Moderate granularity: Compares text word by word
  • Best for: Content changes, word replacements, and readability
  • Output: Shows word-level additions and removals
  • Example: Hello World vs Hello Earth shows World removed, Earth added

Line Mode

  • Least granular: Compares text line by line
  • Best for: Structural changes, paragraph modifications
  • Output: Shows entire line differences
  • Example: Full lines shown as added or removed

Performance Considerations

  • Character mode: More detailed output, larger diffs for big changes
  • Word mode: Balanced detail and readability
  • Line mode: Fastest processing, most concise output for large documents

Library Usage

ESM Import

import { comparePdfs, extractPdfText, formatDiff } from 'pdfdiff';

Extract Text from PDF

import { extractPdfText } from 'pdfdiff';

// From file path
const text = await extractPdfText('./document.pdf');
console.log(text);

// From Buffer
const buffer = await readFile('./document.pdf');
const text = await extractPdfText(buffer);

Compare PDFs

import { comparePdfs } from 'pdfdiff';

// Basic comparison (character mode by default)
const result = await comparePdfs('./file1.pdf', './file2.pdf');

// With options
const result = await comparePdfs('./file1.pdf', './file2.pdf', {
  mode: 'word',              // 'char' (default), 'word', or 'line'
  ignoreWhitespace: false,
  ignoreCase: false
});

console.log(result.summary);
console.log(result.identical); // boolean
console.log(result.changes);   // array of diff changes

Format Diff Output

import { comparePdfs, formatDiff } from 'pdfdiff';

const diffResult = await comparePdfs('./file1.pdf', './file2.pdf');
const formatted = formatDiff(diffResult, {
  showUnchanged: true,
  context: 3
});

console.log(formatted);

Visual Diff with Positioned Text

import { extractPositionedPdfText, comparePdfs } from 'pdfdiff';

// Extract positioned text for visual overlays
const positions1 = await extractPositionedPdfText('./file1.pdf');
const positions2 = await extractPositionedPdfText('./file2.pdf');

// Get diff changes
const diffResult = await comparePdfs('./file1.pdf', './file2.pdf');

// Create visual overlays for PDF viewers
function mapDiffToPositions(diffChanges, positions) {
  const overlays = [];
  let textOffset = 0;
  
  for (const change of diffChanges) {
    if (change.added || change.removed) {
      // Find text positions corresponding to this change
      const relevantItems = findTextInRange(positions, textOffset, change.value.length);
      overlays.push({
        type: change.added ? 'addition' : 'removal',
        text: change.value,
        positions: relevantItems
      });
    }
    textOffset += change.value.length;
  }
  
  return overlays;
}

const overlays1 = mapDiffToPositions(diffResult.changes, positions1);
const overlays2 = mapDiffToPositions(diffResult.changes, positions2);

API Reference

extractPdfText(pdfPath)

Extract text content from a PDF file.

Parameters:

  • pdfPath (string|Buffer): Path to PDF file or Buffer containing PDF data

Returns: Promise<string> - Extracted text content

extractPositionedPdfText(pdfPath)

Extract text content with positioning information from a PDF file.

Parameters:

  • pdfPath (string|Buffer): Path to PDF file or Buffer containing PDF data

Returns: Promise<Array<PageTextContent>> - Array of pages with positioned text items

comparePdfs(pdf1, pdf2, options?)

Compare two PDF files and return differences.

Parameters:

  • pdf1 (string|Buffer): First PDF file path or Buffer
  • pdf2 (string|Buffer): Second PDF file path or Buffer
  • options (Object, optional):
    • mode (string): Diff mode - 'char' (default), 'word', or 'line'
    • ignoreWhitespace (boolean): Ignore whitespace differences (default: false)
    • ignoreCase (boolean): Ignore case differences (default: false)

Returns: Promise<DiffResult>

DiffResult:

{
  changes: Array<DiffChange>,  // Array of diff changes
  summary: string,             // Summary of changes
  identical: boolean           // Whether PDFs are identical
}

formatDiff(diffResult, options?)

Format diff output for console display.

Parameters:

  • diffResult (DiffResult): Result from comparePdfs
  • options (Object, optional):
    • showUnchanged (boolean): Show unchanged lines (default: false)
    • context (number): Number of context lines around changes (default: 3)

Returns: string - Formatted diff output

Output Data Specification

DiffResult Object

The core output from comparePdfs() follows this structure:

{
  changes: Array<DiffChange>,  // Array of individual changes
  summary: string,             // Human-readable summary
  identical: boolean           // Whether files are identical
}

DiffChange Object

Each change in the changes array represents a segment of text with its status:

{
  value: string,        // The text content of this change
  added?: boolean,      // true if this text was added (undefined for unchanged)
  removed?: boolean,    // true if this text was removed (undefined for unchanged)
  count?: number        // Number of units (chars/words/lines) in this change
}

Change Types

  1. Unchanged segments: { value: "text", count: 5 }
  2. Added segments: { value: "new text", added: true, count: 2 }
  3. Removed segments: { value: "old text", removed: true, count: 2 }

Summary Format

The summary string format varies by diff mode:

  • Character mode: "724 characters added, 775 characters removed"
  • Word mode: "181 words added, 159 words removed"
  • Line mode: "2 lines added, 2 lines removed"
  • Identical files: "PDFs are identical" (all modes)

Example Output

{
  changes: [
    { value: "Hello ", count: 6 },                    // Unchanged
    { value: "World", removed: true, count: 5 },      // Removed
    { value: "Earth", added: true, count: 5 },        // Added
    { value: "!\nThis is a test.", count: 17 }        // Unchanged
  ],
  summary: "5 characters added, 5 characters removed",
  identical: false
}

Visual Diff Output Specification

For visual diff applications (like overlaying differences on PDF renderings), the library provides positioned text data that can be used to create visual overlays.

Positioned Text Extraction

import { extractPositionedPdfText } from 'pdfdiff';

const positionedText = await extractPositionedPdfText('./document.pdf');
console.log(positionedText);

PositionedTextData Structure

[
  {
    page: 1,                    // Page number (1-based)
    items: [                    // Array of positioned text items
      {
        text: "Hello World",    // Text content
        x: 72,                  // X coordinate (points)
        y: 720,                 // Y coordinate (points, top-down)
        width: 85.2,            // Text width (points)
        height: 12,             // Text height (points)
        transform: [12, 0, 0, 12, 72, 720], // Full transformation matrix
        fontName: "Arial-Bold", // Font name (if available)
        page: 1                 // Page reference
      }
    ],
    viewport: {
      width: 612,               // Page width (points)
      height: 792               // Page height (points)
    }
  }
]

Visual Diff Overlay Usage

The positioned text data can be combined with diff results to create visual overlays:

import { extractPositionedPdfText, comparePdfs } from 'pdfdiff';

// Extract positioned text from both PDFs
const positions1 = await extractPositionedPdfText('./file1.pdf');
const positions2 = await extractPositionedPdfText('./file2.pdf');

// Get text-based diff
const diffResult = await comparePdfs('./file1.pdf', './file2.pdf');

// Create visual overlay data by mapping diff changes to text positions
function createVisualDiff(positions, diffChanges) {
  const overlays = [];
  let textOffset = 0;
  
  for (const change of diffChanges) {
    if (change.added || change.removed) {
      // Find corresponding positioned text items
      const matchingItems = findTextItemsInRange(positions, textOffset, change.value.length);
      
      overlays.push({
        type: change.added ? 'addition' : 'removal',
        items: matchingItems,
        bounds: calculateBounds(matchingItems)
      });
    }
    textOffset += change.value.length;
  }
  
  return overlays;
}

Coordinate System

  • Origin: Top-left corner of the page
  • Units: Points (1/72 inch)
  • Y-axis: Top-down (0 at top, increases downward)
  • Standard page: 612x792 points (8.5" x 11" at 72 DPI)

Visual Overlay Applications

The positioned text data enables:

  1. SVG overlays: Create <rect> elements highlighting differences
  2. Canvas rendering: Draw colored rectangles over changed text areas
  3. HTML positioning: Absolutely position diff markers over PDF viewers
  4. Annotation layers: Add visual indicators for additions/removals

Example SVG Overlay

function createSVGOverlay(visualDiff) {
  const svg = document.createElementNS('http://www.w3.org/2000/svg', 'svg');
  
  visualDiff.forEach(overlay => {
    const rect = document.createElementNS('http://www.w3.org/2000/svg', 'rect');
    rect.setAttribute('x', overlay.bounds.x);
    rect.setAttribute('y', overlay.bounds.y);
    rect.setAttribute('width', overlay.bounds.width);
    rect.setAttribute('height', overlay.bounds.height);
    rect.setAttribute('fill', overlay.type === 'addition' ? 'rgba(0,255,0,0.3)' : 'rgba(255,0,0,0.3)');
    rect.setAttribute('stroke', overlay.type === 'addition' ? '#00aa00' : '#aa0000');
    svg.appendChild(rect);
  });
  
  return svg;
}

Browser Usage

For browser environments, import the standalone build:

<script src="/path/to/pdfdiff.standalone.js"></script>
<script>
  // PDFDiff is available globally
  const result = await PDFDiff.comparePdfs(pdf1Data, pdf2Data, {
    mode: 'word',
    ignoreCase: true
  });
</script>

Development

Scripts

# Type checking
npm run typecheck

# Run tests (placeholder)
npm test

Requirements

  • Node.js 16+ (ESM support)
  • Modern JavaScript environment

Dependencies

  • pdfjs-dist - PDF parsing and text extraction
  • diff - Text diffing algorithms (character, word, line)

Browser Compatibility

  • Modern browsers supporting ES2020+
  • PDF.js worker support for PDF processing
  • ArrayBuffer and Uint8Array support

License

ISC

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.