npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pdf-diff-core

v1.0.2

Published

A core library for semantic PDF comparison without UI dependencies | 高精度语义化 PDF 比对引擎

Readme

pdf-diff-core

npm version License: ISC


English | 中文说明

Part 1: English Documentation

A High-Precision Semantic PDF Comparison Engine (Headless).

pdf-diff-core is a lightweight, pure-logic library for comparing two PDF files. Unlike traditional pixel-based comparison, it extracts text semantics to perform precise "content diffing".

It separates calculation from rendering, making it perfect for React, Vue, Angular, or Node.js applications.

✨ Features

  • Headless & UI Agnostic: Pure logic. You control how to render the PDF and highlights.
  • Pagination Reflow Support: Smartly detects text moving across pages (e.g., from Page 1 bottom to Page 2 top) without marking it as Delete/Add.
  • Semantic Diff: Based on Google's diff-match-patch algorithm.
  • Precise Coordinates: Returns strict (x, y, w, h) bounding boxes for easy highlighting on Canvas.

📦 Installation

npm install pdf-diff-core pdfjs-dist

Note: This library depends on pdfjs-dist for parsing.

🚀 Usage

1.Basic Setup

import { PdfDiff } from "pdf-diff-core";
import * as pdfjsLib from "pdfjs-dist";

// 1. Configure PDF.js Worker (Essential!)
// You can use a local file or a CDN URL
const workerSrc = `https://unpkg.com/pdfjs-dist@${pdfjsLib.version}/build/pdf.worker.min.js`;

// 2. Initialize
const differ = new PdfDiff({
  workerSrc: workerSrc,
});

// 3. Load files as ArrayBuffer
const oldFileBuffer = await fetch("old.pdf").then((res) => res.arrayBuffer());
const newFileBuffer = await fetch("new.pdf").then((res) => res.arrayBuffer());

// 4. Compare
const results = await differ.compare(oldFileBuffer, newFileBuffer);

console.log(results);

2.Output Data Structure

The compare method returns an array of diff blocks:

[
  {
    pageIndex: 0, // Page number (0-based)
    type: "delete", // 'delete' (Red) or 'add' (Green)
    rects: [
      // Array of bounding boxes
      { x: 50.5, y: 100.2, w: 30.0, h: 12.0 },
      { x: 80.5, y: 100.2, w: 10.0, h: 12.0 },
    ],
  },
  // ... more results
];

3.Rendering Highlights (Frontend Example)

The library provides coordinates. You need to draw them on a Canvas overlaying the PDF.

⚠️ Important: Coordinates are returned at scale = 1.0 (Standard PDF points). If you render your PDF at scale = 1.5 for better quality, you must multiply the coordinates.

// Example: Drawing on a 2D Context
const renderScale = 1.5; // The scale you used to render the PDF page

results.forEach((diff) => {
  if (diff.pageIndex === currentPageIndex) {
    // Set color: Red for Delete, Green for Add
    ctx.fillStyle =
      diff.type === "delete"
        ? "rgba(255, 65, 65, 0.3)"
        : "rgba(65, 255, 100, 0.3)";

    diff.rects.forEach((rect) => {
      // Multiply coordinates by your render scale
      ctx.fillRect(
        rect.x * renderScale,
        rect.y * renderScale,
        rect.w * renderScale,
        rect.h * renderScale,
      );
    });
  }
});

🔧 API

new PdfDiff(options)

  • options.workerSrc (string): Path or URL to pdf.worker.min.js. If not provided, you must set pdfjsLib.GlobalWorkerOptions.workerSrc manually in your project.

compare(buf1, buf2)

  • buf1 (ArrayBuffer): The "Old" (Base) PDF file.
  • buf2 (ArrayBuffer): The "New" (Target) PDF file.
  • Returns: Promise<Array<DiffResult>>

第二部分:中文说明 (Chinese Section)

高精度语义化 PDF 比对引擎 (核心库)

pdf-diff-core 是一个轻量级、纯逻辑的 PDF 比对库。与传统的像素比对不同,它提取 PDF 内部的文本语义进行比对。

该库将“计算”与“渲染”完全分离,因此非常适合集成到 Vue、React、Angular 或 Node.js 项目中。

✨ 特性

  • UI 无关 (Headless): 纯逻辑库。你可以自由决定如何渲染 PDF 和高亮框。
  • 支持分页重排 (Reflow): 智能识别跨页移动的文本(例如:一段话从第1页页尾移到了第2页页头),不会错误地标记为“删除+新增”,而是视为内容相等。
  • 语义比对: 基于 Google diff-match-patch 算法。
  • 精确坐标: 返回精确的 (x, y, w, h) 坐标,方便在 Canvas 上绘制高亮。

📦 安装

npm install pdf-diff-core pdfjs-dist

"注意: 本库依赖pdfjs-dist 进行 PDF 解析。"

🚀 使用方法

1. 基本配置

import { PdfDiff } from "pdf-diff-core";
import * as pdfjsLib from "pdfjs-dist";

// 1. 配置 PDF.js Worker (必须!)
// 建议使用 CDN,或者你本地 public 目录下的 worker 文件路径
const workerSrc = `https://unpkg.com/pdfjs-dist@${pdfjsLib.version}/build/pdf.worker.min.js`;

// 2. 初始化
const differ = new PdfDiff({
  workerSrc: workerSrc,
});

// 3. 加载文件为 ArrayBuffer
const oldFileBuffer = await fetch("old.pdf").then((res) => res.arrayBuffer());
const newFileBuffer = await fetch("new.pdf").then((res) => res.arrayBuffer());

// 4. 开始比对
const results = await differ.compare(oldFileBuffer, newFileBuffer);

console.log(results);

2. 输出数据结构

compare 方法返回一个包含差异块的数组:

[
  {
    pageIndex: 0, // 页码 (从 0 开始)
    type: "delete", // 'delete' (删除/旧版-红) 或 'add' (新增/新版-绿)
    rects: [
      // 矩形坐标数组
      { x: 50.5, y: 100.2, w: 30.0, h: 12.0 },
      { x: 80.5, y: 100.2, w: 10.0, h: 12.0 },
    ],
  },
  // ... 更多结果
];

3. 前端渲染高亮示例

本库只提供坐标数据,你需要自己创建一个覆盖在 PDF 上的 Canvas 来绘制高亮。

⚠️ 关键提示: 返回的坐标基于标准 PDF 点数 (scale = 1.0)。如果你为了清晰度将 PDF 放大渲染(例如 scale = 1.5),绘制高亮时必须将坐标乘以该缩放比例。

// 示例:在 Canvas 上绘图
const renderScale = 1.5; // 假设你的 PDF Canvas 渲染缩放比是 1.5

results.forEach((diff) => {
  // 只绘制当前页的差异
  if (diff.pageIndex === currentPageIndex) {
    // 设置颜色:删除为红,新增为绿
    ctx.fillStyle =
      diff.type === "delete"
        ? "rgba(255, 65, 65, 0.3)"
        : "rgba(65, 255, 100, 0.3)";

    diff.rects.forEach((rect) => {
      // 关键:坐标 * 渲染缩放比
      ctx.fillRect(
        rect.x * renderScale,
        rect.y * renderScale,
        rect.w * renderScale,
        rect.h * renderScale,
      );
    });
  }
});

🔧 API 参考

new PdfDiff(options) options.workerSrc (string): pdf.worker.min.js 的路径或 URL。如果不传,你需要在外部手动设置 pdfjsLib.GlobalWorkerOptions.workerSrccompare(buf1, buf2)

  • buf1 (ArrayBuffer): 旧版 (基准) PDF 文件流。
  • buf2 (ArrayBuffer): 新版 (当前) PDF 文件流。
  • 返回值: Promise<Array<DiffResult>>