npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pdf-normalize

v1.1.0

Published

Normalize messy PDFs for fast web delivery and reliable document ingestion.

Downloads

195

Readme

pdf-normalize

Normalize messy PDFs for fast web delivery, reliable document ingestion, and AI/RAG pipelines.

What is normalization?

Normalization turns inconsistent or broken PDFs into clean, predictable files. The pipeline:

| Step | What it does | Benefit | |------|--------------|---------| | Repair | Fixes corrupt cross-reference tables, malformed objects, broken trailers | Unreadable files become parseable | | Linearize | Reorders bytes so the first page loads first (PDF "fast web view") | Faster perceived load in browsers | | Compress | Re-encodes with Ghostscript (ebook quality) | Smaller file size, standard structure |

You get a single, compact PDF that behaves the same across viewers and tools—no more silent failures or random parser errors.

Why use it?

For AI and RAG pipelines: LLMs and retrieval systems rely on reliable text extraction. Corrupt or non-standard PDFs cause extraction failures, empty chunks, or gibberish. pdf-normalize repairs and standardizes files so your ingestion pipeline sees a consistent format, fewer parse errors, and better-quality chunks.

For web delivery: Linearized PDFs show the first page faster. Compressed files load quicker and cost less to store and serve.

For document workflows: Batch-process scanned docs, emailed attachments, or legacy exports before archiving or OCR—one tool, one pipeline.

Install

npm install pdf-normalize
# or run without installing
npx pdf-normalize file.pdf

System dependencies

Uses qpdf, Ghostscript, and Poppler (or MuPDF). On first run, missing tools are installed via your package manager (Homebrew on macOS, Scoop on Windows, apt/dnf on Linux). One-time setup only.

If auto-install fails:

  • macOS: brew install qpdf ghostscript poppler
  • Linux (apt): sudo apt-get update && sudo apt-get install -y qpdf ghostscript poppler-utils
  • Linux (dnf): sudo dnf install -y qpdf ghostscript poppler-utils
  • Windows (Scoop): scoop install qpdf ghostscript poppler

CLI

npx pdf-normalize path/to/file.pdf

Writes path/to/file.normalized.pdf and prints progress (repaired, linearized, compressed).

Exit codes: 0 success | 1 error (file not found, bad path) | 2 unrecoverable PDF (still writes best-effort output)

Library

import { normalizePDF } from "pdf-normalize";

const { pdf, metadata } = await normalizePDF("file.pdf");
console.log(metadata);
// { status: "success", pages: 22, size_before: "18.0 MB", size_after: "5.0 MB", linearized: true, text_layer: true }

Write to a file:

const { pdf } = await normalizePDF("file.pdf", { outputPath: "out/normalized.pdf" });
// or
const { pdf } = await normalizePDF("file.pdf");
require("fs").writeFileSync("out/normalized.pdf", pdf);

License

ISC