@noctuatech/pdf-cleaner
v0.2.3
Published
A library for cleaning PDF files by filtering out specified operations. Remove text or other unwanted elements from PDFs.
Downloads
400
Readme
PDF Cleaner
Module for easily removing text and other content from a pdf.
npm i @noctuatech/pdf-cleanerAPI
This package exposes a small set of functions through the cleaner() initializer which prepares the library and returns the available methods.
All methods operate on a PDF provided as a Uint8Array (or a Node.js Buffer which is compatible) and return a Uint8Array containing the modified PDF bytes.
cleaner()
Initializes the library and returns the PDFDocument class.
import { cleaner } from "@noctuatech/pdf-cleaner";
const PDFCleaner = await cleaner();PDFDocument.filterOperations
Filters content stream operators according to the provided list and mode (see Mode enum below).
import { cleaner, Mode } from "@noctuatech/pdf-cleaner";
import fs from "node:fs/promises";
const PDFCleaner = await cleaner();
const doc = await PDFCleaner.fromBytes(
await fs.readFile("./test.pdf")
);
const embeddedImagesRemoved = await doc.filterOperations(
["BI", "ID", "EI"],
Mode.Remove
);Cleaner.removeText
Removes text drawing operations from the PDF and returns the cleaned PDF bytes.
import { cleaner, Mode } from "@noctuatech/pdf-cleaner";
import fs from "node:fs/promises";
const PDFCleaner = await cleaner();
const doc = await PDFCleaner.fromBytes(
await fs.readFile("./test.pdf")
);
const documentWithNoText = await doc.removeText();Cleaner.leaveOnlyText
Keeps only text drawing operators and removes other content.
import { cleaner, Mode } from "@noctuatech/pdf-cleaner";
import fs from "node:fs/promises";
const PDFCleaner = await cleaner();
const doc = await PDFCleaner.fromBytes(
await fs.readFile("./test.pdf")
);
const documentWithOnlyText = await doc.leaveOnlyText();Types / enums
The Mode enum has two values:
enum Mode {
Keep = 0,
Remove = 1,
}Mode.Keep— when used withfilterOperationswill keep the listed operators and remove others.Mode.Remove— will remove the listed operators.
