smart-pdf-compressor
v0.1.1
Published
Smart PDF compression CLI with Ghostscript, automatic analysis, and batch processing for macOS
Maintainers
Readme
Smart PDF Compressor is for people who sit on gigabytes of PDF scans and are tired of full disks and attachment size limits.
Instead of running
gsby hand on one folder after another or uploading confidential documents to random websites, you run a single command:spdf ./documents ./compressedThe tool will:
- analyze each PDF and choose an appropriate compression strategy;
- save tens of percent of disk space on large directory trees;
- protect your original files: it never modifies, deletes, or overwrites them;
- skip already optimized documents, so you do not waste time recompressing them.
Smart PDF Compressor is a good fit for:
- large archives of scans (legal, medical, accounting, government documents);
- project and design directories that keep growing because of heavy PDFs;
- CI/CD pipelines and automation where artifact and log sizes matter.
Unlike online compressors, Smart PDF Compressor runs locally on your macOS machine, keeps documents on your disk, and scales from dozens to tens of thousands of files in a single run.
Smart PDF Compressor
Smart PDF Compressor is a production-oriented macOS CLI for rule-based PDF compression across large directory trees.
It does not compress PDF files blindly. Each PDF is analyzed first, classified, processed with an appropriate Ghostscript profile, and validated after compression. Original input files are never modified.
WARNING:
This package will NOT work without Ghostscript installed.
System Requirements
Smart PDF Compressor requires Ghostscript installed on the operating system.
The npm package is a high-level processing and orchestration tool.
Actual PDF compression is performed by Ghostscript.
Current support:
- macOS
Planned:
- Linux
- Windows
Runtime requirements:
- macOS
- Node.js 20 or newer
- Ghostscript available through the
gscommand
Installing Ghostscript
macOS installation
brew install ghostscriptVerify installation
gs --versionExpected result:
10.05.1Any recent Ghostscript version should work, but the command must be available in your terminal through PATH.
Installation
npm install -g smart-pdf-compressorCLI commands installed by the package:
smart-pdf-compressor
spdfspdf is the short alias.
First Run
- Install Ghostscript
brew install ghostscript- Verify installation
gs --version- Install npm package
npm install -g smart-pdf-compressor- Initialize config
spdf init- Run diagnostics
spdf --doctor- First compression
spdf ./documents ./compressedUsage
spdf <input-folder> <output-folder> [options]Example:
spdf ./documents ./compressedThe input folder may contain many nested directories. The output folder mirrors the input structure and preserves Unicode filenames, including Cyrillic and Ukrainian paths.
Example input:
documents/
scans/passport.pdf
scans/photo.jpg
notes/info.txtExample output with --copy-all:
compressed/
scans/passport.pdf
scans/photo.jpg
notes/info.txtPDF files always go through the PDF processing pipeline. Non-PDF files are copied only when --copy-all is enabled.
Commands
spdf init
Creates:
smart-pdf.config.jsonspdf --doctor
Checks:
- Ghostscript installation
- PATH access
- filesystem permissions
- available disk space
spdf config show
Prints the active configuration after loading defaults, local config, and CLI overrides.
spdf config reset
Resets smart-pdf.config.json to default values.
Runtime Flags
Runtime flags do not get saved to the config file.
--dry-runanalyzes files without writing output files.--verboseenables detailed logs.--debugenables debug logs.--workers=NUMBERoverrides worker count. Maximum supported value is4.--copy-allcopies all non-PDF files through stream processing.--on-conflict=MODEcontrols output conflicts. Supported modes:skip,overwrite,rename.--silentdisables macOS notifications.
CLI flags override config values for the current run only.
Examples
Basic compression:
spdf ./docs ./outCopy all files while processing PDFs:
spdf ./docs ./out --copy-allDry run:
spdf ./docs ./out --dry-runVerbose mode:
spdf ./docs ./out --verboseDebug mode:
spdf ./docs ./out --debugLimit workers:
spdf ./docs ./out --workers=2Rename output files when a conflict exists:
spdf ./docs ./out --copy-all --on-conflict=renameSkip notifications:
spdf ./docs ./out --silentRun diagnostics:
spdf --doctorShow active config:
spdf config showSafe Mode
Smart PDF Compressor is designed to protect source documents.
The input folder is treated as read-only.
The tool never:
- modifies original files
- deletes original files
- overwrites original files
- creates temporary files inside the input folder
Temporary operations happen in:
- memory
- the application temp directory
- the output folder
Safe mode also prevents dangerous path layouts:
output-foldermust not equalinput-folderoutput-foldermust not be insideinput-folder
If violated, the process exits immediately:
ERROR:
Input and output folders must be different.or:
ERROR:
Output folder must not be inside input folder.PDF Analysis
Before compression, every PDF is analyzed for:
- image count
- embedded image dimensions
- approximate DPI
- JPEG and PNG-like streams
- embedded font size
- compressed stream ratio
- scan detection
- text percentage
- estimated reducibility
The analyzer uses pdfjs-dist for text-layer inspection and low-level PDF stream scanning for image/font signals.
Compression Logic
Compression modes:
aggressivefor heavy scansmediumfor mixed documentslightfor mostly text PDFs
Rules:
- Heavy scans use aggressive compression.
- Mostly text PDFs use light optimization.
- Mixed PDFs use medium compression.
If estimated reduction is below 10%, the file is skipped:
skip: already optimizedAfter compression, the output file is validated by size. If actual reduction is below 5%, the compressed version is removed and the original remains untouched:
compression not effectiveCopy-All Mode
--copy-all preserves full directory content, not just PDFs.
Copied file types include:
- images
- videos
- text files
- JSON
- archives
- any other non-PDF file
Important:
PDF files are never copied directly. PDF files always go through:
- analysis
- compression decision
- Ghostscript processing when useful
- post-compression validation
Non-PDF files are copied with stream processing using fs.createReadStream, fs.createWriteStream, and stream.pipeline.
Conflict modes:
skipoverwriterename
Example:
spdf ./docs ./out --copy-all --on-conflict=skipConfig System
Smart PDF Compressor automatically looks for:
smart-pdf.config.jsonin the current working directory.
spdf init creates a config file with a $schema field so editors such as VS Code can provide validation and autocomplete.
Loading order:
- built-in defaults
smart-pdf.config.json- CLI runtime flags
CLI flags override config values but are not persisted.
Example config:
{
"$schema": "https://raw.githubusercontent.com/cobchenyuk/smart-pdf-compressor/main/schema/smart-pdf.config.schema.json",
"compression": {
"minEstimatedReductionPercent": 10,
"minSavingsPercent": 5,
"skipOptimized": true,
"aggressiveScanCompression": true
},
"workers": {
"maxWorkers": 4
},
"logging": {
"saveLogs": true,
"logLevel": "info"
},
"reports": {
"generateJsonReport": true
},
"copyAll": {
"enabled": false,
"onConflict": "skip",
"preserveTimestamps": true
},
"notifications": {
"enabled": true,
"success": true,
"errors": true,
"warnings": true
},
"performance": {
"largeFileThresholdMB": 100,
"maxConcurrentLargeFiles": 1
},
"safety": {
"safeMode": true
}
}Config validation checks value types. Example error:
ERROR:
workers.maxWorkers must be a numberTypeScript Support
Smart PDF Compressor ships TypeScript declaration files for its public programmatic API.
Package metadata exposes:
{
"types": "./types/index.d.ts"
}Example:
import {
analyzePdf,
checkGhostscript,
loadConfig
} from "smart-pdf-compressor";
const gs = await checkGhostscript();
const { config } = await loadConfig({ cwd: process.cwd() });
const analysis = await analyzePdf("./document.pdf", {
config,
logger: console
});Config types are also available:
import type { SmartPdfConfig } from "smart-pdf-compressor";The JSON Schema is published with the package and exported as:
import schema from "smart-pdf-compressor/config-schema";For editor validation, use:
{
"$schema": "https://raw.githubusercontent.com/cobchenyuk/smart-pdf-compressor/main/schema/smart-pdf.config.schema.json"
}Logging
Logger levels:
infowarnerrordebug
Logs are written in realtime to the console and persisted under:
<output-folder>/.smart-pdf-compressor/logs/The log stream is opened early so logs survive long-running processing failures as much as possible.
Realtime Dashboard
The dashboard shows:
- total files
- processed PDF files
- compressed PDF files
- skipped PDF files
- copied files
- failed files
- current file
- elapsed time
- ETA
- processing speed
- saved bytes
Reports
After completion, a summary is printed:
- total files
- compressed files
- skipped files
- copied files
- failed files
- original total size
- final total size
- total saved space
- average compression ratio
JSON report path:
<output-folder>/.smart-pdf-compressor/report.jsonExample:
{
"processedPdf": 120,
"compressedPdf": 87,
"skippedPdf": 30,
"copiedFiles": 42,
"failed": 7,
"savedBytes": 182736182
}Notifications
Smart PDF Compressor supports native macOS desktop notifications for long-running processing tasks.
Notifications help track:
- completion status
- errors
- warnings
- large file events
without constantly watching the terminal.
Notifications currently supported:
- macOS
The current implementation uses native macOS notifications through osascript. The notification layer is isolated behind a small internal interface so future Linux and Windows support can be added with matching behavior.
Notification Events
Processing Completed
Sent when processing finishes successfully.
Example:
Title:
Compression complete
Message:
37 files processed
541 MB saved
23 secondsProcessing Failed
Sent when a critical error occurs.
Example:
Title:
Processing failed
Message:
Check logs for details.Large File Detected
Sent when an extremely large file is detected.
Example:
Title:
Large file detected
Message:
medical_scan.pdf
2.4 GBThe threshold is controlled by:
{
"performance": {
"largeFileThresholdMB": 100
}
}Low Disk Space
Sent when available disk space near the output folder drops below the configured warning threshold.
Example:
Title:
Low disk space
Message:
Only 420 MB available near output folder.The threshold is controlled by:
{
"safety": {
"lowDiskSpaceWarningMB": 512
}
}Recovery Warning
Sent when the previous session appears to have ended before clean shutdown.
Example:
Title:
Recovery warning
Message:
Previous session did not finish cleanly. Check logs.Smart PDF Compressor writes a small session marker under:
<output-folder>/.smart-pdf-compressor/session.lockThe marker is removed after a clean completion.
Notification Config
{
"notifications": {
"enabled": true,
"success": true,
"errors": true,
"warnings": true
}
}Config options:
enabledenables notifications globally.successenables notifications for successful completion.errorsenables notifications for failures.warningsenables notifications for warnings.
Runtime Flag
Use --silent to temporarily disable notifications for the current run.
Example:
spdf ./docs ./out --silent--silent does not modify smart-pdf.config.json.
Anti-Spam Protection
The notification system is designed to prevent spam.
Repeated notifications are throttled by event type. If many files fail, Smart PDF Compressor uses an aggregated notification instead of sending one notification per file.
Example:
Title:
Multiple processing errors
Message:
12 files failed.Background Usage
Notifications are especially useful for:
- huge directory processing
- long-running compression tasks
- background terminal sessions
The design goal is to improve the UX of long-running CLI operations without interrupting the workflow.
Large File Handling
Smart PDF Compressor is designed for large file collections.
It uses controlled worker concurrency and a separate limit for large files:
{
"performance": {
"largeFileThresholdMB": 100,
"maxConcurrentLargeFiles": 1
}
}For non-PDF files, copying is stream-based and avoids loading large files into RAM.
For PDF analysis, the tool limits low-level structure reads to avoid memory spikes. Ghostscript performs the actual compression in a separate process.
Architecture
The npm package orchestrates processing.
Ghostscript performs compression.
Node.js handles:
- analysis
- orchestration
- logging
- monitoring
- filesystem processing
- worker management
- config loading and validation
- reporting
- macOS notifications
Project structure:
src/
analyzer/
compressor/
workers/
streams/
logger/
reports/
notifications/
config/
cli/
utils/Design Philosophy
Smart PDF Compressor is built around:
- safe filesystem operations
- original files protection
- rule-based compression decisions
- stream-based processing
- large file support
- production reliability
- predictable logs and reports
- conservative defaults
The tool favors preserving data safety over forcing compression. If a compressed PDF is not meaningfully smaller, it is discarded.
Authorship
Smart PDF Compressor is authored and maintained by cobchenyuk [email protected].
Author profile: Andrey Sobchenyuk
Troubleshooting
Ghostscript not found
Error:
ERROR:
Ghostscript not found.Solution:
brew install ghostscriptThen verify:
gs --versiongs command not available
Problem:
Ghostscript is installed but the gs command is not found.
Recommendations:
- restart terminal
- check
PATH - run:
which gsIf which gs prints nothing, the Ghostscript binary is not available in your shell path.
Output folder rejected
The output folder must not be the same as input and must not be nested inside input.
Use a separate sibling folder:
spdf ./documents ./compressedConfig validation error
Example:
ERROR:
workers.maxWorkers must be a numberOpen smart-pdf.config.json and make sure the value type is correct:
{
"workers": {
"maxWorkers": 4
}
}Disclaimer
This tool performs filesystem operations on large file collections.
Always keep backups of important documents.
