npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@pspdfkit/pdf-to-markdown

v0.2.2

Published

Standalone CLI wrapper for Nutrient's PDF-to-Markdown extractor

Downloads

693

Readme

Nutrient PDF to Markdown

License: Proprietary npm version macOS Linux Windows

Stop wasting your context window on PDF extraction.

Fast, accurate Markdown from PDFs — locally, with no cleanup required. Built for Claude, Codex, RAG pipelines, and document-heavy automation where noisy extraction burns tokens and makes downstream results less reliable.

  • How fast is it? — 0.011s per page. 48x faster than docling, 29x faster than pymupdf4llm. (benchmarks)
  • How accurate is it? — 0.93 reading order (best in class), 0.89 overall extraction accuracy, 0.82 heading detection. (benchmarks)
  • NEW: Image export--enable-image-export extracts images alongside Markdown for vision-capable LLMs. (usage)
  • Where do my PDFs go? — Nowhere. The CLI runs locally. Your documents are not uploaded to Nutrient. (trust & licensing)
  • What does it cost? — Free for up to 1,000 documents per calendar month. No license key, no signup, no API token. (license)

Install

Agent skill (recommended)

If you use Claude Code, Codex, Pi, Cursor, or Gemini CLI, install the Nutrient Skills plugin — the extraction runs automatically when your agent needs to read a PDF:

npx skills add pspdfkit-labs/nutrient-skills --skill pdf-to-markdown

Or with marketplace/plugin flows (Claude Code, Codex):

/plugin marketplace add pspdfkit-labs/nutrient-skills
/plugin install pdf-to-markdown@nutrient-skills

With Pi:

pi install git:github.com/PSPDFKit-labs/nutrient-skills

Once installed, just reference a PDF in your prompt — no extra commands needed:

"Extract the pricing table from proposal.pdf"

The skill invokes the CLI transparently and passes the resulting Markdown into your agent context.

Standalone CLI

For use outside an agent, install the published npm package:

npm install -g @pspdfkit/pdf-to-markdown

Or run it without a global install:

npx @pspdfkit/pdf-to-markdown --help

The package supports Node 18+ on macOS Apple Silicon, Linux x86_64, and Linux arm64.

If you prefer a shell installer, keep the curl fallback:

curl -fsSL https://raw.githubusercontent.com/PSPDFKit/pdf-to-markdown/main/install.sh | sh

This installs pdf-to-markdown into ~/.local/bin by default.

You can also install from a clone:

git clone https://github.com/PSPDFKit/pdf-to-markdown.git
cd pdf-to-markdown
./install.sh            # or: npm install -g .

Quick Check

After install, verify the CLI is available:

pdf-to-markdown --help

Usage

Single PDF

pdf-to-markdown input.pdf output.md

If output.md is omitted, Markdown is written to stdout.

Batch directory

pdf-to-markdown ./input-pdfs ./output-markdown

When both arguments are directories, the CLI converts every PDF in the input directory and writes matching Markdown files into the output directory.

Image export

pdf-to-markdown --enable-image-export input.pdf output.md

Extracts images from the PDF and saves them to output_resources/, referenced as standard Markdown image links in the output. Useful when feeding results to vision-capable LLMs or when image context improves downstream accuracy. Off by default because it increases processing time for image-heavy documents.

Platform Support

  • macOS Apple Silicon (Darwin/arm64)
  • Linux x86_64
  • Linux arm64
  • Windows x64 (coming soon)

Benchmarks

Benchmark results from 200 PDF documents with hand-annotated Markdown ground truth, evaluated using NID (reading order), TEDS (table structure), and MHS (heading hierarchy) metrics. All competitor libraries pinned to their latest versions as of 2026-04-23.

Visual Snapshot

Extraction accuracy

Reading order

Table structure

Heading level

Extraction speed

Faster with Nutrient

Accuracy

| Solution | Version | Overall | Reading Order (NID) | Table Structure (TEDS) | Heading Level (MHS) | | --- | --- | ---: | ---: | ---: | ---: | | Nutrient | 1.0.1 | 0.89 | 0.93 | 0.71 | 0.82 | | docling | 2.91.0 | 0.88 | 0.90 | 0.89 | 0.82 | | opendataloader-hybrid | 2.3.0 | 0.87 | 0.91 | 0.68 | 0.81 | | pymupdf4llm | 1.27.2 | 0.83 | 0.89 | 0.54 | 0.77 | | opendataloader | 2.3.0 | 0.83 | 0.90 | 0.48 | 0.74 | | markitdown | 0.1.5 | 0.59 | 0.84 | 0.27 | 0.00 | | pypdf | 6.10.2 | 0.58 | 0.87 | 0.00 | 0.00 | | liteparse | 1.2.1 | 0.57 | 0.86 | 0.00 | 0.00 |

Speed

| Solution | Seconds per page | | --- | ---: | | Nutrient | 0.011 | | pypdf | 0.019 | | opendataloader | 0.023 | | markitdown | 0.097 | | pymupdf4llm | 0.319 | | opendataloader-hybrid | 0.444 | | docling | 0.527 | | liteparse | 1.081 |

Faster with Nutrient

  • 98x faster than liteparse
  • 48x faster than docling
  • 40x faster than opendataloader-hybrid
  • 29x faster than pymupdf4llm
  • 9x faster than markitdown
  • 2x faster than opendataloader

For the full comparison table, see docs/benchmarks.md.

Trust and Licensing

  • Free for up to 1,000 documents per calendar month
  • PDFs stay local — your documents are not uploaded to Nutrient by this extractor
  • A commercial license is required for processing more than 1,000 documents per month
  • The extraction engine is delivered as a signed platform binary; the repo contains only the wrapper and documentation
  • The license is non-transferable — you may not redistribute the binary standalone or sublicense it to third parties; embedding it in your own application is permitted under the free tier terms

See LICENSE.md for the full terms and docs/distribution-model.md for details on what ships in this repo vs. the binary.

FAQ

What makes this different from other PDF extractors?

Speed and accuracy should not be a tradeoff. Most extractors are either fast but lose structure (markitdown, pymupdf4llm) or accurate but slow (docling). Nutrient extracts at 0.011s per page with the best reading order score (0.93), strong heading and table preservation — less cleanup, fewer wasted tokens, and more reliable downstream results.

Do my documents leave my machine?

No. The CLI processes PDFs locally. Nothing is uploaded to Nutrient. Note that if you feed the extracted Markdown into Claude, Codex, or another model provider, their own data policies apply.

Do I need a license key or API token?

No. There is no signup, no license key, and no API token. Install the CLI and start converting. The free tier (up to 1,000 documents per calendar month) is enforced via the license terms, not a technical gate. If you need to process more than 1,000 documents per month, contact [email protected] for a commercial license.

Why is the extraction engine closed-source?

The repo is designed to be reviewable — you can read the wrapper, the installer, and the documentation. The extraction engine is distributed as a signed binary to protect the implementation while keeping the CLI surface fully transparent.