npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pdf2md-cli

v1.0.0

Published

Convert PDFs to Markdown from the command line — supports local files and web URLs

Downloads

131

Readme

pdf2md-cli

Convert any PDF to clean Markdown — from the command line.
Smart heading detection, page separators, and full source metadata. Works with local files and remote URLs.

npm version Node.js License: MIT PRs Welcome


Install

npm install -g pdf2md-cli

Requires Node.js 18+.


Usage

Convert a local PDF

pdf2md local ./report.pdf

Fetch and convert a remote PDF

pdf2md web https://arxiv.org/pdf/2103.00020.pdf

Flags

| Flag | Short | Description | |---|---|---| | --output <file> | -o | Custom output filename (default: same name as PDF) | | --clipboard | -c | Copy Markdown to clipboard after saving |

Examples

# Save to a custom path
pdf2md local ./research.pdf --output ./notes/research.md

# Fetch remote PDF and copy result to clipboard
pdf2md web https://example.com/paper.pdf --clipboard

# Full example with both flags
pdf2md local ./invoice.pdf -o ./invoices/invoice.md -c

Output Format

Every conversion produces a structured Markdown file:

<!-- Source: report.pdf | Extracted: Apr 18, 2026, 06:00 PM -->

## Page 1

# Document Title

## Section Heading

Paragraph text flows here with proper line breaks
and paragraph spacing preserved from the original PDF.

---

## Page 2

### Subsection

Continued content...

Heading detection maps font sizes from the PDF's transform matrix:

| Font size | Markdown heading | |---|---| | ≥ 22px | # H1 | | ≥ 18px | ## H2 | | ≥ 15px | ### H3 | | > 14px (distinct size) | #### H4 |


Features

  • Smart heading detection — reads raw font-size data from the PDF transform matrix, not guesswork
  • Y-sorted text extraction — items sorted by visual position (top→bottom, left→right) before parsing
  • Page separators — each page is a clearly labeled ## Page N section with --- dividers
  • Source metadata — output header includes filename/URL and extraction timestamp
  • Colored terminal output — chalk-powered ✔ success, ✖ error, ⚠ warn messages
  • Spinner feedback — ora spinner shows progress during extraction
  • Graceful error handling — specific messages for missing files, encrypted PDFs, 404s, timeouts
  • Works offline — local mode requires no internet connection

Error Handling

| Scenario | Behavior | |---|---| | File not found | Clear error with resolved path | | Not a .pdf file | Rejects before attempting extraction | | Encrypted / password-protected PDF | Specific error message | | Scanned / image-only PDF | Detects no text content, explains why | | Unreachable URL | Distinguishes DNS failure vs. refused connection | | HTTP 404 / 403 | Status-specific error messages | | Request timeout (30s) | Clean timeout message |


Tech Stack

| Package | Version | Role | |---|---|---| | commander | ^12 | CLI commands & flags | | pdf-parse | ^1.1 | PDF text extraction (Node.js) | | axios | ^1.7 | Fetch remote PDFs | | chalk | ^5 | Colored terminal output | | ora | ^8 | Loading spinner |


Why This Project

Most PDF-to-Markdown tools treat extraction as a text dump — they grab raw characters and call it done. The result is flat, unstructured Markdown that loses all the hierarchy the original document had.

I originally built two Tampermonkey userscripts — one for browser-tab PDFs, one for local files — that solved this by reading the PDF's raw transform matrix to detect font sizes, then mapping those sizes to proper heading levels. After using them daily and finding them more reliable than any existing tool I tried, I ported the core logic to a Node.js CLI so it can live in developer workflows, automation scripts, and CI pipelines.

The result: Markdown that actually reflects the document's structure — with real headings, proper paragraph breaks, and page markers — ready to paste into Obsidian, Notion, or any Markdown editor.


Local Development

git clone https://github.com/Kevork-Nexacrawl-dev/pdf2md-cli.git
cd pdf2md-cli
npm install
npm link          # makes pdf2md available globally

pdf2md local ./test.pdf

Contributing

PRs are welcome. Open an issue first for any significant changes.


License

MIT © Kevork