npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@rs-pdf/core

v0.1.25

Published

High-performance PDF to HTML converter — MuPDF via Rust/napi-rs, Node.js bindings

Readme


Why

  • MuPDF is the fastest PDF renderer available - same engine used by Foxit, Chrome, and Kindle
  • Pixel-perfect SVG output - text rendered as vector paths, no rasterization artifacts
  • Optional SEO text layer - transparent HTML text overlay, crawlable by search engines and copy-pasteable by users
  • Zero runtime dependencies - MuPDF is statically linked into the .node binary
  • Non-blocking - all rendering runs on Tokio's blocking thread pool, never blocking the Node.js event loop

Installation

npm install @rs-pdf/core
# or
pnpm add @rs-pdf/core

The correct native binary for your platform is installed automatically via optionalDependencies.

Supported platforms: macOS (arm64, x64) · Linux (x64, arm64 glibc) · Windows (x64)

Usage

All functions accept a single input object with either path (local file) or url (remote file). When url is given, the PDF is downloaded to a temporary location and cleaned up automatically.

Convert entire PDF

import { pdfToHtml } from '@rs-pdf/core';

// from local file
const result = await pdfToHtml({ path: '/path/to/file.pdf' });

// from URL
const result = await pdfToHtml({ url: 'https://example.com/document.pdf' });

console.log(result.pageCount); // total pages
console.log(result.pagesConverted); // pages actually converted
console.log(result.html); // self-contained HTML document

Page range & DPI

const result = await pdfToHtml({
  path: '/path/to/file.pdf',
  startPage: 0, // 0-based, default: 0
  endPage: 9,   // 0-based inclusive, default: last page
  dpi: 200,     // render quality, default: 150
});

SEO text layer

Adds a transparent HTML text overlay on top of the SVG - invisible to users, but indexed by search engine crawlers and copy-pasteable.

const result = await pdfToHtml({ path: '/path/to/file.pdf', seoTextLayer: true });
// result.html contains: SVG visual layer + <div class="tl"> text overlay

DRM-protected PDFs

const result = await pdfToHtml({ path: '/path/to/protected.pdf', password: 'secret' });

Stream page by page

Yields pages as they are converted - useful for large PDFs or when you want to process/save pages without waiting for the entire document.

import { pdfToHtmlStream } from '@rs-pdf/core';

for await (const page of pdfToHtmlStream({ path: '/large.pdf' })) {
  console.log(`Page ${page.pageIndex + 1}/${page.pageCount}`);
  await saveToDatabase(page.html);
}

Use concurrency to prefetch multiple pages in parallel:

for await (const page of pdfToHtmlStream({ url: 'https://example.com/doc.pdf', concurrency: 4 })) {
  process(page);
}

Single page

import { pdfPageToHtml } from '@rs-pdf/core';

const page = await pdfPageToHtml({ path: '/path/to/file.pdf', pageIndex: 3 });
// page.html is a fragment - no DOCTYPE/html/head/body

Metadata only

import { pdfInfo } from '@rs-pdf/core';

const info = await pdfInfo({ path: '/path/to/file.pdf' });
// or: await pdfInfo({ url: 'https://example.com/doc.pdf' })
// { pageCount, isDrmProtected, title, author, subject, creator }

Worker pool

Limit concurrent PDF conversions when processing large batches:

import { PdfWorkerPool } from '@rs-pdf/core';

const pool = new PdfWorkerPool({ concurrency: 4 });

const results = await Promise.all(
  pdfPaths.map((p) => pool.convert({ path: p, dpi: 150 }))
);

// stream via pool
for await (const page of pool.stream({ url: 'https://example.com/large.pdf' })) {
  process(page);
}

pool.destroy();

API

All functions accept a single input object. Provide either path or url — not both.

pdfToHtml(input): Promise<PdfConvertResult>

Converts all (or a range of) pages to a self-contained HTML document.

pdfPageToHtml(input): Promise<PdfPageResult>

Converts a single page to an HTML fragment (no DOCTYPE/html/head/body).

pdfToHtmlStream(input): AsyncGenerator<PdfPageResult>

Yields pages one by one as they are converted.

pdfInfo(input): Promise<PdfInfo>

Returns document metadata without rendering. Safe to call on DRM-protected PDFs.

PdfWorkerPool

Concurrency-limited pool. See Worker pool above.

Input fields

| Field | Type | Default | Applies to | Description | | -------------- | --------- | --------- | ------------------- | ------------------------------------------- | | path | string | - | all | Local file path (mutually exclusive with url) | | url | string | - | all | Remote URL — downloaded automatically | | pageIndex | number | - | pdfPageToHtml | 0-based page index (required) | | startPage | number | 0 | all except pdfInfo| First page to convert (0-based) | | endPage | number | last page | all except pdfInfo| Last page to convert (0-based, inclusive) | | password | string | - | all | Password for DRM-protected PDFs | | dpi | number | 150 | all except pdfInfo| Render quality (higher = larger output) | | seoTextLayer | boolean | false | all except pdfInfo| Add transparent HTML text overlay for SEO | | concurrency | number | 1 | pdfToHtmlStream | Pages to prefetch in parallel |

HTML output structure

<!-- Full document (pdfToHtml) -->
<!DOCTYPE html>
<html>
  <head>
    ...
  </head>
  <body>
    <div class="page" id="page-1" data-page="1" data-total="42">
      <div style="position:relative; width:...px; height:...px">
        <!-- Visual layer: pixel-perfect SVG (text as vector paths) -->
        <svg>...</svg>

        <!-- SEO text layer (only when seoTextLayer: true) -->
        <!-- Invisible to users, readable by crawlers, copy-pasteable -->
        <div class="tl" style="color:transparent; ...">
          <p><span>Actual text content from PDF</span></p>
        </div>
      </div>
    </div>
  </body>
</html>

Development

# Install dependencies
pnpm install

# Build native addon (Rust → .node)
pnpm build:native

# Build TypeScript
pnpm build:ts

# Run tests
pnpm test

# Build everything
pnpm build

Requirements: Rust stable, Node.js 18+, pnpm

License

MIT