npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@chnaaam/ppdf

v0.1.3

Published

A TypeScript PDF object extraction library inspired by pdfplumber.

Downloads

57

Readme

ppdf

ppdf is a TypeScript PDF extraction library inspired by pdfplumber.

It is built for Node.js and focuses on:

  • page access
  • character-level extraction
  • text, word, and search helpers
  • lines, rects, curves, images, and annotations
  • top-left coordinate handling like pdfplumber
  • bbox-based page filtering

Install

yarn add @chnaaam/ppdf

Quick Start

import { PPDF } from "@chnaaam/ppdf";

const pdf = await PPDF.open("./sample.pdf");
const page = await pdf.getPage(1);

const text = await page.extractText();
const chars = await page.getChars();
const words = await page.extractWords();
const links = await page.getHyperlinks();

console.log({
  pageCount: pdf.pageCount,
  text,
  firstChar: chars[0],
  firstWord: words[0],
  links,
});

await pdf.close();

Open A PDF

You can open a PDF from a file path, Uint8Array, or ArrayBuffer.

import { PPDF } from "@chnaaam/ppdf";

const fromPath = await PPDF.open("./document.pdf");
const fromBytes = await PPDF.open(bytes);
const fromBuffer = await PPDF.open(arrayBuffer);

await fromPath.close();
await fromBytes.close();
await fromBuffer.close();

Optional open options:

const pdf = await PPDF.open("./protected.pdf", {
  password: "secret",
  stopAtErrors: false,
});

Read Pages

const pdf = await PPDF.open("./document.pdf");

console.log(pdf.pageCount);

const page1 = await pdf.getPage(1);
const pages = await pdf.getPages();

console.log(page1.width, page1.height);
console.log(pages.length);

await pdf.close();

Extract Text

Plain text

const text = await page.extractText();
console.log(text);

Characters

const chars = await page.getChars();

for (const char of chars.slice(0, 5)) {
  console.log(char.text, char.x0, char.top, char.x1, char.bottom);
}

Each char includes:

  • text
  • fontname
  • size
  • matrix
  • x0, top, x1, bottom
  • width, height
  • page_number, doctop

Words

const words = await page.extractWords();

for (const word of words.slice(0, 5)) {
  console.log(word.text, word.x0, word.top, word.x1, word.bottom);
}

Search

Literal search:

const matches = await page.search("invoice", { regex: false });

Regex search:

const matches = await page.search(/total:\s+\$?\d+(?:\.\d+)?/i);

Extract Shapes And Other Objects

const lines = await page.getLines();
const rects = await page.getRects();
const curves = await page.getCurves();
const images = await page.getImages();
const annotations = await page.getAnnotations();
const hyperlinks = await page.getHyperlinks();

You can also collect everything in one call:

const objects = await page.getObjects();

Or across the whole document:

const allObjects = await pdf.getObjects();

Crop And Filter By Bounding Box

Bounding boxes use pdfplumber-style top-left coordinates:

type BBox = [x0, top, x1, bottom];

Crop to a region

const region = page.crop([50, 100, 300, 220]);
const regionChars = await region.getChars();

Keep only objects fully within a region

const inner = page.withinBBox([50, 100, 300, 220]);

Exclude a region

const outer = page.outsideBBox([50, 100, 300, 220]);

Filter with a predicate

const boldishChars = page.filter(
  (obj) => obj.object_type === "char" && "fontname" in obj && obj.fontname.includes("Bold"),
);

Coordinate System

ppdf normalizes coordinates to a top-left origin, matching pdfplumber.

That means:

  • x0 / x1 grow from left to right
  • top / bottom grow from top to bottom
  • doctop is the top offset in document space across pages

End-To-End Example

import { PPDF } from "@chnaaam/ppdf";

const pdf = await PPDF.open("./report.pdf");

for (const page of await pdf.getPages()) {
  const words = await page.extractWords();
  const links = await page.getHyperlinks();

  console.log(`page ${page.pageNumber}`);
  console.log(`words: ${words.length}`);
  console.log(`links: ${links.length}`);
}

await pdf.close();

Local Development

Install dependencies:

yarn

Type-check:

yarn run check

Build:

yarn build

Run tests:

yarn test

Run the character-accuracy comparison test:

yarn vitest run test/char-accuracy.test.ts

Render character bounding boxes over page images:

node --import tsx ./test/render-char-bboxes.ts ./reference_pdf/ref_pdf1.pdf ./tmp/ref1-compare 2 compare

Notes

  • ppdf is currently aimed at machine-generated PDFs.
  • Character geometry is designed to be close to pdfplumber, but full feature parity is not finished yet.
  • Some PDFs may still differ in font fallback behavior or CID text decoding because ppdf uses PDF.js internally.