npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

pdftables-ts

v0.2.1

Published

TypeScript library to extract tables from PDFs using glyph coordinates, preserving empty cells.

Readme

pdftables-ts

TypeScript library to extract table structures from PDFs using glyph coordinates, preserving empty cells and headers.

Features

  • Infers rows/columns automatically from PDF glyph positions (using pdfjs-dist)
  • Returns structured tables with bounding boxes and cell text
  • Convenience helper to map tables into header/value objects with numeric inference and fallback column names
  • Works in Node environments without PDF workers

Installation

npm install pdftables-ts

pdfjs-dist is included as a dependency.

Quick start

import { readFileSync } from "fs";
import { PdfTableExtractor } from "pdftables-ts";

const buf = readFileSync("path/to/file.pdf");
const arrayBuffer = buf.buffer.slice(buf.byteOffset, buf.byteOffset + buf.byteLength);

const extractor = new PdfTableExtractor();
const tables = await extractor.extractTables(arrayBuffer, {
  xTolerance: 4,
  yTolerance: 4,
  // Optional: anchor detection to explicit headers (newlines are collapsed to spaces).
  columnHeaders: [
    "PERIODE",
    "BELPEX",
    "BELPEX RLP\n/\nBELPEX S21",
    "ENDEX20D101",
    "ENDEX101",
    "EPEXDAMREK",
    "EPEXDAMRLP",
  ],
  // Optional: stop when a larger vertical gap is found below the last row (PDF units).
  endOfTableWhitespace: 12,
});

for (const table of tables) {
  console.log(`Page ${table.pageIndex} - rows: ${table.rows.length}`);
  console.log(table.rows.map(row => row.cells.map(c => c.text.trim())).join("\n"));
}

If you prefer ready-to-use objects keyed by column headers (empty headers become column1, column2, ... and numeric columns are parsed):

const tables = await extractor.extractTablesAsObjects(arrayBuffer, { decimalSeparator: "," });
console.log(tables[0]?.headers); // ['Header A', 'Header B', ...]
console.log(tables[0]?.rows[0]);  // { Header A: '...', Header B: 12.34 }

API

  • extractTables(buffer, options?) => Promise<ParsedTable[]>
    • Returns rows/cells with bounding boxes for each detected table.
  • extractTablesAsObjects(buffer, options?) => Promise<TableObjects[]>
    • Uses the first row as headers and maps subsequent rows into plain objects.

TableExtractionOptions:

  • xTolerance (default 3): tolerance in PDF units when grouping glyphs into columns.
  • yTolerance (default 3): tolerance in PDF units when grouping glyphs into rows.
  • decimalSeparator (default "."): decimal separator to use when parsing numeric columns in extractTablesAsObjects.
  • columnHeaders (optional string[]): when provided, the detector first tries to locate these headers (newlines inside labels are treated as spaces) and anchor columns to them, falling back to auto-detection if they cannot be matched.
  • endOfTableWhitespace (optional number, default Infinity): when guiding by headers, stop parsing if a vertical gap larger than this is found below the last parsed row.
  • minColumnCount (default 2): lower bound when inferring column count.
  • maxColumnCount (default 15): upper bound when inferring column count.

Manual test runner

npm run build
node dist/test/manual-run.js path/to/file.pdf

This prints detected tables to stdout for quick inspection.

Development

  • Install deps: npm install
  • Run tests: npm test
  • Build typed output: npm run build
  • Manual PDF check: npm run run:pdf -- path/to/file.pdf

Publishing checklist

  • Replace your-username in package.json repository/bugs/homepage URLs with your GitHub org/user (or set a scoped name you own).
  • Commit all changes and push to a public GitHub repo.
  • Bump the version (npm version patch|minor|major), then publish: npm publish --access public.
  • Verify in a fresh project: npm install pdftables-ts and import it.

License

MIT © Steven Decock