@krakz999/tabula-node

v1.0.6

Published

4 months ago

A package for extracting tables from PDF files written in Typescript, build on top of tabula-java.

Downloads

0High
0Medium
0Low

krakz

pdf csv pdf to csv tabula table extractor

@krakz999/tabula-node

A package for extracting tables from PDF files. The package is a wrapper completely written in Typescript, built on top of the popular tabula-java library.

Installation

To use this package, install it via your favourite package manager:

pnpm i @krakz999/tabula-node

Usage

import { extractTables } from '@krakz999/tabula-node';

const results = await extractTables("./test.pdf", {
    pages: "all",
    guess: true
});

console.log(results);

Options

The extractTables function accepts an options object with the following properties:

area: Portion of the page to analyze. Example: "269.875,12.75,790.5,561". Accepts top,left,bottom,right i.e. y1,x1,y2,x2 where all values are in points relative to the top left corner. If all values are between 0-100 (inclusive) and preceded by '%', input will be taken as % of actual height or width of the page. Example: "%0,0,100,50". To specify multiple areas, pass an array. Default is entire page.
columns: X coordinates of column boundaries. Example "10.1,20.2,30.3". If all values are between 0-100 (inclusive) and preceded by '%', input will be taken as % of actual width of the page. Example: "%25,50,80.6"
format: Output format: (CSV,TSV,JSON). Default: CSV
guess: Guess the portion of the page to analyze per page.
lattice: Force PDF to be extracted using lattice-mode extraction (if there are ruling lines separating each cell, as in a PDF of an Excel spreadsheet)
noSpreadsheet: [Deprecated in favor of -t/--stream] Force PDF not to be extracted using spreadsheet-style extraction (if there are no ruling lines separating each cell)
pages: Comma separated list of ranges, or all. Examples: "1-3,5-7", --pages 3 or "all". Default is "1"
password: Password to decrypt document. Default is empty
stream: Force PDF to be extracted using stream-mode extraction (if there are no ruling lines separating each cell)
useLineReturns: Use embedded line returns in cells. (Only in spreadsheet mode.)

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@krakz999/tabula-node

Installation

Usage

Options