@dreamyoungs/trex
v0.1.0
Published
Node.js wrapper for TREX PDF table extraction CLI
Readme
@dreamyoungs/trex
Node.js wrapper around the TREX CLI.
Prerequisites
- Node.js 18+
- Recommended: internet access during install (postinstall downloads prebuilt TREX binary)
Install
npm install @dreamyoungs/trexBy default, package install tries to download a matching TREX binary from GitHub Releases. If download is unavailable, set either:
TREX_BIN=/path/to/trexoptions.binPathin API calls
To skip binary download explicitly:
TREX_SKIP_DOWNLOAD=1 npm install @dreamyoungs/trexMaintainer: publish release binaries
When bumping package version, upload matching release assets first:
scripts/release/publish_assets.sh --version 0.1.0 --uploadSee scripts/release/README.md for details.
Usage
const { extract, extractCsv } = require("@dreamyoungs/trex");
async function run() {
const tables = await extract("./invoice.pdf", {
mode: "auto",
pages: [1, 2],
});
const csv = await extractCsv("./invoice.pdf", {
mode: "lattice",
});
console.log(tables.length, csv.length);
}
run().catch(console.error);Options
pages:number[] | string(e.g.[1,2,3]or"1-3,7")mode:"auto" | "lattice" | "stream" | "dl"dlModel,dlMinConfidence,dlFallbackbinPath: override TREX binary pathtimeoutMs: command timeout (default: 120000)eventLog,eventDocumentKey,eventTenantId,eventRequestId,eventFeedbackTag,eventTrainingOptIn
Buffer APIs
extractFromBuffer(buffer, options)extractCsvFromBuffer(buffer, options)
These methods write a temporary PDF file, run TREX, then clean up automatically.
