@markitdownjs/pdf
v0.2.0
Published
PDF converter for MarkItDownJS
Downloads
428
Readme
@markitdownjs/pdf
PDF to AST converter for MarkItDownJS, powered by pdfjs-dist. Extracts text content, document structure, metadata, and page information.
Install
npm install @markitdownjs/pdfpdfjs-dist is a peer dependency and will be installed automatically.
Usage
import { MarkItDown } from "@markitdownjs/core";
import { PdfConverter } from "@markitdownjs/pdf";
const parser = new MarkItDown();
parser.registerConverter(new PdfConverter());
const result = await parser.convert({ source: pdfBuffer, mimeType: "application/pdf" });
console.log(result.markdown);Key Exports
| Export | Description |
|---|---|
| PdfConverter | Converter plugin — register with MarkItDown |
What Gets Extracted
- Text content with paragraph and heading detection
- Page boundaries (preserved as
pageNumberin AST nodes) - Document metadata: title, author, creation date, subject
- Multi-column layout heuristics
Options
parser.registerConverter(new PdfConverter({
extractMetadata: true,
preservePageBreaks: true,
}));Accepted MIME Types
application/pdf
Part of the MarkItDownJS monorepo.
