pi-docparser
v3.0.1
Published
Pi package that adds document_parse, document_search, document_screenshot, and a companion skill for local document understanding with LiteParse v2.
Downloads
5,324
Maintainers
Readme
pi-docparser
A standalone pi package that adds local document-understanding tools plus a companion parse-document skill for AI agents.
It wraps @llamaindex/liteparse v2, a Rust/PDFium-based local parser. Documents stay on the local machine: no cloud calls, no LLM parsing, no API keys.
What this package provides
Extension tools
This package registers three tools:
| Tool | Purpose |
| --------------------- | -------------------------------------------------------------------------------------------------------------------------- |
| document_parse | Parse a local document to text or json, save the full result to a temp file, and optionally render screenshots. |
| document_search | Search a local document for a phrase and return page numbers plus bounding boxes for each hit. |
| document_screenshot | Render document pages as PNG images, return image blocks for direct model inspection, and save PNG files to a temp folder. |
Use document_parse for extraction, document_search for citations/source locations, and document_screenshot when visual layout, charts, signatures, dense tables, or page appearance matter.
Skill
Ships a parse-document skill that teaches agents to:
- prefer
document_parseover rawlitCLI commands - choose text vs JSON output deliberately
- search before screenshotting when looking for known text
- use screenshots only when visual inspection is useful
- keep large parsed outputs out of context until needed
LiteParse v2 behavior
LiteParse v2 is a Rust rewrite using PDFium for text extraction/rendering and native Tesseract for OCR. Compared with v1, it is substantially faster and exposes a simpler Node API:
const result = await parser.parse("document.pdf");
// result = { pages: ParsedPage[], text: string }
const screenshots = await parser.screenshot("document.pdf", [1, 2]);JSON output from document_parse is the full LiteParse v2 parse result:
{
"pages": [
{
"pageNum": 1,
"width": 612,
"height": 792,
"text": "...",
"textItems": [{ "text": "Revenue", "x": 72, "y": 120, "width": 48, "height": 12 }]
}
],
"text": "..."
}Removed LiteParse v1 options are not supported:
preciseBoundingBoxpreserveLayoutAlignmentAcrossPages
Alternatives for agents: use JSON textItems bounding boxes, document_search, document_screenshot, or narrower targetPages.
Supported inputs
This package uses LiteParse and supports the formats LiteParse supports locally, including:
- DOC / DOCX / DOCM / ODT / RTF / Pages
- PPT / PPTX / PPTM / ODP / Keynote
- XLS / XLSX / XLSM / ODS / CSV / TSV / Numbers
- PNG / JPG / JPEG / GIF / BMP / TIFF / WebP / SVG
Support for non-PDF formats may depend on host tools such as LibreOffice or ImageMagick. See Host dependencies.
Requirements
- pi installed and working
- Node.js 20.6+
- local machine access to the files you want to parse
Installation
pi install npm:pi-docparserOr from GitHub:
pi install git:github.com/maxedapps/pi-docparserExample model tool calls
These are representative tool calls pi may make internally.
Extract plain text
document_parse({
path: "./docs/contract.pdf"
})Useful for summarizing, quoting, reviewing, or answering questions where layout coordinates are not needed.
Extract JSON with bounding boxes
document_parse({
path: "./reports/financial-report.pdf",
format: "json",
targetPages: "1-3"
})Useful when an agent needs page structure, text coordinates, or bounding boxes.
Search for a phrase and get source locations
document_search({
path: "./reports/financial-report.pdf",
phrase: "Revenue grew",
targetPages: "1-10"
})Returns each hit with page number and bounding box, useful for citations and deciding which pages to screenshot.
Render pages for visual inspection
document_screenshot({
path: "./reports/financial-report.pdf",
pages: "4",
dpi: 150
})Useful for charts, figures, signatures, dense tables, and cases where extracted text is insufficient.
Parse a password-protected document
document_parse({
path: "./docs/protected.pdf",
password: "user-provided-password"
})Use offline/custom OCR data
document_parse({
path: "./scans/report.pdf",
ocr: "auto",
ocrLanguage: "eng",
tessdataPath: "/path/to/tessdata"
})tessdataPath points LiteParse/Tesseract at local .traineddata files. Most users do not need it; it is useful for air-gapped environments or custom language packs.
Tool behavior notes
document_parse
- Saves full parsed output to a temporary
.txtor.jsonfile. - Returns a short preview to avoid flooding model context.
- Supports
targetPages, OCR options,password,tessdataPath, and optionalscreenshotPages. - Defaults
maxPagesto LiteParse v2's default:1000.
document_search
- Parses the document and searches page
textItemswith LiteParse'ssearchItemshelper. - Returns structured hits with
pageNum,text,x,y,width,height, and optional confidence/font data. - Use before screenshotting when searching for known text.
document_screenshot
- Renders pages as PNG screenshots.
- Returns image content blocks the model can inspect directly.
- Also saves screenshots to temporary files and returns their paths.
- Can render supported non-PDF documents when required host conversion tools are installed.
OCR notes
LiteParse v2 uses built-in native Tesseract OCR by default when OCR is enabled and no ocrServerUrl is provided.
Important details:
- OCR is selective: LiteParse OCRs text-sparse pages or image regions rather than blindly OCRing everything.
- Built-in Tesseract typically uses ISO 639-3 language codes such as
eng,deu,fra,jpn. - Many HTTP OCR servers instead expect ISO 639-1 codes such as
en,de,fr,ja. ocrLanguagesis joined into a multilingual language string for built-in Tesseract.- When
ocrServerUrlis used, only the first entry fromocrLanguagesis forwarded. - For offline/custom OCR data, use
tessdataPathor setTESSDATA_PREFIX.
Host dependencies
This package relies on LiteParse for local parsing and conversion. Depending on the input format, you may need additional host tools installed.
The tools perform lightweight preflight checks for the most common host dependencies and also forward LiteParse's original error messages when conversion fails.
LibreOffice
Needed for many Office document, presentation, and spreadsheet conversion paths.
# macOS
brew install --cask libreoffice
# Ubuntu / Debian
apt-get install libreoffice
# Windows
choco install libreoffice-freshImageMagick
Needed for image-to-PDF conversion paths.
# macOS
brew install imagemagick
# Ubuntu / Debian
apt-get install imagemagick
# Windows
choco install imagemagick.appDoctor command
If parsing fails because a host dependency is missing, the extension points users to:
/docparser:doctorRun it inside pi to:
- detect the current operating system
- check whether LibreOffice and ImageMagick are available
- optionally focus the check on a specific file path
- suggest install commands for the current machine
- optionally attempt those install commands after user confirmation when that looks safe to automate
Examples:
/docparser:doctor
/docparser:doctor @./slides.pptxKnown limitations
- OCR quality depends on scan quality, page layout, and the chosen OCR language.
- Some conversion paths depend on external host tools.
- Full parse and screenshot outputs are written to temporary files by default, not directly into your repository.
- Native LiteParse v2 npm packages are platform-specific; unsupported platforms may need upstream LiteParse support first.
Third-party dependency: LiteParse
This package depends on:
@llamaindex/liteparse- license: Apache-2.0
- purpose: local document parsing, OCR, screenshots, search, and conversion support
LiteParse itself documents its own upstream dependencies and platform requirements. See:
- repository: https://github.com/run-llama/liteparse
- npm package: https://www.npmjs.com/package/@llamaindex/liteparse
- docs: https://developers.llamaindex.ai/liteparse/
Additional attribution details are listed in THIRD_PARTY_NOTICES.md.
Changelog
See CHANGELOG.md.
License
This package is licensed under the MIT License. See LICENSE.
