office-md
v0.1.6
Published
Native Node/Bun bindings for OfficeMD document extraction
Maintainers
Readme
office-md
Fast Office document extraction for LLMs and agents. Native Node/Bun bindings for converting DOCX, XLSX, CSV, PPTX, and PDF into clean markdown, structured JSON IR, and Docling output.
Install
npm install office-md
# or
bun add office-mdFor a one-shot run without installing:
npx office-md markdown report.docx
bunx office-md markdown report.docxCLI
office-md markdown report.docx
office-md markdown budget.xlsx --sheets "Summary,Q1"
office-md render report.docx
office-md diff old.docx new.docx
office-md diff old.docx new.docx --htmlSDK
import { readFileSync } from "node:fs";
import { markdownFromBytes, extractIrJson, doclingFromBytes } from "office-md";
const content = readFileSync("report.docx");
// Markdown
console.log(markdownFromBytes(content, "docx"));
// Structured JSON IR
console.log(extractIrJson(content, "docx"));
// Docling JSON
console.log(doclingFromBytes(content, "docx"));API
detectFormat(content)- detect document format from bytesextractIrJson(content, format?)- extract intermediate representation as JSONmarkdownFromBytes(content, format?, options...)- render as markdownmarkdownFromBytesBatch(contents, format?, workers?, options...)- parallel markdown renderingextractSheetNames(content)- list XLSX sheet namesextractTablesIrJson(content, options...)- extract XLSX table data as JSONextractCsvTablesIrJson(content, options...)- extract CSV table data as JSONinspectPdfJson(content)- PDF diagnostics as JSONinspectPdfFontsJson(content)- PDF font information as JSONdoclingFromBytes(content, format?)- convert to Docling JSON
Supported Formats
| Format | Extension | Markdown | JSON IR | Docling | | ---------- | --------- | -------- | ------- | ------- | | Word | .docx | yes | yes | yes | | Excel | .xlsx | yes | yes | yes | | CSV | .csv | yes | yes | - | | PowerPoint | .pptx | yes | yes | yes | | PDF | .pdf | yes | yes | - |
License
MIT
