npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@akrym1582/office-to-md-mcp

v0.0.2

Published

MCP server to convert Office/PDF documents into images and Markdown

Readme

office-to-md-mcp

A TypeScript Model Context Protocol (MCP) server that converts Excel, Word, and PDF documents into PNG page images, structured text, and Markdown — optimised for LLM consumption.


Features

| Tool | Input | Output | |---|---|---| | convert_excel_to_images | .xlsx / .xls | PNG images per page | | convert_word_to_images | .docx / .doc | PNG images per page | | convert_pdf_to_images | .pdf | PNG images per page | | extract_excel_text | .xlsx / .xls | Markdown (via image-based conversion) | | extract_word_text | .docx | Plain text or Markdown | | get_capabilities | — | Runtime dependency status |

extract_excel_text Conversion Pipeline

extract_excel_text converts Excel files to Markdown through the following image-based pipeline:

Excel (.xlsx/.xls)
  → Adjust print area and convert to PDF (Python UNO / LibreOffice)
    → Render PDF pages as PNG images (pdftoppm / ImageMagick)
      → Convert images to Markdown (GitHub Copilot SDK — gpt-5.4-mini)

This approach preserves not only cell data but also shapes, embedded images, and complex layouts with high fidelity.

⚠️ GitHub Copilot Premium Requests

extract_excel_text uses GitHub Copilot SDK's gpt-5.4-mini model for image-to-Markdown conversion. Each tool invocation consumes GitHub Copilot Premium Requests. The number of requests increases with the number of pages in the workbook.


Prerequisites

| Dependency | Purpose | Required | |---|---|---| | Node.js ≥ 18 | Runtime | ✅ | | LibreOffice (soffice) | Excel/Word → PDF | For image conversion | | poppler-utils (pdftoppm) | PDF → PNG | For image conversion | | Python 3 | Excel UNO helper | For best Excel rendering | | GITHUB_TOKEN env var | Copilot SDK auth | Required for extract_excel_text |

Install system dependencies (Ubuntu/Debian)

sudo apt-get install -y libreoffice poppler-utils python3

Install system dependencies (macOS)

brew install libreoffice poppler python3

Installation

npm install
npm run build

Running the server

npm start

The server communicates over stdio using the MCP protocol.

Environment variables

| Variable | Description | |---|---| | GITHUB_TOKEN | GitHub personal access token for Copilot SDK Markdown conversion | | COPILOT_MODEL | Copilot model to use for image-to-Markdown conversion (default: gpt-5.4-mini) | | LOG_LEVEL | Log verbosity: debug | info (default) | warn | error |


MCP Tool Reference

convert_excel_to_images

Converts an Excel workbook to PNG page images.
Uses the Python UNO helper (python/excel_to_pdf_uno.py) for accurate print-area handling when Python is available; falls back to LibreOffice CLI otherwise.

{
  "filePath": "/path/to/file.xlsx",
  "outputDir": "/tmp/output",
  "dpi": 150,
  "sheetNames": ["Sheet1"],
  "keepPdf": false
}

Response:

{
  "sourceType": "excel",
  "images": ["/tmp/output/page-1.png"],
  "pageCount": 1,
  "renderStrategy": "libreoffice-uno-print-area"
}

convert_word_to_images

Converts a Word document to PNG page images via LibreOffice.

{
  "filePath": "/path/to/file.docx",
  "outputDir": "/tmp/output",
  "dpi": 150,
  "keepPdf": false
}

convert_pdf_to_images

Renders each PDF page as a PNG image.

{
  "filePath": "/path/to/file.pdf",
  "outputDir": "/tmp/output",
  "dpi": 150
}

extract_excel_text

Converts an Excel workbook to Markdown via an image-based pipeline (Excel → print-area adjustment → PDF → images → Markdown). Handles shapes, embedded images, and complex layouts. Requires GITHUB_TOKEN.

{
  "filePath": "/path/to/file.xlsx",
  "dpi": 150,
  "sheetNames": ["Sheet1"]
}

Response:

{
  "sourceType": "excel",
  "textFormat": "markdown",
  "content": "## Page 1\n\n| Name | Age |\n| --- | --- |\n| Alice | 30 |",
  "images": ["/tmp/excel-images-xxx/page-1.png"],
  "pageCount": 1
}

Image-to-Markdown conversion uses GitHub Copilot SDK (default model: gpt-5.4-mini) and consumes Premium Requests.


extract_word_text

Extracts text from a .docx file using mammoth.

{
  "filePath": "/path/to/file.docx",
  "format": "markdown"
}

get_capabilities

Returns the runtime status of all system dependencies.

{}

Example response:

{
  "libreOffice": true,
  "libreOfficePath": "/usr/bin/soffice",
  "python": true,
  "pythonPath": "/usr/bin/python3",
  "pythonVersion": "Python 3.12.3",
  "unoHelper": true,
  "pdfRenderer": true,
  "pdfRendererTool": "pdftoppm",
  "githubToken": false
}

Project Structure

.
├── src/
│   ├── server.ts                      # MCP server entry point
│   ├── tools/                         # MCP tool implementations
│   │   ├── convertExcelToImages.ts
│   │   ├── convertWordToImages.ts
│   │   ├── convertPdfToImages.ts
│   │   └── extractExcelText.ts
│   ├── services/                      # Business logic / external integrations
│   │   ├── capabilityDetector.ts
│   │   ├── copilotCli.ts
│   │   ├── excelExtractor.ts
│   │   ├── fileType.ts
│   │   ├── libreOfficeCli.ts
│   │   ├── officePythonBridge.ts
│   │   ├── pdfRenderer.ts
│   │   ├── tempFiles.ts
│   │   └── wordExtractor.ts
│   ├── types/
│   │   ├── errors.ts                  # AppError + ErrorCode enum
│   │   └── toolSchemas.ts             # Zod schemas for all tools
│   └── utils/
│       ├── exec.ts                    # Subprocess wrapper with timeouts
│       ├── fs.ts                      # File system helpers
│       └── logger.ts                  # Stderr logger
├── python/
│   └── excel_to_pdf_uno.py            # LibreOffice UNO helper for Excel→PDF
├── test/
│   ├── fixtures/                      # Sample .xlsx, .docx, .pdf files
│   └── unit/                          # Unit tests
├── package.json
├── tsconfig.json
└── jest.config.js

Development

# Type-check without emitting
npm run typecheck

# Build
npm run build

# Run tests
npm test

# Lint
npm run lint

Error Codes

| Code | Meaning | |---|---| | FILE_NOT_FOUND | Input file does not exist | | UNSUPPORTED_FORMAT | File extension not supported | | LIBREOFFICE_NOT_FOUND | soffice not on PATH | | PYTHON_NOT_FOUND | Python interpreter not found | | LIBREOFFICE_UNO_CONVERSION_FAILED | Python UNO helper failed | | LIBREOFFICE_CLI_CONVERSION_FAILED | LibreOffice CLI conversion failed | | PDF_RENDER_TOOL_NOT_FOUND | pdftoppm/convert not on PATH | | PDF_RENDER_FAILED | PDF rendering failed | | EXCEL_TEXT_EXTRACTION_FAILED | ExcelJS read failure | | WORD_TEXT_EXTRACTION_FAILED | mammoth extraction failure | | GITHUB_TOKEN_MISSING | GITHUB_TOKEN env var not set | | COPILOT_MARKDOWN_FAILED | Copilot CLI returned an error | | INVALID_TOOL_INPUT | Zod schema validation failed |


Troubleshooting

LibreOffice not found
Install LibreOffice and ensure soffice is on your PATH.

pdftoppm not found
Install poppler-utils (apt-get install poppler-utils or brew install poppler).

Copilot SDK unavailable
Set GITHUB_TOKEN in your environment. The model used can be customised via the COPILOT_MODEL environment variable (default: gpt-5.4-mini).

Excel conversion uses LibreOffice CLI instead of UNO
Python 3 must be on PATH and python/excel_to_pdf_uno.py must exist alongside the server. Run get_capabilities to confirm.


License

MIT