npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@aidalinfo/office-to-markdown

v1.0.2

Published

Modern TypeScript library for converting Office documents (DOCX) to Markdown format, optimized for Bun runtime with enhanced table support and math equation conversion.

Readme

📄 @aidalinfo/office-to-markdown

npm version TypeScript Bun

A modern TypeScript library for converting Office documents (DOCX) to Markdown format, optimized for the Bun ecosystem with advanced support for mathematical equations and tables.

🔬 Created through reverse engineering of Microsoft's MarkItDown - A TypeScript reimplementation that brings Python's document conversion capabilities to the JavaScript/Bun ecosystem with enhanced performance and type safety.

🚀 Features

  • DOCX to Markdown conversion with structure preservation
  • Mathematical equation support (OMML → LaTeX)
  • Table handling with automatic formatting
  • Style preservation (bold, italic, headings)
  • Image processing with alt text
  • Simple and advanced API for different use cases
  • Robust error handling with specific error codes
  • Optimized performance with Bun runtime
  • Complete TypeScript types for better DX

📦 Installation

With Bun (recommended)

bun add @aidalinfo/office-to-markdown

With npm/yarn/pnpm

npm install @aidalinfo/office-to-markdown
# or
yarn add @aidalinfo/office-to-markdown
# or  
pnpm add @aidalinfo/office-to-markdown

Required Dependencies

The following dependencies are automatically installed:

  • mammoth - DOCX to HTML conversion
  • turndown - HTML to Markdown conversion
  • jszip - ZIP archive manipulation (DOCX)

🛠️ Conversion Workflow

The conversion process follows these steps:

  1. File Detection - MIME type and extension verification
  2. Preprocessing - DOCX content extraction and modification
  3. Math Processing - OMML → LaTeX conversion
  4. Main Conversion - DOCX → HTML via mammoth
  5. Post-processing - HTML → Markdown with custom rules

🎯 Simple Usage

Basic Conversion

import { docxToMarkdown } from '@aidalinfo/office-to-markdown';

// Simple file conversion
const markdown = await docxToMarkdown('./document.docx');
console.log(markdown);

Advanced API

import { OfficeToMarkdown } from '@aidalinfo/office-to-markdown';

const converter = new OfficeToMarkdown({
  headingStyle: 'atx',           // Use ## for headings
  preserveTables: true,          // Preserve tables
  convertMath: true,             // Convert equations to LaTeX
});

// Conversion with options
const result = await converter.convertDocx('./document.docx');
console.log('Title:', result.title);
console.log('Content:', result.markdown);

Conversion from Different Sources

import { OfficeToMarkdown } from '@aidalinfo/office-to-markdown';

const converter = new OfficeToMarkdown();

// From file path
const result1 = await converter.convert('./document.docx');

// From Buffer
const buffer = await Bun.file('./document.docx').arrayBuffer();
const result2 = await converter.convert(buffer);

// From Bun file
const file = Bun.file('./document.docx');
const result3 = await converter.convert(file);

// Batch processing
const results = await converter.convertMultiple([
  './doc1.docx',
  './doc2.docx',
  buffer
]);

⚙️ Configuration Options

| Option | Type | Default | Description | |--------|------|---------|-------------| | headingStyle | 'atx' \| 'setext' | 'atx' | Markdown heading style | | preserveTables | boolean | true | Preserve tables | | convertMath | boolean | true | Convert mathematical equations | | styleMap | string | - | Custom mapping for mammoth |

🔧 Technical Architecture

Module Structure

src/
├── converters/           # Document converters
│   ├── base-converter.ts    # Abstract base class
│   └── docx-converter.ts    # Specialized DOCX converter
├── preprocessing/        # Preliminary processing
│   └── docx-preprocessor.ts # DOCX preprocessing (math)
├── math/                # Mathematical processing
│   └── omml-processor.ts    # OMML → LaTeX converter
├── utils/               # Utilities
│   ├── html-to-markdown.ts # HTML → Markdown conversion
│   ├── file-detector.ts     # File type detection
│   └── error-handler.ts     # Error handling
└── types/               # TypeScript definitions
    ├── converter.ts         # Converter types
    ├── result.ts           # Result types
    └── stream-info.ts      # File info types

Conversion Pipeline

  1. File Detection - MIME type and extension verification
  2. Preprocessing - DOCX content extraction and modification
  3. Mathematical Processing - OMML → LaTeX conversion
  4. Main Conversion - DOCX → HTML via mammoth
  5. Post-processing - HTML → Markdown with custom rules

Mathematical Equation Handling

The equation conversion follows this process:

// OMML (Office Math Markup Language)
<m:f>
  <m:num>1</m:num>
  <m:den>2</m:den>
</m:f>

// ↓ Preprocessing

<w:r><w:t>$\frac{1}{2}$</w:t></w:r>

// ↓ Mammoth (HTML)

<p>$\frac{1}{2}$</p>

// ↓ Turndown (Markdown)

$\frac{1}{2}$

Supported Mathematical Elements

| OMML | LaTeX | Description | |------|-------|-------------| | <m:f> | \frac{}{} | Fractions | | <m:sSup> | ^{} | Exponents | | <m:sSub> | _{} | Subscripts | | <m:rad> | \sqrt{} | Square roots | | <m:rad><m:deg> | \sqrt[]{} | Nth roots |

🎨 Advanced Usage Examples

Error Handling

import { 
  OfficeToMarkdown, 
  FileConversionException, 
  UnsupportedFormatException 
} from '@aidalinfo/office-to-markdown';

async function convertSafely(filePath: string) {
  try {
    const converter = new OfficeToMarkdown();
    const result = await converter.convertDocx(filePath);
    return result.markdown;
  } catch (error) {
    if (error instanceof UnsupportedFormatException) {
      console.error('Unsupported format:', error.message);
    } else if (error instanceof FileConversionException) {
      console.error('Conversion error:', error.message);
    } else {
      console.error('Unexpected error:', error.message);
    }
    throw error;
  }
}

Capability Checking

import { OfficeToMarkdown } from '@aidalinfo/office-to-markdown';

const converter = new OfficeToMarkdown();

// Check supported types
const info = converter.getSupportedTypes();
console.log('Extensions:', info.extensions); // ['.docx']
console.log('MIME types:', info.mimeTypes);

// Check if a file is supported
const isSupported = await converter.isSupported('./document.pdf');
console.log('PDF supported:', isSupported); // false

// Get file information
const fileInfo = await converter.getFileInfo('./document.docx');
console.log('MIME type:', fileInfo.mimetype);
console.log('Supported:', fileInfo.supported);

Usage with Node.js

import { readFile } from 'fs/promises';
import { OfficeToMarkdown } from '@aidalinfo/office-to-markdown';

// From Node.js Buffer
const buffer = await readFile('./document.docx');
const converter = new OfficeToMarkdown();
const result = await converter.convert(buffer);

console.log(result.markdown);

🧪 Testing and Validation

Test Results

  • ✅ HTML → Markdown conversion with tables
  • ✅ File type detection (DOCX vs others)
  • ✅ OMML → LaTeX mathematical conversion
  • ✅ Error handling with specific codes
  • ✅ Complete pipeline tested with real documents

Performance

  • Speed: ~80ms for an average document (7KB)
  • Fidelity: Complete preservation of structure and content
  • Robustness: Graceful error handling with fallbacks

🔧 Development

Prerequisites

  • Bun >= 1.2.0 (recommended) or Node.js >= 20.0.0
  • TypeScript >= 4.5.0

Development Installation

git clone https://github.com/aidalinfo/extract-kit.git
cd extract-kit/packages/office-to-markdown
bun install

Available Scripts

bun run build          # Complete build (ESM + types)
bun run dev            # Development mode with watch
bun run clean          # Clean dist/ folder

Testing

# Basic functionality test
bun run src/test.ts

# Test with real DOCX file
bun run test-docx.ts "your-file.docx"

🚀 Roadmap

  • [ ] PPT/PPTX format support - Presentation conversion
  • [ ] XLS/XLSX format support - Spreadsheet conversion
  • [ ] Streaming API - Large file streaming processing
  • [ ] Plugin system - Support for custom converters
  • [ ] Web interface - Optional user interface
  • [ ] Embedded image support - Image extraction and conversion
  • [ ] CLI batch mode - Command-line interface

🤝 Contributing

Contributions are welcome! Please see our contribution guide.

Contribution Process

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under MIT - see the LICENSE file for details.

🙏 Acknowledgments

  • Inspired by Microsoft's MarkItDown project
  • Uses mammoth.js for DOCX → HTML conversion
  • Uses turndown for HTML → Markdown conversion
  • Optimized for Bun runtime

📞 Support


@aidalinfo/office-to-markdown

Simple, fast, and reliable DOCX to Markdown conversion