@origints/mammoth

v0.1.1

Published

17 days ago

DOCX to HTML conversion for Origins using mammoth.js

0High
0Medium
0Low

fponticelli

@origints/mammoth

DOCX to HTML/text conversion for Origins using mammoth.js.

Why

Word documents are everywhere in enterprise workflows, but extracting their content programmatically is challenging. You need to convert them to a usable format while preserving semantic structure.

This package wraps mammoth.js and exposes it as Origins transforms. Convert DOCX files to clean HTML or plain text, with full control over style mapping and conversion options.

Features

Convert DOCX to semantic HTML
Convert DOCX to plain text
Custom style mapping for headings, lists, and more
Configurable image handling
Conversion warnings and messages
Integrates with Origins transform registry

Quick Start

npm install @origints/mammoth @origints/core

import { Planner, loadFile, run, globalRegistry } from "@origints/core";
import { docxToHtml, registerMammothTransforms } from "@origints/mammoth";

registerMammothTransforms(globalRegistry);

const plan = Planner.in(loadFile("document.docx"))
  .mapIn(docxToHtml())
  .emit((out, $) => out.add("html", $.get("html").asString()))
  .compile();

const result = await run(plan, {}, globalRegistry);

if (result.ok) {
  console.log(result.value.html);
}

Expected output:

<h1>Document Title</h1><p>Content here...</p>

Installation

Supported platforms:
- macOS / Linux / Windows
Runtime requirements:
- Node.js >= 18
Package managers:
- npm, pnpm, yarn
Peer dependencies:
- @origints/core ^0.1.0

npm install @origints/mammoth @origints/core
# or
pnpm add @origints/mammoth @origints/core

Usage

Basic HTML conversion

import { Planner, loadFile, globalRegistry } from "@origints/core";
import { docxToHtml, registerMammothTransforms } from "@origints/mammoth";

registerMammothTransforms(globalRegistry);

const plan = Planner.in(loadFile("report.docx"))
  .mapIn(docxToHtml())
  .emit((out, $) => {
    out.add("html", $.get("html").asString());
    out.add("messages", $.get("messages").asArray());
  })
  .compile();

Custom style mapping

const plan = Planner.in(loadFile("document.docx"))
  .mapIn(
    docxToHtml({
      styleMap: [
        "p[style-name='Title'] => h1.document-title",
        "p[style-name='Heading 1'] => h1",
        "p[style-name='Heading 2'] => h2",
        "p[style-name='Quote'] => blockquote",
      ],
    })
  )
  .emit((out, $) => out.add("html", $.get("html").asString()))
  .compile();

Convert to plain text

import { docxToText } from "@origints/mammoth";

const plan = Planner.in(loadFile("document.docx"))
  .mapIn(docxToText())
  .emit((out, $) => out.add("text", $.get("text").asString()))
  .compile();

Image handling options

const plan = Planner.in(loadFile("document.docx"))
  .mapIn(
    docxToHtml({
      imageHandling: "omit", // or 'base64'
    })
  )
  .emit((out, $) => out.add("html", $.get("html").asString()))
  .compile();

Project Status

Experimental - APIs may change

Non-Goals

Not a DOCX writer/generator
Not a full Word document parser (no styles, comments, etc.)
Not a PDF converter

Documentation

See @origints/core for Origins concepts
See mammoth.js for conversion details

Contributing

Open an issue before large changes
Keep PRs focused
Tests required for new features

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@origints/mammoth

Why

Features

Quick Start

Installation

Usage

Basic HTML conversion

Custom style mapping

Convert to plain text

Image handling options

Project Status

Non-Goals

Documentation

Contributing

License