open-xml-powertools-js
v0.1.2
Published
JavaScript (ESM) port of the **DOCX-focused** parts of Open XML PowerTools, built to run with **zero npm dependencies** in both **browsers** and **Node.js**.
Downloads
373
Readme
Open-Xml-PowerTools-JS
JavaScript (ESM) port of the DOCX-focused parts of Open XML PowerTools, built to run with zero npm dependencies in both browsers and Node.js.
This project is not affiliated with, endorsed by, or sponsored by Microsoft or the original Open-Xml-PowerTools authors.
This repo currently focuses on:
- DOCX → HTML conversion (structure-first, expanding fidelity over time)
- A small set of DOCX transforms used by the converter (revisions, markup simplification, text replace)
- A minimal HTML (XHTML) → DOCX path for simple content
Status
- The API is usable, but still evolving.
- The implementation is dependency-free by design: ZIP/OPC + XML parsing/serialization are implemented in
./src/.
Install / Use
Install from npm:
npm i open-xml-powertools-jsNode.js
import { readFile } from "node:fs/promises";
import { WmlDocument, WmlToHtmlConverter } from "open-xml-powertools-js";
const bytes = new Uint8Array(await readFile("input.docx"));
const doc = WmlDocument.fromBytes(bytes, { fileName: "input.docx" });
const { html, warnings } = await WmlToHtmlConverter.convertToHtml(doc, {
additionalCss: "body { margin: 1cm auto; max-width: 20cm; }",
});
console.log(warnings);
await Bun?.write?.("output.html", html); // optional (Bun)Node helper entry (optional):
import { readWmlDocument } from "open-xml-powertools-js/node";
import { WmlToHtmlConverter } from "open-xml-powertools-js";
const doc = await readWmlDocument("input.docx");
const { html } = await WmlToHtmlConverter.convertToHtml(doc);Run tests:
npm testBrowser (ES Modules)
Browsers usually don’t allow ESM imports from file://. Use a local HTTP server:
python3 -m http.serverThen open:
http://localhost:8000/playground.html
Minimal browser example:
<input id="file" type="file" accept=".docx" />
<script type="module">
import { WmlDocument, WmlToHtmlConverter } from "./node_modules/open-xml-powertools-js/src/index.js";
document.getElementById("file").addEventListener("change", async (e) => {
const f = e.target.files[0];
const bytes = new Uint8Array(await f.arrayBuffer());
const doc = WmlDocument.fromBytes(bytes, { fileName: f.name });
const { html } = await WmlToHtmlConverter.convertToHtml(doc);
document.open(); document.write(html); document.close();
});
</script>API Overview
All public exports come from ./src/index.js:
Documents
OpenXmlPowerToolsDocument: byte container + type detectionWmlDocument: DOCX wrapper with convenience methods
import { WmlDocument } from "./src/index.js";
const doc = WmlDocument.fromBytes(uint8Array, { fileName: "input.docx" });DOCX → HTML
import { WmlToHtmlConverter } from "./src/index.js";
const result = await WmlToHtmlConverter.convertToHtml(doc, {
pageTitle: "My Document",
additionalCss: "body { max-width: 20cm; margin: 1cm auto; }",
// Optional: include comments.xml (renders references + appended section)
includeComments: false,
// Optional: render tracked changes (w:ins/w:del) instead of accepting revisions
// Note: this implies `preprocess.acceptRevisions: false`.
includeTrackedChanges: false,
// Optional: customize list markers
listItemImplementations: {
default: (_lvlText, levelNumber, _numFmt) => `#${levelNumber}`,
},
// Optional: control image output
// If not set, images are embedded as data URLs.
imageHandler: null,
// Optional: include a lightweight XML object for the produced HTML
output: { format: "xml" }, // also returns `htmlElement`
});
console.log(result.html);
console.log(result.warnings);Current converter coverage (high-level):
- paragraphs/runs, basic formatting (
b/i/u) - paragraph spacing (margins + basic line-height) and tabs
- basic RTL via
w:bidi/w:rtl - headings via
Heading1..Heading6styles - hyperlinks (external)
- lists via
numbering.xml(basic) - tables (including
gridSpan/vMerge, basic borders, shading/padding, and header rows) - images (data URLs by default, with basic size hints)
- headers/footers (per-section)
- footnotes/endnotes (references + appended section, rendered as blocks)
- comments (optional, when
includeComments: true)
Transforms (DOCX mutation)
import { MarkupSimplifier, RevisionAccepter, TextReplacer } from "./src/index.js";
const noRevs = await RevisionAccepter.acceptRevisions(doc);
const simplified = await MarkupSimplifier.simplifyMarkup(noRevs, {
removeComments: true,
removeContentControls: true,
removeRsidInfo: true,
removeGoBackBookmark: true,
});
const replaced = await TextReplacer.searchAndReplace(simplified, "Hello", "Hi", { matchCase: false });Notes:
RevisionAccepter,MarkupSimplifier, andTextReplaceroperate on the main document and common WordprocessingML parts (headers/footers, footnotes/endnotes, comments) when present.
HTML (XHTML) → DOCX
This is intentionally minimal and currently expects well-formed XML (XHTML-like).
import { HtmlToWmlConverter } from "./src/index.js";
const xhtml = `<?xml version="1.0"?>
<html><body>
<h1>Title</h1>
<p>Hello <strong>World</strong><br/>Line2</p>
</body></html>`;
const newDoc = await HtmlToWmlConverter.convertHtmlToWml("", "", "", xhtml, {});Current HTML→DOCX coverage (high-level):
- headings/paragraphs, basic inline formatting (
b/i/u/s,sup/sub) - hyperlinks (external and internal
#anchor), bookmark targets via<a id|name> - lists (
ol/ul/li, includingol start) - tables (including
thead/thheader rows and basic colspan/rowspan) - images (
<img src="data:...">, optionalwidth/height)
Runtime ZIP support (no deps)
DOCX is a ZIP/OPC package and requires deflate/inflate:
- Node.js: uses
node:zlibautomatically (lazy import) - Browsers:
- Reading DOCX: uses a built-in pure-JS raw DEFLATE inflater (and will use
DecompressionStream("deflate-raw")if a browser supports it). - Writing DOCX (repacking after transforms): uses
CompressionStreamwhen available; otherwise provide azipAdapterwithdeflateRaw.
- Reading DOCX: uses a built-in pure-JS raw DEFLATE inflater (and will use
- Other runtimes: pass a
zipAdapterwhen constructing documents / saving modified documents:
const doc = WmlDocument.fromBytes(bytes, {
zipAdapter: { inflateRaw: async (u8) => ..., deflateRaw: async (u8) => ... }
});Development
- Tests:
node --test(run vianpm test) - CI: GitHub Actions workflow runs tests on push/PR (
.github/workflows/test.yml)
Implementation layout:
src/internal/zip*.js: ZIP reader/writer + adapterssrc/internal/xml.js: minimal XML parse/serializesrc/internal/opc.js: OPC package accesssrc/wml-to-html-converter.js: DOCX → HTMLsrc/html-to-wml-converter.js: XHTML → DOCX (minimal)
Attribution
This project is a JavaScript port/derivative of Open-Xml-PowerTools (C#), including familiar type and module naming (e.g., WmlDocument, WmlToHtmlConverter, MarkupSimplifier).
- Upstream project: https://github.com/OpenXmlDev/Open-Xml-PowerTools
- This repo contains a JavaScript translation with additional modifications and re-architecture (ZIP/OPC + XML implemented in JS for browser compatibility).
- Required upstream notices and the full upstream MIT license text are included in
NOTICE.md.
This repo does not use the Open XML SDK; instead it re-implements the needed ZIP/OPC and XML manipulation in JavaScript to remain dependency-free and browser-compatible.
Port credit
This JavaScript port was converted/bootstrapped by Romain de Wolff (Whisperit.ai) with AI assistance using Codex 5.2.
Trademarks / endorsement
“Microsoft” and related marks are trademarks of their respective owners. Use of names is for identification only and does not imply endorsement.
Dependencies & sample assets
This repository does not intentionally ship upstream sample DOCX documents or other external assets. The test fixtures in test/fixtures/ are minimal generated files intended to be MIT-compatible.
License
- This repository is licensed under MIT: see
LICENSE. - Portions derived from Open-Xml-PowerTools remain under the upstream MIT license and required notices: see
NOTICE.md.
