@file-viewer/docx
v0.3.14
Published
Pure HTML DOCX renderer fork for browser-based Word previews.
Maintainers
Readme
@file-viewer/docx
Pure HTML DOCX renderer for browser-based Word previews. This package preserves the final non-canvas docx-viewer line as a stable maintenance track, converting WordprocessingML into HTML while keeping semantic structure, page layout, headers/footers, numbering, fields, tables, images and common DrawingML content as closely as browser layout allows.
This fork includes production hardening for large Word documents: Worker-based parsing, asynchronous rendering yields, Word-saved pagination support, dynamic overflow pagination and layout telemetry.
Installation
npm install @file-viewer/docxFor the standalone browser build, ship these files together and keep the Worker URL same-origin:
dist/docx-preview.js
dist/docx-preview.worker.js
dist/jszip.min.jsBasic usage
<script src="dist/jszip.min.js"></script>
<script src="dist/docx-preview.js"></script>
<div id="container"></div>
<script>
const options = {
useWorker: true,
workerUrl: "dist/docx-preview.worker.js",
workerJsZipUrl: "dist/jszip.min.js",
awaitLayout: true,
strictWordCompatibility: true,
ignoreLastRenderedPageBreak: false,
preserveComplexFieldResults: true,
updatePageReferences: false,
hideWebHiddenContent: false,
progress: ev => console.log(ev.phase, ev.current, ev.total, ev.message)
};
docx.renderAsync(fileOrArrayBuffer, document.getElementById("container"), null, options)
.then(() => console.log("docx: finished"));
</script>ES module usage:
import { renderAsync } from "@file-viewer/docx";
await renderAsync(fileOrArrayBuffer, container, null, {
useWorker: true,
workerUrl: new URL("@file-viewer/docx/dist/docx-preview.worker.js", import.meta.url).toString(),
workerJsZipUrl: new URL("@file-viewer/docx/dist/jszip.min.js", import.meta.url).toString(),
awaitLayout: true
});API
renderAsync(
document: Blob | ArrayBuffer | Uint8Array,
bodyContainer: HTMLElement,
styleContainer?: HTMLElement,
options?: Partial<Options>
): Promise<WordDocument>
parseAsync(
document: Blob | ArrayBuffer | Uint8Array,
options?: Partial<Options>
): Promise<WordDocument>
parseAsyncInWorker(
document: Blob | ArrayBuffer | Uint8Array,
options?: Partial<Options>
): Promise<WordDocument>
renderDocument(
wordDocument: WordDocument,
options?: Partial<Options>
): Promise<Node[]>
awaitRenderedLayout(
container: HTMLElement,
options?: Partial<Options>
): Promise<LayoutSnapshot>
collectLayoutSnapshot(
container: HTMLElement,
options?: Partial<Options>
): LayoutSnapshotImportant options
{
className: "docx",
inWrapper: true,
hideWrapperOnPrint: false,
ignoreWidth: false,
ignoreHeight: false,
ignoreFonts: false,
breakPages: true,
// Word fidelity / pagination
ignoreLastRenderedPageBreak: false,
strictWordCompatibility: true,
paginationTolerance: 2,
maxDynamicPaginationPasses: 1000,
awaitLayout: true,
// Worker and responsiveness
useWorker: true,
workerUrl: "dist/docx-preview.worker.js",
workerJsZipUrl: "dist/jszip.min.js",
workerFallback: true,
workerTimeout: 120000,
renderPageBatchSize: 2,
renderYieldEveryMs: 16,
progress: ev => void,
// Field and print-layout compatibility
preserveComplexFieldResults: true,
updatePageReferences: false,
hideWebHiddenContent: false,
// Content switches
renderHeaders: true,
renderFooters: true,
renderFootnotes: true,
renderEndnotes: true,
renderComments: false,
renderAltChunks: true,
renderChanges: false,
experimental: false,
trimXmlDeclaration: true,
useBase64URL: false,
debug: false
}Large-document rendering path
The production path is intentionally split into stages:
- The main thread starts a Worker using
workerUrl. - The Worker loads JSZip using
workerJsZipUrl, unzips the package, parses XML parts, resolves relationships and serializes a compact document snapshot. - The main thread restores the snapshot into
WordDocumentand renders HTML pages in batches, yielding torequestIdleCallback,requestAnimationFrameorsetTimeoutbetween page batches. awaitRenderedLayoutwaits for images and fonts, runs dynamic overflow pagination and returns aLayoutSnapshot.collectLayoutSnapshotcan be used in CI or telemetry to detect overflow pages, unresolved media, page count, aggregate text length, fields and floating objects.
This avoids the former demo-page freeze caused by synchronous ZIP/XML parsing and very large DOM construction on the UI thread.
Pagination, headers/footers and TOC
The renderer supports:
- explicit page breaks (
w:br w:type="page"); - Word-saved page break positions (
w:lastRenderedPageBreak), enabled by default; - paragraph
pageBreakBefore,keepNext,keepLinesandwidowControlhints; - section page size, orientation, columns, margins, headers and footers;
- table row splitting with
w:tblHeaderrepeat-header preservation; - post-render dynamic pagination for content that still overflows after structural page splitting;
- layout-time updates for truly page-local fields such as
PAGE,NUMPAGES,SECTIONPAGESandSECTION; - preservation of Word's stored complex-field results for
TOC,PAGEREF,REF,SEQ,IF,MERGEFIELDand related fields by default, so a Word-authored table of contents keeps its cached page numbers and tab leaders instead of being recalculated incorrectly by the browser.
For Word-authored documents, keep ignoreLastRenderedPageBreak: false. Those markers preserve the positions calculated by the Word-compatible editor that last saved the document and are especially important for long tables and large Chinese technical documents. Keep preserveComplexFieldResults: true unless you intentionally want to recompute cross-reference fields; the default matches Word print-layout behavior for existing TOC/PAGEREF results.
w:webHidden is not hidden by default in this renderer because it only applies to Word's Web Layout view. In Print Layout, Word still displays the TOC tab before the page number and the cached PAGEREF result even when those runs are marked w:webHidden. Set hideWebHiddenContent: true only for an explicit Web Layout style preview.
Header/footer selection follows the WordprocessingML print-layout rules: first header/footer references are used only when the section has w:titlePg, and even references are used only when document settings contain w:evenAndOddHeaders; otherwise the default/odd header is used. This prevents even-page empty headers from hiding the normal header in documents that merely contain unused even header references.
Complex-script formatting is kept script-aware. w:iCs and w:bCs now affect RTL/complex-script spans, but they are not applied to East Asian TOC text; this avoids turning Chinese TOC level 3 entries italic when the DOCX only specified complex-script italics.
Regression fixture
tests/regression/database-design.docx is a large Chinese database-design document used to validate production behavior. The fixture has a large word/document.xml, thousands of paragraphs, many tables, TOC/PAGEREF fields, saved page breaks and multiple header/footer references.
Open this page in a local server to run the browser regression:
tests/regression/database-design-render.htmlThe Node regression can be run with:
node tests/regression/check-database-design.cjsIt renders tests/regression/database-design.docx and asserts that TOC entries keep Word's cached page numbers such as 引言 5, 各系统与数据库对应关系 6, 表间关系 16 and 数据恢复策略 103 instead of collapsing to page 1. It also verifies that w:tab right-tab leaders are emitted as measurable docx-tab-stop elements with dotted leader styling, so TOC and figure-list page-number dot leaders remain visible. The regression also checks print-layout TOC hyperlink styling: TOC hyperlinks keep their anchors, but nested Hyperlink character-style runs are forced to inherit the paragraph color and text decoration, matching Word's black print-layout TOC rather than a browser-blue link. A successful browser run updates the fixed status banner with page count, overflow pages, unresolved media and text length.
Build
This repository includes a lightweight build script that emits CommonJS, ESM and Worker bundles without requiring a full Rollup install in constrained environments:
npm run buildThe emitted .min.* files are functionally synchronized builds. Use a production minifier if you need compressed bundle size.
Notes on fidelity
The renderer follows OOXML semantics and uses browser-native layout. It is designed for production preview and regression testing, not for claiming byte-for-byte or pixel-perfect identity with Microsoft Word's private layout engine. Use LayoutSnapshot and the supplied regression fixture to monitor the remaining differences that depend on browser fonts, font fallback and shaping engines.
Numbering suffix and heading/list numbering
OOXML separates numbering text from the suffix after the numbering symbol. w:lvlText contains the numbering pattern, w:numFmt defines the format used to expand %1, %2, etc., and w:suff defines the content inserted between the generated number and paragraph text. When w:suff is omitted, OOXML treats it as tab. The renderer emits this suffix as a valid CSS unicode separator token, not as escaped visible text, so headings/lists no longer show literal \9 or \a0 after the numbering.
Run the focused regression with:
node tests/regression/check-numbering-suffix.cjsThe large database-design regression uses the cached rendered snapshot by default to avoid spending minutes rebuilding the full document in constrained CI. Set DOCXJS_RERENDER_REGRESSION=1 when you need to regenerate it from the DOCX.
