@packback/html-to-docx
v1.4.13
Published
A library-agnostic service for converting HTML content to Microsoft Word DocX documents. Works in both Angular frontend applications and Node.js backend environments.
Readme
@packback/html-to-docx
A library-agnostic service for converting HTML content with all its oddities to Microsoft Word DocX documents. Works in both browser and Node.js environments.
Key Features
- Library Agnostic: Accepts any DOM Document object, not tied to specific HTML parsers
- Node.js Compatible: Works in server environments using JSDOM
- Browser Compatible: Works in frontend applications using native DOMParser
- Comprehensive HTML Support: Handles formatting, lists, images, headers, and more
- Document Styling: Configurable fonts, sizes, and citation formats (APA, MLA, Chicago)
- Self-Contained: All dependencies are local to avoid circular imports
Installation
npm install @packback/html-to-docx
# or
yarn add @packback/html-to-docxLocal Development
For detailed instructions on testing local changes in both frontend and backend environments, see LOCAL_DEVELOPMENT.md (this file is not published to NPM and is only available in the source repository).
Quick start for frontend:
- Uncomment the path mapping in
frontend/questions-frontend/src/tsconfig.app.dev.json - Restart your dev server
Quick start for backend:
cd /questions/backend/app-api
npm run link-local-html-to-docxSee the full guide for rebuild workflows, Docker setup, cleanup steps, and troubleshooting.
Usage Examples
Frontend (Browser)
import { HtmlToDocxService } from '@packback/html-to-docx';
// Convert HTML string
const docxDocument = await HtmlToDocxService.convertHtmlToDocument({
htmlContent: '<p>Hello <strong>world</strong>!</p>',
documentSettings: {
font_family: 'arial',
font_size: 12,
format_style: 'apa'
},
includeHeader: true,
includeFooter: false
});
// Convert pre-parsed document (library agnostic)
const parser = new DOMParser();
const document = parser.parseFromString(htmlContent, 'text/html');
const docxDocument = await HtmlToDocxService.convertHtmlToDocument({
htmlContent: '', // Not used when document is provided
document,
documentSettings: { font_family: 'open-sans', font_size: 12 }
});
// With references/bibliography page
const sources = [
{
citation: [
{ resolved: true, text: 'Smith, J.' },
{ resolved: true, text: ' (2023). ' },
{ resolved: true, text: 'Book Title', format: 'italic' },
{ resolved: true, text: '. Publisher.' }
],
citation_format: 'apa'
}
];
const docxDocument = await HtmlToDocxService.convertHtmlToDocument({
htmlContent: '<p>Hello <strong>world</strong>!</p>',
documentSettings: { format_style: 'apa' },
sources
});
Backend (Node.js)
import { HtmlToDocxService } from '@packback/html-to-docx';
import { JSDOM } from 'jsdom';
import { Packer } from 'docx';
import fs from 'fs/promises';
// Make Node constants available globally
const jsdom = new JSDOM('');
global.Node = jsdom.window.Node;
async function convertHtml(htmlContent, outputPath) {
// Parse HTML using JSDOM
const jsdom = new JSDOM(htmlContent);
const document = jsdom.window.document;
// Convert to DocX
const docxDocument = await HtmlToDocxService.convertHtmlToDocument({
htmlContent: '',
document,
documentSettings: {
font_family: 'times-new-roman',
font_size: 11,
format_style: 'mla'
}
});
// Save to file
const buffer = await Packer.toBuffer(docxDocument);
await fs.writeFile(outputPath, buffer);
}Document Metadata and Custom Properties
Every generated document includes metadata in its custom properties. This can be helpful for troubleshooting or tracking document generation parameters:
- Document title
- Font family and size
- Format style (APA, MLA, Chicago)
- Header/footer settings
- Preview mode status
- Title page presence
To view in Microsoft Word: File > Info > Properties > Advanced Properties > Custom tab
const doc = await HtmlToDocxService.convertHtmlToDocument({
htmlContent: '<p>Content</p>',
documentSettings: { font_family: 'arial', font_size: 12 }
});
// doc.CustomProperties contains: fontFamily='arial', fontSize='12', etc.Document Settings
Font Families
arial- Arialopen-sans- Open Sans (default)times-new-roman- Times New Roman
Font Sizes
10- 10 point11- 11 point12- 12 point (default)
Format Styles
apa- APA formattingchicago- Chicago stylemla- MLA formatting
References and Bibliography
The package supports automatic generation of properly formatted references/bibliography pages based on citation data. When sources are provided, a references page is automatically appended to the document with appropriate formatting for the selected citation style.
Features
- Automatic Page Break: A page break is inserted before the references section
- Style-Specific Formatting:
- APA: "References" title (bold), double-spaced entries
- MLA: "Works Cited" title, double-spaced entries
- Chicago: "Bibliography" title, single-spaced within entries, double-spaced between
- Hanging Indentation: All entries use 0.5-inch hanging indentation
- Alphabetical Sorting: Sources are automatically sorted by first author/text
- Format Preservation: Italics and other formatting from citations are preserved
- Filtering: Only resolved citation pieces are included; placeholder text is omitted
Source Data Format
Sources should be provided as an array of objects with:
citation: Array of citation pieces (text, resolved status, optional format)citation_format: The citation style ('apa', 'mla', or 'chicago')
Only sources matching the document's format_style will be included in the references page.
Supported HTML Features
Text Formatting
- Bold:
<strong>,<b> - Italic:
<em>,<i> - Underline:
<u> - Subscript:
<sub> - Superscript:
<sup>
Structure
- Paragraphs:
<p> - Headers: Custom Quill header classes (
dd-title-header,dd-h1-header, etc.) - Lists:
<ul>,<ol>withdata-listattributes - Links:
<a href="">with proper hyperlink styling
Layout
- Alignment:
.ql-align-center,.ql-align-right,.ql-align-justify - Indentation:
.ql-indent-1through.ql-indent-9 - Line Height:
.ql-line-height-1,.ql-line-height-1-5,.ql-line-height-2 - Page Breaks:
.page-breakclass
Media
- Images:
<img>elements with URL support - Alt Text: Proper fallback handling for failed image loads
Command Line Interface
The package includes a CLI tool for converting HTML files to DOCX from the command line, useful for testing and integration with other systems (e.g., PHP applications).
⚠️ Security Warning
The CLI reads files without validation. Never pass user-controlled input as file paths, as attackers could read sensitive files. Always validate and sanitize paths before use (restrict directories, validate extensions, block path traversal).
Installation
# Install globally
npm install -g @packback/html-to-docx
# Or build locally and use node directly (recommended for development)
cd packages/html-to-docx
yarn install && yarn buildUsage
# Using node directly (preserves quotes properly)
node dist/cli.js input.html output.docx
# With custom font and size
node dist/cli.js input.html output.docx --font arial --size 11
# With formatting style
node dist/cli.js input.html output.docx --style apa
# If installed globally
html-to-docx input.html output.docx --style apaCLI Options
--font <name>- Font family:arial,open-sans,times-new-roman(default:open-sans)--size <number>- Font size:10,11,12(default:12)--style <name>- Format style:apa,mla,chicago--header-title <text>- Page header title--header-last-name <text>- Page header last name--header-page-numbers- Include page numbers in header--footer <text>- Footer text--sources <path>- Path to JSON file containing sources for references/bibliography page-h, --help- Show help message
Note: All documents include metadata in custom properties (font, size, style, etc.), accessible via File > Info > Properties > Advanced Properties in Microsoft Word.
Examples
Sample HTML files are provided in the examples/ directory:
# Simple example with basic formatting
node dist/cli.js examples/simple-example.html output.docx
# Full Quill document with title page and MLA style
node dist/cli.js examples/sample-quill.html output.docx --font times-new-roman --size 12 --style mla
# With page header and numbers (APA style) - use quotes for multi-word values
node dist/cli.js examples/sample-quill.html output.docx \
--style apa \
--header-title 'The Baroque Period' \
--header-last-name Koves \
--header-page-numbers
# With custom footer
node dist/cli.js examples/simple-example.html output.docx \
--footer 'Copyright 2025 - All Rights Reserved'
# With references/bibliography page from sources.json
node dist/cli.js examples/sample-quill.html output.docx \
--style apa \
--sources examples/sources.jsonThe sources.json file should contain an array of sources with citation data:
[
{
"citation": [
{ "resolved": true, "text": "Smith, J." },
{ "resolved": true, "text": " (2023). " },
{ "resolved": true, "text": "Book Title", "format": "italic" },
{ "resolved": true, "text": ". Publisher." }
],
"citation_format": "apa"
}
]An example of apa formatted page which should be on the last page of the output document:

Node.js Compatibility
When running in Node.js environments:
- Set
global.Node = jsdom.window.Nodeto provide DOM constants - Use the
documentparameter instead ofhtmlContent - Import JSDOM for HTML parsing
Dependencies
- docx: DocX document generation
- Local utilities: Self-contained formatting and styling utilities
- DOM API: Browser DOMParser or Node.js JSDOM
Development
Setup
# Install dependencies
yarn install
# Build the package
yarn build
# Run tests
yarn test
# Run tests in watch mode
yarn test:watch
# Lint code
yarn lintTesting
The package includes comprehensive tests that run in both browser and Node.js environments using Jest with JSDOM.
Run tests:
yarn test # Run all tests
yarn test:watch # Run tests in watch mode
yarn test:coverage # Run tests with coverage report
yarn test -- --testPathPattern=filename # Run a specific test fileCode Coverage
After running yarn test:coverage, open coverage/index.html in your browser for a detailed interactive coverage report.
Building
The TypeScript source is compiled to CommonJS format in the dist/ directory with type definitions.
License
MIT - see LICENSE file for details.
