@kanaka-prabhath/html-to-docx
v1.0.3
Published
Convert HTML with styles to DOCX using OOXML
Maintainers
Readme
HTML to DOCX Converter
A powerful Node.js library for converting HTML with inline styles to Microsoft Word DOCX files using OOXML (Office Open XML). Perfect for generating professional documents from web content, rich text editors, or any HTML source.
✨ Features
📝 Text Formatting
- Text Alignment: Left, center, right, and justified alignment
- Font Styling: Bold, italic, underline, strikethrough
- Colors: Text color and background color support
- Typography: Font family, font size, and line height
- Headings: H1-H6 with automatic styling
📋 Lists & Structure
- Unordered Lists: Bulleted lists with proper indentation
- Ordered Lists: Numbered lists with automatic numbering
- Nested Lists: Support for nested list structures
- Custom Indentation: Configurable list indentation levels
🖼️ Media & Images
- URL Images: Automatic download and embedding of web images
- Base64 Images: Direct embedding of base64 encoded images
- Image Sizing: Width and height control via attributes or CSS
- Alt Text: Accessibility support with alt text preservation
📄 Document Layout
- Headers & Footers: HTML content or full-width images
- Page Numbers: Automatic page numbering with alignment options
- Page Breaks: Manual page breaks, CSS page breaks, and section breaks
- Page Borders: Customizable border styles, colors, sizes, and radius
- Margins: Customizable page margins (top, bottom, left, right)
- Page Size: Support for A4, Letter, Legal, and custom sizes
🎨 Advanced Styling
- Text Boxes: Div elements with background colors, borders, padding, width, and border-radius
- Section Header Images: Images with top/bottom text wrapping using
data-section-header - Tables: Full table support with headers and cells
- Spacing: Custom paragraph spacing and line height
- Borders: Border styling for text boxes and tables
- Background Colors: Element background color support
🔧 Technical Features
- OOXML Generation: Creates valid Office Open XML structure
- HTML Sanitization: Automatic cleaning of unsafe HTML elements
- CSS Parsing: Support for safe CSS properties
- Media Management: Efficient handling of embedded images
- Buffer Output: Direct buffer output for memory-based processing
- Edge Case Handling: Robust processing of malformed HTML and special characters
- Performance Optimized: Efficient processing for large documents
- ZIP Packaging: Proper DOCX file generation using JSZip
📦 Installation
npm install @kanaka-prabhath/[email protected]🚀 Quick Start
Basic Usage
Using Direct Functions (For single conversions)
import { convertHtmlToDocx } from '@kanaka-prabhath/html-to-docx';
import fs from 'fs';
const htmlContent = `
<h1>Welcome to DOCX Export</h1>
<p>This is a <strong>bold</strong> paragraph with <em>italic</em> text.</p>
`;
const options = {
pageSize: 'A4',
marginTop: 1,
marginRight: 1,
marginBottom: 1,
marginLeft: 1,
enablePageNumbers: true,
pageNumberAlignment: 'center'
};
const docxBuffer = await convertHtmlToDocx(htmlContent, options);
fs.writeFileSync('demo-output.docx', docxBuffer);Advanced Configuration
const options = {
pageSize: 'A4', // Default page size (A4, Letter, Legal, or custom {width: number, height: number} in inches)
marginTop: 1, // 1 inch top margin
marginRight: 1, // 1 inch right margin
marginBottom: 1, // 1 inch bottom margin
marginLeft: 1, // 1 inch left margin
marginHeader: 0.5, // 0.5 inch header margin
marginFooter: 0.5, // 0.5 inch footer margin
headerHeight: 1, // 1 inch header height
footerHeight: 1, // 1 inch footer height
enableHeader: true, // Enable/disable header (default: true if header content provided)
enableFooter: true, // Enable/disable footer (default: true if footer content provided)
enablePageNumbers: true, // Enable/disable page numbers
pageNumberAlignment: 'center', // Page number alignment: 'left', 'center', or 'right'
header: '', // Blue colored image for header (positioned at top-left 0,0, full width)
footer: '', // Blue colored image for footer (positioned at bottom-left 0,bottom, full width)
headingReplacements: [
`<div data-h1 class="textbox" style="border: 1px solid #000000ff; border-radius: 5px; padding: 0px 5px 0px 5px; background-color: #000000ff; width:100%;">
<p data-no-spacing style="color: #ffffffff; margin: 0; font-size: 21px; font-weight: bold;">HEADING_TEXT</p>
</div>`,
`<div data-h2 class="textbox" style="border: 1px solid #00b118ff; border-radius: 5px; padding: 0px; background-color: #a10101ff; width:100%;">
<p data-no-spacing style="color: #ffffffff; margin: 0; font-size: 19px; font-weight: bold;">HEADING_TEXT</p>
</div>`,
`<div data-h3 class="textbox" style="border: 1px solid #00b118ff; border-radius: 5px; padding: 0px; background-color: #a10101ff; width:100%;">
<p data-no-spacing style="color: #ffffffff; margin: 0; font-size: 16px; font-weight: bold;">HEADING_TEXT</p>
</div>`
]
};
const converter = new HtmlToDocx(options);📖 API Reference
Class: HtmlToDocx
Constructor
new HtmlToDocx(options?)Parameters:
options(Object, optional): Configuration optionsfontSize(number): Default font size in points (default: 11)fontFamily(string): Default font family (default: 'Calibri')lineHeight(number): Default line height multiplier (default: 1.15)pageSize(string|object): Page size - 'A4', 'Letter', 'Legal', or custom{width: number, height: number}in inches (default: 'A4')marginTop(number): Top margin in inches (default: 1)marginRight(number): Right margin in inches (default: 1)marginBottom(number): Bottom margin in inches (default: 1)marginLeft(number): Left margin in inches (default: 1)marginHeader(number): Header margin in inches (default: 0.5)marginFooter(number): Footer margin in inches (default: 0.5)headerHeight(number): Header height in inches (default: undefined)footerHeight(number): Footer height in inches (default: undefined)marginGutter(number): Gutter margin in inches (default: 0)enablePageBorder(boolean): Enable page borders aligned with margins (default: false)pageBorder(object): Page border configuration{style: 'single', color: '000000', size: 4, radius: 0}(optional)
Methods
convertHtmlToDocx(html, options?)
Convert HTML string to DOCX buffer.
Parameters:
html(string): HTML content to convertoptions(Object, optional): Conversion options (same as constructor plus additional runtime options)
Returns: Promise - DOCX file buffer
convertHtmlToDocxFile(html, outputPath, options?)
Convert HTML string and save to DOCX file.
Parameters:
html(string): HTML content to convertoutputPath(string): Path where to save the DOCX fileoptions(Object, optional): Conversion options
Returns: Promise
Runtime Options
Additional options that can be passed to conversion methods:
header(string): HTML content or base64 image data URL for document headerfooter(string): HTML content or base64 image data URL for document footerenablePageNumbers(boolean): Enable/disable page numbers in footer (default: false)pageNumberAlignment(string): Page number alignment - 'left', 'center', or 'right' (default: 'right')enablePageBorder(boolean): Enable page borders aligned with margins (default: false)pageBorder(object): Page border configuration{style: 'single', color: '000000', size: 4, radius: 0}(optional)headingReplacements(Array): Custom HTML templates for headings (H1, H2, H3, etc.)
🎯 Supported HTML Elements
Text Elements
<p style="text-align: center;">Centered paragraph</p>
<strong>Bold text</strong>
<em>Italic text</em>
<u>Underlined text</u>
<strike>Strikethrough text</strike>
<span style="color: #FF0000;">Red text</span>Headings
<h1>Main Title</h1>
<h2>Section Header</h2>
<h3>Subsection</h3>Lists
<ul>
<li>Unordered item</li>
<li>Another item</li>
</ul>
<ol>
<li>Ordered item 1</li>
<li>Ordered item 2</li>
</ol>Images
<!-- URL images -->
<img src="https://example.com/image.png" alt="Description" width="300" height="200">
<!-- Base64 images -->
<img src="" alt="Base64 Image">Tables
<table>
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>Data 1</td>
<td>Data 2</td>
</tr>
</table>Text Boxes
<div style="background-color: #FFFF00; padding: 10px; border: 1px solid #000;">
<p>Content in a text box</p>
</div>Page Breaks
<p>Content before break</p>
<page-break></page-break>
<p>Content after break</p>
<!-- CSS page breaks -->
<p style="page-break-before: always;">Content with page break before</p>
<p style="page-break-after: always;">Content with page break after</p>Section Header Images
<!-- Regular section header image with text wrapping -->
<img data-section-header src="image.png" alt="Section Header" style="width: 400px; height: 150px; margin: 20px; display: block;"/>
<p>Text that wraps above and below the section header image.</p>
<!-- Cover image that spans full page width -->
<img data-section-header data-cover src="cover-image.png" alt="Cover Image" style="height: 200px; display: block;"/>
<p>Text that appears below the full-width cover image.</p>🎨 CSS Property Support
Supported Properties
text-align: left, center, right, justifyfont-weight: normal, bold, or numeric values ≥ 600font-style: normal, italictext-decoration: underline, line-throughcolor: Hex colors (#RGB, #RRGGBB), named colorsbackground-color: Hex colors and named colorsfont-size: px, pt unitsfont-family: Font family namesmargin: All margin properties for spacingpadding: Padding for text boxesborder: Border styling for text boxes (width, style, color)border-radius: Border radius for text boxeswidth/height: Dimensions for images and text boxespage-break-before/page-break-after: alwaysfloat: left, right (for images)
Special Attributes
data-section-header: Positions image at top of content section with text wrappingdata-cover: Makes section header image span full page widthdata-no-spacing: Removes default spacing from paragraphs
Automatic Sanitization
The library automatically removes or ignores:
- Dangerous elements:
<script>,<iframe>,<object>, etc. - Unsafe CSS:
position: absolute, complex layouts - Invalid Unicode characters
- Malformed HTML structures
📄 Headers & Footers
HTML Headers/Footers
const options = {
header: '<p style="text-align: center; font-size: 10pt;">Company Header</p>',
footer: '<p style="text-align: center; font-size: 10pt;">Page Footer</p>'
};Image Headers/Footers
const options = {
header: 'data:image/png;base64,...', // Full-width header image
footer: 'data:image/png;base64,...' // Full-width footer image
};Page Numbers
const options = {
enablePageNumbers: true,
pageNumberAlignment: 'center' // 'left', 'center', or 'right'
};Page Borders
Page borders support various styles, colors, sizes, and radius for professional document appearance.
const options = {
enablePageBorder: true,
pageBorder: {
style: 'double', // 'single', 'double', 'thick', 'dotted', 'dashed'
color: 'FF0000', // Hex color without #
size: 8, // Border thickness in points
radius: 5 // Border radius in points (rounded corners)
}
};const options = {
headingReplacements: [
// H1 replacement
'<div style="background-color: #E6E6E6; padding: 10px;"><h1>HEADING_TEXT</h1></div>',
// H2 replacement
'<div style="border-left: 4px solid #0066CC; padding-left: 10px;"><h2>HEADING_TEXT</h2></div>',
// H3 replacement
'<h3 style="color: #0066CC;">HEADING_TEXT</h3>'
]
};Electron Integration
// In Electron main process
const HtmlToDocx = require('@kanaka-prabhath/html-to-docx');
ipcMain.handle('export-to-docx', async (event, { html, outputPath, options }) => {
try {
const converter = new HtmlToDocx(options);
await converter.convertHtmlToDocxFile(html, outputPath);
return { success: true };
} catch (error) {
return { success: false, error: error.message };
}
});Batch Processing
const documents = [
{ html: '<h1>Doc 1</h1>', name: 'document1' },
{ html: '<h1>Doc 2</h1>', name: 'document2' }
];
for (const doc of documents) {
await converter.convertHtmlToDocxFile(doc.html, `${doc.name}.docx`);
}Advanced Features Examples
Section Header Images and Cover Images
const html = `
<h1>Document with Section Headers</h1>
<img data-section-header src="..." alt="Section Header" style="width: 400px; height: 150px; margin: 20px;"/>
<p>Content that wraps above and below the section header image.</p>
<page-break></page-break>
<img data-section-header data-cover src="..." alt="Cover Image" style="height: 200px;"/>
<p>Content below the full-width cover image.</p>
`;
const options = {
enablePageBorder: true,
pageBorder: {
style: 'double',
color: '000000',
size: 6,
radius: 8
}
};
await converter.convertHtmlToDocxFile(html, 'advanced-document.docx', options);Enhanced Page Breaks and Text Boxes
const html = `
<div style="background-color: #F0F0F0; padding: 15px; border: 2px solid #333; border-radius: 5px;">
<h2>Text Box with Rounded Borders</h2>
<p>This content appears in a styled text box.</p>
</div>
<p style="page-break-after: always;">This paragraph forces a page break after it.</p>
<h2>New Page Content</h2>
<p>This appears on a new page due to the CSS page break.</p>
`;
await converter.convertHtmlToDocxFile(html, 'enhanced-layout.docx');🏗️ Architecture
The library processes HTML through several stages:
- HTML Parsing: Uses JSDOM to parse and clean HTML
- Style Extraction: Parses inline CSS properties
- Element Processing: Converts HTML elements to OOXML
- Media Handling: Downloads and embeds images
- OOXML Generation: Creates Office Open XML structure
- ZIP Packaging: Packages everything into a .docx file
File Structure Inside DOCX
document.docx/
├── [Content_Types].xml
├── _rels/.rels
├── word/
│ ├── document.xml # Main document content
│ ├── styles.xml # Document styles
│ ├── numbering.xml # List numbering definitions
│ ├── header1.xml # Header content (if used)
│ ├── footer1.xml # Footer content (if used)
│ ├── _rels/ # Relationships
│ └── media/ # Embedded images
└── docProps/
├── app.xml # Application properties
└── core.xml # Core properties🧪 Testing
# Run the demo
cd demo
npm install
npm test
# This creates test-output.docx with various formatting examples📋 Requirements
- Node.js 14+
- Dependencies:
jsdom,jszip
🤝 Contributing
- Fork the repository
- Create a feature branch
- Add tests for new features
- Ensure all tests pass
- Submit a pull request
📄 License
MIT License - see LICENSE file for details
👥 Author
Kanaka Prabhath
🙏 Acknowledgments
Built with JSDOM for HTML parsing and JSZip for ZIP file generation.
