docgen-utils
v1.0.9
Published
Utilities for converting HTML into DOCX and PPTX.
Readme
DocGen
Converts HTML into DOCX, PPTX and vice versa.
Key Components
Build & Distribution
| File | Description |
| ---------- | -------------------------------------------------------------- |
| build.sh | Builds the library |
| dist/ | Output directory containing production-ready minified JS files |
Usage
Build
npm run buildCLI
The CLI is used in the agent sandbox to transform artifacts.
node dist/cli.js import docx --file=file.docx --out-dir=./output
node dist/cli.js import pptx --file=file.pptx --out-dir=./output
node dist/cli.js export docs --file=file.html --out-dir=./output
node dist/cli.js export slides --files=slide-1.html,slide-2.html --out-dir=./outputDeploy
az login
npm run deployVisual Comparison
The output directory contains the rendered output in target formats. e.g. DOCX or PPTX vs HTML
- Files in
test-data/docs/→ converted to DOCX →docx-render.jpg - Files in
test-data/slides/→ converted to PPTX →pptx-render.jpg - Files in
test-data/pptx/→ imported to HTML →html-render.jpg
Prerequisites
Install the required system dependencies:
macOS:
# LibreOffice (for DOCX/PPTX → PDF conversion)
brew install --cask libreoffice
# Poppler (for PDF → PNG conversion)
brew install poppler
# Chromium for Playwright
npx playwright install chromiumUsage
Process a specific document (HTML -> DOCX):
npm run generate-output -- test-data/docs/doc-1.htmlProcess a specific slide (HTML -> PPTX):
npm run generate-output -- test-data/slides/slide-1.htmlImport a PPTX file (PPTX -> HTML):
npm run generate-output -- test-data/pptx/presentation.pptxProcess multiple files:
npm run generate-all-docs-output # All docs
npm run generate-all-slides-output # All slidesOutput
The script generates the following structure:
output/
├── doc-1.html/
│ ├── html-render.jpg # Screenshot of HTML (via Playwright)
│ ├── docx-render.jpg # DOCX rendered via LibreOffice
│ ├── diff.jpg # Visual diff highlighting differences
│ ├── output.docx # Generated DOCX file
│ └── report.json # Comparison metrics
├── slide-1.html/
│ ├── html-render.jpg # Screenshot of HTML (via Playwright)
│ ├── pptx-render.jpg # PPTX rendered via LibreOffice
│ ├── diff.jpg # Visual diff highlighting differences
│ ├── output.pptx # Generated PPTX file
│ └── report.json # Comparison metrics
├── presentation.pptx/
│ ├── pptx-render.jpg # Original PPTX rendered via LibreOffice
│ ├── html-render.jpg # Imported HTML rendered via Playwright
│ ├── diff.jpg # Visual diff highlighting differences
│ ├── output.html # Generated HTML (all slides concatenated)
│ └── report.json # Comparison metrics
└── ...Metrics explained:
- pixelDiff.percentDiff - Percentage of pixels that differ between the two images (lower is better)
- ssim.mssim - Structural Similarity Index (0-1, higher is better). Values above 0.9 indicate very similar images
