docx-to-builder

v1.0.0

Published

22 days ago

Parse any .docx template and generate a ready-to-run JavaScript builder that reproduces it exactly using the docx npm package.

Downloads

150

0High
0Medium
0Low

jermorrison22

docx word document template document-generation ai-documents word-template docx-generator word-automation

docx-to-builder

Parse any .docx template and generate a ready-to-run JavaScript builder that reproduces it exactly.

You have a branded Word template. You want to generate documents from it programmatically — with AI-written content, dynamic data, or automation pipelines. But Pandoc can't faithfully reproduce your layout, and docxtemplater requires manually adding {placeholders} to every field.

docx-to-builder takes a different approach: it reads your .docx template's raw XML, understands every formatting decision, and writes a JavaScript file that recreates that exact layout using the docx npm package.

How it works

your-template.docx  →  [docx-to-builder]  →  your-template-builder.js
                                                      ↓
                                             node your-template-builder.js
                                                      ↓
                                             branded-output.docx  ✓

Parse — reads the .docx XML directly (no third-party Python libraries needed)
Extract — captures colors, fonts, spacing, borders, tables, images, headers, footers
Infer — detects [bracketed placeholders] and maps them to data.fieldName references
Generate — writes a complete ES module with buildDocument(data, outputPath)

Quick start

Requirements: Python 3.8+ · Node.js · docx npm package

# 1. Generate a builder from your template
python3 docx-to-builder.py my-template.docx

# 2. Install docx in your project (if you haven't already)
npm install docx

# 3. Run the builder
node my-template-builder.js

# Output: my-template-output.docx

Or install via npm and run with npx:

npx docx-to-builder my-template.docx

Template conventions

Your template can use any Word formatting. To make fields dynamic, use bracketed placeholders:

| In your template | Generated JS | |---|---| | [Client Name] | data.clientName | | [Proposal Title] | data.title | | [Month Day, Year] | data.date | | [Draft / Final] | data.status | | [Your custom field] | data.yourCustomField |

Any text not in brackets is treated as a static label and kept as-is.

What gets extracted

| Template element | Extracted | |---|---| | Body paragraphs | Text, font, size, color, bold, italic, all-caps, spacing, alignment | | Section headings | With red rule lines, borders, spacing | | Bullet lists | Level, indent, font | | Tables | Cell widths, shading, borders, margins, content | | Inline images | Size (EMU → pt), relationship to file, auto-copied to assets/ | | Headers & footers | Text, tab stops, border rules, page number fields | | Page margins | Per-section | | Brand colors | Auto-named constants (COLOR.accent, COLOR.body, etc.) |

Generated file structure

// Brand colors extracted from template
const COLOR = { accent: '2C4A6E', body: '444444', ... };
const FONT = 'Calibri';
const LOGO_PATH = join(__dirname, 'assets/logo.png');

// Header and footer — exact replica of template
function buildHeader(data) { ... }
function buildFooter() { ... }

// Document body — all layout hardcoded, bracketed fields wired to data object
function buildContent(data) { ... }

// Main entry point — call this from your code or pipeline
export async function buildDocument(data, outputPath) { ... }

// Data object — auto-populated from inferred placeholders
const data = {
  title: 'Proposal Title',
  clientName: 'Client Name',
  date: '...',
};

Wiring in your own content

After generation, the data object at the bottom of the file maps directly to the bracketed fields the generator found. Replace the example values with your real data — or import buildDocument and pass a data object from your own code:

import { buildDocument } from './my-template-builder.js';

await buildDocument({
  title: 'Cloud Migration Proposal',
  clientName: 'Acme Corp',
  date: 'April 1, 2026',
  status: 'Final',
}, 'output/acme-proposal.docx');

Example

The examples/ folder contains:

sample-template.docx — a generic proposal template (Meridian Consulting)
sample-builder.js — the builder generated from that template

Run the example:

cd examples
npm install docx
node sample-builder.js
# → output/sample-output.docx

Limitations

Inline images — fully extracted. The image file is automatically copied to assets/ next to the generated builder — no manual step needed
Complex graphics — SmartArt, shapes, charts, and WordArt use a different XML format and are skipped; the surrounding paragraph text is preserved
Multi-section documents — section breaks are noted as comments; the generated builder uses a single section (the last one's margins). Multi-section support is on the roadmap
Dynamic content — the generator wires up bracketed placeholders automatically, but long static template paragraphs become literal strings. Replace them with data.fieldName references for full dynamic control
Table of contents — TOC fields are not regenerated

CLI options

python3 docx-to-builder.py <template.docx> [--output <builder.js>]

# Examples:
python3 docx-to-builder.py proposal.docx
python3 docx-to-builder.py proposal.docx --output src/builders/proposal-builder.js

Contributing

Issues and PRs welcome. If your template produces unexpected output, open an issue and attach the template (or a sanitized version of it).

License

MIT — see LICENSE

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

docx-to-builder

How it works

Quick start

Template conventions

What gets extracted

Generated file structure

Wiring in your own content

Example

Limitations

CLI options

Contributing

License