@joelouf/doc-template

v2.0.0

Published

3 months ago

A modular, zero-dependency .docx template engine that scans for placeholder tokens, populates with data, and produces ready-to-use documents from any Word template.

0High
0Medium
0Low

joelouf

docx docx-template document-template mail-merge placeholder token-replacement zero-dependencies word-document lease-template document-generation

@joelouf/doc-template

A modular, zero-dependency .docx template engine that scans for placeholder tokens, populates with data, and produces ready-to-use documents from any Word template.

Users author templates in Word or Google Docs using {{TOKEN}} placeholders and {{#BLOCK}}...{{/BLOCK}} repeating sections. The engine handles the hard part: reassembling text fragments that Word splits across multiple XML runs, expanding repeating blocks with variable-length data, then replacing every token with real data. One function to scan, one function to populate. The output is a valid .docx that opens cleanly in any word processor.

Features

Zero runtime dependencies - ZIP handling uses Node.js built-in zlib; XML parsing and run merging are hand-written
Split-run resolution - merges adjacent XML runs that Word fragments due to spell-check, revision tracking, or cursor placement, so tokens like {{TENANT_NAME}} are always found regardless of how they were typed
Repeating blocks - {{#BLOCK}}...{{/BLOCK}} sections expand once per item in an array, with full inner token replacement per iteration — works with paragraphs and table rows
Full document coverage - scans and replaces tokens in the document body, headers, and footers
Three missing-data behaviors - preserve the raw token, remove it, or insert a configurable placeholder (e.g., ___________)
Pure functions - synchronous, no I/O, no side effects; takes a Buffer in and returns a Buffer out
TypeScript declarations - hand-written .d.ts files for full type safety
Backward compatible - templates without blocks work identically to v1; the block syntax is purely additive

Architecture

core/
  types.js              # Constants: token/block patterns, content file list, defaults
  zip.js                # ZIP archive reader/writer (zlib only, no dependencies)
  xml.js                # Lightweight XML DOM parser/serializer for OOXML
  merge.js              # Adjacent run merger (solves the split-run problem)
  blocks.js             # Block expansion engine (repeating sections)
  tokens.js             # Token scanner and replacer for w:t text nodes
index.js                # Public API: scan() and populate()

The pipeline flows bottom-up: zip unpacks the .docx archive, xml parses each XML file into a tree, merge consolidates fragmented runs, blocks expands repeating sections, and tokens finds and replaces {{PLACEHOLDERS}}. The public API orchestrates all five steps in a single call.

Install

npm install @joelouf/doc-template

Quick Start

Scan a Template

import { scan } from '@joelouf/doc-template';
import { readFileSync } from 'fs';

const template = readFileSync('lease-template.docx');
const result = scan(template);

// Scalar tokens
for (const { token, locations } of result.tokens) {
    console.log(`{{${token}}} found in: ${locations.join(', ')}`);
}
// {{RENT_AMOUNT}} found in: word/document.xml, word/header1.xml
// {{PROPERTY_ADDRESS}} found in: word/document.xml

// Block sections
for (const { name, innerTokens, locations } of result.blocks) {
    console.log(`{{#${name}}} block in: ${locations.join(', ')}`);
    console.log(`  Inner tokens: ${innerTokens.join(', ')}`);
}
// {{#TENANTS}} block in: word/document.xml
//   Inner tokens: NAME, PHONE, EMAIL

Populate a Template

import { populate } from '@joelouf/doc-template';
import { readFileSync, writeFileSync } from 'fs';

const template = readFileSync('lease-template.docx');

const output = populate(template, {
    // Scalar tokens
    PROPERTY_ADDRESS: '1234 Desert Rose Dr, Henderson, NV 89052',
    RENT_AMOUNT: '$1,850.00',
    LEASE_START_DATE: 'January 1, 2026',
    LEASE_END_DATE: 'December 31, 2026',

    // Block data — each array item produces one copy of the block content
    TENANTS: [
        { NAME: 'John Smith', PHONE: '(702) 555-1234', EMAIL: '[email protected]' },
        { NAME: 'Jane Smith', PHONE: '(702) 555-5678', EMAIL: '[email protected]' },
    ],
});

writeFileSync('lease-john-jane-smith.docx', output);

Handle Missing Tokens

// Leave unresolved tokens in place (default)
populate(template, data);
populate(template, data, { missingTokenBehavior: 'preserve' });

// Replace missing tokens with a fill-in-the-blank line
populate(template, data, {
    missingTokenBehavior: 'placeholder',
    placeholderText: '___________',
});

// Remove missing tokens entirely
populate(template, data, { missingTokenBehavior: 'remove' });

Token Format

Tokens use double curly braces: {{TOKEN_NAME}}. The token name must start with an uppercase letter and contain only uppercase letters, digits, and underscores.

Valid tokens:

{{TENANT_NAME}}
{{RENT_AMOUNT}}
{{ADDRESS_LINE1}}
{{FIELD2A}}

Not matched (by design):

{{lowercase}} - must be uppercase
{{MixedCase}} - must be fully uppercase
{{_LEADING_UNDERSCORE}} - must start with a letter
{{123}} - must start with a letter
{{ SPACES }} - no spaces allowed

This convention prevents false matches on natural language like {{see attached}} that might appear in legal documents, while keeping tokens readable and obvious at a glance in any word processor.

Repeating Blocks

Blocks use {{#BLOCK_NAME}} to open and {{/BLOCK_NAME}} to close a repeating section. The content between the markers is cloned once per item in the corresponding data array.

Paragraph Blocks

The most common use case — each item produces a copy of the enclosed paragraphs:

Template:

The following individuals are tenants under this agreement:

{{#TENANTS}}
Tenant: {{NAME}}, Phone: {{PHONE}}, Email: {{EMAIL}}
{{/TENANTS}}

Data:

{
    TENANTS: [
        { NAME: 'John Smith', PHONE: '(702) 555-1234', EMAIL: '[email protected]' },
        { NAME: 'Jane Smith', PHONE: '(702) 555-5678', EMAIL: '[email protected]' },
        { NAME: 'Bob Jones',  PHONE: '(702) 555-9999', EMAIL: '[email protected]' },
    ]
}

Output:

The following individuals are tenants under this agreement:

Tenant: John Smith, Phone: (702) 555-1234, Email: [email protected]
Tenant: Jane Smith, Phone: (702) 555-5678, Email: [email protected]
Tenant: Bob Jones, Phone: (702) 555-9999, Email: [email protected]

Table Row Blocks

When block markers are inside table rows, the engine clones at the row level — perfect for tenant rosters, payment schedules, and similar tabular data:

Template (Word table): | Name | Phone | Email | |---|---|---| | {{#TENANTS}} | | | | {{NAME}} | {{PHONE}} | {{EMAIL}} | | {{/TENANTS}} | | |

Output (3 tenants): | Name | Phone | Email | |---|---|---| | John Smith | (702) 555-1234 | [email protected] | | Jane Smith | (702) 555-5678 | [email protected] | | Bob Jones | (702) 555-9999 | [email protected] |

Block Rules

Opening ({{#NAME}}) and closing ({{/NAME}}) markers must each be in their own paragraph or table row — they cannot share a paragraph with other text.
Block names follow the same naming rules as scalar tokens (uppercase letters, digits, underscores).
Inner tokens within blocks are independent of top-level tokens — you can use {{NAME}} inside a {{#TENANTS}} block without conflicting with a top-level {{TENANT_NAME}} token.
If the data map does not contain the block name, or the value is not an array, the entire block (markers + content) is removed from the output.
The missingTokenBehavior option applies to unresolved inner tokens within blocks, using the same rules as top-level tokens.

Multi-Paragraph Blocks

Blocks can contain multiple paragraphs — all content between the markers is cloned per item:

Template:

{{#TENANTS}}
Name: {{NAME}}
Contact: {{PHONE}} | {{EMAIL}}
Emergency Contact: {{EMERGENCY_CONTACT}}

{{/TENANTS}}

Each item produces all three paragraphs (plus the blank line), fully populated with that item's data.

The Split-Run Problem

When a user types {{TENANT_NAME}} in Word, the underlying XML often looks like this:

<w:r><w:t>{{TENANT</w:t></w:r>
<w:r><w:t>_NAME</w:t></w:r>
<w:r><w:t>}}</w:t></w:r>

Word splits text into separate runs when spell-check flags part of a token, when the cursor was placed mid-token during editing, or when revision tracking assigns different session IDs to different keystrokes. The token looks fine in the Word UI, but no simple regex on a single <w:t> node will find it.

The run merger solves this by walking each paragraph and combining adjacent runs that share identical formatting properties (<w:rPr>) into a single run. After merging, the token is guaranteed to be in one text node:

<w:r><w:t>{{TENANT_NAME}}</w:t></w:r>

The merger is conservative - it only touches runs whose sole content children are <w:rPr> and <w:t>. Runs containing tabs, page breaks, drawings, field codes, or any other structural elements are left untouched.

API Reference

`scan(buffer)`

Scan a .docx template for all {{TOKEN}} placeholders and {{#BLOCK}}...{{/BLOCK}} sections.

| Parameter | Type | Description | |---|---|---| | buffer | Buffer \| Uint8Array | The .docx file bytes |

Returns: { tokens: TokenLocation[], blocks: BlockDefinition[] }

Each scalar token entry includes the token name and an array of XML file paths where it was found. Block definitions include the block name, the inner token names found within the block, and the XML file paths. Inner tokens are excluded from the top-level tokens array to avoid double-reporting.

`populate(buffer, data, options?)`

Replace {{TOKEN}} placeholders and expand {{#BLOCK}}...{{/BLOCK}} sections with values from the data map.

| Parameter | Type | Description | |---|---|---| | buffer | Buffer \| Uint8Array | The .docx template file bytes | | data | PopulateData | Token names mapped to replacement values or block data arrays | | options | PopulateOptions | Optional behavior configuration |

PopulateData: Record<string, string | Record<string, string>[]>

Scalar tokens map to strings (e.g., { RENT_AMOUNT: '$1,850.00' })
Block tokens map to arrays of token maps (e.g., { TENANTS: [{ NAME: '...', ... }, ...] })

Options:

| Property | Type | Default | Description | |---|---|---|---| | missingTokenBehavior | 'remove' \| 'preserve' \| 'placeholder' | 'preserve' | What to do with tokens not in the data map | | placeholderText | string | '___________' | Text to insert when behavior is 'placeholder' |

Returns: Buffer - a valid .docx file.

Advanced Usage

The ./core sub-path export exposes all internal modules for direct manipulation:

import {
    readZip, writeZip,
    parseXml, serializeXml, findAll, firstChild, childElements, textContent, cloneNode,
    mergeRuns,
    expandBlocks, scanBlocks,
    scanTokens, replaceTokens,
    TOKEN_PATTERN, BLOCK_OPEN_PATTERN, BLOCK_CLOSE_PATTERN, CONTENT_XML_FILES,
} from '@joelouf/doc-template/core';

This allows you to build custom pipelines - for example, scanning only specific XML files, applying your own transformations to the XML tree between merging and token replacement, or processing non-OOXML ZIP-based formats using the generic readZip/writeZip utilities.

ZIP Utilities

import { readZip, writeZip } from '@joelouf/doc-template/core';

const entries = readZip(docxBuffer);
// entries: Map<string, Buffer> - filename to uncompressed content

const newZip = writeZip(entries);
// newZip: Buffer - valid ZIP archive

The ZIP reader parses the central directory, follows local file headers for correct data offsets (handling extra field length mismatches between authoring tools), decompresses with zlib.inflateRawSync, and verifies CRC-32 on every entry. The writer compresses with deflate and falls back to stored (method 0) when deflation doesn't reduce size.

XML Utilities

import { parseXml, serializeXml, findAll, firstChild, cloneNode } from '@joelouf/doc-template/core';

const doc = parseXml(xmlString);
// doc: { declaration: string | null, root: XmlElement }

const paragraphs = findAll(doc.root, 'w:p');
const pPr = firstChild(paragraphs[0], 'w:pPr');

// Deep clone a subtree (no shared references with original)
const copy = cloneNode(paragraphs[0]);

const xmlOut = serializeXml(doc);

The XML parser handles namespaced elements and attributes, self-closing tags, entity encoding/decoding, and xml:space="preserve". It does not support CDATA, comments, or DTDs - only the subset of XML that .docx files produce.

Compatibility

Tested against .docx files generated by:

Microsoft Word (multiple versions)
docx-js (npm docx package)
LibreOffice Writer
Google Docs (exported as .docx)

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@joelouf/doc-template

Features

Architecture

Install

Quick Start

Scan a Template

Populate a Template

Handle Missing Tokens

Token Format

Repeating Blocks

Paragraph Blocks

Table Row Blocks

Block Rules

Multi-Paragraph Blocks

The Split-Run Problem

API Reference

scan(buffer)

populate(buffer, data, options?)

Advanced Usage

ZIP Utilities

XML Utilities

Compatibility

License

`scan(buffer)`

`populate(buffer, data, options?)`