@joelouf/doc-template
v2.0.0
Published
A modular, zero-dependency .docx template engine that scans for placeholder tokens, populates with data, and produces ready-to-use documents from any Word template.
Downloads
163
Maintainers
Readme
@joelouf/doc-template
A modular, zero-dependency .docx template engine that scans for placeholder tokens, populates with data, and produces ready-to-use documents from any Word template.
Users author templates in Word or Google Docs using {{TOKEN}} placeholders and {{#BLOCK}}...{{/BLOCK}} repeating sections. The engine handles the hard part: reassembling text fragments that Word splits across multiple XML runs, expanding repeating blocks with variable-length data, then replacing every token with real data. One function to scan, one function to populate. The output is a valid .docx that opens cleanly in any word processor.
Features
- Zero runtime dependencies - ZIP handling uses Node.js built-in
zlib; XML parsing and run merging are hand-written - Split-run resolution - merges adjacent XML runs that Word fragments due to spell-check, revision tracking, or cursor placement, so tokens like
{{TENANT_NAME}}are always found regardless of how they were typed - Repeating blocks -
{{#BLOCK}}...{{/BLOCK}}sections expand once per item in an array, with full inner token replacement per iteration — works with paragraphs and table rows - Full document coverage - scans and replaces tokens in the document body, headers, and footers
- Three missing-data behaviors - preserve the raw token, remove it, or insert a configurable placeholder (e.g.,
___________) - Pure functions - synchronous, no I/O, no side effects; takes a Buffer in and returns a Buffer out
- TypeScript declarations - hand-written
.d.tsfiles for full type safety - Backward compatible - templates without blocks work identically to v1; the block syntax is purely additive
Architecture
core/
types.js # Constants: token/block patterns, content file list, defaults
zip.js # ZIP archive reader/writer (zlib only, no dependencies)
xml.js # Lightweight XML DOM parser/serializer for OOXML
merge.js # Adjacent run merger (solves the split-run problem)
blocks.js # Block expansion engine (repeating sections)
tokens.js # Token scanner and replacer for w:t text nodes
index.js # Public API: scan() and populate()The pipeline flows bottom-up: zip unpacks the .docx archive, xml parses each XML file into a tree, merge consolidates fragmented runs, blocks expands repeating sections, and tokens finds and replaces {{PLACEHOLDERS}}. The public API orchestrates all five steps in a single call.
Install
npm install @joelouf/doc-templateQuick Start
Scan a Template
import { scan } from '@joelouf/doc-template';
import { readFileSync } from 'fs';
const template = readFileSync('lease-template.docx');
const result = scan(template);
// Scalar tokens
for (const { token, locations } of result.tokens) {
console.log(`{{${token}}} found in: ${locations.join(', ')}`);
}
// {{RENT_AMOUNT}} found in: word/document.xml, word/header1.xml
// {{PROPERTY_ADDRESS}} found in: word/document.xml
// Block sections
for (const { name, innerTokens, locations } of result.blocks) {
console.log(`{{#${name}}} block in: ${locations.join(', ')}`);
console.log(` Inner tokens: ${innerTokens.join(', ')}`);
}
// {{#TENANTS}} block in: word/document.xml
// Inner tokens: NAME, PHONE, EMAILPopulate a Template
import { populate } from '@joelouf/doc-template';
import { readFileSync, writeFileSync } from 'fs';
const template = readFileSync('lease-template.docx');
const output = populate(template, {
// Scalar tokens
PROPERTY_ADDRESS: '1234 Desert Rose Dr, Henderson, NV 89052',
RENT_AMOUNT: '$1,850.00',
LEASE_START_DATE: 'January 1, 2026',
LEASE_END_DATE: 'December 31, 2026',
// Block data — each array item produces one copy of the block content
TENANTS: [
{ NAME: 'John Smith', PHONE: '(702) 555-1234', EMAIL: '[email protected]' },
{ NAME: 'Jane Smith', PHONE: '(702) 555-5678', EMAIL: '[email protected]' },
],
});
writeFileSync('lease-john-jane-smith.docx', output);Handle Missing Tokens
// Leave unresolved tokens in place (default)
populate(template, data);
populate(template, data, { missingTokenBehavior: 'preserve' });
// Replace missing tokens with a fill-in-the-blank line
populate(template, data, {
missingTokenBehavior: 'placeholder',
placeholderText: '___________',
});
// Remove missing tokens entirely
populate(template, data, { missingTokenBehavior: 'remove' });Token Format
Tokens use double curly braces: {{TOKEN_NAME}}. The token name must start with an uppercase letter and contain only uppercase letters, digits, and underscores.
Valid tokens:
{{TENANT_NAME}}{{RENT_AMOUNT}}{{ADDRESS_LINE1}}{{FIELD2A}}
Not matched (by design):
{{lowercase}}- must be uppercase{{MixedCase}}- must be fully uppercase{{_LEADING_UNDERSCORE}}- must start with a letter{{123}}- must start with a letter{{ SPACES }}- no spaces allowed
This convention prevents false matches on natural language like {{see attached}} that might appear in legal documents, while keeping tokens readable and obvious at a glance in any word processor.
Repeating Blocks
Blocks use {{#BLOCK_NAME}} to open and {{/BLOCK_NAME}} to close a repeating section. The content between the markers is cloned once per item in the corresponding data array.
Paragraph Blocks
The most common use case — each item produces a copy of the enclosed paragraphs:
Template:
The following individuals are tenants under this agreement:
{{#TENANTS}}
Tenant: {{NAME}}, Phone: {{PHONE}}, Email: {{EMAIL}}
{{/TENANTS}}Data:
{
TENANTS: [
{ NAME: 'John Smith', PHONE: '(702) 555-1234', EMAIL: '[email protected]' },
{ NAME: 'Jane Smith', PHONE: '(702) 555-5678', EMAIL: '[email protected]' },
{ NAME: 'Bob Jones', PHONE: '(702) 555-9999', EMAIL: '[email protected]' },
]
}Output:
The following individuals are tenants under this agreement:
Tenant: John Smith, Phone: (702) 555-1234, Email: [email protected]
Tenant: Jane Smith, Phone: (702) 555-5678, Email: [email protected]
Tenant: Bob Jones, Phone: (702) 555-9999, Email: [email protected]Table Row Blocks
When block markers are inside table rows, the engine clones at the row level — perfect for tenant rosters, payment schedules, and similar tabular data:
Template (Word table): | Name | Phone | Email | |---|---|---| | {{#TENANTS}} | | | | {{NAME}} | {{PHONE}} | {{EMAIL}} | | {{/TENANTS}} | | |
Output (3 tenants): | Name | Phone | Email | |---|---|---| | John Smith | (702) 555-1234 | [email protected] | | Jane Smith | (702) 555-5678 | [email protected] | | Bob Jones | (702) 555-9999 | [email protected] |
Block Rules
- Opening (
{{#NAME}}) and closing ({{/NAME}}) markers must each be in their own paragraph or table row — they cannot share a paragraph with other text. - Block names follow the same naming rules as scalar tokens (uppercase letters, digits, underscores).
- Inner tokens within blocks are independent of top-level tokens — you can use
{{NAME}}inside a{{#TENANTS}}block without conflicting with a top-level{{TENANT_NAME}}token. - If the data map does not contain the block name, or the value is not an array, the entire block (markers + content) is removed from the output.
- The
missingTokenBehavioroption applies to unresolved inner tokens within blocks, using the same rules as top-level tokens.
Multi-Paragraph Blocks
Blocks can contain multiple paragraphs — all content between the markers is cloned per item:
Template:
{{#TENANTS}}
Name: {{NAME}}
Contact: {{PHONE}} | {{EMAIL}}
Emergency Contact: {{EMERGENCY_CONTACT}}
{{/TENANTS}}Each item produces all three paragraphs (plus the blank line), fully populated with that item's data.
The Split-Run Problem
When a user types {{TENANT_NAME}} in Word, the underlying XML often looks like this:
<w:r><w:t>{{TENANT</w:t></w:r>
<w:r><w:t>_NAME</w:t></w:r>
<w:r><w:t>}}</w:t></w:r>Word splits text into separate runs when spell-check flags part of a token, when the cursor was placed mid-token during editing, or when revision tracking assigns different session IDs to different keystrokes. The token looks fine in the Word UI, but no simple regex on a single <w:t> node will find it.
The run merger solves this by walking each paragraph and combining adjacent runs that share identical formatting properties (<w:rPr>) into a single run. After merging, the token is guaranteed to be in one text node:
<w:r><w:t>{{TENANT_NAME}}</w:t></w:r>The merger is conservative - it only touches runs whose sole content children are <w:rPr> and <w:t>. Runs containing tabs, page breaks, drawings, field codes, or any other structural elements are left untouched.
API Reference
scan(buffer)
Scan a .docx template for all {{TOKEN}} placeholders and {{#BLOCK}}...{{/BLOCK}} sections.
| Parameter | Type | Description |
|---|---|---|
| buffer | Buffer \| Uint8Array | The .docx file bytes |
Returns: { tokens: TokenLocation[], blocks: BlockDefinition[] }
Each scalar token entry includes the token name and an array of XML file paths where it was found. Block definitions include the block name, the inner token names found within the block, and the XML file paths. Inner tokens are excluded from the top-level tokens array to avoid double-reporting.
populate(buffer, data, options?)
Replace {{TOKEN}} placeholders and expand {{#BLOCK}}...{{/BLOCK}} sections with values from the data map.
| Parameter | Type | Description |
|---|---|---|
| buffer | Buffer \| Uint8Array | The .docx template file bytes |
| data | PopulateData | Token names mapped to replacement values or block data arrays |
| options | PopulateOptions | Optional behavior configuration |
PopulateData: Record<string, string | Record<string, string>[]>
- Scalar tokens map to strings (e.g.,
{ RENT_AMOUNT: '$1,850.00' }) - Block tokens map to arrays of token maps (e.g.,
{ TENANTS: [{ NAME: '...', ... }, ...] })
Options:
| Property | Type | Default | Description |
|---|---|---|---|
| missingTokenBehavior | 'remove' \| 'preserve' \| 'placeholder' | 'preserve' | What to do with tokens not in the data map |
| placeholderText | string | '___________' | Text to insert when behavior is 'placeholder' |
Returns: Buffer - a valid .docx file.
Advanced Usage
The ./core sub-path export exposes all internal modules for direct manipulation:
import {
readZip, writeZip,
parseXml, serializeXml, findAll, firstChild, childElements, textContent, cloneNode,
mergeRuns,
expandBlocks, scanBlocks,
scanTokens, replaceTokens,
TOKEN_PATTERN, BLOCK_OPEN_PATTERN, BLOCK_CLOSE_PATTERN, CONTENT_XML_FILES,
} from '@joelouf/doc-template/core';This allows you to build custom pipelines - for example, scanning only specific XML files, applying your own transformations to the XML tree between merging and token replacement, or processing non-OOXML ZIP-based formats using the generic readZip/writeZip utilities.
ZIP Utilities
import { readZip, writeZip } from '@joelouf/doc-template/core';
const entries = readZip(docxBuffer);
// entries: Map<string, Buffer> - filename to uncompressed content
const newZip = writeZip(entries);
// newZip: Buffer - valid ZIP archiveThe ZIP reader parses the central directory, follows local file headers for correct data offsets (handling extra field length mismatches between authoring tools), decompresses with zlib.inflateRawSync, and verifies CRC-32 on every entry. The writer compresses with deflate and falls back to stored (method 0) when deflation doesn't reduce size.
XML Utilities
import { parseXml, serializeXml, findAll, firstChild, cloneNode } from '@joelouf/doc-template/core';
const doc = parseXml(xmlString);
// doc: { declaration: string | null, root: XmlElement }
const paragraphs = findAll(doc.root, 'w:p');
const pPr = firstChild(paragraphs[0], 'w:pPr');
// Deep clone a subtree (no shared references with original)
const copy = cloneNode(paragraphs[0]);
const xmlOut = serializeXml(doc);The XML parser handles namespaced elements and attributes, self-closing tags, entity encoding/decoding, and xml:space="preserve". It does not support CDATA, comments, or DTDs - only the subset of XML that .docx files produce.
Compatibility
Tested against .docx files generated by:
- Microsoft Word (multiple versions)
- docx-js (npm
docxpackage) - LibreOffice Writer
- Google Docs (exported as .docx)
License
MIT
