docx-edit

v0.3.1

Published

21 days ago

A JS library that parses DOCX into a virtual component tree and writes paragraph-level text changes back to split OOXML runs.

Downloads

659

0High
0Medium
0Low

onion126

docx word ooxml virtual-dom docx-editor document-processing

docx-edit

一个基于 JavaScript 的 .docx 解析与修改库。相比于传统的 .docx 解析库，本项目的优势在于：对于word动态修改更加友好的支持，支持全文高精度匹配和替换等操作。

当前版本已经实现：

文档虚拟树 diff / patch
段落与 run 的样式级建模
样式新增、修改、清空
组件之间的样式迁移
旧控制器 API 与新虚拟树 API 并存

所有写操作最终都会统一收敛到虚拟树 patch，再同步回底层 OOXML。

特性

解析正文、页眉、页脚、批注、脚注、尾注
识别 paragraph、run、text、table、table-row、table-cell、hyperlink、text-box、image、math、footnoteReference、endnoteReference
支持段落跨多个 w:t 的整段文本读取和回写（保留脚注引用和数学公式占位符）
支持真正的虚拟树 diff / patch
支持段落样式和 run 样式的建模、修改和迁移
兼容旧控制器 API，旧写接口内部自动转为虚拟树 patch
支持上标（superscript）和下标（subscript）样式读写
支持脚注引用（footnoteReference）的读取、写入和新建
支持尾注引用（endnoteReference）的读取和写入
保存修改后的 .docx
提取全文内容为 HTML（支持标题分级、数学公式、表格）
解析标题级别（支持中文/英文样式 ID）

安装

npm install docx-edit

本库当前使用 CommonJS 导出，对应 Node.js 环境建议为 >=18。

快速开始

const { loadDocx } = require("docx-edit");

async function main() {
  const doc = await loadDocx("./sample.docx");

  doc.replaceAll("旧词", "新词");
  await doc.saveAs("./sample.modified.docx");
}

main();

虚拟树模型

文档会被解析成一棵虚拟树，典型结构如下：

document
  body
    paragraph
      run
        text
    table
      table-row
        table-cell
          paragraph
  header
    paragraph
  footer
    paragraph
  comments
    comment
      paragraph

目前支持的节点类型：

document
body
header
footer
footnotes
endnotes
comments
paragraph
run
text
table
table-row
table-cell
hyperlink
tab
break
text-box
comment
footnote
endnote
footnoteReference
endnoteReference
footnoteRef
endnoteRef
image
math

内部写入流程

无论你调用的是旧控制器接口，还是直接使用 doc.patch(nextTree)，内部流程都是一致的：

从当前文档生成一棵新的虚拟树副本
在副本上修改目标节点
调用 doc.patch(nextTree)
patch 引擎执行 INSERT / REMOVE / REPLACE / MOVE / PROPS/TEXT_UPDATE
将结果同步回底层 OOXML
从 XML 重新建树并重建索引

对于段落文本修改，仍然保留当前 ParagraphTextModel 的策略：

尽量保留原有 w:r / w:t
尽量保留 tab / break
只将新的文本重新分配回原有文本节点

段落文本中的特殊占位符

读取段落文本时，脚注引用和数学公式会以占位符形式出现在文本中：

[[FOOTNOTE_REF:id]] — 脚注引用
[[ENDNOTE_REF:id]] — 尾注引用
[[MATH:text]] — 数学公式

修改段落文本时，这些占位符会被自动保留，不会被覆盖。

样式模型

当前已经支持两层样式建模：

paragraph.props.style 对应 w:pPr
run.props.style 对应 w:rPr

已支持的段落样式字段

{
  styleId: "BodyText",
  alignment: "center",
  keepNext: true,
  keepLines: true,
  pageBreakBefore: false,
  spacing: {
    before: "120",
    after: "240",
    line: "360",
    lineRule: "auto",
  },
  indent: {
    left: "240",
    right: "120",
    firstLine: "240",
    hanging: "240",
  },
}

已支持的 run 样式字段

{
  styleId: "Emphasis",
  bold: true,
  italic: true,
  underline: "single",
  color: "FF0000",
  highlight: "yellow",
  fontSize: "28",
  vertAlign: "superscript",  // "superscript" | "subscript" | null
  fontFamily: {
    ascii: "Calibri",
    asciiTheme: "minorHAnsi",
    hAnsi: "Calibri",
    hAnsiTheme: "minorHAnsi",
    eastAsia: "宋体",
    eastAsiaTheme: "minorEastAsia",
    cs: "Arial",
    cstheme: "minorBidi",
  },
}

主题字体属性

fontFamily 已支持捕获和回写主题字体引用：

asciiTheme / hAnsiTheme / eastAsiaTheme / cstheme

这些属性在 word/styles.xml 和 document.xml 中均可正确解析和同步。

导出 API

入口定义在 src/index.js。

const {
  loadDocx,
  VirtualWordDocument,
  VNode,
  createVNode,
  cloneVNode,
  DocumentPartController,
  ParagraphController,
  RunController,
  TableController,
  TableRowController,
  TableCellController,
  TextBoxController,
  StructuredEntryController,
} = require("docx-edit");

文档 API

`loadDocx(input)`

加载 .docx 文件。

input: string | Buffer
返回：Promise<VirtualWordDocument>

const doc = await loadDocx("./sample.docx");

`doc.toComponentTree()`

返回当前文档虚拟树的副本。你可以在这棵树上修改，再传给 doc.patch()。

const tree = doc.toComponentTree();
console.log(tree.type); // document

`doc.patch(nextTree)`

对完整虚拟树执行 patch，并把结果同步到底层 XML。

根节点类型必须为 document
支持文本更新、结构新增、删除、替换、重排
支持段落样式和 run 样式修改
返回 patch 结果，包含执行的操作列表

const tree = doc.toComponentTree();
const body = tree.children.find((node) => node.type === "body");
body.children[0].props.text = "新的第一段";

const result = doc.patch(tree);
console.log(result.operations);

`doc.toBuffer()`

返回修改后的 .docx 二进制内容。

const buffer = await doc.toBuffer();

`doc.saveAs(outputPath)`

保存文档到指定路径。

await doc.saveAs("./sample.modified.docx");

`doc.addFootnote(text)`

创建一条新脚注，返回脚注 ID。可用于后续在段落中插入脚注引用。

text: string — 脚注内容文本
返回：number — 脚注 ID

对于没有 footnotes.xml 的文档，会自动创建。

const doc = await loadDocx("./sample.docx");

// 创建脚注，获取 ID
const fnId = doc.addFootnote("这是脚注内容");

// 在段落中引用该脚注
const tree = doc.toComponentTree();
const body = tree.children.find((node) => node.type === "body");

body.children.push(
  createVNode({
    type: "paragraph",
    props: { text: "正文内容" },
    children: [
      createVNode({
        type: "run",
        props: { text: "正文内容" },
        children: [],
      }),
      createVNode({
        type: "run",
        props: {
          style: { vertAlign: "superscript" },
        },
        children: [
          createVNode({
            type: "footnoteReference",
            props: { id: String(fnId) },
            children: [],
          }),
        ],
      }),
    ],
  }),
);

await doc.patch(tree);
await doc.saveAs("./sample.modified.docx");

doc.getParts();
doc.getBody();
doc.getHeaders();
doc.getFooters();
doc.getParagraphs();
doc.getParagraph(0);
doc.getTables();
doc.getTextBoxes();
doc.getFootnotes();
doc.getEndnotes();
doc.getComments();

`doc.replaceAll(searchValue, replacement, options?)`

全文替换段落文本。

searchValue: string | RegExp
replacement: string | Function
options.partTypes?: string[]

doc.replaceAll("活动", "主题活动");
doc.replaceAll(/2025/g, "2026");
doc.replaceAll("页眉", "新页眉", { partTypes: ["header"] });

`doc.extractHtml(options?)`

提取全文内容为简单 HTML 格式。

options.partTypes?: string[] — 提取哪些部分，默认 ["body"]，可选 "body"、"headers"、"footers"

输出格式：

标题：<h1> ~ <h6>（根据 resolveHeadingLevel 自动识别）
正文：<p>
加粗/斜体/下划线/删除线：<strong> / <em> / <u> / <s>
行内公式：<span class="math">...</span>
块级公式：<div class="math-block">...</div>
表格：<table> / <tr> / <td>（支持 colspan）
图片：<img alt="..." />
换行：<br>

const html = doc.extractHtml();
fs.writeFileSync("output.html", html);

`doc.resolveHeadingLevel(styleId)`

根据 styleId 解析标题级别。

styleId: string — 段落样式的 styleId
返回：1~9（标题级别）或 null

解析优先级：

样式继承链中的 outlineLevel（最可靠）
样式名称匹配（支持 "标题 1"~"标题 9" 和 "heading 1"~"heading 9"）
styleId 直接匹配（Heading1~Heading9）

doc.resolveHeadingLevel("Heading1"); // 1
doc.resolveHeadingLevel("1");        // 1（中文 Word）

`paragraph.getHeadingLevel()`

便捷方法，直接获取段落的标题级别。

返回：1~9 或 null

for (const p of doc.getBody().getParagraphs()) {
  const level = p.getHeadingLevel();
  if (level) {
    console.log(`标题 ${level}: ${p.getText()}`);
  } else {
    console.log(`正文: ${p.getText()}`);
  }
}

控制器 API

旧控制器 API 仍然保留，但内部已经迁移到虚拟树 patch。

`DocumentPartController`

常见来源：

const body = doc.getBody();
const header = doc.getHeaders()[0];
const footer = doc.getFooters()[0];

可用方法：

body.toComponentTree();
body.getParagraphs();
body.getParagraph(0);
body.getTables();
body.getTable(0);
body.getTextBoxes();
body.replaceAll("旧词", "新词");

对于 comments / footnotes / endnotes part，还可以：

const commentsPart = doc.getParts().find((part) => part.type === "comments");
commentsPart.getEntries();
commentsPart.getEntries({ includeSpecial: true });

`ParagraphController`

const paragraph = doc.getBody().getParagraph(0);

paragraph.getText();
paragraph.setText("新的段落内容");
paragraph.replace("旧词", "新词");
paragraph.replaceAll("青年", "青年学生");
paragraph.getStyle();
paragraph.setStyle({ alignment: "center" });
paragraph.patchStyle({ spacing: { after: "240" } });
paragraph.getRuns();
paragraph.getRun(0);

`RunController`

const run = doc.getBody().getParagraph(0).getRun(0);

run.getText();
run.getStyle();
run.setStyle({
  bold: true,
  color: "FF0000",
  fontSize: "28",
});
run.patchStyle({
  italic: true,
  underline: "single",
});

样式迁移

const paragraphA = doc.getBody().getParagraph(0);
const paragraphB = doc.getBody().getParagraph(1);

paragraphB.copyStyleFrom(paragraphA);

const runA = paragraphA.getRun(0);
const runB = paragraphB.getRun(0);
runB.copyStyleFrom(runA);

样式档案（Style Profile）

样式档案可以从文档中提取所有命名样式定义（来自 word/styles.xml），输出为 JSON，也可以用同样的 JSON 格式回写到文档中修改样式定义。

提取样式档案

const profile = doc.getStyleProfile();

console.log(profile.defaults);
// { paragraphStyle: { spacing: { after: "160", line: "278", lineRule: "auto" } }, runStyle: { fontSize: "22" } }

console.log(profile.styles["1"]);
// { name: "heading 1", type: "paragraph", basedOn: "a", paragraphStyle: {...}, runStyle: { fontSize: "48", color: "2F5496" } }

JSON 格式

{
  "defaults": {
    "paragraphStyle": { "spacing": { "after": "160", "line": "278", "lineRule": "auto" } },
    "runStyle": { "fontSize": "22", "fontFamily": { "asciiTheme": "minorHAnsi" } }
  },
  "styles": {
    "a": {
      "name": "Normal",
      "type": "paragraph",
      "basedOn": null,
      "paragraphStyle": {},
      "runStyle": {}
    },
    "1": {
      "name": "heading 1",
      "type": "paragraph",
      "basedOn": "a",
      "paragraphStyle": { "keepNext": true, "spacing": { "before": "480", "after": "80" } },
      "runStyle": { "fontSize": "48", "color": "2F5496" }
    }
  }
}

应用样式档案

// 从文档 A 提取
const profileA = docA.getStyleProfile();

// 应用到文档 B
docB.applyStyleProfile(profileA);
await docB.saveAs("output.docx");

也可以只修改部分样式：

doc.applyStyleProfile({
  styles: {
    "1": {
      name: "heading 1",
      type: "paragraph",
      basedOn: "a",
      runStyle: { fontSize: "56", color: "FF0000" },
    },
  },
});

跨文档格式迁移

不同文档的 styleId 可能不同。例如文档 A 的 heading 1 是 "1"，文档 B 的 heading 1 是 "2"。此时需要按样式名称匹配，而非直接按 styleId 应用：

const srcProfile = docA.getStyleProfile();
const dstProfile = docB.getStyleProfile();

const mappedProfile = { defaults: srcProfile.defaults, styles: {} };

for (const [srcId, srcStyle] of Object.entries(srcProfile.styles)) {
  // 在目标文档中找同名样式
  for (const [dstId, dstStyle] of Object.entries(dstProfile.styles)) {
    if (dstStyle.name === srcStyle.name && dstStyle.type === srcStyle.type) {
      mappedProfile.styles[dstId] = {
        name: srcStyle.name,
        type: srcStyle.type,
        basedOn: dstStyle.basedOn,   // 保留目标文档的继承关系
        paragraphStyle: srcStyle.paragraphStyle,
        runStyle: srcStyle.runStyle,
      };
      break;
    }
  }
}

docB.applyStyleProfile(mappedProfile);
await docB.saveAs("output.docx");

关键点：

basedOn 使用目标文档的值，保留目标文档自身的继承链结构
paragraphStyle 和 runStyle 使用源文档的值，实现格式迁移
源文档中存在但目标文档中不存在的样式会被忽略（不会自动创建）

解析有效样式

解析某个命名样式的最终效果（合并 docDefaults → basedOn 链 → 自身属性）：

const effective = doc.resolveEffectiveStyle("1");
// { paragraphStyle: { keepNext: true, spacing: { before: "480", after: "80", ... } },
//   runStyle: { fontSize: "48", color: "2F5496", ... } }

列出所有命名样式

const named = doc.getNamedStyles();
// [ { styleId: "a", name: "Normal", type: "paragraph", basedOn: null }, ... ]

const table = doc.getTables()[0];

table.getRows();
table.getRow(0);
table.getCell(1, 2);

table.fill(
  [
    ["活动名称", "日期", "负责人", "备注"],
    ["分享会", "2026-03-24", "张三", "已确认"],
  ],
  { startRow: 0 },
);

`TableRowController`

const row = doc.getTables()[0].getRow(0);

row.getCells();
row.getCell(0);

`TableCellController`

const cell = doc.getTables()[0].getCell(1, 0);

cell.getParagraphs();
cell.getParagraph(0);
cell.getText();
cell.setText("新的单元格内容");

`TextBoxController`

const textBox = doc.getTextBoxes()[0];

textBox.getParagraphs();
textBox.getText();

`StructuredEntryController`

用于 comment / footnote / endnote。

const comment = doc.getComments()[0];

comment.getParagraphs();
comment.getText();
comment.replaceAll("原文", "新文");

虚拟树 API 调用说明

1. 修改已有段落文本

const { loadDocx } = require("docx-edit");

const doc = await loadDocx("./sample.docx");
const tree = doc.toComponentTree();
const body = tree.children.find((node) => node.type === "body");

body.children[0].props.text = "这是更新后的第一段";

await doc.patch(tree);
await doc.saveAs("./sample.modified.docx");

2. 插入一个新段落

const { createVNode, loadDocx } = require("docx-edit");

const doc = await loadDocx("./sample.docx");
const tree = doc.toComponentTree();
const body = tree.children.find((node) => node.type === "body");

body.children.splice(
  1,
  0,
  createVNode({
    type: "paragraph",
    props: { text: "这是新插入的段落" },
    children: [],
  }),
);

await doc.patch(tree);
await doc.saveAs("./sample.modified.docx");

3. 删除一个段落

const doc = await loadDocx("./sample.docx");
const tree = doc.toComponentTree();
const body = tree.children.find((node) => node.type === "body");

body.children.splice(0, 1);

await doc.patch(tree);

4. 使用 `key` 做稳定重排

如果你要频繁重排同层节点，建议设置 key。

const tree = doc.toComponentTree();
const body = tree.children.find((node) => node.type === "body");

body.children[0].key = "first";
body.children[1].key = "second";
body.children[2].key = "third";

await doc.patch(tree);

const nextTree = doc.toComponentTree();
const nextBody = nextTree.children.find((node) => node.type === "body");
nextBody.children = [nextBody.children[2], nextBody.children[0], nextBody.children[1]];

await doc.patch(nextTree);

5. 修改段落样式

const tree = doc.toComponentTree();
const body = tree.children.find((node) => node.type === "body");

body.children[0].props.style = {
  styleId: "BodyText",
  alignment: "center",
  spacing: {
    before: "120",
    after: "240",
  },
};

await doc.patch(tree);

6. 修改 run 样式

const tree = doc.toComponentTree();
const body = tree.children.find((node) => node.type === "body");
const firstRun = body.children[0].children[0];

firstRun.props.style = {
  bold: true,
  italic: true,
  color: "FF0000",
  underline: "single",
};

await doc.patch(tree);

7. 在组件之间迁移样式

const tree = doc.toComponentTree();
const body = tree.children.find((node) => node.type === "body");

const sourceParagraphStyle = body.children[0].props.style;
body.children[1].props.style = sourceParagraphStyle;

const sourceRunStyle = body.children[0].children[0].props.style;
body.children[1].children[0].props.style = sourceRunStyle;

await doc.patch(tree);

8. 修改页眉、页脚、批注、文本框

const tree = doc.toComponentTree();

const header = tree.children.find((node) => node.type === "header");
const footer = tree.children.find((node) => node.type === "footer");
const comments = tree.children.find((node) => node.type === "comments");

header.children[0].props.text = "新的页眉";
footer.children[0].props.text = "新的页脚";
comments.children[0].children[0].props.text = "新的批注内容";

await doc.patch(tree);

9. 创建带脚注引用的段落

const { createVNode, loadDocx } = require("docx-edit");

const doc = await loadDocx("./sample.docx");

// 创建脚注
const fnId = doc.addFootnote("脚注说明文字");

// 构建带脚注引用的段落
const tree = doc.toComponentTree();
const body = tree.children.find((node) => node.type === "body");

body.children.push(
  createVNode({
    type: "paragraph",
    props: { text: "这段话有脚注[[FOOTNOTE_REF:" + fnId + "]]。" },
    children: [
      createVNode({
        type: "run",
        props: { text: "这段话有脚注" },
        children: [],
      }),
      createVNode({
        type: "run",
        props: { style: { vertAlign: "superscript" } },
        children: [
          createVNode({
            type: "footnoteReference",
            props: { id: String(fnId) },
            children: [],
          }),
        ],
      }),
      createVNode({
        type: "run",
        props: { text: "。" },
        children: [],
      }),
    ],
  }),
);

await doc.patch(tree);
await doc.saveAs("./sample.modified.docx");

注意：paragraph.props.text 包含占位符用于文本匹配（如 replaceAll），但实际 XML 结构由 children 中的 run 节点决定。footnoteReference 节点必须作为 run 的子节点。

`createVNode()` 说明

createVNode() 用来手动创建新节点。

const node = createVNode({
  type: "paragraph",
  key: "intro",
  props: { text: "介绍段落" },
  children: [],
});

参数说明：

type: 节点类型
key: 可选，同层稳定重排时推荐提供
props: 节点属性
children: 子节点数组

注意：

根节点必须是 document
patch 时必须保持已有 part 不变，不能随意删除 body/header/footer/comments 这些 part 根
新增节点时，要符合当前支持的父子关系
样式修改建议直接写到 paragraph.props.style 或 run.props.style

测试

运行示例脚本：

npm run example

运行测试：

npm test

当前测试覆盖：

段落整段读取和回写
tab / break 保留
虚拟树文本 patch
虚拟树结构插入、删除、重排
表格 patch 与 fill() 混用
header / footer / comment / text-box 持久化
段落样式和 run 样式解析
样式新增、修改、清空
样式在组件之间迁移
word/styles.xml 解析（docDefaults、命名样式、主题字体）
样式继承链解析（basedOn 链 + docDefaults 合并）
样式档案 JSON 导出 / 导入 / 保存回写
真实样本文档回归
HTML 全文提取（标题分级、数学公式、表格、图片）
标题级别解析（中英文样式 ID）
上标（superscript）/ 下标（subscript）样式读写
脚注引用（footnoteReference）读取、写入和 round-trip
尾注引用（endnoteReference）读取和写入
数学公式（math）读取、写入和 round-trip
新建脚注（addFootnote）— 含已有/无脚注的文档

已知边界

当前只覆盖常见文本相关 OOXML 节点，不是完整的 Word OOXML 实现
当前样式建模主要覆盖段落和 run 的常用属性
对未知节点的策略是尽量保留，而不是细粒度理解和编辑
doc.patch(nextTree) 期望目标树是由当前树演化而来，不保证支持任意非法结构

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

docx-edit

特性

安装

快速开始

虚拟树模型

内部写入流程

段落文本中的特殊占位符

样式模型

已支持的段落样式字段

已支持的 run 样式字段

主题字体属性

导出 API

文档 API

loadDocx(input)

doc.toComponentTree()

doc.patch(nextTree)

doc.toBuffer()

doc.saveAs(outputPath)

doc.addFootnote(text)

doc.replaceAll(searchValue, replacement, options?)

doc.extractHtml(options?)

doc.resolveHeadingLevel(styleId)

paragraph.getHeadingLevel()

控制器 API

DocumentPartController

ParagraphController

RunController

样式迁移

样式档案（Style Profile）

提取样式档案

JSON 格式

应用样式档案

跨文档格式迁移

解析有效样式

列出所有命名样式

TableRowController

TableCellController

TextBoxController

StructuredEntryController

虚拟树 API 调用说明

1. 修改已有段落文本

2. 插入一个新段落

3. 删除一个段落

4. 使用 key 做稳定重排

5. 修改段落样式

6. 修改 run 样式

7. 在组件之间迁移样式

8. 修改页眉、页脚、批注、文本框

9. 创建带脚注引用的段落

createVNode() 说明

测试

已知边界

`loadDocx(input)`

`doc.toComponentTree()`

`doc.patch(nextTree)`

`doc.toBuffer()`

`doc.saveAs(outputPath)`

`doc.addFootnote(text)`

`doc.replaceAll(searchValue, replacement, options?)`

`doc.extractHtml(options?)`

`doc.resolveHeadingLevel(styleId)`

`paragraph.getHeadingLevel()`

`DocumentPartController`

`ParagraphController`

`RunController`

`TableRowController`

`TableCellController`

`TextBoxController`

`StructuredEntryController`

4. 使用 `key` 做稳定重排

`createVNode()` 说明