redlinefy
v0.1.0
Published
Tracked-changes toolkit for Word — generate granular redlines from any text diff
Maintainers
Readme
redlinefy
Tracked-changes toolkit for Word. Generate granular redlines from any text diff — word-level, sentence-level, or full-block. Works as an Office.js add-in for live editing, as a standalone diff engine, or as a Node.js library for .docx file manipulation.
Install
npm install redlinefyOverview
redlinefy ships three independent entry points. Import only what you need:
| Import | Runtime | What it does |
|---|---|---|
| redlinefy | Office.js add-in | Apply diffs as tracked changes in a live Word document |
| redlinefy/diff | Any JS/TS | Standalone word-level and sentence-level diff engine |
| redlinefy/docx | Node.js / browser | Read and write .docx files with tracked-change markup |
Each entry point is tree-shakeable and ships ESM + CJS bundles with full TypeScript declarations.
Quick start
Word add-in: apply an LLM rewrite as tracked changes
import { applyTextDiff } from 'redlinefy'
await Word.run(async (context) => {
const range = context.document.getSelection()
range.load('text')
await context.sync()
const original = range.text
const modified = await askLLM(original) // your LLM call
const result = await applyTextDiff(context, range, original, modified)
// result.changes = number of tracked edits applied
// result.strategy = 'token' (granular) or 'block' (fallback)
})Each word-level change appears as an individual tracked revision — the reviewer sees exactly what moved, not a monolithic block replace.
Auto-select granularity
import { applyDiff } from 'redlinefy'
// Word-level for small edits, sentence-level for heavy rewrites
await applyDiff(context, range, original, modified)
// Or force a specific granularity
await applyDiff(context, range, original, modified, { granularity: 'sentence' })
await applyDiff(context, range, original, modified, { granularity: 'block' })In auto mode, redlinefy computes a word-level diff and checks the deletion ratio. If more than half the original words are deleted, it switches to sentence-level diffing for cleaner results.
File-based: redline a .docx on disk
import { redlineFile } from 'redlinefy/docx'
await redlineFile('contract.docx', 'contract-redlined.docx', (text, index) => {
return text.replace(/Acme Corp/g, 'NewCo Inc.')
})
// Open contract-redlined.docx in Word — changes appear as tracked revisionsOr work with buffers directly:
import { redline } from 'redlinefy/docx'
const buffer = new Uint8Array(/* .docx bytes */)
const output = await redline(buffer, (text, index) => {
if (index === 0) return 'Revised first paragraph.'
return null // leave other paragraphs unchanged
}, { author: 'Contract Bot', date: '2026-02-15T00:00:00Z' })Diff two texts against a .docx
import { redlineDiff } from 'redlinefy/docx'
const output = await redlineDiff(
docxBuffer,
'The quick brown fox jumps over the lazy dog.',
'The fast brown fox leaps over the lazy cat.',
)
// Output .docx has "quick→fast" and "jumps→leaps" and "dog→cat" as tracked changesBoth texts are split by newlines and matched against the document's paragraphs. Only paragraphs within the document's count are modified; extra lines are ignored.
Standalone diff (no Word dependency)
import { computeDiff, getDiffStats } from 'redlinefy/diff'
const ops = computeDiff(
'The court held that the standard applies.',
'The court found that the higher standard applies.',
)
// [equal "The court ", delete "held", insert "found", equal " that the ",
// insert "higher ", equal "standard applies."]
const stats = getDiffStats(
'The court held that the standard applies.',
'The court found that the higher standard applies.',
)
// { insertions: 2, deletions: 1, unchanged: 5 }Markdown formatting
When the modified text contains inline markdown (**bold**, *italic*, ~~strikethrough~~, ***bold+italic***), redlinefy strips the delimiters before diffing and applies the formatting to the resulting tracked changes.
This is designed for LLM output — models often return markdown-formatted text. redlinefy converts it to proper Word formatting automatically.
import { applyTextDiff } from 'redlinefy'
const original = 'The standard of review is not specified.'
const modified = 'The standard of review is **de novo**.'
// "not specified" is deleted, "de novo" is inserted as a bold tracked change
await applyTextDiff(context, range, original, modified)Works in the .docx module too:
import { redline } from 'redlinefy/docx'
const output = await redline(buffer, (text) => {
return text.replace('shall', '**must**')
})
// "shall" → bold "must" as a tracked change in the output .docxMarkdown is also available standalone — all exported from redlinefy:
parseInlineMarkdown(text: string): FormattedSegment[]
Parse inline markdown into formatted segments. Each segment has text and optional bold, italic, strikethrough flags.
stripMarkdown(text: string): string
Strip all inline markdown delimiters, returning plain text.
hasMarkdown(text: string): boolean
Check if text contains any inline markdown formatting.
stripMarkdownPreserveFormats(text: string): StrippedMarkdown
Strip markdown and return both the plain text and an array of FormatRange objects with character positions in the stripped text.
getFormattedSegments(text: string, offset: number, formats: FormatRange[]): FormattedSegment[]
Given a substring and its offset in the full plain text, overlay format ranges to produce correctly split FormattedSegment[]. Used internally to map diff insert ops back to formatting.
Unicode tokenization
The default tokenizer uses \w+ (ASCII [a-zA-Z0-9_]), which splits accented words like café and treats CJK characters as punctuation. For proper Unicode word boundaries, pass an Intl.Segmenter:
import { applyTextDiff } from 'redlinefy'
const segmenter = new Intl.Segmenter('en', { granularity: 'word' })
await applyTextDiff(context, range, original, modified, { segmenter })Works with the standalone diff engine too:
import { computeDiff, tokenizeIntl } from 'redlinefy/diff'
const segmenter = new Intl.Segmenter('zh', { granularity: 'word' })
// Direct tokenization
const tokens = tokenizeIntl('中文测试', segmenter)
// Unicode-aware diff
const ops = computeDiff('café latte', 'cafe latte', segmenter)
// Single delete "café" + single insert "cafe" (not fragmented)And with .docx files:
import { redline } from 'redlinefy/docx'
const segmenter = new Intl.Segmenter('en', { granularity: 'word' })
const output = await redline(buffer, (text) => {
return text.replace('café', 'cafe')
}, { segmenter })Intl.Segmenter is a zero-dependency built-in available in all modern browsers and Node.js 16+. The regex tokenizer remains the default for backwards compatibility.
API
redlinefy — Office.js tracked changes
Requires the Office.js runtime (Word add-in). Install @types/office-js as a dev dependency for TypeScript support.
applyTextDiff(context, range, original, modified, options?)
Word-level diff applied as granular tracked changes. Falls back to block replace if token mapping fails. Handles markdown formatting in the modified text.
- context —
Word.RequestContext - range —
Word.Rangeto modify - original — original plain text
- modified — new text (may contain inline markdown)
- options —
TextDiffOptions - returns —
Promise<RedlineResult>
applySentenceDiff(context, range, original, modified, options?)
Sentence-level diff. Splits on .!? boundaries before diffing. Better for heavily rewritten text where word-level produces too many small changes.
Same signature as applyTextDiff.
applyDiff(context, range, original, modified, options?)
Unified entry point that dispatches to the right strategy.
- options.granularity —
‘word’|‘sentence’|‘block’|‘auto’(default‘auto’) - In
automode: uses word-level when the deletion ratio is <= 50%, sentence-level otherwise.
findAndReplace(context, searchText, replaceText)
Find all occurrences of searchText in the document body and replace under Track Changes.
- returns —
Promise<RedlineResult>wherechangesis the number of replacements
applyFormat(context, range, format)
Apply formatting changes as tracked revisions. Only sets properties explicitly provided.
- format —
FormatOptions
insertParagraph(context, range, text, location)
Insert a paragraph before or after a range, tracked as a change.
- location —
‘Before’|‘After’
deleteParagraph(context, range)
Delete a paragraph's content, tracked as a deletion.
withTracking(context, fn)
Run any callback with Track Changes enabled. Reads the current changeTrackingMode, sets it to TrackAll, runs fn, then restores the original mode — even if fn throws.
redlinefy/diff — standalone diff engine
No runtime dependencies beyond diff-match-patch. Works in any JavaScript environment.
computeDiff(original, modified, segmenter?)
Word-level diff. Tokenizes both texts into words, punctuation, and whitespace, maps each token to a single Unicode character, diffs the encoded strings with diff-match-patch, then decodes back to token sequences. Applies diff_cleanupSemantic for human-readable results. Pass an Intl.Segmenter to use Unicode-aware tokenization.
- returns —
DiffOp[]
computeSentenceDiff(original, modified)
Same pipeline as computeDiff but tokenizes at sentence boundaries (.!? followed by whitespace). Does not accept a segmenter — sentence splitting is independent of word boundary rules.
getDiffStats(original, modified, segmenter?)
Returns word-count statistics: { insertions: number, deletions: number, unchanged: number }. Pass a segmenter for Unicode-aware word counting.
tokenize(text)
Split text into Token[] — each token is { text, offset, type } where type is ’word’ | ’punctuation’ | ’whitespace’. Words are matched by \w+ (ASCII alphanumeric plus underscore).
tokenizeIntl(text, segmenter?)
Unicode-aware tokenizer using Intl.Segmenter. Handles accented characters (café, naïve) and CJK text as proper word tokens. Pass an optional Intl.Segmenter instance for locale-specific segmentation; defaults to English.
tokenizeSentences(text)
Split text into sentence-level tokens. Splits on .!? followed by whitespace or end of string.
encodeTokens(originalTokens, modifiedTokens)
Encode token arrays into single-character strings for diff-match-patch. Returns a WordEncoding object with encodedOriginal, encodedModified, originalTokens, modifiedTokens, and charToToken map.
encodeTexts(original, modified)
Convenience: tokenize + encode in one call. Returns WordEncoding.
redlinefy/docx — .docx file manipulation
Uses JSZip to read and write .docx archives. The redlineFile function uses a dynamic import(‘node:fs/promises’) for file I/O; all other functions work with Uint8Array buffers and are browser-compatible.
redline(buffer, transformFn, options?)
Transform paragraphs in a .docx buffer. The transformFn receives each paragraph's text and index — return a new string to create a tracked change, or null to leave it unchanged.
- buffer —
Uint8Array | ArrayBuffer - transformFn —
(text: string, index: number) => string | null - options —
DocxOptions - returns —
Promise<Uint8Array>(the modified .docx)
redlineFile(inputPath, outputPath, transformFn, options?)
File convenience wrapper. Reads the .docx, applies transforms, writes the result. Node.js only.
redlineDiff(buffer, originalText, modifiedText, options?)
Paragraph-by-paragraph diff. Splits both texts by newlines, matches each line to the corresponding paragraph in the .docx, and writes tracked changes for any differences. Lines beyond the document's paragraph count are ignored.
readDocx(buffer)
Parse a .docx buffer into a DocxDocument. Extracts word/document.xml and word/settings.xml, parses <w:p> elements, and returns paragraphs with their text, XML, character offset, and run properties.
- returns —
Promise<DocxDocument>
applyTrackedChanges(doc, transforms, options?)
Low-level: apply an array of ParagraphTransform objects to a parsed DocxDocument. Handles positional XML replacement, revision IDs, and settings updates. Revision IDs reset per call.
ensureTrackRevisions(settingsXml)
Ensure <w:trackRevisions/> is present in the settings XML. Idempotent.
Types
interface Token {
text: string
offset: number
type: 'word' | 'punctuation' | 'whitespace'
}
interface DiffOp {
type: 'equal' | 'insert' | 'delete'
text: string
tokens: Token[]
}
interface RedlineResult {
success: boolean
strategy?: 'token' | 'block'
changes: number
}
interface FormatOptions {
bold?: boolean
italic?: boolean
underline?: boolean
strikeThrough?: boolean
fontSize?: number
fontName?: string
color?: string
highlightColor?: string
}
interface TextDiffOptions {
trackChanges?: boolean // default: true
segmenter?: Intl.Segmenter // Unicode-aware word boundaries
}
interface ApplyDiffOptions extends TextDiffOptions {
granularity?: 'word' | 'sentence' | 'block' | 'auto'
}
interface DocxOptions {
author?: string // default: "Redlinefy"
date?: string // ISO 8601, default: current time
granularity?: 'word' | 'sentence' | 'block' // default: "word"
segmenter?: Intl.Segmenter // Unicode-aware word boundaries
}
interface DocxDocument {
zip: JSZip
paragraphs: DocxParagraph[]
documentXml: string
settingsXml: string
}
interface DocxParagraph {
index: number
text: string
xml: string
xmlOffset: number
rPr?: string
}
interface ParagraphTransform {
index: number
newText: string
}
interface FormattedSegment {
text: string
bold?: boolean
italic?: boolean
strikethrough?: boolean
}
type DiffGranularity = 'word' | 'sentence' | 'block' | 'auto'
interface StrippedMarkdown {
plain: string
formats: FormatRange[]
}
interface FormatRange {
start: number
end: number
bold?: boolean
italic?: boolean
strikethrough?: boolean
}
interface WordEncoding {
encodedOriginal: string
encodedModified: string
originalTokens: Token[]
modifiedTokens: Token[]
charToToken: Map<number, string>
}How it works
Diff engine
- Tokenize — split text into words, punctuation, and whitespace (or sentences)
- Encode — map each unique token to a single Unicode character starting at U+0100
- Diff — run diff-match-patch on the encoded strings
- Decode — map the character-level diff back to token-level operations
- Clean up — apply
diff_cleanupSemanticfor human-readable results
This produces diffs at word boundaries rather than character boundaries, which maps cleanly to Word's tracked-change model.
Office.js integration
The token strategy splits the Word range into sub-ranges using getTextRanges(), then walks the diff ops: equal ops advance the range pointer, delete ops mark ranges for deletion, insert ops record text to insert at anchor points. Deletions are applied in reverse order to preserve indices, then insertions are applied with formatting. If token mapping fails, it falls back to a single block replace — the strategy field in RedlineResult indicates which path was taken.
.docx file manipulation
Opens the .docx archive with JSZip, extracts word/document.xml, parses <w:p> elements with regex, diffs each paragraph's text against the new text, and rebuilds the paragraph XML with Open XML tracked-change elements (<w:ins>, <w:del>). Paragraph replacements are applied from the end of the document forward so character offsets remain valid. Existing <w:pPr> and <w:rPr> are preserved. When markdown is present, formatted segments get separate <w:r> runs with the appropriate <w:rPr> tags (<w:b/>, <w:i/>, <w:strike/>).
Known limitations
- Default tokenizer uses ASCII word boundaries. The
\wregex matches[a-zA-Z0-9_]. Accented characters (é, ñ) and CJK characters are classified as punctuation tokens. Diffs still produce correct results — the tokens are just finer-grained. Pass asegmenteroption (anIntl.Segmenterinstance) for proper Unicode word boundaries. - Sentence tokenizer splits on all
.!?characters. Abbreviations like "Dr." or "U.S.A." are treated as sentence boundaries. This affectscomputeSentenceDiffandapplySentenceDiff. - Nested markdown is not supported.
**bold *and italic***recognizes only the outermost delimiter. Use***bold+italic***for combined formatting. - No markdown escape sequences. Literal
**in text that happens to match the pattern will be parsed as formatting. The parser requires non-whitespace flanking characters to minimize false positives (2 * 3 * 4is left alone). - Paragraph-level operations only in .docx. The docx module operates on
<w:p>elements (including those inside table cells). It does not handle headers, footers, footnotes, comments, or other non-body content. - Regex-based XML parsing. The docx module uses regex to extract and replace
<w:p>elements rather than a full XML parser. This is reliable for round-tripping paragraph content but does not support arbitrary XML transformations.
Development
npm install
npm test # 243 tests
npm run typecheck
npm run build # ESM + CJS + .d.tsLicense
Apache-2.0
