redlinefy

v0.1.0

Published

4 months ago

Tracked-changes toolkit for Word — generate granular redlines from any text diff

0High
0Medium
0Low

hangingahaw

tracked-changes redline diff word docx office-js track-changes legal document

redlinefy

Tracked-changes toolkit for Word. Generate granular redlines from any text diff — word-level, sentence-level, or full-block. Works as an Office.js add-in for live editing, as a standalone diff engine, or as a Node.js library for .docx file manipulation.

Install

npm install redlinefy

Overview

redlinefy ships three independent entry points. Import only what you need:

| Import | Runtime | What it does | |---|---|---| | redlinefy | Office.js add-in | Apply diffs as tracked changes in a live Word document | | redlinefy/diff | Any JS/TS | Standalone word-level and sentence-level diff engine | | redlinefy/docx | Node.js / browser | Read and write .docx files with tracked-change markup |

Each entry point is tree-shakeable and ships ESM + CJS bundles with full TypeScript declarations.

Quick start

Word add-in: apply an LLM rewrite as tracked changes

import { applyTextDiff } from 'redlinefy'

await Word.run(async (context) => {
  const range = context.document.getSelection()
  range.load('text')
  await context.sync()

  const original = range.text
  const modified = await askLLM(original) // your LLM call

  const result = await applyTextDiff(context, range, original, modified)
  // result.changes = number of tracked edits applied
  // result.strategy = 'token' (granular) or 'block' (fallback)
})

Each word-level change appears as an individual tracked revision — the reviewer sees exactly what moved, not a monolithic block replace.

Auto-select granularity

import { applyDiff } from 'redlinefy'

// Word-level for small edits, sentence-level for heavy rewrites
await applyDiff(context, range, original, modified)

// Or force a specific granularity
await applyDiff(context, range, original, modified, { granularity: 'sentence' })
await applyDiff(context, range, original, modified, { granularity: 'block' })

In auto mode, redlinefy computes a word-level diff and checks the deletion ratio. If more than half the original words are deleted, it switches to sentence-level diffing for cleaner results.

File-based: redline a .docx on disk

import { redlineFile } from 'redlinefy/docx'

await redlineFile('contract.docx', 'contract-redlined.docx', (text, index) => {
  return text.replace(/Acme Corp/g, 'NewCo Inc.')
})
// Open contract-redlined.docx in Word — changes appear as tracked revisions

Or work with buffers directly:

import { redline } from 'redlinefy/docx'

const buffer = new Uint8Array(/* .docx bytes */)

const output = await redline(buffer, (text, index) => {
  if (index === 0) return 'Revised first paragraph.'
  return null // leave other paragraphs unchanged
}, { author: 'Contract Bot', date: '2026-02-15T00:00:00Z' })

Diff two texts against a .docx

import { redlineDiff } from 'redlinefy/docx'

const output = await redlineDiff(
  docxBuffer,
  'The quick brown fox jumps over the lazy dog.',
  'The fast brown fox leaps over the lazy cat.',
)
// Output .docx has "quick→fast" and "jumps→leaps" and "dog→cat" as tracked changes

Both texts are split by newlines and matched against the document's paragraphs. Only paragraphs within the document's count are modified; extra lines are ignored.

Standalone diff (no Word dependency)

import { computeDiff, getDiffStats } from 'redlinefy/diff'

const ops = computeDiff(
  'The court held that the standard applies.',
  'The court found that the higher standard applies.',
)
// [equal "The court ", delete "held", insert "found", equal " that the ",
//  insert "higher ", equal "standard applies."]

const stats = getDiffStats(
  'The court held that the standard applies.',
  'The court found that the higher standard applies.',
)
// { insertions: 2, deletions: 1, unchanged: 5 }

Markdown formatting

When the modified text contains inline markdown (**bold**, *italic*, ~~strikethrough~~, ***bold+italic***), redlinefy strips the delimiters before diffing and applies the formatting to the resulting tracked changes.

This is designed for LLM output — models often return markdown-formatted text. redlinefy converts it to proper Word formatting automatically.

import { applyTextDiff } from 'redlinefy'

const original = 'The standard of review is not specified.'
const modified = 'The standard of review is **de novo**.'
// "not specified" is deleted, "de novo" is inserted as a bold tracked change
await applyTextDiff(context, range, original, modified)

Works in the .docx module too:

import { redline } from 'redlinefy/docx'

const output = await redline(buffer, (text) => {
  return text.replace('shall', '**must**')
})
// "shall" → bold "must" as a tracked change in the output .docx

Markdown is also available standalone — all exported from redlinefy:

`parseInlineMarkdown(text: string): FormattedSegment[]`

Parse inline markdown into formatted segments. Each segment has text and optional bold, italic, strikethrough flags.

`stripMarkdown(text: string): string`

Strip all inline markdown delimiters, returning plain text.

`hasMarkdown(text: string): boolean`

Check if text contains any inline markdown formatting.

`stripMarkdownPreserveFormats(text: string): StrippedMarkdown`

Strip markdown and return both the plain text and an array of FormatRange objects with character positions in the stripped text.

`getFormattedSegments(text: string, offset: number, formats: FormatRange[]): FormattedSegment[]`

Given a substring and its offset in the full plain text, overlay format ranges to produce correctly split FormattedSegment[]. Used internally to map diff insert ops back to formatting.

Unicode tokenization

The default tokenizer uses \w+ (ASCII [a-zA-Z0-9_]), which splits accented words like café and treats CJK characters as punctuation. For proper Unicode word boundaries, pass an Intl.Segmenter:

import { applyTextDiff } from 'redlinefy'

const segmenter = new Intl.Segmenter('en', { granularity: 'word' })

await applyTextDiff(context, range, original, modified, { segmenter })

Works with the standalone diff engine too:

import { computeDiff, tokenizeIntl } from 'redlinefy/diff'

const segmenter = new Intl.Segmenter('zh', { granularity: 'word' })

// Direct tokenization
const tokens = tokenizeIntl('中文测试', segmenter)

// Unicode-aware diff
const ops = computeDiff('café latte', 'cafe latte', segmenter)
// Single delete "café" + single insert "cafe" (not fragmented)

And with .docx files:

import { redline } from 'redlinefy/docx'

const segmenter = new Intl.Segmenter('en', { granularity: 'word' })

const output = await redline(buffer, (text) => {
  return text.replace('café', 'cafe')
}, { segmenter })

Intl.Segmenter is a zero-dependency built-in available in all modern browsers and Node.js 16+. The regex tokenizer remains the default for backwards compatibility.

API

`redlinefy` — Office.js tracked changes

Requires the Office.js runtime (Word add-in). Install @types/office-js as a dev dependency for TypeScript support.

`applyTextDiff(context, range, original, modified, options?)`

Word-level diff applied as granular tracked changes. Falls back to block replace if token mapping fails. Handles markdown formatting in the modified text.

context — Word.RequestContext
range — Word.Range to modify
original — original plain text
modified — new text (may contain inline markdown)
options — TextDiffOptions
returns — Promise<RedlineResult>

`applySentenceDiff(context, range, original, modified, options?)`

Sentence-level diff. Splits on .!? boundaries before diffing. Better for heavily rewritten text where word-level produces too many small changes.

Same signature as applyTextDiff.

`applyDiff(context, range, original, modified, options?)`

Unified entry point that dispatches to the right strategy.

options.granularity — ‘word’ | ‘sentence’ | ‘block’ | ‘auto’ (default ‘auto’)
In auto mode: uses word-level when the deletion ratio is <= 50%, sentence-level otherwise.

`findAndReplace(context, searchText, replaceText)`

Find all occurrences of searchText in the document body and replace under Track Changes.

returns — Promise<RedlineResult> where changes is the number of replacements

`applyFormat(context, range, format)`

Apply formatting changes as tracked revisions. Only sets properties explicitly provided.

format — FormatOptions

`insertParagraph(context, range, text, location)`

Insert a paragraph before or after a range, tracked as a change.

location — ‘Before’ | ‘After’

`deleteParagraph(context, range)`

Delete a paragraph's content, tracked as a deletion.

`withTracking(context, fn)`

Run any callback with Track Changes enabled. Reads the current changeTrackingMode, sets it to TrackAll, runs fn, then restores the original mode — even if fn throws.

`redlinefy/diff` — standalone diff engine

No runtime dependencies beyond diff-match-patch. Works in any JavaScript environment.

`computeDiff(original, modified, segmenter?)`

Word-level diff. Tokenizes both texts into words, punctuation, and whitespace, maps each token to a single Unicode character, diffs the encoded strings with diff-match-patch, then decodes back to token sequences. Applies diff_cleanupSemantic for human-readable results. Pass an Intl.Segmenter to use Unicode-aware tokenization.

returns — DiffOp[]

`computeSentenceDiff(original, modified)`

Same pipeline as computeDiff but tokenizes at sentence boundaries (.!? followed by whitespace). Does not accept a segmenter — sentence splitting is independent of word boundary rules.

`getDiffStats(original, modified, segmenter?)`

Returns word-count statistics: { insertions: number, deletions: number, unchanged: number }. Pass a segmenter for Unicode-aware word counting.

`tokenize(text)`

Split text into Token[] — each token is { text, offset, type } where type is ’word’ | ’punctuation’ | ’whitespace’. Words are matched by \w+ (ASCII alphanumeric plus underscore).

`tokenizeIntl(text, segmenter?)`

Unicode-aware tokenizer using Intl.Segmenter. Handles accented characters (café, naïve) and CJK text as proper word tokens. Pass an optional Intl.Segmenter instance for locale-specific segmentation; defaults to English.

`tokenizeSentences(text)`

Split text into sentence-level tokens. Splits on .!? followed by whitespace or end of string.

`encodeTokens(originalTokens, modifiedTokens)`

Encode token arrays into single-character strings for diff-match-patch. Returns a WordEncoding object with encodedOriginal, encodedModified, originalTokens, modifiedTokens, and charToToken map.

`encodeTexts(original, modified)`

Convenience: tokenize + encode in one call. Returns WordEncoding.

`redlinefy/docx` — .docx file manipulation

Uses JSZip to read and write .docx archives. The redlineFile function uses a dynamic import(‘node:fs/promises’) for file I/O; all other functions work with Uint8Array buffers and are browser-compatible.

`redline(buffer, transformFn, options?)`

Transform paragraphs in a .docx buffer. The transformFn receives each paragraph's text and index — return a new string to create a tracked change, or null to leave it unchanged.

buffer — Uint8Array | ArrayBuffer
transformFn — (text: string, index: number) => string | null
options — DocxOptions
returns — Promise<Uint8Array> (the modified .docx)

`redlineFile(inputPath, outputPath, transformFn, options?)`

File convenience wrapper. Reads the .docx, applies transforms, writes the result. Node.js only.

`redlineDiff(buffer, originalText, modifiedText, options?)`

Paragraph-by-paragraph diff. Splits both texts by newlines, matches each line to the corresponding paragraph in the .docx, and writes tracked changes for any differences. Lines beyond the document's paragraph count are ignored.

`readDocx(buffer)`

Parse a .docx buffer into a DocxDocument. Extracts word/document.xml and word/settings.xml, parses <w:p> elements, and returns paragraphs with their text, XML, character offset, and run properties.

returns — Promise<DocxDocument>

`applyTrackedChanges(doc, transforms, options?)`

Low-level: apply an array of ParagraphTransform objects to a parsed DocxDocument. Handles positional XML replacement, revision IDs, and settings updates. Revision IDs reset per call.

`ensureTrackRevisions(settingsXml)`

Ensure <w:trackRevisions/> is present in the settings XML. Idempotent.

Types

interface Token {
  text: string
  offset: number
  type: 'word' | 'punctuation' | 'whitespace'
}

interface DiffOp {
  type: 'equal' | 'insert' | 'delete'
  text: string
  tokens: Token[]
}

interface RedlineResult {
  success: boolean
  strategy?: 'token' | 'block'
  changes: number
}

interface FormatOptions {
  bold?: boolean
  italic?: boolean
  underline?: boolean
  strikeThrough?: boolean
  fontSize?: number
  fontName?: string
  color?: string
  highlightColor?: string
}

interface TextDiffOptions {
  trackChanges?: boolean        // default: true
  segmenter?: Intl.Segmenter   // Unicode-aware word boundaries
}

interface ApplyDiffOptions extends TextDiffOptions {
  granularity?: 'word' | 'sentence' | 'block' | 'auto'
}

interface DocxOptions {
  author?: string              // default: "Redlinefy"
  date?: string                // ISO 8601, default: current time
  granularity?: 'word' | 'sentence' | 'block'  // default: "word"
  segmenter?: Intl.Segmenter  // Unicode-aware word boundaries
}

interface DocxDocument {
  zip: JSZip
  paragraphs: DocxParagraph[]
  documentXml: string
  settingsXml: string
}

interface DocxParagraph {
  index: number
  text: string
  xml: string
  xmlOffset: number
  rPr?: string
}

interface ParagraphTransform {
  index: number
  newText: string
}

interface FormattedSegment {
  text: string
  bold?: boolean
  italic?: boolean
  strikethrough?: boolean
}

type DiffGranularity = 'word' | 'sentence' | 'block' | 'auto'

interface StrippedMarkdown {
  plain: string
  formats: FormatRange[]
}

interface FormatRange {
  start: number
  end: number
  bold?: boolean
  italic?: boolean
  strikethrough?: boolean
}

interface WordEncoding {
  encodedOriginal: string
  encodedModified: string
  originalTokens: Token[]
  modifiedTokens: Token[]
  charToToken: Map<number, string>
}

How it works

Diff engine

Tokenize — split text into words, punctuation, and whitespace (or sentences)
Encode — map each unique token to a single Unicode character starting at U+0100
Diff — run diff-match-patch on the encoded strings
Decode — map the character-level diff back to token-level operations
Clean up — apply diff_cleanupSemantic for human-readable results

This produces diffs at word boundaries rather than character boundaries, which maps cleanly to Word's tracked-change model.

Office.js integration

The token strategy splits the Word range into sub-ranges using getTextRanges(), then walks the diff ops: equal ops advance the range pointer, delete ops mark ranges for deletion, insert ops record text to insert at anchor points. Deletions are applied in reverse order to preserve indices, then insertions are applied with formatting. If token mapping fails, it falls back to a single block replace — the strategy field in RedlineResult indicates which path was taken.

.docx file manipulation

Opens the .docx archive with JSZip, extracts word/document.xml, parses <w:p> elements with regex, diffs each paragraph's text against the new text, and rebuilds the paragraph XML with Open XML tracked-change elements (<w:ins>, <w:del>). Paragraph replacements are applied from the end of the document forward so character offsets remain valid. Existing <w:pPr> and <w:rPr> are preserved. When markdown is present, formatted segments get separate <w:r> runs with the appropriate <w:rPr> tags (<w:b/>, <w:i/>, <w:strike/>).

Known limitations

Default tokenizer uses ASCII word boundaries. The \w regex matches [a-zA-Z0-9_]. Accented characters (é, ñ) and CJK characters are classified as punctuation tokens. Diffs still produce correct results — the tokens are just finer-grained. Pass a segmenter option (an Intl.Segmenter instance) for proper Unicode word boundaries.
Sentence tokenizer splits on all .!? characters. Abbreviations like "Dr." or "U.S.A." are treated as sentence boundaries. This affects computeSentenceDiff and applySentenceDiff.
Nested markdown is not supported. **bold *and italic*** recognizes only the outermost delimiter. Use ***bold+italic*** for combined formatting.
No markdown escape sequences. Literal ** in text that happens to match the pattern will be parsed as formatting. The parser requires non-whitespace flanking characters to minimize false positives (2 * 3 * 4 is left alone).
Paragraph-level operations only in .docx. The docx module operates on <w:p> elements (including those inside table cells). It does not handle headers, footers, footnotes, comments, or other non-body content.
Regex-based XML parsing. The docx module uses regex to extract and replace <w:p> elements rather than a full XML parser. This is reliable for round-tripping paragraph content but does not support arbitrary XML transformations.

Development

npm install
npm test          # 243 tests
npm run typecheck
npm run build     # ESM + CJS + .d.ts

License

Apache-2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

redlinefy

Install

Overview

Quick start

Word add-in: apply an LLM rewrite as tracked changes

Auto-select granularity

File-based: redline a .docx on disk

Diff two texts against a .docx

Standalone diff (no Word dependency)

Markdown formatting

parseInlineMarkdown(text: string): FormattedSegment[]

stripMarkdown(text: string): string

hasMarkdown(text: string): boolean

stripMarkdownPreserveFormats(text: string): StrippedMarkdown

getFormattedSegments(text: string, offset: number, formats: FormatRange[]): FormattedSegment[]

Unicode tokenization

API

redlinefy — Office.js tracked changes

applyTextDiff(context, range, original, modified, options?)

applySentenceDiff(context, range, original, modified, options?)

applyDiff(context, range, original, modified, options?)

findAndReplace(context, searchText, replaceText)

applyFormat(context, range, format)

insertParagraph(context, range, text, location)

deleteParagraph(context, range)

withTracking(context, fn)

redlinefy/diff — standalone diff engine

computeDiff(original, modified, segmenter?)

computeSentenceDiff(original, modified)

getDiffStats(original, modified, segmenter?)

tokenize(text)

tokenizeIntl(text, segmenter?)

tokenizeSentences(text)

encodeTokens(originalTokens, modifiedTokens)

encodeTexts(original, modified)

redlinefy/docx — .docx file manipulation

redline(buffer, transformFn, options?)

redlineFile(inputPath, outputPath, transformFn, options?)

redlineDiff(buffer, originalText, modifiedText, options?)

readDocx(buffer)

applyTrackedChanges(doc, transforms, options?)

ensureTrackRevisions(settingsXml)

Types

How it works

Diff engine

Office.js integration

.docx file manipulation

Known limitations

Development

License

`parseInlineMarkdown(text: string): FormattedSegment[]`

`stripMarkdown(text: string): string`

`hasMarkdown(text: string): boolean`

`stripMarkdownPreserveFormats(text: string): StrippedMarkdown`

`getFormattedSegments(text: string, offset: number, formats: FormatRange[]): FormattedSegment[]`

`redlinefy` — Office.js tracked changes

`applyTextDiff(context, range, original, modified, options?)`

`applySentenceDiff(context, range, original, modified, options?)`

`applyDiff(context, range, original, modified, options?)`

`findAndReplace(context, searchText, replaceText)`

`applyFormat(context, range, format)`

`insertParagraph(context, range, text, location)`

`deleteParagraph(context, range)`

`withTracking(context, fn)`

`redlinefy/diff` — standalone diff engine

`computeDiff(original, modified, segmenter?)`

`computeSentenceDiff(original, modified)`

`getDiffStats(original, modified, segmenter?)`

`tokenize(text)`

`tokenizeIntl(text, segmenter?)`

`tokenizeSentences(text)`

`encodeTokens(originalTokens, modifiedTokens)`

`encodeTexts(original, modified)`

`redlinefy/docx` — .docx file manipulation

`redline(buffer, transformFn, options?)`

`redlineFile(inputPath, outputPath, transformFn, options?)`

`redlineDiff(buffer, originalText, modifiedText, options?)`

`readDocx(buffer)`

`applyTrackedChanges(doc, transforms, options?)`

`ensureTrackRevisions(settingsXml)`