@jonatabolzanloss/rtfjs
v0.0.1
Published
Zero-dependency JavaScript library that renders RTF (Rich Text Format) documents to HTML canvas. Pixel-accurate against Microsoft Word for the common 95% of real-world documents.
Downloads
51
Maintainers
Readme
rtfjs
Zero-dependency JavaScript library that renders Rich Text Format (.rtf) documents directly onto HTML <canvas> elements.
Built to render .rtf files the way Microsoft Word renders them — same pagination, same line breaks, same table layout, same images — without taking a single runtime dependency. Drop it in a browser, in Node + headless Chromium, or anywhere a <canvas> is available.
+------------+ +-------------+ +--------+ +----------+
| RTF string | --> | parser | --> | layout | --> | canvases |
+------------+ +-------------+ +--------+ +----------+
(zero runtime dependencies)Highlights
- Zero runtime dependencies. The whole library is vanilla ES2020. Only
devDependencies(Playwright, pixelmatch, pngjs) are needed to run the test harness. - Full RTF 1.9.1 spec coverage at the dispatcher level (1730 control words recognized). The 95% of control words that have visual effects are rendered; the remaining metadata flags are no-ops that don't pollute the output.
- Pixel-accurate against Word. A 31-file test corpus comparing rtfjs renders against Word-produced PDF references scores 27/31 ≥ 95% similarity, with most files in the 97–99% range.
- Rich rendering engines:
- Knuth-Liang hyphenation (
\hyphpar,\hyphauto) with embedded English patterns + Word-empirical line-pitch metrics for common fonts (Times New Roman, Verdana, Monaco, Menlo) - Widow / orphan control (
\widowctrl) with proper multi-line repagination - Office Open Math (
\moMath,\mfrac,\msup,\msub,\mrad) flattened to readable inline text - Paragraph borders / shading with all border styles (single, double, triple, dashed, dotted, wavy, etc.)
- Drawing primitives (
\shprectangles, ellipses, text boxes) with OfficeArt color decoding
- Knuth-Liang hyphenation (
- Browser-friendly — works from a plain
<script>tag with no build step. - Standalone preview —
preview.htmlopens in any browser and renders any local.rtffile via drag-and-drop.
Installation
npm
npm install @jonatabolzanloss/rtfjsPlain <script> (no build step)
<div id="host"></div>
<script src="https://unpkg.com/@jonatabolzanloss/rtfjs/dist/rtfjs.global.js"></script>
<script>
RTFRenderer.render(myRtfString, document.getElementById('host'));
</script>Quick start
import { RTFRenderer } from '@jonatabolzanloss/rtfjs';
const rtfString = await (await fetch('document.rtf')).text();
const host = document.getElementById('host');
const result = RTFRenderer.render(rtfString, host);
console.log(`Rendered ${result.pageCount} pages`);
// Optionally wait for embedded images to finish decoding:
await result.imagesReady;The host element is cleared and one <canvas> is appended per page, stacked vertically.
API
RTFRenderer.render(rtfString, hostElement, options?)| Argument | Type | Description |
|---------------------|-----------------|-------------|
| rtfString | string | Full RTF document. |
| hostElement | HTMLElement | Container — cleared, then one <canvas> per page is appended. |
| options.devicePixelRatio | number | Override window.devicePixelRatio. Default 1. |
| options.background | string | CSS color for empty page background. Default white. |
| options.dpi | number | Logical DPI used for twip → pixel conversion. Default 96. |
Returns:
{
pageCount: number,
canvases: HTMLCanvasElement[],
document: object, // parsed AST
pages: Array<Page>, // drawing primitives per page
imagesReady: Promise<void>,
}imagesReady resolves once every embedded image has been decoded and painted to its canvas — useful for screenshot pipelines.
Lower-level helpers
const doc = RTFRenderer.parse(rtfString); // → document AST
const pages = RTFRenderer.layout(doc); // → drawing primitivespages is an array of { width, height, items: [{ type, ...primitives }] }. Useful if you want to ship the output somewhere other than canvas (SVG, PDF, server-side raster, etc.).
Module formats
The dist/ folder ships:
| File | Format | How to load |
|---------------------|--------------|-------------|
| rtfjs.global.js | UMD-style IIFE exposing window.RTFRenderer | <script src="dist/rtfjs.global.js"> |
| rtfjs.esm.js | ES module re-export of src/index.js | import { RTFRenderer } from '@jonatabolzanloss/rtfjs'; |
preview.html inlines rtfjs.global.js so it works from a data: URL or file:// with no server.
Feature coverage
Implemented (visually rendered):
- Character formatting —
\b,\i,\ul(single / double / dotted / dashed / wavy / thick),\strike,\super,\sub,\fs,\f,\cf,\highlight,\caps,\scaps,\expnd,\plain,\kerning,\outl,\shad,\embo,\impr - Paragraph formatting —
\pard,\ql/\qr/\qc/\qj,\li,\ri,\fi(incl. hanging indent),\sb,\sa,\sl(with and without\slmult), tab stops (\tx,\tqr,\tqc,\tldot),\keepn,\keep,\pagebb,\rtlpar/\ltrpar,\widowctrl,\nowidctlpar - Borders / shading — paragraph
\brdrt/b/l/rwith all styles (single, double, triple, dashed, dotted, dashdot, dashdotdot, wavy, hair, thick);\cbpatshading; per-side colors via\brdrcf;\brspspacing - Sections / pages —
\paperw,\paperh,\margl/r/t/b,\sect,\sectd,\titlepg,\landscape, multi-column (\cols,\colsx), headers/footers (\header,\headerr,\headerl,\headerf,\footervariants), page numbers (PAGE,NUMPAGES) - Tables —
\trowd,\cellx,\cell,\row,\trleft, padding (\trpaddl/r/t/b), borders (\clbrdr...), shading (\clcbpat,\clcfpat,\clshdng), per-cell vertical alignment (\clvertalt/c/b), merged cells (\clmgf,\clmrg,\clvmgf,\clvmrg), nested tables - Lists — bulleted / numbered / multi-level (
\listtable,\listoverridetable,\ls,\ilvl), bullet symbol mapping (PUA → Unicode) - Images — PNG (
\pngblip), JPEG (\jpegblip), BMP pass-through; scale via\picscalex/yor\picwgoal/\pichgoal - Drawing primitives —
\shp \shpinstrectangles, ellipses (shapeType 3), text boxes (shapeType 202); OfficeArt fill / line color decoding; text-box content layout - Fields —
PAGE,NUMPAGES,DATE,TIME,AUTHOR,TITLE,HYPERLINK,MERGEFIELD,SEQ, bookmarks - Math — Office Open Math (
\moMathfamily) flattened to inline strings:(a)/(b)for fractions,x^(2)for superscripts,√(x+1)for radicals - Hyphenation — Knuth-Liang algorithm (
src/hyphenation.js) with embedded English patterns;setPatterns()to swap dictionaries for other languages - Special characters —
\\,\{,\},\~(NBSP),\-(soft hyphen),\_(non-breaking hyphen),\tab,\line,\page,\u<N>Unicode (with\ucskip),\'XXhex (via active codepage), all 30+ glyph-substitution words (\emdash,\bullet, etc.) - Header tables —
\fonttbl,\colortbl,\stylesheet(with\sbasedon,\snext,\slink,\sautoupd,\spriority),\info(metadata captured but not rendered to body),\listtable,\listoverridetable - RTL / Bidi —
\rtlpar,\ltrpar,\rtlch,\ltrch,\rtlrow,\ltrrow, RTL Unicode rendering - East Asian —
\dbch,\loch,\hich,\langfefor CJK text; multi-codepage support
Recognized but no-op (1000+ control words):
Metadata flags, RSID stamps, theme font roles, latent styles, view-only toggles, mail-merge field configurations, comments, revision tracking, custom XML, and so on — all parsed correctly and silently dropped, so the document body stays clean.
Known limitations:
- WMF / EMF images render as a placeholder block (no in-browser decoder).
- Drawing primitives beyond rectangle / ellipse / text-box (lines, arrows, complex shapes) aren't rendered.
- Floating-frame positioning (
\absw/\abshwith text wrap-around) flows inline; full wrap-exclusion zones are not yet implemented. - Multi-pass paragraph optimization (Knuth-Plass justification) isn't implemented — line-breaking is single-pass greedy with optional Liang hyphenation.
- Pixel-perfect parity with Word depends on which fonts are installed. Headless Chromium uses FreeType + Skia rasterization; Word uses GDI. Identical RTF can produce 1–2 px differences in line breaks and glyph metrics.
Testing
The library ships with a Playwright-driven pixel-comparison test harness. For each RTF in samples/, the harness loads the document into Chromium, captures each page as PNG, scales to the reference resolution, and computes a similarity score using pixelmatch.
npm install # one-time
npx playwright install chromium # downloads ~150 MB
npm test # all samples in samples/ (default threshold: 95%)
node tests/run.mjs sample_03 # filter to one
node tests/run.mjs --threshold=98Bring your own samples — see samples/README.md for the directory layout the harness expects.
How comparison works
Word's GDI text rasterizer and Chromium's FreeType + Skia rasterizer never produce byte-identical pixels — and our pagination occasionally drifts by a page against Word's because we don't have Word's exact font shaping pipeline. The comparison is built to score "looks the same to a person" rather than "is byte-identical":
- Bilinear downscale the rendered canvas to the reference's resolution.
- Best-page-match search — small early drift wouldn't cascade into 0% for every page after.
- Relaxed color threshold (0.25) to absorb GDI/Skia color disagreement that's invisible to the eye.
- Bidirectional neighborhood forgiveness — a pixel still flagged different gets a second look against nearby reference pixels (and vice versa) to forgive slightly shifted glyphs.
Project layout
src/ Library source (zero-dep; vanilla ES2020 modules)
parser.js Tokenizer + event parser (groups, control words, hex/Unicode escapes)
document.js Document builder (sections, paragraphs, runs, tables, images, lists, math)
layout.js Line breaking, pagination, drawing-primitive emission
renderer.js Draws primitives onto <canvas> elements
util.js Twips / half-pt conversions, codepage tables, helpers
hyphenation.js Knuth-Liang hyphenation engine + embedded English patterns
font-metrics.json OS/2 metrics database for spec-compliant line pitch
word-metrics.json Empirical Word-PDF line pitch for common (font, size) pairs
index.js Public entry point — exposes RTFRenderer
dist/ Built bundles (UMD-style global + ESM)
samples/ Drop your own RTFs + PNG references here (gitignored)
tests/ Playwright + pixelmatch harness
scripts/ Zero-dep bundler, font-metrics extractor, position-diff tool
preview.html Standalone interactive preview (drag-and-drop RTF)
index.html Sample-cycling demo pageBuilding
npm run buildThe bundler is a tiny zero-dependency script (scripts/build.mjs) that concatenates src/*.js in dependency order, strips ES-module syntax, and writes:
dist/rtfjs.global.js— UMD-style IIFEdist/rtfjs.esm.js— ESM re-export wrapper
It also inlines the bundle into preview.html so the preview works from file:// with no server.
Contributing
Contributions welcome — especially:
- Sample RTFs from real-world sources. Drop your
.rtf+ Word-saved reference PNGs intosamples/and open an issue if scores look off. - Hyphenation patterns for non-English languages.
- Drawing primitive coverage (
\shpline, polyline, arc, etc.). - Body-image text-wrap (the long-standing
\shpwr2story).
Please run npm test before opening a PR.
License
MIT — see LICENSE.
