@publiwrite/html-to-epub
v0.0.23
Published
A library to generate EPUB from HTML. Inspired by epub-gen.
Readme
@publiwrite/html-to-epub
Generate EPUB books from HTML with a simple API in Node.js. Originally inspired by cyrilis/epub-gen, hard-forked by PubliWrite to target Node 22+ exclusively, add EPUB 3.3 compliance, parallel I/O, and a permanent W3C epubcheck gate.
Why this fork
The original epub-gen is unmaintained (last commit 2022) and was
written against Node 14 conventions: serial image downloads, sync
fs ops, no security guards, no validation gate. PubliWrite needed an
EPUB generator that:
- Produces output that passes W3C epubcheck 5.3.0 cleanly as a blocking gate, not a "best effort" target.
- Generates EPUBs that render correctly in epub.js (the engine PubliWrite's reader app uses), Kindle Previewer 3 (Amazon's official tool), Calibre, and Apple Books.
- Handles the throughput of a Lambda render pipeline: 30+ chapters, 100+ images per book, sub-30s wall-clock budgets.
- Refuses to ship unsafe content silently (
file://reads, inline<script>, oversized image bodies, etc.).
The fork started as a perf rewrite and grew an EPUB 3.3 spec-compliance pass + a security pass + a JAR-gated test matrix.
What changed vs upstream
EPUB 3.3 spec compliance fixes (PR #3)
| Bug | epubcheck rule | Fix |
|---|---|---|
| cover.xhtml declared twice in OPF manifest | OPF-074 | Added isCover marker on the cover entry, skipped in the content forEach |
| Inline SVG cover missing manifest property | OPF-014 | <item id="cover" ... properties="svg"/> in epub3 template |
| Broken cover URL in warn mode left dangling manifest reference | RSC-001 | makeCover catch nulls cover state + filters the cover entry from this.content |
| EPUB 2 cover used HTML5 <meta charset>, epub:type without xmlns, SVG 2 bare href | RSC-005, RSC-016 | Three tags now conditional on version === 3 in the cover template |
| Empty chapter title produced <title></title> in chapter head | RSC-005 | Fall back to bookTitle, then to "Untitled" |
Perf + security
- Parallel image download via
p-limitsemaphore (was a serial loop) - HTTP keep-alive socket pool (was per-request socket)
- ZIP compression default
9 -> 6(10x faster, 2-5% larger output) stream/promises.pipelinefor cover + image writes (was racypipe())- Optional
assetFailureMode: 'throw'(default'warn'for back-compat) - Strip 53
on*event-handler attributes from sanitiser allowlist - Strip
<script>,<iframe>,<object>,<embed>,<applet> - Reject
file://image / cover URLs by default (allowFileUrls: trueto opt in) - Hard 30s fetch timeout, 100 MB per-image size cap
Stack
- Node 22+ required (was Node 14+); ESM only
slugifyreplaceduslug + diacriticsimage-sizev1 -> v2 (sync Buffer API)p-limitv6 -> v7vitestreplacesmocha + ts-nodeyarn.locknow committed (was gitignored)
Usage
npm install @publiwrite/html-to-epubimport { EPub } from "@publiwrite/html-to-epub";
const epub = new EPub(options, outputPath);
await epub.render();Options
The legacy fields (title, author, cover, content, etc.) work
identically to upstream. New fields:
assetFailureMode:'warn'(default, back-compat) or'throw'. Production callers should use'throw'so a dead cover URL or missing inline image rejectsrender()instead of silently shipping a broken manifest reference.allowFileUrls: defaultfalse. Rejectfile://URLs (an arbitrary-file-read vector when chapter HTML is user-controlled). Set totruefor trusted offline tooling and tests.imageDownloadConcurrency: default8. Max concurrent image fetches across the whole book.zipCompressionLevel: default6.0-9. Lower is faster, EPUB payloads are mostly fonts and already-compressed images so the marginal saving from9is tiny.
See src/index.ts for the full EpubOptions interface and inline
comments on every option.
Validation toolchain
The fork ships a real W3C epubcheck gate in the test suite. CI and
local yarn test both refuse to pass unless the canonical Amazon
toolchain accepts every fixture.
How the gate works
tests/epub.spec.ts contains 50+ vitest cases. The last describe block
(epubcheck compliance) spawns the canonical W3C epubcheck 5.3.0
JAR via java -jar on a matrix of production-shaped fixtures:
- minimal EPUB 3
- multi-chapter
- with cover
- with data URI image
- mixed data URI + HTTP image
- broken cover (warn mode) — graceful degradation
- EPUB 2 minimal
- EPUB 2 with cover
- chapter with
ornamentalBreakElementinline SVG (via custom template) beforeToc/excludeFromTocspine ordering- realistic Plate-shaped chapter (headings + lists + blockquote + footnote)
- empty chapter title fallback
Each fixture must emit Messages: 0 fatals / 0 errors from epubcheck
or the test fails.
Running the gate locally
yarn test # runs the install script first, then vitest
yarn test:no-jar # skip the JAR-gated tests (no Java required)
yarn install:epubcheck # one-shot JAR install onlyRequires Java on PATH. The JAR is downloaded once into
.epubcheck/epubcheck-5.3.0/epubcheck.jar (gitignored). Re-runs are
near-zero cost because the install script short-circuits when the JAR
already exists.
Why this gate is non-skippable
An earlier iteration of the gate skipped silently when Java or the JAR
was missing — turning the test suite into a structural rubber stamp. The
fix made the install part of yarn test, so a CI runner without epubcheck
fails loud instead of green-by-accident. Cross-engine review (Codex +
Sonnet) consensus-flagged this as the highest-priority gap on the first
landing of the matrix.
Beyond the library: PubliWrite-side validation
PubliWrite's manuscript-render lambda ships its own validation harness that takes real prod Yjs manuscripts through the lambda's full pipeline (Plate render + bundled fonts + plate.css + cover) and runs five reader engines against each output:
- W3C epubcheck 5.3.0 (same JAR as this fork)
- epub.js (futurepress) — the engine the FE reader app + the location-extraction lambda both use
- Calibre EPUB→AZW3 (Kindle binary compat proxy)
- Kindle Previewer 3 (Amazon's canonical KDP validator)
- Apple Books (
open -a Books)
That harness lives in
pw-lambda-functions/lambdas/manuscript-render/scripts/. See:
setup-validators.sh— idempotent installer (brew + cask)run-all-validators.sh— one-shot N-manuscript sweepvalidate-prod-epubcheck.ts— epubcheck + epub.js gatecross-reader-check.ts— Calibre + KP3 batch + Books gate
Verified result on N=100 prod manuscripts (latest sweep):
- 100/100 pass W3C epubcheck 3.3
- 100/100 pass epub.js (non-empty CFI locations)
- 100/100 pass Calibre EPUB→AZW3
- 100/100 pass Kindle Previewer 3 (Enhanced Typesetting supported)
(Lambda repo private; cross-reference for PubliWrite contributors.)
Local development
git clone https://github.com/publiwrite/html-to-epub
cd html-to-epub
yarn install
yarn install:epubcheck # one-time
yarn test # ~90s on M-series Mac
yarn build # tsc to lib/Engines: Node 20 or 22 only. Node 18 is dropped (was upstream's
target; we use Promise.any, top-level await, and image-size v2
which all require ≥ 20).
For ad-hoc smokes against a single Yjs blob, see the lambda repo's
scripts/build-one.mts helper.
Original API reference
Everything below is the original epub-gen API documentation. All
fields work as documented; the new fields are the ones listed in the
"Options" section above.
Options
title: Title of the bookauthor: Name of the author for the book, string or array, eg."Alice"or["Alice", "Bob"]publisher: Publisher name (optional)cover: Book cover image (optional), File path (absolute path) or web url, eg."http://abc.com/book-cover.jpg"or"/User/Alice/images/book-cover.jpg"outputOut put path (absolute path), you can also path output as the second argument when usenew, eg:new Epub(options, output)version: You can specify the version of the generated EPUB,3the latest version (http://idpf.org/epub/30) or2the previous version (http://idpf.org/epub/201, for better compatibility with older readers). If not specified, will fallback to3.css: If you really hate our css, you can pass css string to replace our default style. eg:"body{background: #000}"fonts: Array of (absolute) paths to custom fonts to include on the book so they can be used on custom css. Ex: if you configure the array tofonts: ['/path/to/Merriweather.ttf']you can use the following on the custom CSS:@font-face { font-family: "Merriweather"; font-style: normal; font-weight: normal; src : url("./fonts/Merriweather.ttf"); }lang: Language of the book in 2 letters code (optional). If not specified, will fallback toen.tocTitle: Title of the table of contents. If not specified, will fallback toTable Of Contents.includeToc: Generate and include a table of contents page in the book. If set tofalse, the TOC page will not appear in the generated EPUB. Default:true.appendChapterTitles: Automatically append the chapter title at the beginning of each contents. You can disable that by specifyingfalse.customOpfTemplatePath: Optional. For advanced customizations: absolute path to an OPF template.customNcxTocTemplatePath: Optional. For advanced customizations: absolute path to a NCX toc template.customHtmlTocTemplatePath: Optional. For advanced customizations: absolute path to a HTML toc template.content: Book Chapters content. It's should be an array of objects. eg.[{title: "Chapter 1",data: "<div>..."}, {data: ""},...]Within each chapter object:
title: optional, Chapter titleauthor: optional, if each book author is different, you can fill it.data: required, HTML String of the chapter content. image paths should be absolute path (should start with "http" or "https"), so that they could be downloaded. With the upgrade is possible to use local images (for this the path must start with file: //)excludeFromToc: optional, if is not shown on Table of content, default: false;beforeToc: optional, if is shown before Table of content, such like copyright pages. default: false;filename: optional, specify filename for each chapter, default: undefined;
verbose: specify whether or not to console.log progress messages, default: false.
Output
If you don't want pass the output pass the output path as the second argument, you should specify output path as option.output.
