unquotemail
v0.2.2
Published
Parse a given Html/Text email and return only the new text, without the quoted part.
Maintainers
Readme
UnquoteMail
TypeScript port of unquotemail - parse HTML/text emails and extract only the new message content, removing quoted replies. This fork adds html cleaning, markdown output support, getQuote method and performance optimizations.
Installation
npm install unquotemailUsage
import { Unquote } from 'unquotemail';
// Create instance - parsing is lazy (happens on first getter call)
const unquote = new Unquote(htmlContent, textContent);
// Primary content (reply history stripped)
unquote.getHtml(); // Cleaned HTML (default)
unquote.getHtml({ raw: true }); // Raw HTML (original structure)
unquote.getText(); // Plain text
unquote.getMarkdown(); // Markdown
// The stripped quote block
unquote.getQuote(); // Quote HTML (null if none)Standalone Converters
import { htmlToText, htmlToMarkdown } from 'unquotemail';
const text = htmlToText(html);
const markdown = htmlToMarkdown(html);Markdown Converter Features
The markdown converter (used by both getMarkdown() and htmlToMarkdown()) is optimized for email HTML:
- Flattens layout tables (common in email templates) instead of rendering them as markdown tables
- Ignores data URI images, scripts, and styles
- Handles non-standard HTML from various email clients
- Preserves nested blockquotes with proper
>syntax
How it works
- First tries to identify and remove known quote markup (
.gmail_quote,.protonmail_quote, etc.) - Falls back to regex patterns to identify "On DATE, NAME wrote:" style headers
- Removes everything from the quote marker onwards
Credits
- Original Python implementation by Cyril Nicodeme
- Regex patterns from Talon (Mailgun) and Email Reply Parser (Crisp)
- Markdown conversion by node-html-markdown
License
MIT
