email-body-parser
v1.0.0
Published
Parse and clean email content - removes quotes, auto-signatures, and mailing list footers while preserving human signatures
Maintainers
Readme
email-body-parser
Parse and clean email content - removes quotes, auto-signatures, and mailing list footers while preserving human signatures.
Maintained by Pinenlime
Why Another Email Parser?
Unlike other email parsing libraries that aggressively remove all signatures, email-body-parser follows a conservative philosophy: only remove things we're 100% sure are not content.
| Feature | Other Libraries | email-body-parser | |---------|-----------------|-------------------| | Human signatures ("Best, John") | ❌ Removes | ✅ Keeps | | Mobile auto-signatures | ✅ Removes | ✅ Removes | | Quote headers | ✅ Removes | ✅ Removes | | Mailing list footers | ❌ Not handled | ✅ Removes | | Legal disclaimers | ❌ Not handled | ✅ Removes | | Compressed Outlook headers | ❌ Basic | ✅ Comprehensive |
Installation
npm install email-body-parserUsage
Simple API
For most use cases, the cleanEmailContent() function is all you need:
import { cleanEmailContent } from 'email-body-parser';
const rawEmail = `Thanks for the update!
Best regards,
John Smith
Product Manager
On Mon, Mar 17, 2025 at 1:29 PM Jane Doe <[email protected]> wrote:
> Here's the latest report...
Sent from my iPhone`;
const cleaned = cleanEmailContent(rawEmail);
console.log(cleaned);
// Output:
// Thanks for the update!
//
// Best regards,
// John Smith
// Product ManagerAdvanced API
For more control, use the EmailBodyParser class:
import EmailBodyParser from 'email-body-parser';
const parser = new EmailBodyParser();
const email = parser.parse(rawEmail);
// Get visible content (excludes quotes, auto-signatures)
console.log(email.getVisibleText());
// Get just the quoted portions
console.log(email.getQuotedText());
// Iterate over all fragments
for (const fragment of email.getFragments()) {
console.log({
content: fragment.content,
isHidden: fragment.isHidden,
isQuoted: fragment.isQuoted,
isSignature: fragment.isSignature,
});
}Convenience Methods
const parser = new EmailBodyParser();
// Get visible text directly
const visibleText = parser.parseReply(rawEmail);
// Get quoted text directly
const quotedText = parser.parseReplied(rawEmail);What Gets Removed
Quote Headers
- Gmail style:
On Mon, Mar 17, 2025 at 1:29 PM John <[email protected]> wrote: - Outlook style:
-----Original Message----- - Forward headers:
From: ... Sent: ... To: ... Subject: ... - Standard quote markers:
> quoted text
Auto-Generated Signatures
- Mobile:
Sent from my iPhone,Sent from my Android - Apps:
Sent via Superhuman,Get Outlook for iOS - Meeting links:
BOOK A MEETING...
Mailing List Footers
- Google Groups:
You received this message because... - Unsubscribe links:
Click here to unsubscribe - Marketing footers:
This email was sent to...
Legal Disclaimers
CONFIDENTIAL: This message contains...DISCLAIMER: This email and any files...
What Gets Preserved
Human signatures are kept because they provide valuable context:
- Contact information for follow-ups
- Job titles help understand urgency
- Avoids false positives
const email = `Please review the attached document.
Best regards,
Sarah Williams
Senior Financial Analyst
Direct: (555) 234-5678
[email protected]`;
cleanEmailContent(email);
// Returns the ENTIRE email - signature is preserved!API Reference
cleanEmailContent(content: string): string
Cleans email content by removing quotes, auto-signatures, and mailing list footers.
Parameters:
content- The raw email content to clean
Returns: Cleaned email content with quotes and auto-signatures removed
EmailBodyParser
constructor(options?: ParserOptions)
Options:
keepSignatures(default:true) - Keep human signaturesremoveDisclaimers(default:true) - Remove legal disclaimersremoveMailingListFooters(default:true) - Remove mailing list footers
parse(text: string): ParsedEmail
Parses email content into fragments.
parseReply(text: string): string
Convenience method that returns visible text directly.
parseReplied(text: string): string
Convenience method that returns quoted text directly.
ParsedEmail
getFragments(): EmailFragment[]
Returns all email fragments.
getVisibleText(): string
Returns content that is not hidden (excludes quotes, auto-signatures).
getQuotedText(): string
Returns only the quoted portions of the email.
EmailFragment
interface EmailFragment {
content: string; // The fragment text
isHidden: boolean; // True if this should be hidden from display
isSignature: boolean; // True if this is an auto-signature
isQuoted: boolean; // True if this is quoted content
}Customizing Patterns
For advanced users, the pattern arrays are exported:
import {
QUOTE_PATTERNS,
AUTO_SIGNATURE_PATTERNS,
MAILING_LIST_PATTERNS,
} from 'email-body-parser';
// Each pattern has metadata for debugging
QUOTE_PATTERNS.forEach(({ pattern, description, example }) => {
console.log(description, example);
});TypeScript Support
Full TypeScript support with exported types:
import type {
EmailFragment,
ParsedEmail,
PatternDefinition,
ParserOptions,
} from 'email-body-parser';RE2 Support (Optional)
For better performance and ReDoS protection, install RE2 as an optional peer dependency:
npm install re2The library will automatically use RE2 when available.
License
MIT License - see LICENSE for details.
Author
Created and maintained by Pinenlime.
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
