email-origin-chain
v1.0.16
Published
Uncover the full audit trail of your email threads. Recursively reconstructs the entire conversation history with instant access to the original sender and true source message.
Readme
email-origin-chain
Uncover the full audit trail of your email threads. Recursively deep-dives into forwards and replies to reconstruct the entire conversation history. Combines MIME traversal with multi-language text detection for a perfect message chain—giving you instant access to the original sender's details and the true source message.
Architecture & Refactor
The library recently underwent a major refactor to a plugin-based architecture, improving compatibility and fix recursion bugs.
Detailed documentation can be found in the docs/architecture/ directory:
- Phase 1: Cc: Fix
- Phase 2: Plugin Architecture
- Phase 3: Full Compatibility (100%)
- Deep Forward Fix Walkthrough
- Confidence Scoring System
- Detector Usage & Priorities
✅ Test Coverage: The library has been validated against 239 fixtures from the email-forward-parser-recursive library with a 100% success rate (239/239). This includes validating message bodies and ensuring non-message snippets are correctly identified. See Test Coverage Report for details.
Features
- Hybrid Strategy: Combines MIME recursion (
message/rfc822) and inline text parsing - Reply & Forward Support: Detects both traditional "Forwarded message" blocks and "On ... wrote:" reply headers in 15+ languages.
- Robust Parsing: Uses
mailparserandemail-forward-parserwith custom detectors for Outlook Live, French headers, and more. - Type-Safe: Full TypeScript support
- Normalized Output: Consistent result format with diagnostics
Installation
npm install email-origin-chainextractDeepestHybrid(raw, options)
Analyzes an email to extract the most recent message in the chain and its full history.
raw:string | Buffer | Readable- The full raw email source (recommended to pass as Buffer or Stream to preserve encoding).options:Options(optional) - Configuration for the extraction.
Example
const { extractDeepestHybrid } = require('email-origin-chain');
const fs = require('fs');
// Recommendation: Pass the raw Buffer or Stream directly
const rawEml = fs.readFileSync('email.eml');
const result = await extractDeepestHybrid(rawEml);
// New: Support for Streams
const stream = fs.createReadStream('heavy-thread.eml');
const streamResult = await extractDeepestHybrid(stream);CLI Utilities
You can test any email file directly using the included extraction tool:
npx tsx bin/extract.ts tests/fixtures/complex-forward.emlimport { extractDeepestHybrid } from 'email-origin-chain';
// Process a full EML with hybrid strategy
const result = await extractDeepestHybrid(rawEmailString);
// Process ONLY the text/inline forwards (ignore MIME layer)
const textOnlyResult = await extractDeepestHybrid(rawText, { skipMimeLayer: true });
console.log(result.text); // The deepest original message
console.log(result.history); // Full conversation chainOptions
| Option | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| skipMimeLayer | boolean | false | If true, ignores MIME parsing (rfc822) and processes the input as raw text only. Ideal for inputs that are already stripped of headers. |
| maxDepth | number | 5 | Maximum number of recursion levels for MIME parsing. |
| timeoutMs | number | 5000 | Timeout for MIME processing to prevent blocking on huge files. |
Response Format
The library returns a ResultObject with the following structure:
| Field | Type | Description |
| :--- | :--- | :--- |
| from | object \| null | { name?: string, address?: string }. |
| to | array | List of primary recipients. |
| cc | array | List of CC recipients. |
| subject | string \| null | The original subject line of the deepest message. |
| date_raw | string \| null | The original date string found in the email headers. |
| date_iso | string \| null | ISO 8601 UTC representation (normalized via any-date-parser). |
| text | string \| null | Cleaned body content of the deepest message. |
| full_body | string | The full decoded text body before chain splitting. |
| attachments | array | Metadata for MIME attachments found at the deepest level. |
| history | array | Conversation Chaining: Full audit trail of the discussion (see below). |
| confidence_score | number | Reliability score (0-100) based on signal analysis. |
| confidence_description | string | Human-readable explanation of the score. |
| confidence_signals | object | Key-value breakdown of triggered bonuses and penalties. |
| confidence_reasons | array | Detailed list of triggered scoring rules. |
| diagnostics | object | Metadata about the parsing process. |
Diagnostics Detail
method: Strategy used to find the deepest message.rfc822: Found via recursive MIME attachments (highest reliability).inline: Found via text pattern detection (forwarded blocks).fallback: No forward found, returning current message info or best-effort extraction.
depth: Number of forward levels traversed (0 for original email).parsedOk:trueif at least a sender (from) andsubjectwere successfully extracted.warnings: Array of non-fatal issues (e.g., date normalization failure).
Conversation Chain Reconstruction (Full History)
Rather than just finding the "original" source, the library reconstructs the entire Conversation Chain (sometimes called Email Threading or Message Chaining). This allows you to audit every step of a transfer:
history[0]: The deepest (oldest) message in the chain. Same as the root object.history[1...n-1]: Intermediate forwards/messages.history[n]: The root (most recent) message you actually received.
Each history entry contains its own from, to, cc, subject, date_iso, text, and flags (array of strings). The contact fields (from, to, cc) are structured as objects containing:
name: The display name (e.g., "John Doe").address: The email address (e.g., "[email protected]").
Possible Flags:
level:deepest: The original source of the thread.level:root: The entry representing the received email itself.trust:high_mime: Metadata from a real.emlattachment (100% reliable).trust:medium_inline: Metadata extracted from text patterns (best effort).method:crisp_engine: Detected via standard international patterns (Crisp).method:outlook_fr: Detected via standard rules (French, Outlook).method:outlook_reverse_fr: Detected via reversed rules (Envoyé before De).method:outlook_empty_header: Detected via permissive rules (No date/email).method:new_outlook: Detected via modern localized headers (handles bolding andmailto:tags).method:reply: Detected via international reply patterns (On ... wrote:).method:crisp: Detected via standard international patterns (Crisp/Fallback).content:silent_forward: The user forwarded the message without adding any text.date:unparseable: A date string was found but could not be normalized to ISO.
Confidence Scoring System
To ensure high-quality extraction from text-based forwards, the library uses a Signal-Based Confidence Score. It analyzes metrics like email address density, sender count consistency, and quote levels to detect "Garbage" or incomplete chains.
Scoring Logic:
- Baseline: 100% confidence for standard formatting (~2 emails per level).
- Penalties:
- Sender Mismatch: More senders found than levels detected (-75%).
- Quote Mismatch: Quote nesting deeper than detected levels (-75%).
- Partial Chain: Only 1 email detected per level (-50%).
- Ghost Forward: No emails found in text (-100%).
- Bonuses:
- Validated Density: High email density corroborated by context headers (+75%).
Check the Confidence Scoring Documentation for full details.
Typical Output Example
{
"from": { "name": "Original Sender Name", "address": "[email protected]" },
"subject": "Initial Topic",
"text": "The very first message content.",
"full_body": "Check this thread below!\n\n---------- Forwarded message ---------\nFrom: Intermediate Person <[email protected]>...",
"history": [
{
"depth": 2,
"from": { "name": "Original Sender Name", "address": "[email protected]" },
"text": "The very first message content.",
"flags": ["method:outlook_fr", "trust:medium_inline", "level:deepest"]
},
{
"depth": 1,
"from": { "name": "Intermediate Person", "address": "[email protected]" },
"text": "",
"flags": ["method:crisp", "trust:medium_inline", "content:silent_forward"]
},
{
"depth": 0,
"from": { "name": "Me", "address": "[email protected]" },
"text": "Check this thread below!",
"flags": ["trust:high_mime", "level:root"]
}
],
"diagnostics": {
"method": "inline",
"depth": 2,
"parsedOk": true,
"warnings": []
},
"confidence_score": 100,
"confidence_description": "High Confidence: Standard Density: Ratio 2.00 is optimal (~2 emails per level)",
"confidence_signals": {},
"confidence_reasons": [
"Standard Density: Ratio 2.00 is optimal (~2 emails per level)"
]
}Examples
1. Simple Email (No Forward)
When no forward is detected, the library returns the metadata of the email itself.
const email = `From: [email protected]
Subject: Meeting Update
Date: Mon, 26 Jan 2026 15:00:00 +0100
Hey, the meeting is moved to 4 PM.`;
const result = await extractDeepestHybrid(email);
console.log(result.diagnostics.depth); // 0
console.log(result.from.address); // "[email protected]"2. Double Inline Forward (Deep Extraction)
The library recursively follows "Forwarded message" blocks to find the original sender.
const doubleForward = `
---------- Forwarded message ---------
From: Flo R. <[email protected]>
Date: Mon, 26 Jan 2026 at 15:01
Subject: Fwd: original topic
---------- Forwarded message ---------
From: Original Sender <[email protected]>
Date: Mon, 26 Jan 2026 at 10:00
Subject: original topic
This is the very first message content.`;
const result = await extractDeepestHybrid(doubleForward);
console.log(result.diagnostics.depth); // 2
console.log(result.from.address); // "[email protected]"
console.log(result.text); // "This is the very first message content."3. Extreme Conversation Chain (5 Levels)
For complex corporate threads where a message is forwarded multiple times across different regional offices (e.g., mixing English and French headers).
const extremeChain = `From: [email protected]
Date: Tue, 27 Jan 2026 02:35:18 +0100
Subject: FW: Final Review
Check the bottom of this long thread.
---------- Forwarded message ---------
From: "Intermediate Manager" <[email protected]>
Date: mardi 27 janvier 2026 à 00:30
Subject: Tr: Final Review
But it is quite normal!
De : "Employee" <[email protected]>
Envoyé : mardi 27 janvier 2026 à 00:30
À : "Recip" <[email protected]>
Objet : Fwd: Final Review
Great Yodjii, thank you
---------- Forwarded message ---------
From: <[email protected]>
Date: Tue, 27 Jan 2026 at 00:29
Subject: Fwd: original request
Ok noted, I am forwarding it back to you.
---------- Forwarded message ---------
From: <[email protected]>
Date: mardi 27 janvier 2026 à 00:28
Subject: original request
Hello, please forward this back to me.`;
const result = await extractDeepestHybrid(extremeChain);
console.log(result.diagnostics.depth); // 4 (5 messages total)JSON Output Example (Extreme Case):
{
"from": { "address": "[email protected]" },
"subject": "original request",
"text": "Hello, please forward this back to me.",
"full_body": "Check the bottom of this long thread.\n\n---------- Forwarded message ---------\nDe : Intermediate Manager...",
"history": [
{
"depth": 4,
"from": { "address": "[email protected]" },
"text": "Hello, please forward this back to me.",
"flags": ["method:crisp", "trust:medium_inline", "level:deepest"]
},
{
"depth": 3,
"from": { "address": "[email protected]" },
"text": "Ok noted, I am forwarding it back to you.",
"flags": ["method:crisp", "trust:medium_inline"]
},
{
"depth": 2,
"from": { "name": "Employee", "address": "[email protected]" },
"text": "Great Yodjii, thank you",
"flags": ["method:outlook_empty_header", "trust:medium_inline"]
},
{
"depth": 1,
"from": { "name": "Intermediate Manager", "address": "[email protected]" },
"text": "But it is quite normal!",
"flags": ["method:crisp", "trust:medium_inline"]
},
{
"depth": 0,
"from": { "address": "[email protected]" },
"text": "Check the bottom of this long thread.",
"flags": ["trust:high_mime", "level:root"]
}
],
"diagnostics": {
"method": "inline",
"depth": 4,
"parsedOk": true,
"warnings": []
},
"confidence_score": 100,
"confidence_description": "High Confidence: Standard Density: Ratio 2.00 is optimal (~2 emails per level)",
"confidence_signals": {},
"confidence_reasons": [
"Standard Density: Ratio 2.00 is optimal (~2 emails per level)"
]
}4. International Support (e.g., French)
The library automatically handles international headers like "De:", "Objet:", "Message transféré".
const frenchEmail = `
---------- Message transféré ---------
De : Expert Auto <[email protected]>
Date : lun. 10 févr. 2025 à 11:39
Objet : Dossier #12345
Hello, here is your expertise report.`;
const result = await extractDeepestHybrid(frenchEmail);
console.log(result.from.name); // "Expert Auto"
console.log(result.date_iso); // "2025-02-10T10:39:00.000Z"Extensions & Plugins (Custom Detectors)
The library allows you to inject custom forward detectors to handle specific corporate headers, regional formats, or proprietary email barriers that are not covered by the default detectors.
This system is built on Dependency Injection, meaning your custom logic lives in your application code, not deeper in node_modules.
How to create a Plugin
Implement the ForwardDetector interface:
import { extractDeepestHybrid, ForwardDetector, DetectionResult } from 'email-deepest-forward';
class MyCustomDetector implements ForwardDetector {
// Unique name for your detector (will appear in 'diagnostics.method')
name = 'my-custom-detector';
// Priority: Lower number = Higher priority.
// -100 = Override Everything (Expert Plugins)
// -40 to -20 = Specific Build-in Detectors (Outlook, FR, etc.)
// 100 = Crisp (Default International Engine)
// 150 = Reply (Fallback)
priority = -100;
detect(text: string): DetectionResult {
// Example: Detects '--- START FORWARD ---'
const marker = '--- START FORWARD ---';
const idx = text.indexOf(marker);
if (idx !== -1) {
// Extracted body (text AFTER the marker)
const body = text.substring(idx + marker.length).trim();
// Text BEFORE the marker (the message from the forwarder)
const message = text.substring(0, idx).trim();
return {
found: true,
detector: this.name,
confidence: 'high',
message: message, // Important for history reconstruction
email: {
from: { name: 'Detected Sender', address: '[email protected]' },
subject: 'Extracted Subject',
date: new Date().toISOString(),
body: body
}
};
}
return { found: false, confidence: 'low' };
}
}How to use it
Pass your detector instance in the options.customDetectors array:
const result = await extractDeepestHybrid(emailContent, {
customDetectors: [ new MyCustomDetector() ]
});
console.log(result.diagnostics.method); // "method:my-custom-detector"Malformed Inputs
If you pass a string that isn't an email (e.g., a simple welcome message), the library returns the text but sets parsedOk to false.
const result = await extractDeepestHybrid("Welcome to our platform!");
console.log(result.from); // null
console.log(result.full_body); // "Welcome to our platform!"
console.log(result.diagnostics.parsedOk); // false
console.log(result.text); // "Welcome to our platform!"Missing or Unparseable Dates
If a date cannot be normalized to ISO format, date_iso will be null and a warning will be added. You can still access the original string via date_raw.
const result = await extractDeepestHybrid(emailWithBadDate);
if (!result.date_iso) {
console.warn(result.diagnostics.warnings[0]); // "Could not normalize date: ..."
console.log("Raw date was:", result.date_raw);
}Non-String Input
The library strictly requires a string input and will throw an Error otherwise.
try {
await extractDeepestHybrid(null as any);
} catch (e) {
console.error(e.message); // "Input must be a string"
}The Expert Cleaner Utility
All built-in detectors use the Cleaner utility to ensure consistent text normalization across recursion levels.
Key Features:
- Normalization: Unifies line breaks (
\r\n->\n), removes BOM, handles . - Memoization: Cache layer to prevent re-processing the same text multiple times.
- Quote Stripping: Expertly removes
>prefixes while preserving body structure. - Boundary Detection: Uses the "Double Newline" rule found in professional parsers.
import { Cleaner } from 'email-origin-chain/utils/cleaner';
const normalized = Cleaner.normalize(rawText);
const bodyOnly = Cleaner.extractBody(lines, lastHeaderIndex);
const quoteFree = Cleaner.stripQuotes(bodyOnly);Strategy
- MIME Layer: Recursively descends through
message/rfc822attachments usingmailparser. - Inline Layer: Iteratively scans the body for forwarded blocks using
email-forward-parserpatterns (supports multi-language). - Date Normalization: Uses
any-date-parserandluxonfor resilient international date parsing. - Fallback: Manual regex extraction if no structured headers are found.
License
MIT - See LICENSE for details.
