vaultmail
v2.0.0
Published
Email archive extraction library and CLI tool. Extracts emails, attachments, and metadata from PST, OST, MBOX, and OLM files with support for large archives and multiple output formats.
Maintainers
Readme
VaultMail
A Node.js library and CLI tool for extracting emails and attachments from multiple email archive formats including PST, OST, MBOX, and OLM files. Supports large files (40GB+) and exports emails in standardized EML format.
Features
- Multi-format Support: PST, OST, MBOX, and OLM email archives
- Library & CLI: Use as a Node.js library or command-line tool
- Large File Handling: Processes files over 40GB with hybrid approach
- EML Export: Standardized email format output with original formatting preserved
- Attachment Extraction: Saves all attachments with organized naming (PST/OST)
- Comprehensive Data Types: Emails, contacts, appointments, tasks, and notes (OLM)
- Recursive Processing: Maintains original folder structure from archives
- Robust Error Handling: Continues processing even with corrupted emails
- NPM Package: Easy installation and integration into your projects
- Extensive Testing: Comprehensive test suite for reliability
Installation
NPM Installation (Recommended)
# Global installation for CLI usage
npm install -g vaultmail
# Local installation for library usage
npm install vaultmailDevelopment Installation
git clone https://github.com/Mikej81/vaultmail.git
cd vaultmail
npm installOptional: Install External Tools for Large Files (>2GB)
The tool works without external tools, but for PST/OST files larger than 2GB, installing pst-utils provides enhanced processing and better performance:
Linux (Ubuntu/Debian):
sudo apt-get update
sudo apt-get install pst-utilsmacOS:
brew install libpstWindows: Download libpst from https://www.five-ten-sg.com/libpst/
Usage
CLI Usage
# If installed globally
vaultmail -i <input-file> -o <output-directory> -a <attachments-directory>
# If installed locally
npx vaultmail -i <input-file> -o <output-directory> -a <attachments-directory>Library Usage
const { EmailExtractor } = require('vaultmail');
const extractor = new EmailExtractor({
verbose: true,
format: 'eml'
});
// Extract PST/OST files
await extractor.extract('./archive.pst', './output', './attachments');
// Extract MBOX files
await extractor.extract('./mailbox.mbox', './output');
// Extract OLM files (Outlook for Mac)
await extractor.extract('./archive.olm', './output');Examples
Extract PST file:
vaultmail -i archive.pst -o ./emails -a ./attachmentsExtract large OST file with verbose output:
vaultmail -i large_archive.ost -o ./emails -a ./attachments --verboseExtract MBOX file in text format:
vaultmail -i mailbox.mbox -f txt --verboseExtract OLM file (Outlook for Mac):
vaultmail -i archive.olm -o ./emailsProcess with depth limit:
vaultmail -i archive.pst -o ./emails -a ./attachments --max-depth 3Include empty folders (disabled by default):
vaultmail -i archive.ost -o ./emails -a ./attachments --skip-empty falseQuiet mode for minimal output:
vaultmail -i archive.ost -o ./emails -a ./attachments --quietCommand Line Options
| Option | Alias | Description | Default |
|--------|-------|-------------|---------|
| --input | -i | Input email archive file (PST, OST, MBOX, OLM) | Required |
| --output | -o | Output directory for extracted emails | ./output |
| --attachments | -a | Directory for extracted attachments | ./attachments |
| --recursive | -r | Process folders recursively | true |
| --format | -f | Output format: eml or txt | eml |
| --verbose | -v | Enable verbose logging | false |
| --max-depth | -d | Maximum folder depth (-1 for unlimited) | -1 |
| --skip-empty | -s | Skip creating folders that contain no emails | true |
| --quiet | -q | Suppress progress updates (less output) | false |
| --help | -h | Show help information | |
File Format Support
PST Files
- Microsoft Outlook Personal Storage Table files
- Maintains folder hierarchy and metadata
- Extracts embedded attachments
- Supports files up to 40GB+ with external tools
OST Files
- Microsoft Outlook Offline Storage Table files
- Same capabilities as PST files
- Handles Exchange server synchronized data
MBOX Files
- Unix mailbox format
- Processes files of any size efficiently with line-by-line streaming
- Extracts individual emails while preserving original MIME structure
- Attachments and images remain embedded in EML files for perfect mail client compatibility
- Supports Gmail exports and other MBOX sources
- Handles problematic headers gracefully
OLM Files
- Microsoft Outlook for Mac archive format
- Comprehensive extraction of emails, contacts, appointments, tasks, and notes
- Automatic folder organization of extracted data
- Support for multi-disk OLM archives
- Conversion to standard formats (EML, VCF, ICS, TXT)
Output Structure
The tool creates organized output with preserved folder structures and specialized formats for different content types:
PST/OST Output Structure
emails/
├── archive_name/
│ ├── Inbox/
│ │ ├── 1-email-subject.eml
│ │ ├── 2-another-email.eml
│ │ ├── contacts/
│ │ │ ├── contact1.vcf
│ │ │ └── contact2.vcf
│ │ └── calendar/
│ │ ├── meeting1.ics
│ │ └── appointment1.ics
│ ├── Sent Items/
│ └── Deleted Items/
└── ...
attachments/
├── 1-document.pdf
├── 2-image.jpg
└── ...MBOX Output Structure
emails/
├── 0001-email.eml
├── 0002-email.eml
├── 0003-email.eml
└── ...Note: MBOX emails preserve all attachments and images embedded within each EML file
OLM Output Structure
emails/
├── emails/
│ ├── email-1.eml
│ ├── email-2.eml
│ └── ...
├── contacts/
│ ├── contact-1.vcf
│ ├── contact-2.vcf
│ └── ...
├── appointments/
│ ├── meeting-1.ics
│ ├── appointment-1.ics
│ └── ...
├── tasks/
│ ├── task-1.txt
│ └── ...
└── notes/
├── note-1.txt
└── ...File Formats:
- Emails:
.emlformat (RFC822 standard) - Contacts:
.vcfformat (VCard 3.0 standard) - PST/OST/OLM - Calendar/Tasks:
.icsformat (iCalendar standard) - PST/OST/OLM - Tasks:
.txtformat - OLM - Notes:
.txtformat - OLM - Attachments:
- PST/OST: Extracted as separate files in attachments directory
- MBOX: Preserved embedded within each EML file
- OLM: Integrated within email content
Large File Handling
The tool automatically detects file sizes and uses appropriate processing methods:
- Files ≤ 2GB: Uses built-in JavaScript PST extractor for fast processing
- Files > 2GB: Automatically switches to external
readpsttool if available - Fallback: Uses built-in extractor if external tools are missing (works but may be slower for very large files)
Enhanced readpst Integration:
When using the external readpst tool for large files, the tool is configured with optimized settings:
- EML format:
-eflag ensures proper email format with extensions - VCard contacts:
-cvflag exports contacts in VCard format - All content types:
-t eajcextracts emails, attachments, journals, and contacts - UTF-8 encoding:
-8flag ensures proper character encoding - Includes deleted items:
-Dflag for comprehensive extraction
Progress Monitoring
The tool provides real-time feedback during extraction:
- Live progress updates: Shows folder count, emails extracted, and elapsed time every 5 seconds
- Folder-by-folder status: Displays current folder being processed
- Heartbeat monitoring: Shows "Still working..." message if no updates for 15+ seconds
- Final summary: Complete statistics when extraction finishes
- Quiet mode: Use
--quietto suppress progress updates for minimal output
Example progress output:
Processing: Inbox
Progress: 25 folders, 150 emails, 45 attachments | 2m 30s elapsed
Processing: Sent Items
Still working... 5m 15s elapsedError Handling
The tool is designed to be robust:
- Continues processing if individual emails are corrupted
- Saves problematic emails in raw format when parsing fails
- Handles Node.js memory limits for very large messages
- Provides detailed error messages and recovery suggestions
Troubleshooting
Large File Errors
If you get errors with files over 2GB:
- Install pst-utils if not available (see installation instructions above)
- For very large files, ensure sufficient disk space for temporary files
Memory Issues
For extremely large archives:
- Use the
--max-depthoption to limit processing depth - Process in smaller chunks if needed
- Monitor system memory usage during processing
MBOX Processing
- Individual email extraction: Each email is saved as a separate numbered EML file (0001-email.eml, 0002-email.eml, etc.)
- MIME structure preservation: Original email structure is maintained for perfect mail client compatibility
- Embedded attachments: All attachments and images remain within each EML file (no separate extraction)
- Large file support: Uses line-by-line streaming to handle MBOX files of any size
- Memory efficient: No file size limits due to streaming approach
- Malformed header handling: Automatically handles problematic headers gracefully
- Check verbose output for detailed processing information
OLM Processing
- Comprehensive extraction: Extracts emails, contacts, appointments, tasks, and notes
- Automatic organization: Content is automatically organized into appropriate subdirectories
- Standard formats: Outputs in widely supported formats (EML, VCF, ICS, TXT)
- Multi-disk support: Handles OLM archives that span multiple disks
- Error handling: Continues processing even if individual items are corrupted
API Reference
EmailExtractor Class
const { EmailExtractor } = require('vaultmail');
const extractor = new EmailExtractor(options);Options
verbose(boolean): Enable verbose logging (default: false)format(string): Output format - 'eml' or 'txt' (default: 'eml')maxDepth(number): Maximum folder depth to process (default: -1, unlimited)skipEmpty(boolean): Skip empty folders (default: true)
Methods
extract(filePath, outputDir, attachmentDir, options)
Extract emails from any supported format.
filePath(string): Path to the email archive fileoutputDir(string): Directory for extracted emailsattachmentDir(string): Directory for attachments (required for PST/OST)options(object): Override default options for this extraction
Returns: Promise resolving to extraction statistics
detectFileType(filePath)
Detect the file type of an email archive.
filePath(string): Path to the file
Returns: String ('pst', 'ost', 'mbox', or 'olm')
Individual Extractors
const { PSTExtractor, MboxExtractor, OLMExtractor } = require('vaultmail');
// Use specific extractors directly
const pstExtractor = new PSTExtractor(options);
const mboxExtractor = new MboxExtractor(options);
const olmExtractor = new OLMExtractor(options);Testing
# Run tests
npm test
# Run tests with coverage
npm run test:coverage
# Run tests in watch mode
npm run test:watchContributing
I welcome contributions to VaultMail! Whether you're fixing bugs, adding features, or improving documentation, your help makes the project better for everyone.
Development Setup
Fork and Clone
git clone https://github.com/your-username/vaultmail.git cd vaultmail npm installRun Tests
npm test # Run all tests npm run test:watch # Run tests in watch mode npm run test:coverage # Generate coverage reportCode Quality
npm run lint # Check code style npm run lint:fix # Auto-fix style issues npm run validate # Run both linting and tests
Contribution Guidelines
- Code Style: Follow the existing ESLint configuration
- Testing: Add tests for new features and bug fixes
- Documentation: Update README and JSDoc comments for public APIs
- Commits: Use clear, descriptive commit messages
- Pull Requests: Include a description of changes and link any related issues
Areas for Contribution
- Performance: Optimize extraction for very large files
- Formats: Add support for additional email archive formats
- Features: Enhance filtering, search, and export capabilities
- Documentation: Improve examples and troubleshooting guides
- Testing: Increase test coverage and add integration tests
Reporting Issues
When reporting bugs, please include:
- Operating system and Node.js version
- VaultMail version
- Input file format and approximate size
- Complete error message and stack trace
- Steps to reproduce the issue
License
Apache-2.0 License - see LICENSE file for details
Support
For issues and feature requests, please use the GitHub Issues page.
