mscompress
v1.0.11
Published
Multi-threaded lossless/lossy compression library for Mass Spectrometry data
Maintainers
Readme
mscompress
Multi-threaded lossless/lossy compression library for Mass Spectrometry data with Node.js/TypeScript bindings.
Features
- 🚀 Fast: Multi-threaded compression/decompression using ZSTD
- 🎯 Random Access: Read individual spectra without full decompression
- 🔧 Flexible: Supports both lossless and lossy compression
- 📊 Format Conversion: mzML ↔ MSZ with filtering options
- 🗂️ MSZX Archives: Bundle MSZ files with annotations and metadata
- 🔒 Type-Safe: Full TypeScript support with type definitions
Installation
npm install mscompressPre-built binaries are available for:
- macOS (x64, ARM64)
- Linux (x64, ARM64)
- Windows (x64)
Quick Start
import { read, MZMLFile, MSZFile } from 'mscompress';
// Auto-detect and open file
const file = read('sample.mzML');
// Compress mzML to MSZ
const msz = file.compress('output.msz');
// Access spectra
for (const spectrum of file.spectra) {
console.log(`Scan ${spectrum.scan}: ${spectrum.size} peaks`);
console.log(`m/z range: ${spectrum.mz[0]} - ${spectrum.mz[spectrum.mz.length - 1]}`);
console.log(`Retention time: ${spectrum.retentionTime}s`);
}
// Decompress MSZ back to mzML
const mzml = msz.decompress('output.mzML');
// Extract filtered data
msz.extract('ms2-only.mzML', { msLevel: 2 });
msz.extract('first-10.mzML', { indices: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] });
msz.extract('scans.mzML', { scanNumbers: [100, 200, 300] });
// Always close files
file.close();API Reference
read(path: string): MZMLFile | MSZFile
Auto-detects file type and returns the appropriate file handle.
const file = read('sample.mzML'); // Returns MZMLFile
const msz = read('sample.msz'); // Returns MSZFileMZMLFile
Handles uncompressed mzML files.
Methods
compress(outputPath: string): MSZFile- Compress to MSZ formatextract(output: string, options?: ExtractOptions): MZMLFile | MSZFile- Extract with filtersgetMzBinary(index: number): Float32Array | Float64Array- Get m/z array for spectrumgetIntenBinary(index: number): Float32Array | Float64Array- Get intensity arraygetXml(index: number): string- Get spectrum XMLclose(): void- Close file and free resourcesdescribe(): object- Get file metadata
Properties
path: string- File pathfilesize: number- File size in bytesformat: DataFormat- Data format informationpositions: Division- Spectrum positionsspectra: Spectra- Iterable spectrum collectionarguments: RuntimeArguments- Compression settings
MSZFile
Handles compressed MSZ files.
Methods
decompress(outputPath: string): MZMLFile- Decompress to mzML formatextract(output: string, options?: ExtractOptions): MZMLFile | MSZFile- Extract with filtersgetMzBinary(index: number): Float32Array | Float64Array- Get m/z array for spectrumgetIntenBinary(index: number): Float32Array | Float64Array- Get intensity arraygetXml(index: number): string- Get spectrum XMLclose(): void- Close file and free resourcesdescribe(): object- Get file metadata
Properties
Same as MZMLFile.
MSZXFile
Handles bundled MSZX archives containing MSZ files with annotations and metadata.
Static Methods
open(filePath: string): MSZXFile- Open an MSZX archive
Methods
close(): void- Close archive and clean up temporary filesgetAnnotationFile(filename: string): Buffer- Get raw annotation file contentgetAnnotationFilesByFormat(format: string): string[]- Get annotation files by formatextractMSZX(output: string, options?: ExtractOptions): Promise<MSZXFile>- Extract subset to new MSZX
All MSZ file methods are also available (getMzBinary, getIntenBinary, getXml, decompress, extract).
Properties
archive_path: string- Path to the MSZX archivemanifest: MSZXManifest- Archive manifest with metadataannotation_files: AnnotationEntry[]- List of annotation files- All MSZ file properties (path, filesize, format, positions, spectra, arguments)
MSZXBuilder
Builder for creating MSZX archives.
const msz = read('sample.msz') as MSZFile;
const builder = new MSZXBuilder(msz);
builder
.setDescription('Proteomics dataset')
.addAnnotations('results.pin', { format: 'percolator_tsv' })
.setExtra('experiment_id', 'EXP001');
await builder.save('sample.mszx');Methods
addAnnotations(filePath: string, options?): this- Add annotation filesetDescription(description: string): this- Set archive descriptionsetJoinKey(joinKey: string): this- Set join key (default: 'scan_number')setExtra(key: string, value: any): this- Add custom metadatasave(outputPath: string): Promise<string>- Save archive
MSZXManifest
Manifest describing MSZX archive contents.
Properties
version: string- Manifest versioncreated_at: string- Creation timestamp (ISO 8601)spectra_file: string- Name of the MSZ filenum_spectra: number- Number of spectraannotations: AnnotationEntry[]- List of annotation filesjoin_key: string- Key for joining spectra with annotationsdescription?: string- Archive descriptionsource_file?: string- Original source file nameextra: Record<string, any>- Custom metadata
Methods
toString(indent?: number): string- Serialize to JSON stringtoJSON(): object- Convert to plain objectstatic parse(json: string): MSZXManifest- Parse from JSON string
Spectrum
Represents a single mass spectrum.
Properties
index: number- Spectrum index in filescan: number- Scan number from instrumentmsLevel: number- MS level (1 for MS1, 2 for MS/MS, etc.)retentionTime: number | null- Retention time in secondssize: number- Number of m/z-intensity pairsmz: Float32Array | Float64Array- m/z values (lazy-loaded)intensity: Float32Array | Float64Array- Intensity values (lazy-loaded)peaks: Float64Array- Interleaved [m/z, intensity] pairsxml: string- Spectrum XML (lazy-loaded)
Spectra
Iterable collection of spectra.
const spectra = file.spectra;
// Get length
console.log(spectra.length);
// Access by index
const spectrum = spectra.get(0);
// Iterate
for (const spectrum of spectra) {
// Process spectrum
}RuntimeArguments
Configure compression settings.
const file = read('sample.mzML');
file.arguments.threads = 8;
file.arguments.zstdCompressionLevel = 5;
const msz = file.compress('output.msz');Properties
threads: number- Number of threads (default: CPU count)blocksize: number- Block size for division (default: 100 MB)zstdCompressionLevel: number- ZSTD level 1-22 (default: 3)targetXmlFormat: number- XML compression formattargetMzFormat: number- m/z compression formattargetIntenFormat: number- Intensity compression format
ExtractOptions
Options for filtering during extraction.
interface ExtractOptions {
indices?: number[]; // Extract specific spectrum indices
scanNumbers?: number[]; // Extract specific scan numbers
msLevel?: number; // Extract only spectra at this MS level
}Utility Functions
getNumThreads(): number- Get system CPU countgetFilesize(path: string): number- Get file size in bytesgetVersion(): string- Get mscompress version
Examples
Compress with Custom Settings
import { read } from 'mscompress';
const mzml = read('sample.mzML');
mzml.arguments.threads = 16;
mzml.arguments.zstdCompressionLevel = 9;
mzml.arguments.blocksize = 50_000_000; // 50 MB blocks
const msz = mzml.compress('output.msz');
console.log(`Compression ratio: ${mzml.filesize / msz.filesize}x`);Extract MS2 Spectra Only
import { read } from 'mscompress';
const msz = read('sample.msz');
const ms2File = msz.extract('ms2-only.mzML', { msLevel: 2 });
console.log(`Extracted ${ms2File.spectra.length} MS2 spectra`);Process Spectra
import { read } from 'mscompress';
const file = read('sample.msz');
for (const spectrum of file.spectra) {
if (spectrum.msLevel === 1) {
// Find base peak
let maxIntensity = 0;
let basePeakMz = 0;
for (let i = 0; i < spectrum.size; i++) {
if (spectrum.intensity[i] > maxIntensity) {
maxIntensity = spectrum.intensity[i];
basePeakMz = spectrum.mz[i];
}
}
console.log(`Scan ${spectrum.scan}: Base peak at ${basePeakMz.toFixed(4)} m/z`);
}
}
file.close();Batch Conversion
import { read } from 'mscompress';
import { readdirSync } from 'fs';
import { join } from 'path';
const inputDir = './mzml-files';
const outputDir = './msz-files';
for (const filename of readdirSync(inputDir)) {
if (!filename.endsWith('.mzML')) continue;
const inputPath = join(inputDir, filename);
const outputPath = join(outputDir, filename.replace('.mzML', '.msz'));
const mzml = read(inputPath);
console.log(`Compressing ${filename}...`);
mzml.compress(outputPath);
mzml.close();
}Working with MSZX Archives
Create an MSZX Archive
import { read, createMSZX, MSZFile } from 'mscompress';
const msz = read('sample.msz') as MSZFile;
// Using convenience function
await createMSZX(msz, 'sample.mszx', {
description: 'Proteomics dataset with PSM annotations',
annotations: ['results.pin', 'results.pepXML'],
extra: {
experiment_id: 'EXP001',
instrument: 'Orbitrap Fusion',
},
});
msz.close();Using MSZXBuilder
import { read, MSZXBuilder, MSZFile } from 'mscompress';
const msz = read('sample.msz') as MSZFile;
const builder = new MSZXBuilder(msz);
builder
.setDescription('Annotated proteomics dataset')
.setJoinKey('scan_number')
.addAnnotations('percolator.pin', {
format: 'percolator_tsv',
description: 'Percolator PSM results',
})
.addAnnotations('results.pepXML', {
format: 'pepxml',
description: 'X!Tandem search results',
})
.setExtra('experiment', 'EXP001')
.setExtra('date', '2024-01-15');
await builder.save('annotated.mszx');
msz.close();Read an MSZX Archive
import { MSZXFile } from 'mscompress';
const mszx = MSZXFile.open('sample.mszx');
// Access manifest
console.log(mszx.manifest.description);
console.log(`Spectra: ${mszx.manifest.num_spectra}`);
console.log(`Annotations: ${mszx.annotation_files.length}`);
// Access spectra (same as MSZFile)
const limit = Math.min(mszx.spectra.length, 10);
for (let i = 0; i < limit; i++) {
const spectrum = mszx.spectra.get(i);
console.log(`Scan ${spectrum.scan}: ${spectrum.size} peaks`);
}
// Access annotation files
for (const entry of mszx.annotation_files) {
console.log(`${entry.filename}: ${entry.format} (${entry.num_records} records)`);
const content = mszx.getAnnotationFile(entry.filename);
// Process annotation content...
}
mszx.close();Extract Subset to New MSZX
import { MSZXFile } from 'mscompress';
const mszx = MSZXFile.open('sample.mszx');
// Extract MS2 spectra with annotations to new archive
const ms2Archive = await mszx.extractMSZX('ms2-only.mszx', {
msLevel: 2,
});
console.log(`Extracted ${ms2Archive.spectra.length} MS2 spectra`);
ms2Archive.close();
mszx.close();Platform Support
The mscompress package uses native bindings and pre-built binaries for optimal performance. The main package automatically installs the correct platform-specific binary:
@mscompress/darwin-x64- macOS Intel@mscompress/darwin-arm64- macOS Apple Silicon@mscompress/linux-x64- Linux x64@mscompress/linux-arm64- Linux ARM64@mscompress/win32-x64- Windows x64
If your platform is not supported, the package will attempt to build from source (requires C++ compiler and CMake).
Building from Source
git clone https://github.com/chrisagrams/mscompress.git
cd mscompress/node-ts
npm install
npm run build
npm testRequirements
- Node.js >= 16
- C++17 compiler (GCC, Clang, or MSVC)
- CMake >= 3.15
- Python 3 (for node-gyp)
Related Projects
- mscompress Python - Python bindings for the same C library
- mscompress CLI - Command-line tool
License
MIT © Chris Grams
Citation
If you use mscompress in your research, please cite:
@software{mscompress2024,
author = {Grams, Chris},
title = {mscompress: Multi-threaded compression for mass spectrometry data},
year = {2024},
url = {https://github.com/chrisagrams/mscompress}
}