neuro-mono

v1.0.1

Published

3 months ago

Neural network-based intelligent stereo to mono audio conversion that preserves quality, wideness, volume, and richness

0High
0Medium
0Low

unityaisolutions

audio stereo mono neural-network audio-processing dsp sound

NeuroMono

Neural network-based intelligent stereo to mono audio conversion that preserves quality, wideness, volume, and richness.

Features

🧠 Neural Network Processing: Intelligently analyzes stereo characteristics and optimizes downmix weights
🎵 Quality Preservation: Maintains audio richness through harmonic analysis and spectral processing
📊 Stereo Width Preservation: Captures and retains stereo imaging information in the mono mix
🔊 Volume Compensation: Automatically adjusts volume to maintain perceived loudness
⚡ Fast & Efficient: Optimized algorithms for real-time processing capabilities
🎛️ Flexible API: Fluent builder pattern for easy configuration

Installation

npm install neuro-mono

Quick Start

Simple Conversion

import { convert } from 'neuro-mono';

// Your stereo audio buffer
const stereoBuffer = {
  left: new Float32Array([...]), // Left channel samples
  right: new Float32Array([...]), // Right channel samples
  sampleRate: 44100
};

// Convert to mono
const monoAudio = convert(stereoBuffer);

With Custom Options

import { createConverter } from 'neuro-mono';

const converter = createConverter()
  .preserveWidth(0.8)        // High stereo width preservation
  .preserveRichness(0.9)     // Maximum richness preservation
  .volumeCompensation(1.15)  // 15% volume boost
  .quality(0.9)              // High quality processing
  .spectralAnalysis(true);   // Enable spectral analysis

const monoAudio = converter.convert(stereoBuffer);

Analyze Stereo Characteristics

import { analyze } from 'neuro-mono';

const analysis = analyze(stereoBuffer);

console.log('Stereo Width:', analysis.width);
console.log('Spectral Richness:', analysis.richness);
console.log('RMS Level:', analysis.rmsLevel);
console.log('Phase Correlation:', analysis.phaseCorrelation);
console.log('Frequency Distribution:', analysis.frequencyDistribution);

API Reference

`createConverter(options?: ConversionOptions): NeuroMono`

Creates a new converter instance with optional initial configuration.

`convert(buffer: AudioBuffer, options?: ConversionOptions): Float32Array`

Quick conversion function. Returns mono audio as Float32Array.

`analyze(buffer: AudioBuffer): StereoAnalysis`

Analyzes stereo audio characteristics without converting.

Configuration Options

`preserveWidth(level: number): this`

Range: 0.0 - 1.0
Default: 0.7
Description: Controls how much stereo width information is preserved in the mono mix. Higher values retain more spatial information.

`preserveRichness(level: number): this`

Range: 0.0 - 1.0
Default: 0.8
Description: Controls preservation of harmonic content and spectral detail. Higher values maintain more audio richness.

`volumeCompensation(factor: number): this`

Range: 0.5 - 2.0
Default: 1.1
Description: Volume compensation factor. Values > 1.0 boost volume, < 1.0 reduce it. Compensates for perceived loudness loss during stereo-to-mono conversion.

`quality(level: number): this`

Range: 0.0 - 1.0
Default: 0.8
Description: Processing quality level. Higher quality provides better results but may be slower.

`spectralAnalysis(enabled: boolean): this`

Default: true
Description: Enable advanced spectral analysis for better harmonic preservation.

`sampleRate(rate: number | null): this`

Default: null (uses source sample rate)
Description: Target sample rate for processing. Set to resample during conversion.

Audio Buffer Format

interface AudioBuffer {
  left: Float32Array;    // Left channel samples (-1.0 to 1.0)
  right: Float32Array;   // Right channel samples (-1.0 to 1.0)
  sampleRate: number;    // Sample rate in Hz (e.g., 44100, 48000)
}

Analysis Results

interface StereoAnalysis {
  width: number;                    // Stereo width (0.0 = mono, 1.0 = max separation)
  richness: number;                 // Spectral richness (harmonic density)
  rmsLevel: number;                 // RMS volume level
  peakLevel: number;                // Peak volume level
  phaseCorrelation: number;         // Phase correlation (-1 to 1)
  frequencyDistribution: {
    low: number;                    // Low frequency energy ratio
    mid: number;                    // Mid frequency energy ratio
    high: number;                   // High frequency energy ratio
  };
}

Advanced Examples

Processing with Different Presets

import { createConverter } from 'neuro-mono';

// Podcast/Voice preset - focus on clarity
const voiceConverter = createConverter()
  .preserveWidth(0.3)
  .preserveRichness(0.6)
  .volumeCompensation(1.2);

// Music preset - maximum quality
const musicConverter = createConverter()
  .preserveWidth(0.9)
  .preserveRichness(0.95)
  .volumeCompensation(1.1)
  .quality(1.0);

// Fast preset - for real-time processing
const fastConverter = createConverter()
  .preserveWidth(0.5)
  .preserveRichness(0.5)
  .quality(0.5)
  .spectralAnalysis(false);

Batch Processing

import { createConverter } from 'neuro-mono';

const converter = createConverter()
  .preserveWidth(0.8)
  .preserveRichness(0.8);

const stereoBuffers = [...]; // Array of stereo buffers

const monoBuffers = stereoBuffers.map(buffer => 
  converter.convert(buffer)
);

Integration with Web Audio API

import { createConverter } from 'neuro-mono';

async function convertWebAudioBuffer(audioBuffer: AudioBuffer) {
  // Extract channels
  const left = audioBuffer.getChannelData(0);
  const right = audioBuffer.getChannelData(1);
  
  // Convert
  const converter = createConverter()
    .preserveWidth(0.8)
    .preserveRichness(0.85);
  
  const mono = converter.convert({
    left: new Float32Array(left),
    right: new Float32Array(right),
    sampleRate: audioBuffer.sampleRate
  });
  
  // Create new mono AudioBuffer
  const audioContext = new AudioContext();
  const monoBuffer = audioContext.createBuffer(
    1, // mono
    mono.length,
    audioBuffer.sampleRate
  );
  
  monoBuffer.copyToChannel(mono, 0);
  return monoBuffer;
}

Dynamic Quality Adjustment

import { createConverter } from 'neuro-mono';

const converter = createConverter();

// Analyze first
const analysis = converter.analyze(stereoBuffer);

// Adjust settings based on analysis
if (analysis.width > 0.7) {
  // Wide stereo image - preserve more width
  converter.preserveWidth(0.9);
}

if (analysis.richness > 0.8) {
  // Rich harmonic content - preserve more richness
  converter.preserveRichness(0.95);
}

const mono = converter.convert(stereoBuffer);

How It Works

NeuroMono uses a multi-stage intelligent processing pipeline:

Stereo Analysis: Analyzes stereo width, phase correlation, spectral richness, and frequency distribution
Neural Network Processing: A pre-trained neural network determines optimal mixing weights based on audio characteristics
Intelligent Downmix: Combines left and right channels using neural network weights while preserving side information
Harmonic Preservation: Extracts and reintroduces harmonic content to maintain audio richness
Frequency Enhancement: Preserves high-frequency clarity that's often lost in traditional downmixing
Volume Compensation: Adjusts volume to maintain perceived loudness, accounting for stereo width loss

Why Not Just Average L+R?

Traditional stereo-to-mono conversion simply averages left and right channels: mono = (L + R) / 2. This approach loses:

Stereo Width Information: Spatial positioning is completely discarded
Harmonic Richness: Side channel harmonics are lost
Perceived Volume: Wide stereo mixes lose perceived loudness
Frequency Balance: High-frequency content in out-of-phase signals cancels out

NeuroMono preserves these characteristics through intelligent analysis and processing.

Performance

Typical processing speed on modern hardware:

44.1kHz stereo audio: ~50-100x real-time (quality=0.8)
48kHz stereo audio: ~40-80x real-time (quality=0.8)
Fast preset: ~200-400x real-time

Browser Support

NeuroMono works in any JavaScript environment:

✅ Node.js 16+
✅ Modern browsers (Chrome, Firefox, Safari, Edge)
✅ Electron
✅ React Native (with appropriate audio buffer handling)

HTML Demo

Try NeuroMono in your browser with the interactive demo! Just open demo.html to:

Upload your own audio files (MP3, WAV, OGG, FLAC)
Configure conversion settings with intuitive sliders
Analyze stereo characteristics (width, richness, phase correlation)
Compare original and converted waveforms
Play and download the converted audio

The demo includes presets for different use cases:

🎤 Voice/Podcast - Optimized for speech
🎵 Music - Maximum quality preservation
⚡ Fast - Real-time processing
⚖️ Balanced - Good quality/speed tradeoff
🔝 Maximum - Best possible output

To use the demo, simply open demo.html in a modern web browser.

TypeScript

NeuroMono is written in TypeScript and includes full type definitions.

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Developed with ❤️ by Alex Scott