odyssey-spatial-comms
v2.0.7
Published
Odyssey Spatial Communications SDK - A powerful and flexible WebRTC-based SDK for spatial audio and video communication, built on top of mediasoup.
Readme
odyssey-spatial-comms
WebRTC spatial audio & video SDK — built on mediasoup, Web Audio API, and TensorFlow noise suppression.
Installation
npm
npm install odyssey-spatial-commsyarn
yarn add odyssey-spatial-commspnpm
pnpm add odyssey-spatial-commsQuick Start
React / Next.js
import { SpatialCommsSDK } from 'odyssey-spatial-comms';
// Create once when the user enters a space — type is inferred automatically
const client = SpatialCommsSDK.create('https://your-mediasoup-server.com');
// Resume audio context on first user gesture (required by all browsers)
await client.resumeAudio();
// Listen for events
client.on('room-joined', (data) => console.log('joined', data));
client.on('participant-joined', (p) => console.log('user joined', p.userId));
// Join a room
await client.joinRoom({
roomId: 'my-room',
userId: 'user-123',
deviceId: 'device-123',
position: { x: 0, y: 0, z: 0 },
direction: { x: 0, y: 0, z: 1 },
userName: 'Alice', // optional — shown to other participants
userEmail: '[email protected]', // optional
});
// Publish microphone
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
await client.produceTrack(stream.getAudioTracks()[0]);
// Update 3D position every time the user moves (~10Hz)
client.updatePosition(
{ x: 1.0, y: 0.0, z: 5.0 }, // world position in meters
{ x: 0, y: 0, z: 1 }, // forward direction (unit vector)
{ rot: { x: 0, y: 45, z: 0 } } // rotation: pitch, yaw, roll (degrees)
);
// Leave
client.leaveRoom();Vue 3 (Composition API)
import { SpatialCommsSDK } from 'odyssey-spatial-comms';
import type { OdysseySpatialCommsHandle } from 'odyssey-spatial-comms';
import { onMounted, onUnmounted, ref } from 'vue';
const client = ref<OdysseySpatialCommsHandle | null>(null);
onMounted(() => {
client.value = SpatialCommsSDK.create('https://your-mediasoup-server.com');
client.value.on('room-joined', () => console.log('connected!'));
client.value.on('participant-joined', (p) => console.log('user joined', p.userId));
client.value.joinRoom({
roomId: 'my-room',
userId: 'user-123',
deviceId: 'device-123',
position: { x: 0, y: 0, z: 0 },
direction: { x: 0, y: 0, z: 1 },
});
});
onUnmounted(() => client.value?.leaveRoom());Plain TypeScript / Vanilla JS
import { SpatialCommsSDK } from 'odyssey-spatial-comms';
const client = SpatialCommsSDK.create('https://your-mediasoup-server.com');
client.on('room-joined', () => console.log('connected!'));
await client.joinRoom({
roomId: 'lobby',
userId: 'user-1',
deviceId: 'device-1',
position: { x: 0, y: 0, z: 0 },
direction: { x: 0, y: 0, z: 1 },
});Next.js — dynamic import (browser-only SDK)
// pages/space.tsx or app/space/page.tsx
import dynamic from 'next/dynamic';
// Must be dynamic — SDK uses Web Audio API (browser only)
const SpaceRoom = dynamic(() => import('../components/SpaceRoom'), { ssr: false });// components/SpaceRoom.tsx
import { useEffect, useRef } from 'react';
import { SpatialCommsSDK } from 'odyssey-spatial-comms';
import type { OdysseySpatialCommsHandle } from 'odyssey-spatial-comms';
export default function SpaceRoom() {
const clientRef = useRef<OdysseySpatialCommsHandle | null>(null);
useEffect(() => {
const client = SpatialCommsSDK.create('https://your-server.com');
clientRef.current = client;
client.on('room-joined', async () => {
await client.resumeAudio();
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
await client.produceTrack(stream.getAudioTracks()[0]);
});
client.joinRoom({
roomId: 'space-1',
userId: 'user-1',
deviceId: 'dev-1',
position: { x: 0, y: 0, z: 0 },
direction: { x: 0, y: 0, z: 1 },
});
return () => { client.leaveRoom(); };
}, []);
return <div>Space loaded</div>;
}TypeScript — if you need the type explicitly
import { SpatialCommsSDK } from 'odyssey-spatial-comms';
import type { OdysseySpatialCommsHandle } from 'odyssey-spatial-comms';
// Option A: let TypeScript infer it (recommended — no import needed)
const client = SpatialCommsSDK.create('https://your-server.com');
// Option B: explicit type annotation (only needed for refs/stores)
const clientRef: OdysseySpatialCommsHandle | null = null;
// Option C: derive the type from the factory (no import needed)
type OdysseyClient = ReturnType<typeof SpatialCommsSDK.create>;Peer Dependencies
These are not bundled — install them in your own project:
npm install socket.io-client mediasoup-client @tensorflow/tfjs webrtc-adapterKey API
| Method | Description |
|---|---|
| SpatialCommsSDK.create(serverUrl) | Create a new SDK client |
| client.joinRoom({ roomId, userId, deviceId, position, direction }) | Join a room |
| client.leaveRoom() | Leave the current room |
| client.updatePosition(position, direction) | Update 3D position in space |
| client.produceTrack(track, appData?) | Publish a local audio/video track |
| client.setMasterMuted(muted) | Mute/unmute all incoming audio |
| client.initializeMLNoiseSuppression(modelPath) | Enable TF.js noise suppression |
| client.setListenerPosition(pos, orientation) | Set spatial audio listener |
| client.on(event, handler) | Subscribe to SDK events |
| client.off(event, handler) | Unsubscribe from SDK events |
Events
client.on('room-joined', (data: RoomJoinedData) => { ... });
client.on('participant-joined', (p: Participant) => { ... });
client.on('participant-left', (p: Participant) => { ... });
client.on('participant-updated', (p: Participant) => { ... });
client.on('track-added', ({ participantId, track, kind }) => { ... });
client.on('huddle-invite', ({ from }) => { ... });
client.on('space-live-started', (data) => { ... });
client.on('space-live-stopped', () => { ... });Complete Flow: Frontend → Server → SDK
[1] UNREAL ENGINE
Sends: pos=(4130, 220, 700) cm (X=forward, Y=right, Z=up)
│
▼
[2] FRONTEND (Vue / React)
Transform Unreal coords → Standard + cm → meters:
position = {
x: unrealPos.y / 100, // UE Y (right) → X (right) = 2.2m
y: unrealPos.z / 100, // UE Z (up) → Y (up) = 7.0m
z: unrealPos.x / 100 // UE X (forward) → Z (forward) = 41.3m
}
Calls:
sdk.updatePosition(position, direction, { rot })
sdk.setListenerFromLSD(position, cameraPos, lookAtPos, rot)
│
▼
[3] SDK → SERVER (socket.emit "update-position")
Sends: { participantId, position, direction, rot }
│
▼
[4] SERVER (pass-through mode)
1. Receive position from client
2. Auto-detect units: if maxAxis > 50 → divide by 100 (cm→m safety net)
3. No smoothing: pass real-time position straight through
4. Broadcast normalized position (meters) to all other clients
│
▼
[5] SDK — receiving remote participant positions
socket.on("participant-position-updated") triggers:
1. normalizePositionUnits() — backup unit check (maxAxis > 50 → /100)
2. snapPosition() — ignore movements < 15cm (anti-jitter)
3. computeHeadPosition() — add +1.6m Y for head height
4. calculateLogarithmicGain()— cubic falloff 100%→0% over 0.5m→15m
5. calculatePanning() — sin(atan2) projection onto listener right-ear axis
6. Web Audio nodes updated — GainNode + StereoPannerNode ramped smoothlyWhat Happens on a Sudden 5m Position Jump
SCENARIO: Person teleports from 2m away → 7m away instantly
Step 1 — SERVER detects jump > 5m
Lerps 30% toward new position each frame:
Frame 1: 2.0m → 3.5m (30% of 5m gap)
Frame 2: 3.5m → 4.55m
Frame 3: 4.55m → 5.29m
Frame 4: 5.29m → 5.80m
Frame N: ... converges to 7m
Step 2 — SDK receives the smoothed intermediate positions
Recalculates cubic gain for each step:
3.5m → ~51% gain
4.55m → ~38% gain
5.29m → ~29% gain
Step 3 — Web Audio smooths each gain change
gainNode.gain.setTargetAtTime(newGain, now, 0.05)
≈ 150ms smooth ramp per step → zero clicks or popsCoordinate System (World Space)
All spatial calculations are performed relative to a world origin (datum) at (0, 0, 0):
+Z (Forward/North)
↑
10 |
| B (15, 8) ← Speaker
8 | /
| / 5.83m distance
6 | /
| /
5 | A (10, 5) ← YOU (Listener, facing 0°)
| ↑
3 | | Your right ear →
| |
1 |
|
0 +--+--+--+--+--+--+--+--+--→ +X (Right/East)
0 2 4 6 8 10 12 14 16
↙ (into page)
+Y (Up/Height)Key Points:
- Datum (0,0,0): World origin - all positions measured from here
- X-axis: Right/Left (positive = right, negative = left)
- Y-axis: Up/Down (height above ground)
- Z-axis: Forward/Back (positive = forward/north, negative = back/south)
- Distance: 3D Euclidean distance =
√(Δx² + Δy² + Δz²) - Panning: Calculated from X-Z plane position relative to listener rotation
Coordinate Transform (Unreal → Standard):
// Unreal: X=forward, Y=right, Z=up
// Standard SDK: X=right, Y=up, Z=forward
position = {
x: unrealPos.y / 100, // UE Y (right) → X (right)
y: unrealPos.z / 100, // UE Z (up) → Y (up)
z: unrealPos.x / 100 // UE X (forward) → Z (forward)
}Feature Highlights
- 🔌 One factory to rule it all –
SpatialCommsSDK.create(serverUrl)wires transports, producers, consumers, and room state. - 🧭 Accurate pose propagation –
updatePosition()streams listener pose to the SFU whileparticipant-position-updatedkeeps the local store in sync. - 🎧 Studio-grade spatial audio – each remote participant gets a dedicated Web Audio graph:
ML denoiser (ScriptProcessorNode) → limiter → high-pass → low-pass → stereo panner → adaptive gain → master compressor. ML denoiser is a trained 3-layer GRU model (872K params, val_loss=0.1636) running fully client-side via TensorFlow.js. - 🎚️ Crystal-Clear Audio Processing – A finely-tuned audio pipeline featuring a gentle compressor, multi-stage filtering, and a smart denoiser prevents audio dropouts and echo. The result is a more natural, continuous voice without distracting artifacts.
- 🧭 Position-based spatial panning –
updatePositionforwards positions to Web Audio which calculates panning based on WHERE the speaker is relative to the listener (not which way they face). Uses listener's right-vector projection with 5m pan radius for natural left/right placement. - 🤖 ML Noise Suppression (Active) – TensorFlow.js GRU model (
odyssey_adaptive_denoiser) runs as aScriptProcessorNodewired as the first node in every participant's audio chain. Loads non-blocking in the background; operates in pass-through mode until the model is ready, then switches to ML denoising automatically. No fallback — if it fails, the error is logged to console. - 🔄 ICE Connection Stability – Automatic ICE restart on transport disconnect for robust connections. SDK requests ICE restart from server when transport enters
disconnectedstate, enabling faster recovery from network issues without full reconnection.
Audio Flow (Server ↔ Browser)
┌──────────────┐ update-position ┌──────────────┐ pose + tracks ┌──────────────────┐
│ Browser LSD │ ──────────────────▶ │ MediaSoup SFU│ ────────────────▶ │ SDK Event Bus │
│ (Unreal data)│ │ + Socket.IO │ │ (EventManager) │
└──────┬───────┘ └──────┬───────┘ └──────────┬────────┘
│ │ track + pose
│ │ ▼
│ ┌────────▼────────┐ ┌──────────────────┐
│ audio RTP │ consumer-created│ │ SpatialAudioMgr │
└──────────────────────────▶│ setup per-user │◀──────────────────────│ (Web Audio API) │
└────────┬────────┘ │ - Denoiser │
│ │ - HP / LP │
│ │ - StereoPanner │
▼ │ - Gain + Comp │
Web Audio Graph └──────────┬───────┘
│ │
▼ ▼
Listener ears (Left/Right) System OutputVideo Flow (Capture ↔ Rendering)
┌──────────────┐ produceTrack ┌──────────────┐ RTP ┌──────────────┐
│ getUserMedia │ ───────────────▶ │ MediaSoup SDK│ ──────▶ │ MediaSoup SFU│
└──────┬───────┘ │ (Odyssey) │ └──────┬───────┘
│ └──────┬───────┘ │
│ consumer-created │ track │
▼ ▼ │
┌──────────────┐ ┌──────────────┐ │
│ Vue/React UI │ ◀─────────────── │ SDK Event Bus │ ◀──────────────┘
│ (muted video │ │ exposes media │
│ elements) │ │ tracks │
└──────────────┘ └──────────────┘Video Track Flow:
- Capture:
getUserMedia()captures video from camera or screen - Produce:
sdk.produceTrack(track, { isScreenshare: true })sends to SFU - Route: MediaSoup SFU routes video RTP to other participants
- Consume: SDK receives
consumer-createdevent with video track - Render: UI attaches track to muted
<video>element (audio handled separately)
Web Audio Algorithms
Coordinate normalization – Unreal sends centimeters; SpatialAudioManager auto-detects large values and converts to meters once.
360° angle-based stereo panning – setListenerFromLSD() calculates the listener's right-ear vector from their yaw (rot.y). When updateSpatialAudio() runs, it uses atan2 to calculate the angle from listener to speaker, then applies sin(angle) for natural panning. This gives full left/right separation at ±90° angles. Speaker's rotation is ignored – only their position relative to listener matters.
Dynamic distance gain – updateSpatialAudio() measures distance from listener → source and applies a CUBIC EXPONENTIAL falloff (0.5m-15m range). Voices gradually fade from 100% (0.5m) to complete silence at 15m+ (hard cutoff). The cubic (1-normalized)³ formula creates clearly noticeable volume changes as you move. Distance calculated from listener's HEAD position to participant's HEAD position (body + 1.6m height). Master compressor is DISABLED to ensure gain changes are audible.
Noise handling – a TensorFlow.js GRU model (odyssey_adaptive_denoiser, 872K params, val_loss=0.1636) runs in a ScriptProcessorNode as the FIRST node in every participant's chain, applying a learned spectral mask before the high/low-pass filters. Audio passes through unchanged until the model finishes loading, then ML denoising becomes active automatically with no user action required.
Spatial Audio System (CLOCKWISE Rotation)
Core Algorithm (Full 360° Support)
The panning calculation uses position-based projection onto the listener's right-ear axis:
// Step 1: Calculate listener's right vector from yaw (CLOCKWISE rotation)
const yawRad = (rot.y * Math.PI) / 180;
listenerRight = {
x: Math.cos(yawRad),
z: -Math.sin(yawRad) // NEGATIVE sine for CLOCKWISE rotation
};
// Step 2: Vector from listener to speaker
vecToSource = {
x: speakerPos.x - listenerPos.x,
z: speakerPos.z - listenerPos.z
};
// Step 3: Calculate forward vector (90° CW from right)
listenerForward = { x: -listenerRight.z, z: listenerRight.x };
// Step 4: Project onto both axes
dxLocal = vecToSource.x * listenerRight.x + vecToSource.z * listenerRight.z; // Right/Left
dzLocal = vecToSource.x * listenerForward.x + vecToSource.z * listenerForward.z; // Front/Back
// Step 5: Calculate angle using atan2 (gives -π to +π radians)
angleToSource = Math.atan2(dxLocal, dzLocal);
// Step 6: Convert to pan value using sine (-1 to +1)
// 90° (right) = +1.0, 270° (left) = -1.0, 0°/180° (front/back) = 0.0
rawPan = Math.sin(angleToSource);
// Step 7: Apply smoothing to prevent jitter
smoothedPan = smoothPanValue(participantId, rawPan);Key Principles
| Principle | Description |
|----------------------------|-------------------------------------------------------------|
| Position-based | Panning based on WHERE speaker is, NOT where they're looking |
| Listener yaw matters | Your rot.y determines which direction is "right" |
| Speaker rotation ignored | Their facing direction does NOT affect panning |
| Full 360° support | cos/sin trigonometry handles any angle automatically |
Listener Right Vector by Yaw (CLOCKWISE Rotation)
| Yaw | Facing | listenerRight (x, z) | Right Ear Faces | Left Ear Faces | |-------|-----------|----------------------|-----------------|----------------| | 0° | +Z (fwd) | (1.0, 0.0) | +X | -X | | 90° | +X (right)| (0.0, -1.0) | -Z | +Z | | 180° | -Z (back) | (-1.0, 0.0) | -X | +X | | 270° | -X (left) | (0.0, 1.0) | +Z | -Z |
Pan Value to Left/Right Gain
| panValue | Left Ear | Right Ear | Angle | Description | |----------|----------|-----------|--------------|----------------| | -1.0 | 100% | 0% | 270° (left) | Full LEFT | | -0.71 | 85% | 15% | 315°/225° | Diagonal LEFT | | 0.0 | 50% | 50% | 0°/180° | CENTER | | +0.71 | 15% | 85% | 45°/135° | Diagonal RIGHT | | +1.0 | 0% | 100% | 90° (right) | Full RIGHT |
Anti-Jitter Smoothing (3 Layers)
Layer 1: Gain Change Threshold Filter (2.5%)
const GAIN_CHANGE_THRESHOLD = 0.025; // 2.5%
if (Math.abs(newGain - currentGain) / 100 < GAIN_CHANGE_THRESHOLD) {
return currentGain; // Ignore micro-jitter (movements ≤40cm)
}Layer 2: Adaptive EMA for Pan
// Normal: 70% smoothing for stability
smoothedPan = previousPan * 0.7 + newPan * 0.3;
// Near center: 50% smoothing for moderate response
if (bothNearCenter) {
smoothedPan = previousPan * 0.5 + newPan * 0.5;
}
// Full flip (likely jitter): 85% HEAVY smoothing
if (signFlipped && panChange > 1.0) {
smoothedPan = previousPan * 0.85 + newPan * 0.15;
}Layer 3: Audio API Ramp Time
stereoPanner.pan.setTargetAtTime(panValue, currentTime, 0.08); // 80ms pan
gainNode.gain.setTargetAtTime(gainValue, currentTime, 0.05); // 50ms gainDistance-Based Gain: CUBIC EXPONENTIAL Falloff (HARD CUTOFF at 15m)
| Distance | Gain | Description |
|:---------------|:------:|:---------------------------------|
| 0.0 - 0.5m | 100% | Full volume (intimate) |
| 1.0m | ~90% | Very close - still loud |
| 2.0m | ~72% | Normal talking - NOTICEABLE |
| 3.0m | ~57% | Across table - CLEARLY QUIETER |
| 5.0m | ~33% | Across room - MUCH QUIETER |
| 7.0m | ~17% | Far end of room - very faint |
| 10.0m | ~4% | Barely audible |
| ≥15.0m | 0% | Silent (HARD CUTOFF) |
CUBIC EXPONENTIAL falloff formula:
private calculateLogarithmicGain(distance: number): number {
const minDistance = 0.5; // Full volume at 0.5m or closer
const maxDistance = 15.0; // Silent at 15m or farther - HARD CUTOFF
if (distance <= minDistance) return 100; // Full volume
if (distance >= maxDistance) return 0; // Silent - HARD CUTOFF
// CUBIC: (1 - normalized)³ for NOTICEABLE volume changes
const range = maxDistance - minDistance; // 14.5m
const normalizedDistance = (distance - minDistance) / range;
const remainingRatio = 1 - normalizedDistance;
return 100 * remainingRatio * remainingRatio * remainingRatio;
}Why Cubic (not Linear or Quadratic)?
- Linear: Too gradual - hard to notice volume changes
- Quadratic: Not steep enough for 15m range
- Cubic: Perfect balance - clearly noticeable with proper 15m silence
Smoothing: Web Audio's setTargetAtTime() handles all smoothing:
// Time constant 0.05 = ~150ms smooth transition (no clicks)
gainNode.gain.setTargetAtTime(gainValue, currentTime, 0.05);Note: Master compressor is DISABLED to ensure gain changes are clearly audible.
Audio Stability System
Layer 1: Gain Change Threshold Filter (2.5%)
const GAIN_CHANGE_THRESHOLD = 0.025; // 2.5%
if (Math.abs(newGain - currentGain) / 100 < GAIN_CHANGE_THRESHOLD) {
return currentGain; // Ignore micro-jitter (movements ≤40cm)
}Layer 2: SDK Position Snapping
positionSnapThreshold = 40cm
If movement < 40cm → use cached position (ignores pixel streaming jitter)Layer 3: Web Audio Smoothing
// Gain changes are smoothed by Web Audio API directly
// 50ms time constant for smooth transitions
gainNode.gain.setTargetAtTime(gainValue, currentTime, 0.05); // 150ms smooth
stereoPanner.pan.setTargetAtTime(panValue, currentTime, 0.08); // 240ms smoothWhy simplified? Previous rate-limiting was causing gain to get stuck at low values. Web Audio's built-in smoothing is sufficient and more reliable.
Enterprise-Grade Gain Smoothing
Problem Solved: With 40+ people in a room, rapid position updates (60+ Hz) caused instantaneous gain changes that created audible clicks, pops, and "pit pit" crackling noise. This was caused by setValueAtTime() creating waveform discontinuities.
Throttling = "Wait a bit before doing the same thing again" to prevent overload! 🎯
Person walking 10 meters:
Without Throttling:
||||||||||||||||||||||||||||||||||||| 600 updates
↑ Every single tiny movement = update
Result: Clicks, pops, CPU overload 😖
With Throttling (16ms):
| | | | | | | | | | 60 updates
↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
16ms gaps between updates
Result: Smooth, efficient, perfect 😎The Solution: Intelligent Throttling + Adaptive Ramping
// OLD (Causes Clicks):
nodes.gain.gain.setValueAtTime(gainValue, currentTime); // ❌ Instant jump
// NEW (Butter Smooth):
nodes.gain.gain.cancelScheduledValues(currentTime);
nodes.gain.gain.setValueAtTime(lastGain, currentTime);
nodes.gain.gain.linearRampToValueAtTime(gainValue, currentTime + rampTime); // ✅ Smooth transitionPerformance Characteristics
| Participant Count | Position Updates/sec | Throttled Updates/sec | CPU Impact | Audio Quality | |:-----------------:|:--------------------:|:---------------------:|:----------:|:-------------:| | 2-5 | ~300 | ~60 | Low | Perfect ✅ | | 10 | ~600 | ~120 | Low | Perfect ✅ | | 20 | ~1,200 | ~240 | Medium | Perfect ✅ | | 40 | ~2,400 | ~480 | Medium | Perfect ✅ | | 100 | ~6,000 | ~600 | Medium | Perfect ✅ |
Intelligent Throttling Logic
// Throttle: Skip update if too recent AND gain change is small
const isSignificantChange = gainDelta > 0.1; // >10% change
if (timeSinceLastUpdate < 16ms && !isSignificantChange) {
return; // Skip this update, wait for next frame
}Key Features:
- ✅ Time-based throttling: Maximum 60Hz per participant (16ms interval)
- ✅ Significance bypass: Large changes (>10%) bypass throttle immediately
- ✅ Per-participant tracking: Each person has independent throttle state
- ✅ Standing participants: Minimal updates when not moving (saves CPU)
Adaptive Ramp Time
The system automatically adjusts ramp time based on gain change magnitude:
| Gain Change | Ramp Time | User Experience | |:-----------:|:---------:|:----------------| | < 5% | 15ms | Instant feel, imperceptible smoothing | | 5-20% | 15-35ms | Smooth transition, natural | | 20-30% | 35-45ms | Very smooth, no artifacts | | > 30% | 50ms | Ultra smooth, prevents any clicking |
Formula:
rampTime = Math.min(
0.015 + (gainDelta * 0.1), // Base 15ms + scaled by change
0.050 // Max 50ms cap
);Real-World Scenarios
| Scenario | Behavior | Result | |----------|----------|--------| | Person walking nearby | Small gain changes → 15-25ms ramps | Feels instant, zero clicks | | Person runs past you | Large gain changes → 40-50ms ramps | Smooth volume sweep | | 40 people, 20 moving | ~1200 updates → throttled to ~240 | Perfect audio, low CPU | | Person stands still | Updates skipped entirely | Zero CPU usage | | Person teleports close | >10% change bypasses throttle | Immediate volume update |
Error Handling & Fallback
try {
// Smooth ramping
nodes.gain.gain.linearRampToValueAtTime(gainValue, currentTime + rampTime);
} catch (err) {
// Fallback: Direct value setting (rare edge case)
console.warn(`Gain scheduling failed, using instant set:`, err);
nodes.gain.gain.value = gainValue;
}Why This Works
Root Cause: Instantaneous gain changes create waveform discontinuities:
Old Method: New Method:
Volume Volume
↑ ↑
│ ╱╲ ╱╲ │ ╱╲ ╱╲
│ ╱ ╲ ╱ ╲ │ ╱ ╲ ╱ ╲
│ ╱ ╲╱ ╲ │ ╱ ╲╱ ╲
│ ╱ ╲ │ ╱ ╲
│ ╱ ╲ │ ╱ ╲
──┼────────────────────→ Time ──┼────────────────────→ Time
0 ← JUMP! Click here! 0 ← Smooth ramp here!Technical Details:
- Rapid gain jumps = discontinuous waveform = audible click
- With 60Hz position updates × 40 people = 2400 potential clicks/sec
- Linear ramping = continuous waveform = zero artifacts
- Throttling reduces update frequency by ~60% (saves CPU + audio thread)
Network Resilience
Server-Side:
// Opus codec with Forward Error Correction
useinbandfec: 1 // Automatically recovers lost packets
ptime: 20 // 20ms frames for low latencyWhy Non-Spatial Audio Worked Fine:
- Non-spatial audio: Single static gain value, rarely changes
- Spatial audio: Per-frame position updates = rapid gain changes
- The issue wasn't network - it was rapid gain value changes in Web Audio API
🎛️ Audio Processing Settings
Design Goal: Crystal clear voice with no echo, pumping, or bathroom effect.
🔊 Master Compressor
| Setting | Value | Purpose |
|:--------------|:---------:|:-------------------------------------|
| Threshold | -18 dB | Only compress loud peaks |
| Knee | 40 dB | Soft knee for natural sound |
| Ratio | 3:1 | Gentle compression, no pumping |
| Attack | 10 ms | Fast enough to catch peaks |
| Release | 150 ms | Fast release prevents echo tail |
| Master Gain | 1.0 | Unity gain for clean signal |
🎚️ Filter Chain
| Filter | Frequency | Q Value | Purpose |
|:----------------|:-----------:|:-------:|:--------------------------------|
| Highpass | 100 Hz | 0.5 | Remove room boom/rumble |
| Lowpass | 10 kHz | 0.5 | Open sound, no ringing |
| Voice Boost | 180 Hz | 0.5 | ❌ Disabled (prevents echo) |
| Dynamic Lowpass | 12 kHz | 0.5 | Natural treble preservation |
🛡️ Per-Participant Limiter
| Setting | Value | Purpose |
|:-----------|:---------:|:--------------------------------------|
| Threshold | -6 dB | Only activate near clipping |
| Knee | 3 dB | Hard knee = true limiter |
| Ratio | 20:1 | High ratio catches peaks cleanly |
| Attack | 1 ms | Ultra-fast peak catching |
| Release | 50 ms | Fast release = no pumping |
🎤 Denoiser (ML — GRU ScriptProcessorNode)
| Parameter | Value | Purpose |
|:------------------|:-------------------------:|:--------------------------------------------------|
| Model | odyssey_adaptive_denoiser | 3-layer GRU, UINT8 quantized TF.js |
| Params | 872,448 | Trained 100 epochs, val_loss=0.1636 |
| Buffer size | 4096 samples (~85ms) | ScriptProcessorNode synchronous processing |
| Backend | WebGL (GPU) | Reported at load: [MLNoiseSuppressor] TF.js backend ready: webgl |
| Pass-through | Yes (while loading) | Audio unaffected until model is ready |
| Normalization | mean=0.3953, std=0.1442 | Stats loaded from normalization_stats.json |
🔗 Audio Chain
┌──────────────────────────────────────────────────────────────────────────────────────────┐
│ AUDIO PROCESSING CHAIN │
├──────────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ MediaStream ML Denoiser Per-Participant Spatial Master │
│ Source → (GRU model) → Limiter → Filters → Panner → Compressor │
│ │ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ ▼ │
│ [WebRTC] [ScriptProcessor [Peak Catch] [HP 100Hz] [Stereo [3:1 Ratio] │
│ Track 872K GRU model] [-6dB] LP 10kHz] L/R Pan] Output │
│ (pass-through │
│ while loading) │
└──────────────────────────────────────────────────────────────────────────────────────────┘Detailed Chain:
Source → MLScriptProcessor (GRU denoiser) → Limiter → HighPass(100Hz) → VoiceBand → LowPass(10kHz) →
DynamicLP(12kHz) → MonoDownmix → StereoUpmix → StereoPanner → Gain → MasterCompressor → OutputSpatial Audio Flowchart
┌─────────────────────────────────────────────────────────────────────────────┐
│ SPATIAL AUDIO PIPELINE │
└─────────────────────────────────────────────────────────────────────────────┘
┌──────────────────┐ ┌──────────────────┐
│ LISTENER DATA │ │ SPEAKER DATA │
│ pos, rot (yaw) │ │ pos (x, y, z) │
└────────┬─────────┘ └────────┬─────────┘
│ │
▼ │
┌──────────────────┐ │
│ Calculate Right │ │
│ Vector │ │
│ cos(yaw), -sin() │ │
└────────┬─────────┘ │
│ │
▼ ▼
┌─────────────────────────────────────────────┐
│ VECTOR TO SPEAKER │
│ vecToSource = speakerPos - listenerPos │
└──────────────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ DOT PRODUCT (Projection) │
│ dxLocal = vecToSource · listenerRight │
│ (positive = RIGHT, negative = LEFT) │
└──────────────────────┬──────────────────────┘
│
┌────────────┴────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ NORMALIZE PAN │ │ CALCULATE DIST │
│ sin(atan2(dx,dz))│ │ dist = |vecTo| │
│ range: -1 to +1 │ │ 0.5m → 15m range │
└────────┬─────────┘ └────────┬─────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ ANTI-JITTER │ │ EXPONENTIAL GAIN │
│ - Threshold 2.5% │ │ gain = (1-norm)² │
│ - EMA 70% │ │ × 100% │
│ - Ramp 80ms │ │ 0% at 15m │
└────────┬─────────┘ └────────┬─────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ StereoPanner │ │ GainNode │
│ L/R balance │ │ volume │
└────────┬─────────┘ └────────┬─────────┘
│ │
└──────────┬──────────────┘
▼
┌──────────────────┐
│ AUDIO OUTPUT │
│ (headphones) │
└──────────────────┘360° Spatial Audio Diagram (Top View)
0° (Front)
L100% R100%
↑
│
315° │ 45°
L100% R58% │ L58% R100%
↖ │ ↗
↖ │ ↗
↖ │ ↗
↖ │ ↗
270° ←─────────────── 🎧 ───────────────→ 90°
(Left) Listener (Right)
L100% R40% yaw=0° L40% R100%
↙ │ ↘
↙ │ ↘
↙ │ ↘
↙ │ ↘
L100% R58% │ L58% R100%
225° │ 135°
│
↓
L100% R100%
180° (Behind)
Legend:
- 🎧 = Listener at origin, facing 0° (forward)
- Angles = Speaker position around listener
- L/R % = Left/Right ear volume for speaker at that positionConfiguration
| Parameter | Value | Description |
|------------------------|--------|-----------------------------|
| positionPanRadius | 5.0m | Distance for full L/R pan |
| nearDistance | 0.5m | Full gain threshold |
| farDistance | 10.0m | Silence threshold |
| panSmoothingFactor | 0.5 | Normal smoothing |
| panChangeThreshold | 0.02 | Jitter ignore threshold |
| panRampTime | 0.15s | Audio transition time |
| headHeight | 1.6m | Added to body Y |
Console Logs Reference
// Mediasoup server URL being used
[Odyssey] Connecting to MediaSoup server: https://...
// ML model loading
[MLNoiseSuppressor] Initializing TF.js backend: webgl
[MLNoiseSuppressor] Model loaded — 872,448 params | backend: webgl
[Odyssey] ML Noise Suppression loaded and active
// ML model failure (audio still works — pass-through mode)
[Odyssey] ML Noise Suppression failed to load: <error>
// ML active per participant
[SpatialAudioChannel] ML noise suppression ACTIVE — model loaded from <url>
// Listener position update
📍 [SDK Listener] pos=(x, y, z) rot=(pitch, yaw, roll)
// Speaker position received
🎧 [SDK Rx] <id> bodyPos=(x, y, z) rot=(pitch, yaw, roll)
// Spatial audio calculation
🎧 SPATIAL AUDIO [<id>] dist=Xm dxLocal=Xm rawPan=X smoothPan=X pan(L=X%,R=X%) gain=X% listenerRight=(x,z) vecToSrc=(x,z)Server Contract (Socket.IO Events)
| Event | Direction | Payload |
|----------------------------------|------------------|-----------------------------------------------------------------------------|
| join-room | client → server | {roomId, userId, deviceId, position, direction} |
| room-joined | server → client | RoomJoinedData (router caps, participants snapshot) |
| update-position | client → server | {participantId, conferenceId, position, direction, rot, cameraDistance} |
| participant-position-updated | server → client | {participantId, position, direction, rot, mediaState, pan} |
| consumer-created | server → client | {participantId, track(kind), position, direction, appData} |
| participant-media-state-updated| server → client | {participantId, mediaState} |
| all-participants-update | server → client | {roomId, participants[]} |
| new-participant | server → client | {participantId, userId, position, direction} |
| participant-left | server → client | {participantId} |
Position Data Types (Critical for Spatial Audio)
The SDK sends three separate data types to the server for accurate spatial audio:
| Data Type | Structure | Description |
|--------------|----------------------------------|---------------------------------------------------------------|
| position | {x, y, z} in meters | World coordinates - WHERE the player is located |
| direction | {x, y, z} normalized vector | Forward direction - which way the player is LOOKING (unit vector) |
| rot | {x, y, z} in degrees | Euler rotation angles - pitch(x), yaw(y), roll(z) |
IMPORTANT: rot.y (yaw) is critical for spatial audio left/right ear calculation:
- The listener's yaw determines their ear orientation
listenerRight = { x: cos(yaw), z: -sin(yaw) }- Speakers are panned based on their position projected onto listener's right axis
// Frontend sends all 3 data types:
sdk.updatePosition(position, direction, {
rot, // Rotation angles (pitch, yaw, roll) in degrees
cameraDistance,
screenPos,
});
// Server broadcasts to other clients:
socket.emit("participant-position-updated", {
position, // World coordinates
direction, // Forward vector
rot, // Rotation angles - yaw used for L/R audio
...
});Noise-Cancellation Stack (What's Included)
| Layer | Purpose |
|-------------------------------|--------------------------------------------------------------------------------------------------------------------------------|
| Adaptive denoiser worklet | Learns each participant's noise floor in real time, applies multi-band downward expander plus dynamic low/high-pass shaping |
| speechBoost | Lifts the low/mid band only when speech confidence is high, keeping consonants bright without reintroducing floor noise |
| highBandGate | Clamps constant fan hiss in the 4–12 kHz band whenever speechPresence is low |
| Silence gate | If energy stays below silenceFloor for configurable hold window, track ramps to true silence, wakes instantly on voice return|
| Classic filters | Fixed high-pass (80Hz) / low-pass (8kHz) shave off rumble and hiss before signals reach the panner |
Configuration example:
const sdk = SpatialCommsSDK.create(serverUrl, {
denoiser: {
threshold: 0.008,
maxReduction: 0.88,
hissCut: 0.52,
holdMs: 260,
voiceBoost: 0.65,
voiceSensitivity: 0.33,
voiceEnhancement: true,
silenceFloor: 0.00075,
silenceHoldMs: 520,
silenceReleaseMs: 160,
speechBoost: 0.35,
highBandGate: 0.7,
highBandAttack: 0.25,
highBandRelease: 0.12,
},
});How Spatial Audio Is Built
| Step | Description |
|-------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1. Telemetry ingestion | Each LSD packet is passed through setListenerFromLSD(listenerPos, cameraPos, lookAtPos, rot) so the Web Audio listener matches the player's real head/camera pose |
| 2. Per-participant graph | When consumer-created yields a remote audio track, setupSpatialAudioForParticipant() spins up: Source → Compressor → Denoiser → HP → LP → StereoPanner → Gain |
| 3. Position updates | Every participant-position-updated event calls updateSpatialAudio(participantId, position, rot). Position feeds panning, rot provides listener's yaw |
| 4. Distance-aware gain | The manager computes Euclidean distance to each remote participant and applies inverse distance law with exponential falloff (0.5m–15m range) for more perceptible volume changes |
| 5. Anti-jitter smoothing | 3-layer system: threshold filter (0.02), EMA smoothing (0.5), SNAP behavior (0.2 for direction changes) |
| 6. Left/right rendering | StereoPannerNode outputs processed signal with accurate L/R separation based on position projection |
Integration Checklist
- [ ] Instantiate once per page/tab and keep it in a store (Vuex, Redux, Zustand, etc.)
- [ ] Pipe LSD/Lap data from your rendering engine into
updatePosition()+setListenerFromLSD()at ~10 Hz - [ ] Render videos muted – never attach remote audio tracks straight to DOM; let
SpatialAudioManagerown playback - [ ] Resume audio context – call
sdk.resumeAudio()on first user interaction (required by browsers) - [ ] Handle consumer-created – attach video tracks to UI, audio is handled automatically by spatial audio
- [ ] Monitor logs – browser console shows
🎧 SDK,📍 SDK, and🎚️ [Spatial Audio]statements for every critical hop - [ ] Push avatar telemetry back to Unreal so
remoteSpatialDatacan render minimaps/circles
Core Modules
| File | Purpose |
|-----------------------------|--------------------------------------------------------------------------------------|
| src/index.ts | SpatialCommsSDK.create() – socket lifecycle, producers/consumers, event surface |
| src/core/MediasoupManager.ts | Transport helpers for produce/consume/resume |
| src/channels/spatial/SpatialAudioChannel.ts | Web Audio orchestration (listener transforms, per-participant chains, ML denoiser node) |
| src/audio/MLNoiseSuppressor.ts | TensorFlow.js GRU denoiser — odyssey_adaptive_denoiser model, 872K params, val_loss=0.1636 |
| src/core/EventManager.ts | Lightweight EventEmitter used by the entire SDK |
| src/types/index.ts | TypeScript interfaces for Position, Direction, Participant, MediaState, etc. |
Development Tips
- Run
npm install && npm run buildinsideodyssey-mediasoup-sdkto publish a fresh build - Use
npm run devwhile iterating so TypeScript outputs live underdist/ - The SDK targets evergreen browsers; Safari <16.4 needs WebGL support for TF.js (all modern Safari versions have this)
- Have questions or want to extend the SDK? Start with
SpatialAudioManager– that's where most of the "real-world" behavior (distance feel, stereo cues, denoiser) lives - ML Noise Suppression: Initialized automatically at SDK startup using the
odyssey_adaptive_denoisermodel frompublic/odyssey_adaptive_denoiser/model.json. No manual call needed. Watch browser console for[Odyssey] ML Noise Suppression loaded and active
Development
# Install dependencies
npm install
# Build
npm run build
# Watch mode
npm run dev
# Type check only
npm run typecheckRelated Documentation
- HEAD_POSITION_DATA_FLOW.md – Detailed panning algorithm with 360° tables
- SPATIAL_AUDIO_IMPLEMENTATION.md – Implementation summary with examples
- audio-position-wisecalulation.md – Position-wise calculation reference
