@newgameplusinc/odyssey-official-audio-video-sdk
v1.0.14
Published
Odyssey Official Audio & Video SDK using MediaSoup for real-time communication
Downloads
1,437
Readme
Odyssey Audio/Video SDK (MediaSoup + Web Audio)
This package exposes OdysseySpatialComms, a thin TypeScript client that glues together:
- MediaSoup SFU for ultra-low-latency audio/video routing
- Web Audio API for Apple-like spatial mixing via
SpatialAudioManager - Socket telemetry (position + direction) so every browser hears/see everyone exactly where they are in the 3D world
It mirrors the production SDK used by Odyssey V2 and ships ready-to-drop into any Web UI (Vue, React, plain JS).
Feature Highlights
- 🔌 One class to rule it all –
OdysseySpatialCommswires transports, producers, consumers, and room state. - 🧭 Accurate pose propagation –
updatePosition()streams listener pose to the SFU whileparticipant-position-updatedkeeps the local store in sync. - 🎧 Studio-grade spatial audio – each remote participant gets a dedicated Web Audio graph: denoiser → high-pass → low-pass → HRTF
PannerNode→ adaptive gain → master compressor. Uses Web Audio API's HRTF panning model for accurate left/right/front/back positioning based on distance and direction, with custom AudioWorklet processors for noise cancellation and voice tuning. - 🎥 Camera-ready streams – video tracks are exposed separately so UI layers can render muted
<video>tags while audio stays inside Web Audio. - 🔁 EventEmitter contract – subscribe to
room-joined,consumer-created,participant-position-updated, etc., without touching Socket.IO directly.
Quick Start
import {
OdysseySpatialComms,
Direction,
Position,
} from "@newgameplusinc/odyssey-audio-video-sdk-dev";
const sdk = new OdysseySpatialComms("https://mediasoup-server.example.com");
// 1) Join a room
await sdk.joinRoom({
roomId: "demo-room",
userId: "user-123",
deviceId: "device-123",
position: { x: 0, y: 0, z: 0 },
direction: { x: 0, y: 1, z: 0 },
});
// 2) Produce local media
const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: true });
for (const track of stream.getTracks()) {
await sdk.produceTrack(track);
}
// 3) Handle remote tracks
sdk.on("consumer-created", async ({ participant, track }) => {
if (track.kind === "video") {
attachVideo(track, participant.participantId);
}
});
// 4) Keep spatial audio honest
sdk.updatePosition(currentPos, currentDir);
sdk.setListenerFromLSD(listenerPos, cameraPos, lookAtPos);Audio Flow (Server ↔ Browser)
┌──────────────┐ update-position ┌──────────────┐ pose + tracks ┌──────────────────┐
│ Browser LSD │ ──────────────────▶ │ MediaSoup SFU│ ────────────────▶ │ SDK Event Bus │
│ (Unreal data)│ │ + Socket.IO │ │ (EventManager) │
└──────┬───────┘ └──────┬───────┘ └──────────┬────────┘
│ │ track + pose
│ │ ▼
│ ┌────────▼────────┐ ┌──────────────────┐
│ audio RTP │ consumer-created│ │ SpatialAudioMgr │
└──────────────────────────▶│ setup per-user │◀──────────────────────│ (Web Audio API) │
└────────┬────────┘ │ - Denoiser │
│ │ - HP / LP │
│ │ - HRTF Panner │
▼ │ - Gain + Comp │
Web Audio Graph └──────────┬───────┘
│ │
▼ ▼
Listener ears (Left/Right) System OutputWeb Audio Algorithms
- Coordinate normalization – Unreal sends centimeters;
SpatialAudioManagerauto-detects large values and converts to meters once. - Orientation math –
setListenerFromLSD()builds forward/right/up vectors from camera/LookAt to keep the listener aligned with head movement. - Dynamic distance gain –
updateSpatialAudio()measures distance from listener → source and applies a smooth rolloff curve, so distant avatars fade to silence. - Noise handling – the AudioWorklet denoiser now runs an adaptive multi-band gate (per W3C AudioWorklet guidance) before the high/low-pass filters, stripping constant HVAC/fan noise even when the speaker is close. A newly added silence gate mutes tracks entirely after ~250 ms of sub-noise-floor energy, eliminating hiss during dead air without touching spatial cues.
Noise-Cancellation Stack (What’s Included)
- Adaptive denoiser worklet – learns each participant’s noise floor in real time, then applies a multi-band downward expander plus dynamic low/high-pass shaping.
speechBoostlifts the low/mid band only when speech confidence is high, keeping consonants bright without reintroducing floor noise.highBandGate+highBandAttack/Releaseclamp constant fan hiss in the 4–12 kHz band whenever speechPresence is low, so background whoosh never leaks through live mics.
- Optional voice enhancement – autocorrelation-derived confidence (inspired by the tuner article) can raise the reduction floor when speech is present to keep vocals bright.
- Silence gate – if energy stays below
silenceFloorfor a configurable hold window, the track ramps to true silence, then wakes instantly once voice energy returns. - Classic filters – fixed high-pass/low-pass filters shave off rumble and hiss before signals reach the HRTF panner.
These layers run entirely in Web Audio, so you can ship “AirPods-style” background rejection in any browser without native code.
ts
const sdk = new OdysseySpatialComms(serverUrl, {
denoiser: {
threshold: 0.008,
maxReduction: 0.88,
hissCut: 0.52,
holdMs: 260,
voiceBoost: 0.65,
voiceSensitivity: 0.33,
voiceEnhancement: true,
silenceFloor: 0.00075,
silenceHoldMs: 520,
silenceReleaseMs: 160,
speechBoost: 0.35,
highBandGate: 0.7,
highBandAttack: 0.25,
highBandRelease: 0.12,
},
});
Voice enhancement (autocorrelation-based speech detection) is off by default to keep the gate extra quiet; enable it when you want brighter close-talk voicing. Tweak silenceFloor / silenceHoldMs if you need either more aggressive hiss removal or softer gating.
How Spatial Audio Is Built
- Telemetry ingestion – each LSD packet is passed through
setListenerFromLSD(listenerPos, cameraPos, lookAtPos)so the Web Audio listener matches the player’s real head/camera pose. - Per-participant node graph – when
consumer-createdyields a remote audio track,setupSpatialAudioForParticipant()spins up an isolated graph:MediaStreamSource → (optional) Denoiser Worklet → High-Pass → Low-Pass → Panner(HRTF) → Gain → Master Compressor. - Position + direction updates – every
participant-position-updatedevent callsupdateSpatialAudio(participantId, position, direction). The position feeds the panner’s XYZ, while the direction vector sets the source orientation so voices project forward relative to avatar facing. - Distance-aware gain – the manager stores the latest listener pose and computes the Euclidean distance to each remote participant on every update. A custom rolloff curve adjusts gain before the compressor, giving the “someone on my left / far away” perception without blowing out master levels.
- Left/right rendering – because the panner uses
panningModel = "HRTF", browsers feed the processed signal into the user’s audio hardware with head-related transfer functions, producing natural interaural time/intensity differences.
Video Flow (Capture ↔ Rendering)
┌──────────────┐ produceTrack ┌──────────────┐ RTP ┌──────────────┐
│ getUserMedia │ ───────────────▶ │ MediaSoup SDK│ ──────▶ │ MediaSoup SFU│
└──────┬───────┘ │ (Odyssey) │ └──────┬───────┘
│ └──────┬───────┘ │
│ consumer-created │ track │
▼ ▼ │
┌──────────────┐ ┌──────────────┐ │
│ Vue/React UI │ ◀─────────────── │ SDK Event Bus │ ◀──────────────┘
│ (muted video │ │ exposes media │
│ elements) │ │ tracks │
└──────────────┘ └──────────────┘Core Classes
src/index.ts–OdysseySpatialComms(socket lifecycle, producers/consumers, event surface).src/MediasoupManager.ts– transport helpers for produce/consume/resume.src/SpatialAudioManager.ts– Web Audio orchestration (listener transforms, per-participant chains, denoiser, distance math).src/EventManager.ts– lightweight EventEmitter used by the entire SDK.
Integration Checklist
- Instantiate once per page/tab and keep it in a store (Vuex, Redux, Zustand, etc.).
- Pipe LSD/Lap data from your rendering engine into
updatePosition()+setListenerFromLSD()at ~10 Hz. - Render videos muted – never attach remote audio tracks straight to DOM; let
SpatialAudioManagerown playback. - Push avatar telemetry back to Unreal so
remoteSpatialDatacan render minimaps/circles (see Odyssey V2sendMediaSoupParticipantsToUnreal). - Monitor logs – browser console shows
🎧 SDK,📍 SDK, and🎚️ [Spatial Audio]statements for every critical hop.
Server Contract (Socket.IO events)
| Event | Direction | Payload |
|-------|-----------|---------|
| join-room | client → server | {roomId, userId, deviceId, position, direction} |
| room-joined | server → client | RoomJoinedData (router caps, participants snapshot) |
| update-position | client → server | {participantId, conferenceId, position, direction} |
| participant-position-updated | server → client | {participantId, position, direction, mediaState} |
| consumer-created | server → client | {participantId, track(kind), position, direction} |
| participant-media-state-updated | server → client | {participantId, mediaState} |
Development Tips
- Run
pnpm install && pnpm buildinsidemediasoup-sdk-testto publish a fresh build. - Use
pnpm watchwhile iterating so TypeScript outputs live underdist/. - The SDK targets evergreen browsers; for Safari <16.4 you may need to polyfill AudioWorklets or disable the denoiser via
new SpatialAudioManager({ denoiser: { enabled: false } }).
Have questions or want to extend the SDK? Start with SpatialAudioManager – that’s where most of the “real-world” behavior (distance feel, stereo cues, denoiser) lives.
