browser-voice
v0.1.0
Published
Browser-first voice capture, playback, and WebSocket transport for realtime apps.
Maintainers
Readme
browser-voice
Browser-first voice capture, playback, and WebSocket transport for realtime apps.
This package is a small TypeScript library for:
- capturing microphone audio in the browser
- sending audio over plain WebSocket in configurable formats
- receiving remote audio over plain WebSocket in configurable formats
- playing remote audio with Web Audio controls
- exchanging custom JSON/control events alongside audio
It is intentionally simpler than a full realtime media SDK:
- no room / participant / publication model
- no WebRTC signaling
- no SFU assumptions
- no vendor-specific backend contract
Status
Experimental, but usable.
The API is designed to stay small and explicit. Expect additive evolution while the transport and browser-compatibility edges continue to harden.
Inspiration
This library is inspired by, and honestly vibe-coded from, LiveKit's WebSocket / voice implementation patterns, especially the browser audio, transport, and recovery ideas in livekit-client.
It is not a port of LiveKit, and it does not depend on LiveKit at runtime.
Attribution and license notes
This package is licensed under Apache-2.0.
Because the implementation is heavily inspired by, and in a few places adapted from, upstream open-source work, the package also includes:
NOTICECREDITS.mdTHIRD_PARTY_NOTICES.md
These files document upstream attributions and third-party notices, including:
- LiveKit client SDK inspiration / Apache-2.0 notice context
ts-debounceMIT attribution for the adapted debounce helper
Why this package exists
Use browser-voice when you want browser voice behavior inspired by larger realtime SDKs without adopting a full room/signaling model.
Typical use cases:
- browser client -> custom .NET / Node / Python voice backend over WebSocket
- raw PCM16 or JSON audio payloads instead of WebRTC
- custom control events mixed with audio on the same socket
- Azure / OpenAI / bespoke realtime backends
Features
- microphone capture defaults tuned for voice
- autoplay-safe
AudioContexthandling - explicit
startAudio()playback unlock flow - optional pre-connect audio buffering
- automatic capture recovery for ended / missing microphone tracks
- debounced media-device observation
- configurable incoming and outgoing audio formats
- reconnect backoff for plain WebSocket sessions
- backpressure-aware frame queueing
- optional capture-side noise suppression processor
- voice activity tracking
- remote playback analyser for visualizers
- remote playback gain / EQ / limiter hooks
- custom JSON event send / receive support
Install
npm install browser-voice tslibor
pnpm add browser-voice tslibRuntime requirements
- modern browser with:
MediaDevices.getUserMediaWebSocketAudioContext
- Node.js
>= 20.19.0for package development, docs generation, and demo tooling
This is a browser-first library. It is not intended to capture or play audio directly in Node.js.
Quick start
import {
NoiseSuppressionProcessor,
PcmAudioPlayer,
VoiceCapture,
VoiceWebSocket,
} from 'browser-voice';
const capture = new VoiceCapture({
autoRecover: true,
preConnectBufferMs: 1500,
processor: new NoiseSuppressionProcessor(),
targetSampleRate: 24000,
targetChannelCount: 1,
});
const player = new PcmAudioPlayer({
initialBufferMs: 120,
});
const voiceSocket = new VoiceWebSocket({
url: 'wss://example.com/voice',
capture,
player,
autoReconnect: true,
incomingAudioFormat: 'raw-pcm16',
incomingAudioFormatOptions: {
sampleRate: 24000,
channels: 1,
},
outgoingAudioFormat: 'raw-pcm16',
onJsonEvent: ({ parsed }) => {
console.log('control event', parsed);
},
});
await voiceSocket.connect();
await player.startAudio();
await capture.start();
voiceSocket.sendJsonEvent({
type: 'ping',
timestamp: Date.now(),
});Transport formats
Incoming audio formats
VoiceWebSocket supports:
framed-pcm16raw-pcm16jsonauto
Outgoing audio formats
VoiceWebSocket supports:
framed-pcm16raw-pcm16json
Default framed PCM layout
The default framed-pcm16 binary format is:
- 4 bytes: sample rate (
uint32, little-endian) - 2 bytes: channel count (
uint16, little-endian) - 2 bytes: reserved flags (
uint16, little-endian) - remaining bytes: signed PCM16 payload
raw-pcm16
Use raw-pcm16 when the backend expects plain binary PCM16 with no custom library header.
Important:
- outgoing browser audio is sent as raw PCM16 bytes only
- incoming binary audio is interpreted using
incomingAudioFormatOptions - text / JSON messages are treated as non-audio and routed to
onJsonEvent
json
Expected audio JSON examples:
{
"sampleRate": 24000,
"channels": 1,
"pcm16": [100, -200, 300]
}or
{
"sampleRate": 24000,
"channels": 1,
"pcm16Base64": "..."
}Non-audio JSON is routed to onJsonEvent.
auto
auto is for mixed transports. It will:
- try framed PCM first for binary audio
- accept JSON-wrapped audio
- ignore or route non-audio JSON to
onJsonEvent - fall back to raw PCM16 when
incomingAudioFormatOptionsare configured
Custom control / data events
You can send custom JSON to the backend independently of the audio format:
voiceSocket.sendJsonEvent({
type: 'assistant.reset',
correlationId: '123',
});You can also send plain text:
voiceSocket.sendTextMessage('ping');On inbound messages, use:
const socket = new VoiceWebSocket({
// ...
onJsonEvent: ({ format, parsed, rawText }) => {
console.log(format, parsed, rawText);
},
});Audio behavior notes
Silence and server VAD
If your backend depends on server-side VAD, do not drop silent outgoing frames unless you are also manually committing turns.
skipSilentFrames can reduce bandwidth, but it can also prevent backends like Azure server VAD from detecting end-of-speech.
Noise suppression
The library uses two layers:
- browser-native constraints such as
echoCancellation,noiseSuppression,autoGainControl, andvoiceIsolation - an optional
NoiseSuppressionProcessorthat applies lightweight browser-side filtering / gating / compression
The built-in processor is not a full acoustic echo canceller. It complements browser voice processing; it does not replace it.
Playback effects
PcmAudioPlayer supports:
setGain()/setVolume()setEqualizer()for low / mid / high EQ shelvessetLimiter()for a built-in limiter / dynamics-compressor setupsetProcessorChain()for customAudioNode[]getAnalyser()for visualizers
Example:
const player = new PcmAudioPlayer();
player.setEqualizer({
lowDb: 2,
midDb: -1,
highDb: 1,
});
player.setLimiter({
enabled: true,
threshold: -8,
ratio: 12,
});
const analyser = player.getAnalyser();
const volume = analyser.getVolume();
const bars = analyser.getFrequencyBands(16);Demo
Run the local demo server:
pnpm demoThen open the printed URL in your browser.
The demo includes:
- transport format selection
- sample rate / channel configuration
- mic capture
- playback unlock
- playback visualizer
- gain / EQ / limiter controls
- custom JSON event sender
- mixed text + binary relay through the demo server
API overview
Main exports:
VoiceCaptureVoiceWebSocketPcmAudioPlayerAudioPlaybackManagerPlaybackAnalyserNoiseSuppressionProcessorVoiceActivityDetectorobserveMediaDevicesBackoffStrategydebounce
Development
pnpm build
pnpm run docs:api
pnpm lint
pnpm test
pnpm coverageGenerated API docs are written to docs/api/.
Coverage reports are written to coverage/.
If demo behavior changes, also run:
node --check packages/browser-voice/demo/server.mjs
node --check packages/browser-voice/demo/public/app.jsLicense
Apache-2.0
