warp-voice-protocol
v0.0.16
Published
WebSocket conversation protocol implementation
Maintainers
Readme
Warp Voice Protocol
A WebSocket-based conversation protocol that makes it easy to create client-server voice and text conversations with real-time streaming, transcription, and type-safe message handling.
Motivation
Building real-time conversational interfaces typically requires:
- Complex WebSocket connection management
- Message chunking and reassembly for streaming data
- Protocol negotiation and error handling
- Type-safe message routing for text and voice
- Cross-platform binary data handling
Warp simplifies all of this into a clean, event-driven API that handles the low-level protocol details so you can focus on your conversation logic.
Features
- 🎯 Simple API: Event-driven interface for both client and server
- 🎤 Voice & Text: Built-in support for streaming voice and text messages
- 📝 Transcription: Real-time voice transcription acknowledgments
- 🔄 Streaming: Efficient chunked data transmission for large audio files
- 🛡️ Type Safe: Full TypeScript support with discriminated message types
- 🌐 Cross-Platform: Works in Node.js and browsers with automatic data normalization
- ⚡ Real-Time: Low-latency streaming with immediate chunk processing
Installation
npm install warp-voice-protocolQuick Start
Server Example
import { ConversationServer } from 'warp-voice-protocol';
import { transcribeAudio } from './your-transcription-service';
import { generateSpeech } from './your-tts-service';
const server = new ConversationServer(httpServer, '/chat', (conversation) => {
conversation.on('message', (message) => {
if (message.dataType === 'text') {
// Handle incoming text message
message.on('end', async (text: string) => {
console.log('Received text:', text);
// Echo the text back
conversation.sendMessage(`Echo: ${text}`);
// Send voice response
const audioStream = await generateSpeech(text);
const voiceStream = conversation.newVoiceStream();
for await (const chunk of audioStream) {
voiceStream.send(chunk);
}
voiceStream.end();
});
} else if (message.dataType === 'voice') {
// Handle incoming voice message
message.on('end', async (audioData: Uint8Array) => {
// Transcribe the audio
const transcription = await transcribeAudio(audioData);
// Send transcription acknowledgment
message.sendTranscription(transcription);
message.finalize();
// Process the transcription...
console.log('Voice transcription:', transcription);
});
}
});
conversation.on('end', () => {
console.log('Conversation ended');
});
});Client Example (React Hook)
import { useWarpVoiceChat } from './hooks/useWarpVoiceChat';
function VoiceChatComponent() {
const {
sendText,
startRecording,
stopRecording,
isConnected,
isRecording
} = useWarpVoiceChat({
onTextMessage: (text, sender) => {
console.log(`${sender}: ${text}`);
},
onVoiceMessage: (audioBlob) => {
const audio = new Audio(URL.createObjectURL(audioBlob));
audio.play();
},
onTranscription: (text) => {
console.log('Transcribed:', text);
}
});
return (
<div>
<button
onClick={() => sendText('Hello!')}
disabled={!isConnected}
>
Send Text
</button>
<button
onClick={isRecording ? stopRecording : startRecording}
disabled={!isConnected}
>
{isRecording ? 'Stop Recording' : 'Start Recording'}
</button>
</div>
);
}Client Hook Implementation
import { useState, useEffect, useRef } from 'react';
import { ConversationClient } from 'warp-voice-protocol';
export function useWarpVoiceChat(config = {}) {
const [isConnected, setIsConnected] = useState(false);
const [isRecording, setIsRecording] = useState(false);
const sessionRef = useRef(null);
useEffect(() => {
const client = new ConversationClient({
url: 'ws://localhost:8080/chat',
onConnection: (session) => {
sessionRef.current = session;
setIsConnected(true);
// Handle incoming messages
session.on('message', (message) => {
if (message.dataType === 'text') {
message.on('end', (text) => {
config.onTextMessage?.(text, 'ai');
});
message.finalize();
} else if (message.dataType === 'voice') {
const audioChunks = [];
message.on('chunk', (chunk) => audioChunks.push(chunk));
message.on('end', () => {
const audioBlob = new Blob(audioChunks, { type: 'audio/mpeg' });
config.onVoiceMessage?.(audioBlob);
});
message.finalize();
}
});
}
});
return () => {
if (sessionRef.current) {
sessionRef.current.end();
}
};
}, []);
const sendText = (text) => {
if (sessionRef.current) {
sessionRef.current.sendMessage(text);
config.onTextMessage?.(text, 'user');
}
};
const startRecording = async () => {
if (!sessionRef.current || isRecording) return;
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const mediaRecorder = new MediaRecorder(stream);
const voiceStream = sessionRef.current.newVoiceStream();
// Handle transcription
voiceStream.on('transcription', (transcription) => {
config.onTranscription?.(transcription);
});
// Stream audio chunks in real-time
mediaRecorder.ondataavailable = async (event) => {
if (event.data.size > 0) {
const arrayBuffer = await event.data.arrayBuffer();
voiceStream.send(new Uint8Array(arrayBuffer));
}
};
mediaRecorder.onstop = () => {
voiceStream.end();
setIsRecording(false);
stream.getTracks().forEach(track => track.stop());
};
mediaRecorder.start(100); // 100ms chunks for real-time streaming
setIsRecording(true);
};
const stopRecording = () => {
// Implementation details...
};
return { sendText, startRecording, stopRecording, isConnected, isRecording };
}Core Concepts
Message Types
The protocol supports two message types with different behaviors:
- Text Messages: Simple string data with completion acknowledgment
- Voice Messages: Binary audio data with optional transcription acknowledgment
Streaming
Both text and voice messages support chunked streaming:
// Send text in chunks
const textStream = conversation.sendMessage();
textStream.send('Hello ');
textStream.send('World!');
textStream.end();
// Send voice in chunks
const voiceStream = conversation.newVoiceStream();
for (const audioChunk of audioChunks) {
voiceStream.send(audioChunk); // Accepts Uint8Array, Buffer, ArrayBuffer, etc.
}
voiceStream.end();Cross-Platform Data Handling
The protocol automatically normalizes voice data across environments:
// All of these work seamlessly:
voiceStream.send(new Uint8Array(data)); // Universal
voiceStream.send(Buffer.from(data)); // Node.js
voiceStream.send(arrayBuffer); // Browser ArrayBuffer
voiceStream.send([1, 2, 3, 4]); // Number arrayArchitecture
Warp uses a clean layered architecture:
- Connection Layer: WebSocket management and JSON/Base64 encoding
- Conversation Layer: Message routing and lifecycle management
- Message Layer: Type-safe message handling and streaming
- Application Layer: Your conversation logic
For detailed architecture documentation, see docs/layered-architecture.md.
Error Handling
conversation.on('end', (reason, error) => {
if (reason === 'error') {
console.error('Conversation failed:', error);
}
});
message.on('error', (error) => {
console.error('Message error:', error);
});API Reference
ConversationSession
sendMessage(text: string): Send a text messagenewVoiceStream(): Create a new voice message streamend(): End the conversation
Message Events
message.on('chunk', callback): Receive data chunksmessage.on('end', callback): Message completedmessage.on('error', callback): Handle errors
Voice Message Acknowledgments
message.sendTranscription(text): Send transcription acknowledgmentmessage.finalize(): Complete message processing
License
MIT
