warp-voice-protocol

v0.0.16

Published

9 months ago

WebSocket conversation protocol implementation

0High
0Medium
0Low

danield9tqh

websocket conversation streaming protocol

Warp Voice Protocol

A WebSocket-based conversation protocol that makes it easy to create client-server voice and text conversations with real-time streaming, transcription, and type-safe message handling.

Motivation

Building real-time conversational interfaces typically requires:

Complex WebSocket connection management
Message chunking and reassembly for streaming data
Protocol negotiation and error handling
Type-safe message routing for text and voice
Cross-platform binary data handling

Warp simplifies all of this into a clean, event-driven API that handles the low-level protocol details so you can focus on your conversation logic.

Features

🎯 Simple API: Event-driven interface for both client and server
🎤 Voice & Text: Built-in support for streaming voice and text messages
📝 Transcription: Real-time voice transcription acknowledgments
🔄 Streaming: Efficient chunked data transmission for large audio files
🛡️ Type Safe: Full TypeScript support with discriminated message types
🌐 Cross-Platform: Works in Node.js and browsers with automatic data normalization
⚡ Real-Time: Low-latency streaming with immediate chunk processing

Installation

npm install warp-voice-protocol

Quick Start

Server Example

import { ConversationServer } from 'warp-voice-protocol';
import { transcribeAudio } from './your-transcription-service';
import { generateSpeech } from './your-tts-service';

const server = new ConversationServer(httpServer, '/chat', (conversation) => {
  conversation.on('message', (message) => {
    if (message.dataType === 'text') {
      // Handle incoming text message
      message.on('end', async (text: string) => {
        console.log('Received text:', text);
        
        // Echo the text back
        conversation.sendMessage(`Echo: ${text}`);
        
        // Send voice response
        const audioStream = await generateSpeech(text);
        const voiceStream = conversation.newVoiceStream();
        for await (const chunk of audioStream) {
          voiceStream.send(chunk);
        }
        voiceStream.end();
      });
      
    } else if (message.dataType === 'voice') {
      // Handle incoming voice message
      message.on('end', async (audioData: Uint8Array) => {
        // Transcribe the audio
        const transcription = await transcribeAudio(audioData);
        
        // Send transcription acknowledgment
        message.sendTranscription(transcription);
        message.finalize();
        
        // Process the transcription...
        console.log('Voice transcription:', transcription);
      });
    }
  });
  
  conversation.on('end', () => {
    console.log('Conversation ended');
  });
});

Client Example (React Hook)

import { useWarpVoiceChat } from './hooks/useWarpVoiceChat';

function VoiceChatComponent() {
  const { 
    sendText, 
    startRecording, 
    stopRecording, 
    isConnected, 
    isRecording 
  } = useWarpVoiceChat({
    onTextMessage: (text, sender) => {
      console.log(`${sender}: ${text}`);
    },
    onVoiceMessage: (audioBlob) => {
      const audio = new Audio(URL.createObjectURL(audioBlob));
      audio.play();
    },
    onTranscription: (text) => {
      console.log('Transcribed:', text);
    }
  });

  return (
    <div>
      <button 
        onClick={() => sendText('Hello!')} 
        disabled={!isConnected}
      >
        Send Text
      </button>
      
      <button 
        onClick={isRecording ? stopRecording : startRecording}
        disabled={!isConnected}
      >
        {isRecording ? 'Stop Recording' : 'Start Recording'}
      </button>
    </div>
  );
}

Client Hook Implementation

import { useState, useEffect, useRef } from 'react';
import { ConversationClient } from 'warp-voice-protocol';

export function useWarpVoiceChat(config = {}) {
  const [isConnected, setIsConnected] = useState(false);
  const [isRecording, setIsRecording] = useState(false);
  const sessionRef = useRef(null);

  useEffect(() => {
    const client = new ConversationClient({
      url: 'ws://localhost:8080/chat',
      onConnection: (session) => {
        sessionRef.current = session;
        setIsConnected(true);

        // Handle incoming messages
        session.on('message', (message) => {
          if (message.dataType === 'text') {
            message.on('end', (text) => {
              config.onTextMessage?.(text, 'ai');
            });
            message.finalize();
          } else if (message.dataType === 'voice') {
            const audioChunks = [];
            message.on('chunk', (chunk) => audioChunks.push(chunk));
            message.on('end', () => {
              const audioBlob = new Blob(audioChunks, { type: 'audio/mpeg' });
              config.onVoiceMessage?.(audioBlob);
            });
            message.finalize();
          }
        });
      }
    });

    return () => {
      if (sessionRef.current) {
        sessionRef.current.end();
      }
    };
  }, []);

  const sendText = (text) => {
    if (sessionRef.current) {
      sessionRef.current.sendMessage(text);
      config.onTextMessage?.(text, 'user');
    }
  };

  const startRecording = async () => {
    if (!sessionRef.current || isRecording) return;

    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    const mediaRecorder = new MediaRecorder(stream);
    const voiceStream = sessionRef.current.newVoiceStream();

    // Handle transcription
    voiceStream.on('transcription', (transcription) => {
      config.onTranscription?.(transcription);
    });

    // Stream audio chunks in real-time
    mediaRecorder.ondataavailable = async (event) => {
      if (event.data.size > 0) {
        const arrayBuffer = await event.data.arrayBuffer();
        voiceStream.send(new Uint8Array(arrayBuffer));
      }
    };

    mediaRecorder.onstop = () => {
      voiceStream.end();
      setIsRecording(false);
      stream.getTracks().forEach(track => track.stop());
    };

    mediaRecorder.start(100); // 100ms chunks for real-time streaming
    setIsRecording(true);
  };

  const stopRecording = () => {
    // Implementation details...
  };

  return { sendText, startRecording, stopRecording, isConnected, isRecording };
}

Core Concepts

Message Types

The protocol supports two message types with different behaviors:

Text Messages: Simple string data with completion acknowledgment
Voice Messages: Binary audio data with optional transcription acknowledgment

Streaming

Both text and voice messages support chunked streaming:

// Send text in chunks
const textStream = conversation.sendMessage();
textStream.send('Hello ');
textStream.send('World!');
textStream.end();

// Send voice in chunks  
const voiceStream = conversation.newVoiceStream();
for (const audioChunk of audioChunks) {
  voiceStream.send(audioChunk); // Accepts Uint8Array, Buffer, ArrayBuffer, etc.
}
voiceStream.end();

Cross-Platform Data Handling

The protocol automatically normalizes voice data across environments:

// All of these work seamlessly:
voiceStream.send(new Uint8Array(data));    // Universal
voiceStream.send(Buffer.from(data));       // Node.js  
voiceStream.send(arrayBuffer);             // Browser ArrayBuffer
voiceStream.send([1, 2, 3, 4]);           // Number array

Architecture

Warp uses a clean layered architecture:

Connection Layer: WebSocket management and JSON/Base64 encoding
Conversation Layer: Message routing and lifecycle management
Message Layer: Type-safe message handling and streaming
Application Layer: Your conversation logic

For detailed architecture documentation, see docs/layered-architecture.md.

Error Handling

conversation.on('end', (reason, error) => {
  if (reason === 'error') {
    console.error('Conversation failed:', error);
  }
});

message.on('error', (error) => {
  console.error('Message error:', error);
});

API Reference

ConversationSession

sendMessage(text: string): Send a text message
newVoiceStream(): Create a new voice message stream
end(): End the conversation

Message Events

message.on('chunk', callback): Receive data chunks
message.on('end', callback): Message completed
message.on('error', callback): Handle errors

Voice Message Acknowledgments

message.sendTranscription(text): Send transcription acknowledgment
message.finalize(): Complete message processing

License

MIT