@codmir/realtime

v0.1.1

Published

6 days ago

Low-latency voice infrastructure with Hive Mode DSP

0High
0Medium
0Low

jobrayan

codmir realtime voice tts dsp hive-mode audio streaming webrtc ai

@codmir/realtime

Low-latency voice infrastructure with Hive Mode DSP - the collective consciousness voice effect.

Features

Contract-driven AI execution - Modal.com (now) → AWS (future)
Event streaming - SSE for Vercel Edge
Streaming TTS - ElevenLabs, Local (Coqui, Piper)
Hive Mode DSP - Multi-layered collective voice effect
Low-latency transport - WebRTC, WebSocket

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                    @codmir/realtime Pipeline                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  User Input                                                         │
│      │                                                              │
│      ▼                                                              │
│  ┌─────────────┐                                                    │
│  │   Runner    │  ← Modal (now) / AWS (future) / Local              │
│  └─────────────┘                                                    │
│      │ Streaming text                                               │
│      ▼                                                              │
│  ┌─────────────┐                                                    │
│  │  Chunker    │  ← Semantic text segmentation                      │
│  └─────────────┘                                                    │
│      │ Text chunks                                                  │
│      ▼                                                              │
│  ┌─────────────┐                                                    │
│  │    TTS      │  ← ElevenLabs / Local                              │
│  └─────────────┘                                                    │
│      │ Audio chunks                                                 │
│      ▼                                                              │
│  ┌─────────────┐                                                    │
│  │  Hive DSP   │  ← Multi-layer voice effect                        │
│  └─────────────┘                                                    │
│      │ Processed audio                                              │
│      ▼                                                              │
│  ┌─────────────┐                                                    │
│  │  Transport  │  ← WebRTC / WebSocket                              │
│  └─────────────┘                                                    │
│      │                                                              │
│      ▼                                                              │
│  Client                                                             │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Installation

pnpm add @codmir/realtime

Quick Start

import {
  createRealtimeSession,
  ModalRunner,
  ElevenLabsTTS,
} from '@codmir/realtime';

// Create session
const session = createRealtimeSession({
  runner: new ModalRunner({
    apiKey: process.env.MODAL_API_KEY,
  }),
  tts: new ElevenLabsTTS({
    apiKey: process.env.ELEVENLABS_API_KEY,
  }),
  voiceMode: 'hive',
});

// Start session
await session.start();

// Execute contract with voice output
await session.execute({
  id: 'task_1',
  version: '1.0',
  task: 'Explain quantum computing',
});

// Or speak directly
await session.speak('Welcome to the Council');

Hive Mode

Create the "collective consciousness" voice effect used in films:

import { HiveModeProcessor, HIVE_PRESETS } from '@codmir/realtime';

const processor = new HiveModeProcessor();

// Apply preset
processor.applyPreset('collective'); // subtle | collective | terrifying | ancient

// Or configure manually
const processor = new HiveModeProcessor({
  layers: 5,
  offsets: [0, 12, 25, 38, 50],      // ms delays
  pitchShifts: [0, -3, 2, -5, 4],    // semitones
  volumes: [1, 0.5, 0.5, 0.3, 0.3],
  reverb: 0.4,
  chorus: 0.3,
  intensity: 0.6,
});

// Process audio
const processed = processor.process(audioBuffer);

Hive Presets

| Preset | Description | Use Case | |--------|-------------|----------| | subtle | Light layering | Subtle otherworldly feel | | collective | Multiple voices | AI council, collective | | terrifying | Heavy effect | Dramatic moments | | ancient | Deep, reverberant | Ancient entities |

Browser (Web Audio API)

import { HiveModeWebAudio } from '@codmir/realtime';

const audioContext = new AudioContext();
const hive = new HiveModeWebAudio(audioContext);

// Connect to audio source (e.g., from TTS)
const source = audioContext.createBufferSource();
source.connect(hive.getInput());
hive.connectToDestination();

// Adjust in real-time
hive.setIntensity(0.8);
hive.applyPreset('terrifying');

Contract Runners

Modal Runner (Serverless)

import { ModalRunner } from '@codmir/realtime';

const runner = new ModalRunner({
  apiKey: process.env.MODAL_API_KEY,
  appName: 'codmir-realtime',
  functionName: 'execute_contract',
  gpu: 'T4', // Optional GPU
  scaledownWindow: 300, // Keep warm for 5 min
});

const run = await runner.start({
  id: 'contract_1',
  version: '1.0',
  task: 'Generate a poem about AI',
});

// Stream events
for await (const event of runner.stream(run.id)) {
  console.log(event.type, event.data);
}

Local Runner (Privacy)

import { LocalRunner } from '@codmir/realtime';

const runner = new LocalRunner({
  provider: 'ollama',
  model: 'llama3.2',
  localUrl: 'http://localhost:11434',
});

Event Streaming (SSE)

Server (Vercel/Next.js)

// app/api/realtime/stream/route.ts
import { createSSEResponse, ModalRunner } from '@codmir/realtime';

export async function POST(req: Request) {
  const { contract } = await req.json();
  
  const runner = new ModalRunner({ apiKey: process.env.MODAL_API_KEY });
  const run = await runner.start(contract);
  
  return createSSEResponse(runner.stream(run.id));
}

Client

import { SSEClient } from '@codmir/realtime';

const client = new SSEClient({
  url: '/api/realtime/stream?runId=run_123',
});

client.connect();

client.on('step.output.delta', (event) => {
  console.log('Token:', event.data.delta);
});

client.on('voice.chunk', (event) => {
  // Play audio chunk
});

TTS Providers

ElevenLabs (Streaming)

import { ElevenLabsTTS } from '@codmir/realtime';

const tts = new ElevenLabsTTS({
  apiKey: process.env.ELEVENLABS_API_KEY,
  modelId: 'eleven_turbo_v2_5',
  optimizeStreamingLatency: 3,
});

// Stream synthesis
for await (const chunk of tts.streamSynthesize('Hello world')) {
  // Process audio chunk
}

Local TTS (Coqui/Piper)

import { LocalTTS } from '@codmir/realtime';

const tts = new LocalTTS({
  provider: 'coqui',
  localUrl: 'http://localhost:5002',
});

Text Chunking

Optimize latency with smart text segmentation:

import { TextChunker } from '@codmir/realtime';

const chunker = new TextChunker({
  strategy: 'adaptive', // sentence | clause | word | time | adaptive
  minChars: 20,
  maxChars: 200,
  maxTimeMs: 500,
});

// Add streaming text
for (const token of tokens) {
  const chunks = chunker.add(token);
  for (const chunk of chunks) {
    await tts.synthesize(chunk.text);
  }
}

// Flush remaining
const final = chunker.flush();

Voice Modes

import { VoiceModeProcessor } from '@codmir/realtime';

const processor = new VoiceModeProcessor('hive');

// Available modes:
// - normal   : Single voice
// - hive     : Multi-layered collective
// - whisper  : Quiet, intimate
// - entity   : Deep, reverberant
// - oracle   : Ethereal, wise
// - swarm    : Many distinct voices

processor.setMode('entity');
processor.setIntensity(0.8);

Transport

WebRTC (Lowest Latency)

import { WebRTCTransport } from '@codmir/realtime';

const transport = new WebRTCTransport({
  signalingUrl: '/api/rtc/signal',
  iceServers: [{ urls: 'stun:stun.l.google.com:19302' }],
});

await transport.connect();

transport.onAudio((frame) => {
  // Play received audio
});

WebSocket (Fallback)

import { WebSocketTransport } from '@codmir/realtime';

const transport = new WebSocketTransport({
  url: 'wss://api.example.com/realtime',
  autoReconnect: true,
});

Latency Optimization

Aggressive chunking - Start speaking before sentence completes
Streaming TTS - Use ElevenLabs WebSocket API
Client-side DSP - Do hive processing in browser
WebRTC transport - Native low-latency audio
Warm containers - Keep Modal containers warm

Target latency:

Time-to-first-audio: 300-700ms
Conversational feel: <1s round-trip

Multi-AI Conference (7 AI Assistants)

Create real-time voice conferences with multiple AI participants using PersonaPlex:

import {
  ConferenceRunner,
  createCouncilConference,
  DEFAULT_COUNCIL,
} from '@codmir/realtime';

// Quick start: Council of 7 AI experts
const session = await createCouncilConference(
  'wss://personaplex.example.com:8998',
  'The future of artificial intelligence'
);

// Or create custom conference
const runner = new ConferenceRunner({
  serverUrl: 'wss://personaplex.example.com:8998',
  turnMode: 'free',       // free | round_robin | moderated | priority
  mixingMode: 'stereo',   // stereo | mono | separate
  onParticipantAudio: (id, opusData) => playAudio(id, opusData),
  onParticipantText: (id, text) => updateTranscript(id, text),
});

await runner.startConference({
  participants: [
    {
      id: 'sage',
      name: 'Sage',
      voice: 'NATF0',
      persona: 'A wise philosopher who considers multiple perspectives',
      pan: -0.5,
      role: 'expert',
    },
    {
      id: 'analyst',
      name: 'Analyst',
      voice: 'NATM1',
      persona: 'A data scientist focused on facts and evidence',
      pan: 0.5,
      role: 'expert',
    },
    // Up to 7 participants
  ],
  topic: 'Climate change solutions',
});

// User speaks - audio goes to all participants
runner.sendAudio(userOpusFrame);

// Control turns
runner.requestSpeaker('sage');
runner.muteParticipant('analyst');

// Get transcripts
const transcripts = runner.getTranscripts();

Default Council of 7

Pre-configured diverse AI personas:

| ID | Name | Voice | Role | Personality | |----|------|-------|------|-------------| | sage | Sage | NATF0 | Expert | Wise philosopher | | nova | Nova | NATF2 | Participant | Energetic innovator | | atlas | Atlas | NATM0 | Expert | Pragmatic analyst | | oracle | Oracle | VARF1 | Host | Mysterious seer | | ember | Ember | NATF3 | Participant | Ethics advocate | | cipher | Cipher | NATM2 | Expert | Technical specialist | | echo | Echo | VARM1 | Participant | Consensus builder |

React Hook

import { useConference, useCouncilConference } from '@codmir/realtime';

function ConferenceRoom() {
  const {
    session,
    state,
    participants,
    speakers,
    start,
    stop,
    sendAudio,
    transcripts,
    isActive,
  } = useCouncilConference('wss://personaplex.example.com:8998');

  return (
    <div>
      <button onClick={() => start({ topic: 'AI Ethics' })}>
        Start Conference
      </button>
      
      {participants.map(p => (
        <div key={p.config.id} className={speakers.includes(p.config.id) ? 'speaking' : ''}>
          {p.config.name}: {transcripts.get(p.config.id)}
        </div>
      ))}
    </div>
  );
}

Turn Modes

| Mode | Description | |------|-------------| | free | All participants can speak anytime | | round_robin | Cycle through participants in order | | moderated | Host controls who speaks | | priority | Higher priority participants speak first | | reactive | Respond based on what was said |

Future: AWS Migration

// Phase 1: Modal Runner (now)
const runner = new ModalRunner({ ... });

// Phase 2: AWS Runner (future)
import { AWSRunner } from '@codmir/realtime';
const runner = new AWSRunner({
  region: 'us-east-1',
  functionArn: 'arn:aws:lambda:...',
  provisionedConcurrency: 5, // Warm containers
});

// Same interface - swap without client changes

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@codmir/realtime

Features

Architecture

Installation

Quick Start

Hive Mode

Hive Presets

Browser (Web Audio API)

Contract Runners

Modal Runner (Serverless)

Local Runner (Privacy)

Event Streaming (SSE)

Server (Vercel/Next.js)

Client

TTS Providers

ElevenLabs (Streaming)

Local TTS (Coqui/Piper)

Text Chunking

Voice Modes

Transport

WebRTC (Lowest Latency)

WebSocket (Fallback)

Latency Optimization

Multi-AI Conference (7 AI Assistants)

Default Council of 7

React Hook

Turn Modes

Future: AWS Migration

License