@enegix/chatwoot-voice-sdk

v0.2.0

Published

8 days ago

React Native SDK for Chatwoot WebRTC voice calls

0High
0Medium
0Low

abdelrahman_essawy

ahmed.agha

chatwoot webrtc voice react-native sdk

WebRTC Mobile SDK Integration Guide

Target audience: Engineers building a React Native (or any non-browser client) SDK that initiates voice calls to Chatwoot agents via the WebRTC voice provider.
Scope: Everything the SDK must know to interoperate with the existing server + agent-side WebRTC implementation. The SDK talks to the same HTTP endpoints and ActionCable signaling channel that the agent dashboard uses — it is a peer, not a separate system.
Status: Phase 1 (pilot). Inbound calls (customer → agent) are production-ready. Outbound calls (agent → customer) are deferred pending push-notification infrastructure.

1. Architecture at a Glance

┌──────────────────┐                                   ┌──────────────────┐
│  Mobile App      │                                   │   Agent Browser  │
│ (Your RN SDK)    │                                   │  (Dashboard SPA) │
└────────┬─────────┘                                   └─────────┬────────┘
         │                                                       │
         │ ①  POST /public/api/v1/webrtc/inboxes/:id/voice_token │
         │    → { token, source_id, ice_servers }                │
         │                                                       │
         │ ②  POST /public/api/v1/webrtc/calls                   │
         │    (Bearer token)                                     │
         │    → { call_sid, signaling_channel, ice_servers }     │
         │                                                       │
         │                                                       │ ③ message.created
         │                                                       │   (content_type:
         │                                                       │    'voice_call',
         │                                                       │    status: ringing)
         │                                                       │   → ring UI
         │                                                       │
         │                                                       │ ④ POST /api/v1/accounts
         │                                                       │    /:id/webrtc/calls
         │                                                       │    /:call_sid/accept
         │                                                       │    → { token }
         │                                                       │
         │                                                       │
         │  ⑤   ActionCable stream: voice_call_{call_sid}        │
         │◄─────────────────────────────────────────────────────►│
         │        {type: offer,  sdp, from: contact}             │
         │        {type: answer, sdp, from: agent}               │
         │        {type: ice-candidate, candidate, from: ...}    │
         │                                                       │
         │  ⑥   WebRTC media (P2P, relayed via TURN if needed)   │
         │◄═════════════════════════════════════════════════════►│
         │        Opus audio, DTLS-SRTP encrypted                │
         │                                                       │
         │  ⑦   {type: hangup, from: contact|agent}              │
         │◄─────────────────────────────────────────────────────►│

Key properties:

Pure peer-to-peer media. Chatwoot's server routes signaling only — audio never touches the backend. TURN relay kicks in only if direct P2P fails (NAT traversal).
ActionCable wire protocol over raw WebSocket. The SDK intentionally avoids @rails/actioncable and sends ActionCable frames (subscribe / message) over a plain WebSocket to avoid mobile reconnection races.
The call_sid is the conversation identifier. When the mobile app creates a call, a Conversation is created in Chatwoot with identifier = call_sid. A Message with content_type: 'voice_call' fires message.created, which is how agents are notified.
First-accept-wins. Multiple agents may be online. The first POST /webrtc/calls/:sid/accept that succeeds wins; concurrent accepts get 409 Conflict. The server also broadcasts a call_unavailable message on the shared signaling stream — losing agent clients tear down; the contact SDK must ignore it (see §9.2).
Time-scoped TURN credentials. The server issues short-lived (about 10 min) TURN credentials per call. Mid-call credential refresh is not currently exposed, so very long calls are a known limitation (see §9.7).

2. Prerequisites

2.1 Feature flag

The provider is gated behind the channel_voice feature flag. Before integration works against a Chatwoot deployment, an admin must enable it for the account.

Reference: plans/voice-webrtc-provider.md (implementation plan, §12 "Pilot rollout gates").

2.2 WebRTC Voice Inbox

An admin must create a Voice inbox with provider: 'webrtc' via the dashboard UI (Settings → Inboxes → New → Voice → WebRTC). This requires choosing a TURN provider:

| turn_provider | Required provider_config keys | Credentials | |------|--------|-------------| | coturn | turn_urls[], turn_realm, turn_shared_secret | Time-scoped HMAC-SHA1 (REST-auth) | | cloudflare | cloudflare_turn_app_id, cloudflare_turn_app_secret | Issued by Cloudflare Calls API | | custom | turn_urls[], turn_username, turn_credential | Static pass-through |

Reference: enterprise/app/models/channel/voice.rb:123–168 (validation) and enterprise/app/services/voice/provider/webrtc/turn/ (credential issuance per provider).

What the SDK receives: it never sees the TURN secrets. The server returns a ready-to-use ice_servers array conforming to the browser WebRTC RTCIceServer dictionary.

2.3 Inbox identifier

Each inbox has a numeric database ID. Your SDK must be configured with this inbox_id (and the Chatwoot base URL) at startup. Treat it as configuration, not a secret — it appears in public URLs.

2.4 React Native runtime dependencies

The SDK will need three categories of dependencies:

| Concern | Recommended package | |---------|---------------------| | WebRTC peer connection + media | react-native-webrtc | | Signaling transport | Built-in WebSocket (send/receive ActionCable frame format directly) | | HTTP + JWT | fetch (built-in) or axios; no JWT decode is needed client-side since the SDK only passes the token back to the server |

Native permissions:

iOS: NSMicrophoneUsageDescription in Info.plist
Android: android.permission.RECORD_AUDIO in AndroidManifest.xml, plus runtime permission request

Native linking: react-native-webrtc requires platform-specific configuration. Follow its installation guide; autolinking covers most cases on RN ≥ 0.71.

3. Authentication Model

The SDK uses two credentials that travel together on every call session:

3.1 Pubsub token (long-lived, per contact)

Issued on first registration (no source_id body).
Returned in the registration response as pubsub_token.
Persists across app launches. Store it securely (Keychain / EncryptedSharedPreferences).
Sent on subsequent voice-token requests as the HTTP header X-Chatwoot-Pubsub-Token: <stored token>. The server reads it only from this header (voice_tokens_controller.rb:64). Placing it in the request body is silently ignored and causes a 404 — this has bitten every SDK implementer at least once.

3.2 Voice JWT (short-lived, per call)

HS256-signed JWT with scope: 'voice.call', 10-minute expiry.
Minted fresh on each call attempt via POST /public/api/v1/webrtc/inboxes/:inbox_id/voice_token.
Used as:
- Bearer token on POST /public/api/v1/webrtc/calls (initiate)
- Subscription param on the ActionCable VoiceSignalingChannel ({ call_sid, token, role: 'contact' })
Single-use for the ActionCable subscription: the server atomically consumes the token's jti claim on subscribe to prevent replay.

JWT claims the SDK receives (informational — do not rely on fields the server hasn't documented as stable):

{
  "sub": "contact",
  "scope": "voice.call",
  "account_id": 1,
  "inbox_id": 9,
  "contact_inbox_id": 1234,
  "source_id": "abc-def-123",
  "iat": 1712345678,
  "exp": 1712346278,
  "jti": "uuid"
}

Reference: enterprise/app/services/voice/provider/webrtc/token_issuer.rb:10-22.

3.3 Anonymous vs. known contact

On the first registration, omit source_id. The server will:

Create a new Contact and ContactInbox record.
Return source_id + pubsub_token — the SDK must persist both.

On subsequent registrations (same device, same inbox), include source_id in the body and pubsub_token in the X-Chatwoot-Pubsub-Token header. The server validates the pair and re-issues a fresh voice token for the existing contact. Rules: always send them together or not at all. Sending source_id without the header returns 404; sending the header without source_id has no effect.

This lets you maintain conversation continuity: if the user calls three times over a week, all three calls become conversations attached to the same Contact in the agent inbox.

3.4 Contact identity (optional)

The voice-token request body also accepts four optional identity fields: name, email, phone_number (E.164 preferred), and identifier (the host app's stable user ID). The SDK surfaces these as a user block on registerDevice:

await client.registerDevice({
  devicePlatform: 'ios',
  deviceId: await getDeviceId(),
  user: currentUser && {
    name: currentUser.displayName,
    email: currentUser.email,
    phoneNumber: currentUser.phoneNumber,
    identifier: currentUser.id,
  },
});

Server behaviour:

First registration: any identity fields you send seed the new Contact. Without them the Contact falls back to "ios" / "android" / "web" as its display name — useful for anonymous sessions, confusing in an agent inbox once you know who the user is.
Subsequent calls: the server progressively fills blanks only — it will not overwrite a field an agent has edited in the dashboard. Calling registerDevice again after sign-in is the intended enrichment path.

Trust level: none. The server accepts whatever the client sends. If you need the server to verify the user is who the client claims, that is the HMAC-verified identifier_hash flow (mirrors Channel::WebWidget) — tracked separately and not implemented by this SDK today.

Token frugality: registerDevice is lazy. Supplying a user block does not mint a fresh voice token on its own — voice tokens are single-use and rate-limited. The identity is captured on the client and forwarded on the next startCall's token mint. Sign-in flows therefore do not spend a token.

Reference: app/controllers/public/api/v1/webrtc/voice_tokens_controller.rb:1-89.

4. Call Lifecycle (Happy Path)

Step │ Actor    │ Action
─────┼──────────┼────────────────────────────────────────────────────────────────────────
 1   │ Mobile   │ Mint voice token  (POST /voice_token)
 2   │ Mobile   │ Create call       (POST /calls)
 3   │ Mobile   │ Create RTCPeerConnection with returned ice_servers + local audio track
 4   │ Mobile   │ Subscribe to VoiceSignalingChannel (role: contact) via raw WebSocket
 5   │ Mobile   │ Create SDP offer, setLocalDescription(offer), cache offer SDP
 6   │ Mobile   │ Send {type: 'offer', sdp} and enter `ringing`
 7   │ Mobile   │ While `ringing`, replay cached offer every 2s until answer arrives
 8   │ Server   │ Creates Conversation + fires message.created (ringing)
 9   │ Agent    │ Ring UI opens in dashboard
 10  │ Agent    │ POST /webrtc/calls/:sid/accept (first-wins)
 11  │ Server   │ Broadcasts `call_unavailable` on shared stream
 12  │ Mobile   │ Ignores `call_unavailable`, but re-sends cached offer once immediately
 13  │ Agent    │ Receives offer → creates answer → sends {type:'answer', sdp}
 14  │ Mobile   │ Applies answer, flushes buffered outbound ICE, arms 15s connect budget
 15  │ Both     │ Exchange {type: 'ice-candidate'} messages (trickle ICE)
 16  │ Both     │ connectionState = 'connected' → audio flows
 17  │ Mobile   │ POST /calls/:sid/status {status: 'connected'} (fire-and-forget)
 18  │ Either   │ Sends {type: 'hangup'} and reports terminal status
 19  │ Both     │ Tear down peer connection + signaling

The "unhappy" branches (rate limit, token reuse, timeout, no agent picks up) are covered in §9 Error Handling.

5. REST API Reference

All public endpoints are under /public/api/v1/webrtc/. All responses are JSON. Errors follow standard { error: "...", errors: {...} } shapes.

5.1 Mint voice token

POST /public/api/v1/webrtc/inboxes/:inbox_id/voice_token
Content-Type: application/json

Body (first-time registration — anonymous):
{
  "device_platform": "ios",           // "ios" | "android" | "web"
  "device_id": "A3F6...-opaque-uuid"  // your persistent device identifier
}

Headers (returning contact):
X-Chatwoot-Pubsub-Token: <stored pubsub_token>   // from earlier response

Body (returning contact):
{
  "source_id":       "abc-def-123",      // from earlier response — body only
  "device_platform": "ios",
  "device_id":       "A3F6...-opaque-uuid"
}

Optional identity fields (any call — first-time or returning):
{
  "name":         "Ada Lovelace",
  "email":        "[email protected]",
  "phone_number": "+14155550123",        // E.164 preferred
  "identifier":   "app-user-42"          // host app's stable user id
}

Identity fields are top-level on the body — a nested user object is silently ignored by the server. On first registration they seed the new Contact; on subsequent calls they progressively fill blanks without clobbering agent-edited fields. Trust level: none — see §3.4.

Critical: on returning-contact calls the pubsub_token goes in the HTTP header, not the body. The server reads only the header; a body value is silently ignored.

Response 200 OK:

{
  "token": "eyJhbGciOi...",
  "source_id": "abc-def-123",
  "pubsub_token": "...",
  "identity": "contact-1234",
  "provider": "webrtc",
  "ice_servers": [
    { "urls": "stun:stun.cloudflare.com:3478" },
    { "urls": "turn:turn.example.com:3478?transport=udp",
      "username": "1712345678:user",
      "credential": "..." }
  ],
  "ice_servers_expires_at": 1712346278
}

Errors:

404 {"error":"not_found"} — covers several cases which look identical to the client on purpose (anti-enumeration): inbox missing, inbox not a WebRTC voice inbox, source_id sent with no matching ContactInbox, OR source_id present but the X-Chatwoot-Pubsub-Token header is missing / wrong. If you get 404 on a returning-contact call, your first suspect is a missing or mismatched header.
429 — rate limited (see §9.4)

Reference: app/controllers/public/api/v1/webrtc/voice_tokens_controller.rb:1-89.

5.2 Initiate call

POST /public/api/v1/webrtc/calls
Authorization: Bearer <token from 5.1>

No request body is required. The server derives contact + inbox identity from the JWT.

Response 200 OK:

{
  "call_sid": "wrtc_550e8400-e29b-41d4-a716-446655440000",
  "conversation_id": "42",
  "conference_sid": "conf_...",
  "signaling_channel": "voice_call_wrtc_550e8400-...",
  "ice_servers": [...],
  "ice_servers_expires_at": 1712346278
}

The SDK uses call_sid for the ActionCable subscription param and for the status endpoint.

Errors:

401 — invalid / expired / replayed token
429 — rate limited
422 — account not permitted for this provider

Reference: app/controllers/public/api/v1/webrtc/calls_controller.rb (see §11).

5.3 Report call status

POST /public/api/v1/webrtc/calls/:call_sid/status
Authorization: Bearer <voice token>
Content-Type: application/json

Body:
{ "status": "completed" }  // "failed" | "no-answer" | "canceled" | "connected"

Called by the SDK for two distinct things:

Terminal states (completed, failed, no-answer, canceled) — complementary to the {type: 'hangup'} signaling message: the status endpoint is the durable record, the signaling message is the live notification to the other peer.
connected (required, fire-and-forget) — the SDK must POST { "status": "connected" } as soon as peerConnection.connectionState === 'connected'. This records connected_at on the server. Without it, the server's Voice::CallTimeoutJob force-ends the call ~20 seconds after agent accept, even with media flowing happily. See §9.5 Connect timeout for the failure mode.
connected does not transition call_status — it only stamps a timestamp. It is idempotent and safely no-ops if:
- the call has already moved to a terminal status (completed, failed, canceled, etc.), or
- connected_at is already set.
Because it is advisory, always call it fire-and-forget with a .catch(() => {}): a failed POST must not tear the call down.

Response 200 OK: { "ok": true, "call_status": "<current>" }

Reference: app/controllers/public/api/v1/webrtc/calls_controller.rb (see §11).

6. ActionCable Wire Protocol (Raw WebSocket)

6.1 Connection

ActionCable here is a wire protocol, not a required JS client library. The SDK uses a plain WebSocket and sends ActionCable-compatible frames. Connect to:

wss://<chatwoot-host>/cable

(falls back to ws:// for non-HTTPS dev). On initial connect, the contact client sends a subscribe command:

{
  "command": "subscribe",
  "identifier": "{\"channel\":\"VoiceSignalingChannel\",\"call_sid\":\"wrtc_...\",\"token\":\"eyJ...\",\"role\":\"contact\"}"
}

The identifier is a string-encoded JSON object. This is standard ActionCable behavior. The server will respond:

{ type: "welcome" } — transport established
{ identifier, type: "confirm_subscription" } — subscription accepted
{ identifier, type: "reject_subscription" } — auth failed (token replay, scope mismatch, mismatched inbox/contact, role unknown). Treat this as fatal for this call attempt.

6.2 Sending signaling messages

After confirm_subscription, the SDK sends messages with:

{
  "command": "message",
  "identifier": "{...same identifier as subscribe...}",
  "data": "{\"action\":\"receive\",\"type\":\"offer\",\"sdp\":\"v=0...\"}"
}

The inner data.action must always be "receive" (that's the channel's dispatch method name on the server).

Supported outbound message types (contact SDK):

| type | Additional payload fields | Purpose | |--------|--------------------------|---------| | offer | sdp (string) | Send SDP offer to agent (initial + replay while ringing) | | ice-candidate | candidate (object — serialized RTCIceCandidate) | Trickle ICE candidate to agent | | hangup | reason (string, optional) | Gracefully tear down |

Any other type will be logged and dropped.

Rate limit: 50 messages / 10 seconds per subscription (Redis token bucket). With offer replay enabled (every 2s while ringing), normal calls stay far below this limit.

Reference: app/channels/voice_signaling_channel.rb:13-46.

6.3 Receiving broadcasts

Server broadcasts arrive as:

{
  "identifier": "{...}",
  "message": {
    "type": "offer|answer|ice-candidate|hangup|call_unavailable",
    "payload": { ... },
    "from": { "kind": "contact|agent|server", "id": <user_or_contact_id> },
    "call_sid": "wrtc_..."
  }
}

Echo suppression: the server broadcasts all messages including the sender's own. The SDK must check from.kind and ignore messages where the sender is itself. For the mobile SDK (role contact), that means:

if (msg.from?.kind === 'contact') return; // own echo

The agent client applies the mirror rule (from.kind === 'agent').

Reference: app/javascript/dashboard/api/channel/voice/webrtcVoiceClient.js:96-140.

Inbound payload shapes:

| type | payload shape | Action | |--------|-----------------|--------| | offer | { sdp } | setRemoteDescription, then create and send answer | | answer | { sdp } | setRemoteDescription, then flush any queued ICE (see §6.5) | | ice-candidate | { candidate } | addIceCandidate if remoteDescription is set; else queue (see §6.5) | | hangup | { reason? } | Tear down peer connection | | call_unavailable | { accepted_by_agent_id } | Contact: do not end call; optionally re-send cached offer if still ringing. Agent: losing peers tear down |

6.4 Unsubscribe semantics

On the server side, a contact unsubscribing while the call is still ringing transitions the conversation to canceled. For mobile apps this matters because if the user kills the app mid-ring, the agent-side UI will reflect a canceled call (not a phantom ring forever).

Reference: spec/channels/voice_signaling_channel_spec.rb:84-90.

6.5 ICE candidate ordering (required SDK pattern)

Trickle ICE is asynchronous by design. The agent can start emitting ICE candidates the instant its setLocalDescription(answer) resolves, but the contact's setRemoteDescription(answer) runs on a separate microtask queue. If an ice-candidate message lands and the SDK calls addIceCandidate before the answer has been applied, RTCPeerConnection throws InvalidStateError: The remote description was null.

Under good network conditions the answer usually arrives before any ICE candidates, which hides the bug for dozens of calls — then a slightly slower answer flips the ordering and every call fails at the same point.

The SDK must buffer ICE candidates that arrive while remoteDescription is null, and flush them immediately after setRemoteDescription(answer) resolves:

private pendingIceCandidates: RTCIceCandidateInit[] = [];

async onIceCandidateFromSignaling(candidate: RTCIceCandidateInit) {
  if (this.pc.remoteDescription) {
    await this.pc.addIceCandidate(candidate);
  } else {
    this.pendingIceCandidates.push(candidate);
  }
}

async onAnswerFromSignaling(sdp: string) {
  await this.pc.setRemoteDescription({ type: 'answer', sdp });
  for (const c of this.pendingIceCandidates.splice(0)) {
    try { await this.pc.addIceCandidate(c); } catch (_) { /* drop bad candidates */ }
  }
}

Also reset the queue on teardown so it does not leak into the next call. The agent-side reference implementation at app/javascript/dashboard/api/channel/voice/webrtcVoiceClient.js does exactly this pattern (pendingIceCandidates + flushPendingIceCandidates) — diff against it if your SDK behavior diverges.

7. Call State Machine

The server tracks call state on Conversation#additional_attributes['call_status']. Legal transitions:

               ┌─────────┐
 (created) ───►│ ringing │
               └────┬────┘
                    │
       ┌────────────┼────────────┬────────────┐
       ▼            ▼            ▼            ▼
  in-progress   no-answer     canceled      failed
       │                                        
       │                                        
  completed / failed

ringing — call created, awaiting accept
in-progress — agent accepted; peer connection alive. The server records connected_at on this state only when a client reports status: 'connected' (see §5.3). Without that report, the server's connect_timeout job force-ends the call at 20 s — see §9.5.
completed — normal termination
no-answer — 30-second ring timeout (server enforced via Voice::CallTimeoutJob)
canceled — contact hung up or unsubscribed while ringing
failed — ICE failure, connect timeout (20 s without connected_at), or unexpected error
missed — agent-oriented label (shown when direction is inbound and status ended without accept)

The SDK should drive its own local UI state from a subset: idle → dialing → ringing → connecting → connected → ended(reason). Map the server status into local UI as needed, but do not rely on server status alone — the SDK also observes peerConnection.connectionState.

Reference: enterprise/app/services/voice/call_state_machine.rb, enterprise/app/jobs/voice/call_timeout_job.rb.

8. React Native Implementation Guide

This section is a cookbook, not a complete SDK. Use it to structure your own. All examples use TypeScript and react-native-webrtc v124+.

8.1 Suggested module layout

chatwoot-voice-sdk/
├── src/
│   ├── ChatwootVoiceClient.ts       # Public API (start, hangup, events)
│   ├── auth/
│   │   ├── RegistrationManager.ts   # source_id + pubsub_token persistence
│   │   └── TokenService.ts          # voice-token minting
│   ├── call/
│   │   ├── CallSession.ts           # Per-call orchestration
│   │   ├── SignalingClient.ts       # Raw WebSocket ActionCable protocol client
│   │   └── PeerConnectionManager.ts # RTCPeerConnection lifecycle
│   ├── http/
│   │   └── ChatwootClient.ts        # fetch wrapper, base URL, headers
│   └── types.ts                     # shared types
└── index.ts

8.2 Public API surface (reference design)

import { ChatwootVoiceClient } from '@your-org/chatwoot-voice-sdk';

const client = new ChatwootVoiceClient({
  baseUrl: 'https://chatwoot.example.com',
  inboxId: 9,
  storage: asyncStorageAdapter,   // for persisting source_id / pubsub_token
});

// Register once per install (idempotent)
await client.registerDevice({
  devicePlatform: 'ios',
  deviceId: await getDeviceId(),
});

// Start a call
const call = await client.startCall();

call.on('ringing',    ()    => setUi('ringing'));
call.on('connected',  ()    => setUi('in-call'));
call.on('remoteStream', (s) => attachStreamToView(s));
call.on('ended',      (reason) => setUi('idle'));
call.on('error',      (err) => showToast(err.message));

// User taps hangup
await call.hangup();

8.3 Registration & token minting

// src/auth/TokenService.ts
export class TokenService {
  constructor(
    private http: ChatwootClient,
    private inboxId: number,
    private storage: Storage,
  ) {}

  async mintVoiceToken(): Promise<VoiceTokenResponse> {
    const existing = await this.storage.read('chatwoot.contact');
    const body: Record<string, unknown> = {
      device_platform: Platform.OS,
      device_id: await this.deviceId(),
    };
    const headers: Record<string, string> = {};

    if (existing) {
      // source_id → body, pubsub_token → header. Never both in body; the
      // server reads pubsub_token ONLY from the X-Chatwoot-Pubsub-Token
      // header (voice_tokens_controller.rb:64). Body values are ignored and
      // produce 404 {"error":"not_found"} on every repeat call.
      body.source_id = existing.sourceId;
      headers['X-Chatwoot-Pubsub-Token'] = existing.pubsubToken;
    }

    const res = await this.http.post(
      `/public/api/v1/webrtc/inboxes/${this.inboxId}/voice_token`,
      body,
      { headers },
    );

    // Persist on every successful response — the server may rotate the
    // pubsub_token, so treat the response as the source of truth.
    await this.storage.write('chatwoot.contact', {
      sourceId:    res.source_id,
      pubsubToken: res.pubsub_token,
    });

    return res;
  }
}

8.4 Peer connection & media acquisition

// src/call/PeerConnectionManager.ts
import {
  RTCPeerConnection,
  RTCSessionDescription,
  RTCIceCandidate,
  mediaDevices,
  MediaStream,
} from 'react-native-webrtc';

export class PeerConnectionManager extends EventTarget {
  private pc: RTCPeerConnection | null = null;
  private localStream: MediaStream | null = null;

  async initialize(iceServers: RTCIceServer[]): Promise<void> {
    this.localStream = await mediaDevices.getUserMedia({ audio: true });

    this.pc = new RTCPeerConnection({ iceServers });

    // Add local tracks
    this.localStream.getTracks().forEach(t => {
      this.pc!.addTrack(t, this.localStream!);
    });

    this.pc.ontrack = (e) => {
      this.dispatchEvent(new CustomEvent('remote-stream', { detail: e.streams[0] }));
    };
    this.pc.onicecandidate = (e) => {
      if (e.candidate) {
        this.dispatchEvent(new CustomEvent('ice-candidate', { detail: e.candidate }));
      }
    };
    this.pc.onconnectionstatechange = () => {
      const s = this.pc!.connectionState;
      if (s === 'connected') this.dispatchEvent(new Event('connected'));
      if (s === 'failed') this.dispatchEvent(new Event('connection-failed'));
      if (s === 'closed') this.dispatchEvent(new Event('connection-closed'));
      // `disconnected` is transient; do not end call immediately.
    };
    this.pc.oniceconnectionstatechange = () => {
      if (this.pc!.iceConnectionState === 'failed') {
        this.dispatchEvent(new Event('connection-failed'));
      }
    };
  }

  async createOffer(): Promise<RTCSessionDescriptionInit> {
    const offer = await this.pc!.createOffer({ offerToReceiveAudio: true });
    await this.pc!.setLocalDescription(offer);
    return offer;
  }

  // Required: flush any ICE candidates that arrived while remoteDescription
  // was null. See §6.5 for why — skipping this produces intermittent
  // InvalidStateError teardowns that look like random drops.
  private pendingIce: RTCIceCandidateInit[] = [];

  async acceptAnswer(sdp: string): Promise<void> {
    await this.pc!.setRemoteDescription(new RTCSessionDescription({ type: 'answer', sdp }));
    for (const c of this.pendingIce.splice(0)) {
      try { await this.pc!.addIceCandidate(new RTCIceCandidate(c)); } catch (_) {}
    }
  }

  async addRemoteIceCandidate(candidate: RTCIceCandidateInit): Promise<void> {
    if (this.pc!.remoteDescription) {
      await this.pc!.addIceCandidate(new RTCIceCandidate(candidate));
    } else {
      this.pendingIce.push(candidate);
    }
  }

  teardown(): void {
    this.pc?.close();
    this.localStream?.getTracks().forEach(t => t.stop());
    this.pc = null;
    this.localStream = null;
  }
}

This mirrors the agent-side client almost 1:1. The browser client is at app/javascript/dashboard/api/channel/voice/webrtcVoiceClient.js:29-140 — read it alongside your SDK code to verify behavior parity.

8.5 Signaling client (raw WebSocket)

Use a plain WebSocket and speak ActionCable frame format directly.

// src/call/SignalingClient.ts
export class SignalingClient {
  private ws: WebSocket | null = null;
  private identifier = '';
  private confirmed = false;

  subscribe({ cableUrl, callSid, token }: {
    cableUrl: string;
    callSid: string;
    token: string;
  }): Promise<void> {
    this.identifier = JSON.stringify({
      channel: 'VoiceSignalingChannel',
      call_sid: callSid,
      token,
      role: 'contact',
    });

    this.ws = new WebSocket(cableUrl);

    return new Promise((resolve, reject) => {
      this.ws!.onopen = () => {
        this.wsSend({ command: 'subscribe', identifier: this.identifier });
      };

      this.ws!.onmessage = (event) => {
        const frame = JSON.parse(String(event.data));

        if (frame.type === 'welcome' || frame.type === 'ping') return;
        if (frame.type === 'confirm_subscription') {
          this.confirmed = true;
          resolve();
          return;
        }
        if (frame.type === 'reject_subscription') {
          reject(new Error('Signaling subscription rejected'));
          return;
        }

        const msg = frame.message as SignalingMessage | undefined;
        if (!msg) return;
        if (msg.from?.kind === 'contact') return; // echo suppression
        this.emit('signal', msg);
      };
    });
  }

  send(type: SignalingMessageType, body: Record<string, unknown> = {}): void {
    if (!this.ws || !this.confirmed) return;
    this.wsSend({
      command: 'message',
      identifier: this.identifier,
      data: JSON.stringify({ action: 'receive', type, ...body }),
    });
  }
}

8.6 Stitching it together — CallSession (current reliable flow)

The call orchestration that avoids the "agent answered but contact stayed ringing" bug:

Mint voice token.
Create call.
Initialize peer connection + local mic track.
Subscribe signaling first (wait for confirm_subscription).
Create offer, set local description, cache offerSdp, send offer, transition to ringing.
While ringing, replay cached offer every 2s (stop replay as soon as answer is applied).
On call_unavailable: ignore as non-terminal, but re-send cached offer immediately.
On answer: transition to connecting, apply answer, flush buffered outbound ICE, arm 15s connection timeout.
On connected: emit connected + fire-and-forget POST /status {status:'connected'}.

Minimal sketch of the two critical pieces:

// Offer replay while waiting for answer
private offerSdp: string | null = null;
private offerReplayHandle: ReturnType<typeof setInterval> | null = null;

private armOfferReplay(): void {
  if (this.offerReplayHandle) return;
  this.offerReplayHandle = setInterval(() => {
    if (this.state !== 'ringing' || this.answerApplied || !this.offerSdp) return;
    this.signaling.send('offer', { sdp: this.offerSdp });
  }, 2000);
}

// call_unavailable handling
case 'call_unavailable': {
  // Do not end the call; this event is intended for losing agents.
  if (this.state === 'ringing' && !this.answerApplied && this.offerSdp) {
    this.signaling.send('offer', { sdp: this.offerSdp });
  }
  break;
}

This specific replay behavior is the mitigation that fixed the observed issue where the initial offer can be missed around agent accept timing.

8.7 Attaching remote audio

In React Native with react-native-webrtc, remote audio plays automatically through the OS audio pipeline once a track is attached — there is no view to mount and no <audio> element. What you do need is a correctly configured audio session, otherwise the mic may capture silence and/or remote audio may route to the wrong endpoint:

iOS: category playAndRecord, options like allowBluetooth / defaultToSpeaker.
Android: request RECORD_AUDIO at runtime; the audio mode defaults are usually fine for voice calls.

The community-standard way to handle both is react-native-incall-manager — add it as a peer dependency and call InCallManager.start({ media: 'audio' }) inside initialize() and InCallManager.stop() inside teardown(). This also manages proximity sensor / screen-off and earpiece routing automatically.

In a browser (your reference for the agent dashboard, or if you build a web caller), the rules are different: remote audio does not play automatically. You must create an <audio> element and set srcObject = stream, then call .play(). The dashboard reference implementation at app/javascript/dashboard/api/channel/voice/webrtcVoiceClient.js (attachRemoteStream / detachRemoteStream) does exactly this — copy the pattern if you are building a web SDK.

9. Error Handling & Edge Cases

9.1 Token replay / scope mismatch

Symptom: ActionCable subscribe returns reject_subscription immediately.

Cause: the jti has already been consumed (SDK retried a subscribe with the same token), or the role doesn't match the token's scope.

Fix: always mint a fresh voice token for each call attempt. Never cache a voice token across calls. If the subscribe is rejected, bail out and surface an error; do not retry with the same token.

Reference: enterprise/app/services/voice/provider/webrtc/token_decoder.rb:18-64.

9.2 Call unavailable (contact must ignore)

Symptom: signaling broadcast { type: 'call_unavailable', payload: { accepted_by_agent_id: N }, from: { kind: 'server' } } arrives on the contact's subscription right after the agent accepts.

Meaning (server intent): the message exists to tell other agents on the ring fan-out that one of them already won the race; losing agents are expected to dismiss their ring widgets locally. The contact and the agents share the same signaling stream, so the contact receives it too — that's a stream-design artifact, not a message meant for the contact.

Required SDK behavior: ignore it as a terminal event (do not end the call), and if still ringing with no answer yet, re-send the cached offer once immediately. If the contact treats call_unavailable as a hard failure and tears down, the sequence is:

Contact sends offer
Agent accepts → server broadcasts call_unavailable
Contact tears down (buggy)
Agent's answer arrives on an unsubscribed channel — no peer left
Mobile shows "couldn't connect" for a call that was working

The practical fix is: no-op termination + immediate offer replay on call_unavailable (see §8.6).

References: enterprise/app/services/voice/provider/webrtc/signaling_service.rb, enterprise/app/controllers/api/v1/accounts/webrtc/calls_controller.rb#accept.

9.3 No agent answers

Symptom: ringing → no-answer after 30 seconds. A hangup signaling message is broadcast with reason: 'no-answer'.

SDK behavior: treat as normal termination, show an appropriate UI ("Nobody answered — try again").

Reference: enterprise/app/jobs/voice/call_timeout_job.rb.

9.4 Rate limits

Two layers:

| Layer | Scope | Limit | |-------|-------|-------| | rack-attack on POST /voice_token | Per IP | Check config/initializers/rack_attack.rb for current thresholds | | rack-attack on POST /calls | Per IP | Same | | ActionCable message rate | Per subscription | 50 msgs / 10s |

Response on REST rate limit: 429 Too Many Requests. SDK should back off (honor Retry-After if present) and surface a friendly error. Do not auto-retry aggressively — a misbehaving SDK can flood the agent inbox with ring events.

9.5 Connect timeout (20-second server-side kill)

Symptom: call connects, both sides hear audio, then ~20 seconds after the agent accepts the call is force-ended with {type: 'hangup', reason: 'connect_timeout', from: { kind: 'server' }}. The mobile alert reads "call disconnected" even though nothing failed.

Cause: on agent accept, the server schedules Voice::CallTimeoutJob with a 20-second wait. When the job fires, it checks conversation.additional_attributes['connected_at'] — if that field is still missing, it assumes media never came up and ends the call as failed. The field is set when either peer reports media-connected.

Required SDK behavior: as soon as peerConnection.connectionState === 'connected', POST { "status": "connected" } to /public/api/v1/webrtc/calls/:call_sid/status (see §5.3). This records connected_at and the timeout job no-ops.

Fire-and-forget. Wrap in .catch(() => {}). A failed POST must not tear the call down — worst case is the server still times it out, which is no worse than not reporting at all.
Send exactly once per call (PCM connected fires once). The server is idempotent, so duplicates are safe but pointless.
The agent-side dashboard reports this too via its own endpoint (POST /api/v1/accounts/:account_id/webrtc/calls/:call_sid/connected); either peer winning the race is sufficient, but a well-behaved contact SDK should still report.

Reference: enterprise/app/jobs/voice/call_timeout_job.rb:54-66 (the check), enterprise/app/services/voice/call_status/manager.rb#record_connected (the write).

9.6 ICE failure

Symptom: peerConnection.connectionState === 'failed' or stays 'connecting' past a reasonable timeout (15s is a good SDK-side limit).

Diagnosis: likely a TURN issue. The SDK cannot fix this — it's a deployment problem. Surface an error, report status: 'failed', emit ended('ice-failed').

If this happens repeatedly for a specific TURN provider config, the Chatwoot admin needs to revisit TURN credentials.

9.7 ICE server credential expiry (long calls)

ice_servers_expires_at is ~10 minutes in the future. If a call lasts longer and the connection drops mid-call (ICE restart needed), you would need fresh credentials. The current server does not expose a "refresh credentials for existing call" endpoint. For v1, accept the limitation: calls that exceed the TURN credential lifetime and then lose their connection cannot recover. Document it to users as "up to 10-minute calls." Typical voice support calls end well within this window.

9.8 Network change (Wi-Fi → cellular)

react-native-webrtc can do an ICE restart to recover, but this requires a new offer/answer exchange. If you implement ICE restart, send a new offer (with iceRestart: true) through the same signaling channel. The agent client will produce a new answer. This is not yet tested in the Chatwoot implementation — validate carefully before shipping.

9.9 App backgrounded

When the user backgrounds the app during a call:

iOS: Request audio background mode and configure VoIP audio session. Consider CallKit integration for a native call UI.
Android: Keep a foreground service alive (and show a persistent notification). Consider ConnectionService.

Without CallKit/ConnectionService, the OS will eventually suspend the app and audio will cut out. This is an SDK concern, not a Chatwoot concern.

9.10 Agent answered but contact stays ringing (missed first offer)

Symptom: contact logs show call_unavailable, but no answer follows; call remains ringing until server/agent hangup.

Why this happens: in some deployments the first offer can be missed around the accept handoff. call_unavailable confirms an agent accepted, but does not guarantee that the winning agent already consumed the first contact offer.

Required SDK mitigation (implemented in this SDK):

Cache the generated offer SDP.
While ringing, replay cached offer every ~2 seconds.
On call_unavailable while still ringing and unanswered, re-send cached offer once immediately.
Stop replay as soon as answer is applied.

This mitigation is idempotent and stays below signaling rate limits in normal usage.

10. Known Limitations & Roadmap

Phase 1 (current) — supported

Inbound calls: mobile app → Chatwoot agent (browser dashboard)
Three TURN providers (Coturn, Cloudflare, custom)
Per-inbox TURN config
Anonymous + persistent contact identity (via source_id)
Rate limiting, replay protection, timeout handling

Phase 1 — not supported

Outbound calls (agent → mobile) — raises NotImplementedError on the server. Requires push infrastructure (APNs/FCM) to wake a backgrounded app. Reference: enterprise/app/services/voice/provider/webrtc/adapter.rb:11-16.
Call recording / playback
Mute, hold, transfer
IVR / routing trees
Voicemail
Skill-based routing
Multi-party calls
SDK-side ICE restart (untested; may work with careful implementation)

Current tech debt (from `CUSTOMIZATIONS.md` WebRTC entry)

The channel_voice.phone_number column stores a synthetic value ("webrtc-acct-{id}-{hex}") for WebRTC inboxes — a compatibility shim due to a NOT NULL constraint shared with Twilio. Likely to be refactored (split table per provider) in a later phase.
Strong-params array-of-scalars behavior for turn_urls relies on a Rails 7.1 quirk. There is a regression spec that will fail loudly if a future Rails version changes this.

11. Code References

The authoritative source is the code. These are the files to read when your SDK's behavior diverges from expectations:

Backend — Server

| File | Purpose | |------|---------| | enterprise/app/models/channel/voice.rb | Voice channel model, provider validation, initiate_call dispatch | | enterprise/app/services/voice/provider/webrtc/adapter.rb | WebRTC provider adapter (inbound-only) | | enterprise/app/services/voice/provider/webrtc/token_issuer.rb | JWT minting for contact + agent | | enterprise/app/services/voice/provider/webrtc/token_decoder.rb | JWT verification + jti replay prevention | | enterprise/app/services/voice/provider/webrtc/signaling_service.rb | Broadcast helpers for all signaling message types | | enterprise/app/services/voice/provider/webrtc/turn/coturn.rb | Coturn REST-auth credential generator | | enterprise/app/services/voice/provider/webrtc/turn/cloudflare.rb | Cloudflare Calls API integration | | enterprise/app/services/voice/provider/webrtc/turn/custom.rb | Static pass-through | | enterprise/app/services/voice/inbound_call_builder.rb | Creates Conversation + initial voice_call Message on call start | | enterprise/app/services/voice/call_state_machine.rb | Allowed transitions | | enterprise/app/services/voice/call_status/manager.rb | Processes status updates | | enterprise/app/jobs/voice/call_timeout_job.rb | Ring timeout (30s) + connect timeout (20s) |

Backend — Controllers & Channels

| File | Purpose | |------|---------| | app/controllers/public/api/v1/webrtc/voice_tokens_controller.rb | POST /voice_token | | app/controllers/public/api/v1/webrtc/calls_controller.rb | POST /calls, POST /calls/:sid/status | | enterprise/app/controllers/api/v1/accounts/webrtc/calls_controller.rb | POST /calls/:sid/accept (agent side) | | app/channels/voice_signaling_channel.rb | VoiceSignalingChannel — single signaling endpoint for both roles | | enterprise/app/controllers/enterprise/api/v1/accounts/inboxes_controller.rb | Whitelists voice channel type + provider: webrtc strong params | | config/routes.rb | WebRTC namespace (see lines with namespace :webrtc) |

Frontend — Agent Dashboard (reference implementation)

| File | Purpose | |------|---------| | app/javascript/dashboard/api/channel/voice/webrtcVoiceClient.js | Canonical reference for SDK behavior — peer connection, signaling, state handling | | app/javascript/dashboard/api/channel/voice/voiceClientFactory.js | Dispatches to Twilio or WebRTC client based on inbox | | app/javascript/dashboard/api/channel/voice/specs/webrtcVoiceClient.spec.js | Vitest unit tests — implicitly define the wire contract | | app/javascript/dashboard/composables/useCallSession.js | Call-session composable (ties client to Vuex) | | app/javascript/dashboard/stores/calls.js | Pinia store for active calls | | app/javascript/dashboard/helper/voice.js | Handlers for message.created / message.updated events with content_type: 'voice_call' | | app/javascript/dashboard/routes/dashboard/settings/inbox/channels/Voice.vue | Inbox creation UI — shows the exact payload shape admins submit |

Tests — authoritative contracts

| File | What it pins | |------|--------------| | spec/channels/voice_signaling_channel_spec.rb | Auth rules, scope enforcement, replay, rate limit, unsubscribe-on-ringing | | spec/enterprise/controllers/api/v1/accounts/inboxes_controller_webrtc_spec.rb | Strong params for inbox creation — shows every provider_config shape | | spec/enterprise/services/voice/provider/webrtc/token_issuer_spec.rb | JWT claims structure | | spec/enterprise/services/voice/provider/webrtc/token_decoder_spec.rb | JWT verification contract | | spec/enterprise/services/voice/provider/webrtc/signaling_service_spec.rb | Broadcast payload shapes | | spec/controllers/public/api/v1/webrtc/calls_controller_spec.rb | REST call lifecycle (including status transitions) | | spec/enterprise/controllers/api/v1/accounts/webrtc/calls_controller_spec.rb | Agent accept — first-accept-wins, 409 Conflict shape | | spec/enterprise/jobs/voice/call_timeout_job_spec.rb | Ring + connect timeout behavior |

Implementation plan & customization log

| File | Purpose | |------|---------| | plans/voice-webrtc-provider.md | Full 532-line locked-decision plan — the "why" behind every design choice | | CUSTOMIZATIONS.md | WebRTC entry — merge-risk notes, upstream-compatibility flags |

12. Appendix

12.1 Voice JWT payload (contact)

{
  "sub": "contact",
  "scope": "voice.call",
  "account_id": 1,
  "inbox_id": 9,
  "contact_inbox_id": 1234,
  "source_id": "abc-def-123",
  "iat": 1712345678,
  "exp": 1712346278,
  "jti": "550e8400-e29b-41d4-a716-446655440000"
}

12.2 Voice JWT payload (agent) — informational only; SDK never sees this

{
  "sub": "agent",
  "scope": "voice.session",
  "account_id": 1,
  "inbox_id": 9,
  "user_id": 42,
  "identity": "agent-42-account-1",
  "iat": 1712345678,
  "exp": 1712346578,
  "jti": "..."
}

12.3 `ice_servers` example

[
  { "urls": "stun:stun.cloudflare.com:3478" },
  {
    "urls": [
      "turn:one.example.com:3478?transport=udp",
      "turn:one.example.com:3478?transport=tcp"
    ],
    "username": "1712346278:abc-def-123",
    "credential": "HMAC-SHA1-computed-secret-base64"
  }
]

Pass this array directly to new RTCPeerConnection({ iceServers }) — react-native-webrtc accepts the same shape as the browser spec.

12.4 Full example session (pseudo-wire)

→ POST /public/api/v1/webrtc/inboxes/9/voice_token
  { "device_platform": "ios", "device_id": "..." }
← 200 { "token": "eyJ...", "source_id": "...", "pubsub_token": "...",
       "ice_servers": [...], "ice_servers_expires_at": ... }

→ POST /public/api/v1/webrtc/calls
  Authorization: Bearer eyJ...
← 200 { "call_sid": "wrtc_abc", "signaling_channel": "voice_call_wrtc_abc",
       "ice_servers": [...], ... }

→ WebSocket connect wss://chatwoot.example.com/cable
← { "type": "welcome" }

→ { "command": "subscribe",
    "identifier": "{\"channel\":\"VoiceSignalingChannel\",
                     \"call_sid\":\"wrtc_abc\",
                     \"token\":\"eyJ...\",\"role\":\"contact\"}" }
← { "identifier": "...", "type": "confirm_subscription" }

→ { "command": "message", "identifier": "...",
    "data": "{\"action\":\"receive\",\"type\":\"offer\",\"sdp\":\"v=0\\r\\n...\"}" }

                                              [agent accepts via dashboard]
                                              [agent subscribes with role: agent]

← { "identifier": "...", "message": {
      "type": "call_unavailable",
      "payload": { "accepted_by_agent_id": 42 },
      "from": { "kind": "server", "id": 0 },
      "call_sid": "wrtc_abc"
  }}

[contact ignores terminally; re-sends cached offer once if still ringing]

→ { "command": "message", "identifier": "...",
    "data": "{\"action\":\"receive\",\"type\":\"offer\",\"sdp\":\"v=0\\r\\n...\"}" }

← { "identifier": "...", "message": {
      "type": "answer",
      "payload": { "sdp": "v=0\r\n..." },
      "from": { "kind": "agent", "id": 42 },
      "call_sid": "wrtc_abc"
  }}

→ { "command": "message", ..., "data": "{\"action\":\"receive\",
                                          \"type\":\"ice-candidate\",
                                          \"candidate\": {...}}" }
← { ..., "message": { "type": "ice-candidate", "payload": {...},
                       "from": { "kind": "agent", ... }, ... }}

[peer connection connects — audio flows]

→ POST /public/api/v1/webrtc/calls/wrtc_abc/status
  Authorization: Bearer eyJ...
  { "status": "connected" }                    ← records connected_at;
← 200 { "ok": true, "call_status": "in-progress" }   prevents 20s connect_timeout

[user taps hangup]

→ { "command": "message", ..., "data": "{\"action\":\"receive\",
                                          \"type\":\"hangup\",
                                          \"reason\":\"user-ended\"}" }

→ POST /public/api/v1/webrtc/calls/wrtc_abc/status
  Authorization: Bearer eyJ...
  { "status": "completed" }
← 200 { "ok": true, "call_status": "completed" }

→ WebSocket close

Feedback & Questions

This document should track reality. If you find the SDK behaving differently than described here, treat it as either an SDK bug or a docs bug — read the referenced code, determine which, and fix the smaller of the two.

For changes that affect the wire protocol (e.g. a new signaling message type, changed claim structure, new endpoint), update this guide in the same PR.