@navai/voice-mobile

v0.1.6

Published

3 months ago

Mobile helpers to integrate OpenAI Realtime voice with Navai backend routes

Downloads

0High
0Medium
0Low

luxisoft

@navai/voice-mobile

Mobile package to run Navai voice agents in React Native applications.

It provides a complete mobile stack for:

Backend client_secret retrieval.
WebRTC transport negotiation for Realtime.
Route-aware and function-aware mobile tools.
Local dynamic function loading.
Realtime tool call parsing and result event emission.
React hook lifecycle for microphone, transport, and session.
Optional local speech playback for hybrid ElevenLabs output.

Installation

npm install @navai/voice-mobile
npm install react react-native react-native-webrtc

react-native-webrtc is a peer dependency and must exist in the consuming app.

Architecture Overview

This package is organized in layers:

Runtime/config layer

src/runtime.ts
resolves env values, API URL, routes file, function folder filters, agent folders, and model override.

Function layer

src/functions.ts
loads local modules and converts exports into callable function definitions.

Agent runtime layer

src/agent.ts
builds mobile instructions and tool schemas.
executes navigate_to and execute_app_function.
parses tool calls from Realtime events.
creates response events for function_call outputs.

Backend bridge layer

src/backend.ts
API client for Navai backend routes.

Session orchestration layer

src/session.ts
coordinates backend client + transport.
handles start/stop, function preloading, event forwarding.

Transport layer

src/transport.ts
interface contract for custom transports.
src/react-native-webrtc.ts implementation for React Native WebRTC.

React integration layer

src/useMobileVoiceAgent.ts
hook that combines runtime, local functions, backend tools, permissions, and session state.

End-to-End Flow

Typical hook-driven flow:

App resolves runtime config with generated module loaders.
Hook dynamically loads react-native-webrtc.
Hook loads local function registry from module loaders.
On start():

validates runtime state.
requests Android microphone permission when needed.
creates backend client and WebRTC transport.
starts mobile voice session (client secret + transport connect).
builds mobile agent runtime (instructions + tool schemas).
sends session.update event with tools and instructions.
when speech.provider=elevenlabs, the session is configured to receive text output and the hook can synthesize/play audio locally.

During conversation:

incoming Realtime events are parsed for tool calls.
tool call outputs are emitted back via conversation.item.create and response.create.

On stop():

transport disconnects.
local refs and pending tool maps are cleared.

Public API

Main exports:

resolveNavaiMobileEnv(...)
resolveNavaiMobileRuntimeConfig(...)
resolveNavaiMobileApplicationRuntimeConfig(...)
loadNavaiFunctions(...)
createNavaiMobileAgentRuntime(...)
extractNavaiRealtimeToolCalls(...)
buildNavaiRealtimeToolResultEvents(...)
createNavaiMobileBackendClient(...)
createNavaiMobileVoiceSession(...)
createReactNativeWebRtcTransport(...)
useMobileVoiceAgent(...)

Important types:

NavaiRoute
NavaiFunctionDefinition
NavaiRealtimeTransport
NavaiMobileVoiceSession
NavaiBackendSpeechConfig
NavaiMobileSpeechPlayer
ResolveNavaiMobileApplicationRuntimeConfigResult
UseMobileVoiceAgentTransportOptions

Tool Runtime Design

Mobile tool surface is intentionally stable:

navigate_to
execute_app_function

Execution behavior:

navigate_to

validates target.
resolves route using route matcher.
calls navigate(path).

execute_app_function

validates function_name.
tries local function first.
falls back to backend function if local not found.

Graceful compatibility fallback:

if model calls a function name directly as tool, runtime routes it as execute_app_function.

Realtime Tool Event Handling

extractNavaiRealtimeToolCalls understands multiple event families:

response.function_call_arguments.done
response.output_item.done
response.output_item.added
conversation.item.created
conversation.item.added
conversation.item.done
conversation.item.retrieved
response.done

Partial tool calls are ignored until completed status is available.

buildNavaiRealtimeToolResultEvents emits two events:

conversation.item.create with function_call_output.
response.create to resume model generation.

Runtime Config and Env Resolution

resolveNavaiMobileRuntimeConfig priority:

Explicit options.
Env object values.
Defaults.

Keys:

NAVAI_FUNCTIONS_FOLDERS
NAVAI_AGENTS_FOLDERS
NAVAI_ROUTES_FILE
NAVAI_REALTIME_MODEL

Defaults:

routes file: src/ai/routes.ts
functions folder: src/ai/functions-modules

Multi-agent layout:

NAVAI_FUNCTIONS_FOLDERS=src/ai
NAVAI_AGENTS_FOLDERS=main,support,sales
local modules live in src/ai/<agent>/...
optional per-agent config file: src/ai/<agent>/agent.config.ts

Matcher formats:

folder
recursive folder (/...)
wildcard (*)
explicit file
CSV list

Fallback behavior:

configured folders with no matches emit warning.
resolver falls back to default folder.

Current limitation:

mobile resolves agent folders and primary agent metadata, but the mobile runtime still exposes one active tool surface (navigate_to, execute_app_function).
official realtime handoffs are implemented in web first; mobile is prepared for the same structure but not yet switched to SDK-native handoffs.

resolveNavaiMobileApplicationRuntimeConfig also resolves:

apiBaseUrl from:
1. explicit apiBaseUrl
2. env.NAVAI_API_URL
3. explicit defaultApiBaseUrl
4. default http://localhost:3000
warning when generated module loader map is empty.

resolveNavaiMobileEnv lets you merge multiple env-like sources (for example Expo extra, process.env, custom config object).

Backend Client Contract

createNavaiMobileBackendClient calls:

POST /navai/realtime/client-secret
POST /navai/speech/synthesize
GET /navai/functions
POST /navai/functions/execute

Base URL priority:

apiBaseUrl option.
env.NAVAI_API_URL.
fallback http://localhost:3000.

listFunctions returns warnings instead of throwing on most parse/network failures.

createClientSecret and executeFunction throw on request failures or invalid responses.

createClientSecret() returns { value, expires_at, speech }, where speech.provider is openai or elevenlabs.

Session Orchestrator Details

createNavaiMobileVoiceSession responsibilities:

Function list cache.
Session state transitions (idle, connecting, connected, error).
Start flow:

optional backend function preload.
client secret request.
transport connect with clientSecret and optional model.

Stop flow:

transport disconnect.

Realtime event send helper (requires transport sendEvent implementation).

React Native WebRTC Transport Details

createReactNativeWebRtcTransport default behavior:

realtime endpoint: https://api.openai.com/v1/realtime/calls
model default: gpt-realtime
creates RTCPeerConnection
opens data channel oai-events
captures microphone via mediaDevices.getUserMedia
negotiates SDP with OpenAI
waits for data channel open before resolving connect

Resilience behavior:

tracks transport state (idle, connecting, connected, error, closed)
propagates connection/data channel errors via callbacks
cleans tracks, channel, and connection on disconnect
supports configurable remote audio track volume via private _setVolume when available

React Hook Internals

useMobileVoiceAgent adds app-level behavior:

Android microphone permission request.
dynamic require("react-native-webrtc").
pending tool call queue while runtime/session is initializing.
dedup of handled tool call ids.
automatic session.update after session starts.
optional transportOptions passthrough for rtcConfiguration, audioConstraints, and remoteAudioTrackVolume.
optional speechPlayer for local playback when backend uses ElevenLabs hybrid TTS.

Hook states:

idle
connecting
connected
error

Agent voice state exposed by the hook:

agentVoiceState: idle | speaking
isAgentSpeaking: boolean

agentVoiceState is inferred from realtime audio events (response.output_audio.delta, response.output_audio.done, output_audio_buffer.started, output_audio_buffer.stopped, response.done).

Hybrid Speech Mode

When backend returns speech.provider: "elevenlabs":

useMobileVoiceAgent updates the Realtime session to request output_modalities: ["text"].
assistant final text is sent to backendClient.synthesizeSpeech(...).
the synthesized audio is played through the app-provided speechPlayer.
if no speechPlayer is provided, the hook logs a warning and skips local playback.

Generated Loader CLI

This package ships:

navai-generate-mobile-loaders

Default behavior:

Read NAVAI_FUNCTIONS_FOLDERS and NAVAI_ROUTES_FILE from process env or .env.
Read NAVAI_AGENTS_FOLDERS when present.
Scan src/ for source files.
Select only modules matching configured function folders.
If agents are configured, keep only files inside src/ai/<agent>/....
Include route module.
Include files referenced by route module string literals like src/... (for screen modules).
Write src/ai/generated-module-loaders.ts.

Useful flags:

--project-root <path>
--src-root <path>
--output-file <path>
--env-file <path>
--default-functions-folder <path>
--default-routes-file <path>
--type-import <module>
--export-name <identifier>

Auto Setup on npm Install

Postinstall can auto-add missing scripts:

generate:ai-modules -> navai-generate-mobile-loaders
predev -> npm run generate:ai-modules
preandroid -> npm run generate:ai-modules
preios -> npm run generate:ai-modules
pretypecheck -> npm run generate:ai-modules

Rules:

only missing scripts are added.
existing scripts are never overwritten.

Disable auto setup:

NAVAI_SKIP_AUTO_SETUP=1
or NAVAI_SKIP_MOBILE_AUTO_SETUP=1

Manual setup runner:

npx navai-setup-voice-mobile

Integration Examples

Low-level integration:

import { mediaDevices, RTCPeerConnection } from "react-native-webrtc";
import {
  createNavaiMobileBackendClient,
  createNavaiMobileVoiceSession,
  createReactNativeWebRtcTransport
} from "@navai/voice-mobile";

const backend = createNavaiMobileBackendClient({
  apiBaseUrl: "http://localhost:3000"
});

const transport = createReactNativeWebRtcTransport({
  globals: { mediaDevices, RTCPeerConnection }
});

const session = createNavaiMobileVoiceSession({
  backendClient: backend,
  transport,
  onRealtimeEvent: (event) => console.log(event),
  onRealtimeError: (error) => console.error(error)
});

await session.start();

Hook integration:

import { useMobileVoiceAgent } from "@navai/voice-mobile";

const voice = useMobileVoiceAgent({
  runtime,
  runtimeLoading,
  runtimeError,
  navigate: (path) => navigation.navigate(path as never)
});

React Native CLI Android (opt-in transport config):

Leave transportOptions undefined in Expo if the current defaults already work for your app. In bare React Native CLI on Android, you can opt in to an explicit WebRTC transport configuration without changing Expo behavior.

import { Platform } from "react-native";
import {
  useMobileVoiceAgent,
  type UseMobileVoiceAgentTransportOptions
} from "@navai/voice-mobile";

const androidBareTransportOptions: UseMobileVoiceAgentTransportOptions | undefined =
  Platform.OS === "android"
    ? {
        rtcConfiguration: {
          iceServers: [{ urls: ["stun:stun.l.google.com:19302"] }]
        },
        audioConstraints: {
          audio: {
            echoCancellation: true,
            noiseSuppression: true,
            autoGainControl: true
          },
          video: false
        },
        remoteAudioTrackVolume: 10
      }
    : undefined;

const voice = useMobileVoiceAgent({
  runtime,
  runtimeLoading,
  runtimeError,
  navigate: (path) => navigation.navigate(path as never),
  transportOptions: androidBareTransportOptions
});

Expected Backend Routes

POST /navai/realtime/client-secret
POST /navai/speech/synthesize
GET /navai/functions
POST /navai/functions/execute

These can be provided by registerNavaiExpressRoutes from @navai/voice-backend.

Related Docs

Spanish version: README.es.md
English version: README.en.md
Backend package: ../voice-backend/README.md
Frontend package: ../voice-frontend/README.md
Playground Mobile: ../../apps/playground-mobile/README.md
Playground API: ../../apps/playground-api/README.md

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@navai/voice-mobile

Installation

Architecture Overview

End-to-End Flow

Public API

Tool Runtime Design

Realtime Tool Event Handling

Runtime Config and Env Resolution

Backend Client Contract

Session Orchestrator Details

React Native WebRTC Transport Details

React Hook Internals

Hybrid Speech Mode

Generated Loader CLI

Auto Setup on npm Install

Integration Examples

Expected Backend Routes

Related Docs