openclaw-sip-voice

v0.1.18

Published

9 days ago

SIP Voice plugin for OpenClaw - Answer SIP calls with AI-powered conversations

0High
0Medium
0Low

chgrimm

openclaw openclaw-plugin sip voice voip telephony ai assistant

OpenClaw SIP Voice Plugin

Answer SIP/VoIP calls with AI-powered conversations using your OpenClaw agent.

Prerequisites

OpenClaw 2026.0.0 or higher
Python 3.9 or higher
ffmpeg (required for audio format conversion)

Installation

System Dependencies

Ubuntu/Debian:

sudo apt install python3-dev ffmpeg
# Optional: portaudio19-dev (only needed for local microphone access, not SIP)

macOS:

brew install ffmpeg
# Optional: portaudio (only needed for local microphone access, not SIP)

Note: The plugin checks for ffmpeg on startup and will exit with a clear error if not found.

Install Plugin

openclaw plugins install openclaw-sip-voice
openclaw gateway restart

Python dependencies install automatically on first startup (30-60 seconds).

What gets downloaded:

Python packages: pyaudio, numpy, scipy, aiohttp, etc. (~200MB total)
Whisper models (when first used): tiny=39MB, base=142MB, small=466MB, medium=1.5GB, large=2.9GB
Default model is tiny for minimal disk usage. Configure larger models for better accuracy.

Configuration

Basic Configuration

Add to ~/.openclaw/config.json:

{
  "plugins": {
    "entries": {
      "openclaw-sip-voice": {
        "enabled": true,
        "config": {
          "sip": {
            "port": 5060,
            "codec": "PCMA"
          },
          "greeting": "Hello! How can I help you?"
        }
      }
    }
  }
}

Speech-to-Text (STT) Configuration

Local Whisper (default):

{
  "config": {
    "stt": {
      "type": "whisper",
      "model": "base",
      "language": "en"
    }
  }
}

Uses local Whisper CLI. Models download on first use:

tiny: 39MB (fastest, lower accuracy)
base: 142MB (good balance)
small: 466MB (better accuracy)
medium: 1.5GB (high accuracy)
large: 2.9GB (best accuracy, slowest)

Requires openai-whisper installed via pip install openai-whisper.

OpenAI Whisper API:

{
  "config": {
    "stt": {
      "type": "openai",
      "model": "whisper-1"
    }
  }
}

Set OPENAI_API_KEY environment variable. Cloud-based, more accurate but costs per minute.

Text-to-Speech (TTS) Configuration

OpenClaw uses its configured TTS provider by default. For standalone mode:

ElevenLabs (recommended for Linux):

{
  "config": {
    "tts": {
      "type": "elevenlabs",
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "model_id": "eleven_turbo_v2_5"
    }
  }
}

Set ELEVENLABS_API_KEY environment variable. High-quality voice synthesis.

Note: Older models (eleven_monolingual_v1, eleven_multilingual_v1) are deprecated and removed from the free tier. Use eleven_turbo_v2_5 or eleven_multilingual_v2 instead.

OpenAI TTS:

{
  "config": {
    "tts": {
      "type": "openai",
      "voice": "nova",
      "model": "tts-1"
    }
  }
}

Set OPENAI_API_KEY environment variable. Voices: alloy, echo, fable, onyx, nova, shimmer.

macOS System Voice:

{
  "config": {
    "tts": {
      "type": "macos"
    }
  }
}

Uses macOS say command. Free, works offline, macOS only.

Linux Users: The macOS say command is not available on Linux. Set ELEVENLABS_API_KEY as an environment variable for the OpenClaw gateway service, or configure OpenAI TTS. The plugin will automatically detect and use ElevenLabs when the API key is present.
For systemd-managed OpenClaw, add to your service override:
mkdir -p ~/.config/systemd/user/openclaw-gateway.service.d
echo '[Service]
Environment="ELEVENLABS_API_KEY=your-key-here"' > ~/.config/systemd/user/openclaw-gateway.service.d/tts.conf
systemctl --user daemon-reload
systemctl --user restart openclaw-gateway

Testing

Use any SIP client to call:

sip:test@YOUR_IP_ADDRESS:5060

The agent will answer and respond to your voice.

Example SIP clients:

Desktop: MicroSIP, Linphone, Zoiper
Mobile: Linphone (iOS/Android), Zoiper (iOS/Android)
Web: JsSIP

Troubleshooting

Python dependency installation fails

The plugin auto-installs Python packages on first run. If it fails:

# Verify system dependencies
sudo apt install python3-dev portaudio19-dev

# Manual install if needed
cd ~/.openclaw/extensions/openclaw-sip-voice/python
pip3 install --break-system-packages -e .

# Restart gateway
openclaw gateway restart

No audio or one-way audio

Open firewall ports for RTP:

# Ubuntu/Debian
sudo ufw allow 5060/udp
sudo ufw allow 20000:20100/udp

If behind NAT, forward UDP ports 5060 and 20000-20100 to your machine.

SIP port already in use

Another service is using port 5060. Change the port in config:

{
  "plugins": {
    "entries": {
      "openclaw-sip-voice": {
        "config": {
          "sip": {
            "port": 5061
          }
        }
      }
    }
  }
}

Whisper not found

# Install Whisper CLI
pip install openai-whisper

# Verify
which whisper

Alternatively, use OpenAI Whisper API (set OPENAI_API_KEY and configure stt.type: "openai").

Check plugin status

openclaw plugins list | grep sip
openclaw logs | grep SIP

Configuration Reference

SIP Settings

| Option | Default | Description | |--------|---------|-------------| | sip.port | 5060 | SIP listen port (UDP) | | sip.rtpPortBase | 20000 | Base port for RTP media streams | | sip.codec | PCMA | Audio codec (PCMA=G.711 A-law for AU/EU, PCMU=G.711 μ-law for US) |

Speech Recognition (STT)

| Option | Default | Description | |--------|---------|-------------| | stt.type | whisper | STT backend (whisper, openai) | | stt.model | tiny | Whisper model size (tiny/base/small/medium/large) | | stt.language | en | Transcription language (ISO 639-1 code) |

Voice Synthesis (TTS)

| Option | Default | Description | |--------|---------|-------------| | tts.type | auto | TTS backend (elevenlabs, openai, macos). Auto-detects: ElevenLabs if API key present, else macOS say on Darwin | | tts.voice_id | - | Voice ID (ElevenLabs) | | tts.voice | alloy | Voice name (OpenAI: alloy, echo, fable, onyx, nova, shimmer) | | tts.model_id | eleven_turbo_v2_5 | Model ID (ElevenLabs) | | tts.model | tts-1 | Model name (OpenAI) |

General

| Option | Default | Description | |--------|---------|-------------| | greeting | "Hello! How can I help you today?" | Initial message when call connects | | pythonPath | python3 | Python interpreter to use | | sipServerPath | auto | Path to Python SIP server script |

Environment Variables

| Variable | Required For | Description | |----------|--------------|-------------| | ELEVENLABS_API_KEY | ElevenLabs TTS | API key from elevenlabs.io. Required for Linux. | | OPENAI_API_KEY | OpenAI STT/TTS | API key from OpenAI. For cloud Whisper or OpenAI TTS. |

Advanced Setup

Using with SIP Trunk Providers

Most SIP trunk providers (Twilio, Voip.ms, etc.) can route calls to your server:

Get a phone number from your provider
Configure routing to: sip:YOUR_PUBLIC_IP:5060
Ensure port 5060 UDP is open to the internet
Calls to that number will reach your agent

NAT/Firewall Setup

If your OpenClaw instance is behind NAT or firewall:

Forward UDP ports: 5060 and 20000-20100
Plugin detects external IP automatically
Configure firewall to allow incoming UDP on those ports

Platform Support

macOS: Fully supported (native TTS available via say command)
Linux: Fully supported (use cloud TTS: ElevenLabs or OpenAI)
Windows: Untested (may require WSL)

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.