openclaw-sip-voice
v0.1.18
Published
SIP Voice plugin for OpenClaw - Answer SIP calls with AI-powered conversations
Maintainers
Readme
OpenClaw SIP Voice Plugin
Answer SIP/VoIP calls with AI-powered conversations using your OpenClaw agent.
Prerequisites
- OpenClaw 2026.0.0 or higher
- Python 3.9 or higher
- ffmpeg (required for audio format conversion)
Installation
System Dependencies
Ubuntu/Debian:
sudo apt install python3-dev ffmpeg
# Optional: portaudio19-dev (only needed for local microphone access, not SIP)macOS:
brew install ffmpeg
# Optional: portaudio (only needed for local microphone access, not SIP)Note: The plugin checks for ffmpeg on startup and will exit with a clear error if not found.
Install Plugin
openclaw plugins install openclaw-sip-voice
openclaw gateway restartPython dependencies install automatically on first startup (30-60 seconds).
What gets downloaded:
- Python packages: pyaudio, numpy, scipy, aiohttp, etc. (~200MB total)
- Whisper models (when first used): tiny=39MB, base=142MB, small=466MB, medium=1.5GB, large=2.9GB
- Default model is
tinyfor minimal disk usage. Configure larger models for better accuracy.
Configuration
Basic Configuration
Add to ~/.openclaw/config.json:
{
"plugins": {
"entries": {
"openclaw-sip-voice": {
"enabled": true,
"config": {
"sip": {
"port": 5060,
"codec": "PCMA"
},
"greeting": "Hello! How can I help you?"
}
}
}
}
}Speech-to-Text (STT) Configuration
Local Whisper (default):
{
"config": {
"stt": {
"type": "whisper",
"model": "base",
"language": "en"
}
}
}Uses local Whisper CLI. Models download on first use:
- tiny: 39MB (fastest, lower accuracy)
- base: 142MB (good balance)
- small: 466MB (better accuracy)
- medium: 1.5GB (high accuracy)
- large: 2.9GB (best accuracy, slowest)
Requires openai-whisper installed via pip install openai-whisper.
OpenAI Whisper API:
{
"config": {
"stt": {
"type": "openai",
"model": "whisper-1"
}
}
}Set OPENAI_API_KEY environment variable. Cloud-based, more accurate but costs per minute.
Text-to-Speech (TTS) Configuration
OpenClaw uses its configured TTS provider by default. For standalone mode:
ElevenLabs (recommended for Linux):
{
"config": {
"tts": {
"type": "elevenlabs",
"voice_id": "21m00Tcm4TlvDq8ikWAM",
"model_id": "eleven_turbo_v2_5"
}
}
}Set ELEVENLABS_API_KEY environment variable. High-quality voice synthesis.
Note: Older models (
eleven_monolingual_v1,eleven_multilingual_v1) are deprecated and removed from the free tier. Useeleven_turbo_v2_5oreleven_multilingual_v2instead.
OpenAI TTS:
{
"config": {
"tts": {
"type": "openai",
"voice": "nova",
"model": "tts-1"
}
}
}Set OPENAI_API_KEY environment variable. Voices: alloy, echo, fable, onyx, nova, shimmer.
macOS System Voice:
{
"config": {
"tts": {
"type": "macos"
}
}
}Uses macOS say command. Free, works offline, macOS only.
Linux Users: The macOS
saycommand is not available on Linux. SetELEVENLABS_API_KEYas an environment variable for the OpenClaw gateway service, or configure OpenAI TTS. The plugin will automatically detect and use ElevenLabs when the API key is present.For systemd-managed OpenClaw, add to your service override:
mkdir -p ~/.config/systemd/user/openclaw-gateway.service.d echo '[Service] Environment="ELEVENLABS_API_KEY=your-key-here"' > ~/.config/systemd/user/openclaw-gateway.service.d/tts.conf systemctl --user daemon-reload systemctl --user restart openclaw-gateway
Testing
Use any SIP client to call:
sip:test@YOUR_IP_ADDRESS:5060The agent will answer and respond to your voice.
Example SIP clients:
- Desktop: MicroSIP, Linphone, Zoiper
- Mobile: Linphone (iOS/Android), Zoiper (iOS/Android)
- Web: JsSIP
Troubleshooting
Python dependency installation fails
The plugin auto-installs Python packages on first run. If it fails:
# Verify system dependencies
sudo apt install python3-dev portaudio19-dev
# Manual install if needed
cd ~/.openclaw/extensions/openclaw-sip-voice/python
pip3 install --break-system-packages -e .
# Restart gateway
openclaw gateway restartNo audio or one-way audio
Open firewall ports for RTP:
# Ubuntu/Debian
sudo ufw allow 5060/udp
sudo ufw allow 20000:20100/udpIf behind NAT, forward UDP ports 5060 and 20000-20100 to your machine.
SIP port already in use
Another service is using port 5060. Change the port in config:
{
"plugins": {
"entries": {
"openclaw-sip-voice": {
"config": {
"sip": {
"port": 5061
}
}
}
}
}
}Whisper not found
# Install Whisper CLI
pip install openai-whisper
# Verify
which whisperAlternatively, use OpenAI Whisper API (set OPENAI_API_KEY and configure stt.type: "openai").
Check plugin status
openclaw plugins list | grep sip
openclaw logs | grep SIPConfiguration Reference
SIP Settings
| Option | Default | Description | |--------|---------|-------------| | sip.port | 5060 | SIP listen port (UDP) | | sip.rtpPortBase | 20000 | Base port for RTP media streams | | sip.codec | PCMA | Audio codec (PCMA=G.711 A-law for AU/EU, PCMU=G.711 μ-law for US) |
Speech Recognition (STT)
| Option | Default | Description | |--------|---------|-------------| | stt.type | whisper | STT backend (whisper, openai) | | stt.model | tiny | Whisper model size (tiny/base/small/medium/large) | | stt.language | en | Transcription language (ISO 639-1 code) |
Voice Synthesis (TTS)
| Option | Default | Description |
|--------|---------|-------------|
| tts.type | auto | TTS backend (elevenlabs, openai, macos). Auto-detects: ElevenLabs if API key present, else macOS say on Darwin |
| tts.voice_id | - | Voice ID (ElevenLabs) |
| tts.voice | alloy | Voice name (OpenAI: alloy, echo, fable, onyx, nova, shimmer) |
| tts.model_id | eleven_turbo_v2_5 | Model ID (ElevenLabs) |
| tts.model | tts-1 | Model name (OpenAI) |
General
| Option | Default | Description | |--------|---------|-------------| | greeting | "Hello! How can I help you today?" | Initial message when call connects | | pythonPath | python3 | Python interpreter to use | | sipServerPath | auto | Path to Python SIP server script |
Environment Variables
| Variable | Required For | Description |
|----------|--------------|-------------|
| ELEVENLABS_API_KEY | ElevenLabs TTS | API key from elevenlabs.io. Required for Linux. |
| OPENAI_API_KEY | OpenAI STT/TTS | API key from OpenAI. For cloud Whisper or OpenAI TTS. |
Advanced Setup
Using with SIP Trunk Providers
Most SIP trunk providers (Twilio, Voip.ms, etc.) can route calls to your server:
- Get a phone number from your provider
- Configure routing to:
sip:YOUR_PUBLIC_IP:5060 - Ensure port 5060 UDP is open to the internet
- Calls to that number will reach your agent
NAT/Firewall Setup
If your OpenClaw instance is behind NAT or firewall:
- Forward UDP ports: 5060 and 20000-20100
- Plugin detects external IP automatically
- Configure firewall to allow incoming UDP on those ports
Platform Support
- macOS: Fully supported (native TTS available via
saycommand) - Linux: Fully supported (use cloud TTS: ElevenLabs or OpenAI)
- Windows: Untested (may require WSL)
License
MIT License
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
