@tschmz/imp-phone
v0.1.2
Published
Turn-based SIP phone conversation frontend for imp
Readme
imp Phone
imp-phone is a turn-based SIP phone frontend for imp.
It is intentionally modeled after imp-voice: companion processes exchange JSON files with an imp plugin endpoint. The difference is the audio boundary. Instead of a local microphone and speaker, imp-phone expects the SIP client audio to be routed through configurable capture and playback commands.
Runtime Flow
phone_call tool
-> call request JSON
-> imp-phone controller
-> SIP command, for example baresip
-> wait for registration, ringing, and answered state
-> capture caller audio
-> STT
-> plugin inbox JSON
-> imp agent
-> plugin outbox JSON
-> TTS
-> playback into the SIP audio inputThe first implementation is turn-based, not full duplex. The caller speaks, silence ends the turn, the agent answers, and the next turn starts.
The controller writes phone-status.json with state, phase, and can_speak fields modeled after imp-voice runtime status. Important phases include:
callingringingansweredrecording_commandtranscribing_commandwaiting_for_speakerspeakingconversation_closed
Install
From npm:
imp plugin install @tschmz/imp-phone --config ~/.config/imp/config.jsonFor local development from the imp repository:
imp plugin install imp-phone --root plugins --config ~/.config/imp/config.jsonPackage installs are stored below the active config's paths.dataRoot at plugins/npm.
The install command adds:
- top-level plugin
imp-phone - endpoint
phone-ingress - outbox response routing with
replyChannel.kind = "phone" - MCP server
imp-phone, which exposesphone_callandphone_hangupas Imp tools - auto-started
imp-phone-controllerservice
The install command does not add phone contacts to an agent. Contacts are allowlisted per agent and must be configured explicitly in agents[].tools.phone, and the agent must opt into the imp-phone MCP server.
Call Requests
The controller watches requestsDir for request files. The packaged MCP server writes those files directly and waits until the controller reports whether the call was answered, timed out, or failed. The controller still owns the call timing through call.registerTimeoutMs and call.answerTimeoutMs; the tool only waits for the controller result.
The MCP server receives the calling agent id from Imp through IMP_PHONE_AGENT_ID, so phone call sessions stay attached to the agent that initiated the call. Optional contact comments and call purposes are written into the call request and become detached phone session metadata.
Agents can also use the packaged phone_hangup MCP tool. It writes a control command to controlDir, and the controller ends the active call after the current agent reply has been played.
When an answered call ends, the controller writes one final call_closed event into the same detached phone session with "response": { "type": "none" }. This gives the agent one internal turn to update contact notes without producing another phone reply or leaving an outbox message.
Example agent tool config:
{
"mcp": {
"servers": ["imp-phone"]
},
"phone": {
"contacts": [
{
"id": "thomas",
"name": "Thomas",
"uri": "+10000000000",
"comment": "work colleague"
}
]
}
}Imp prefixes MCP tool names with the server id, so the model sees imp-phone__phone_call and imp-phone__phone_hangup. The plugin install provides the default request and control directories. Use phone.requestsDir or phone.controlDir only when you need to override those paths.
Audio Bridge
The default controller config uses:
baresip, waits for SIP registration, then sends/dial {uri}over stdin- SIP progress output to wait for
ringingandansweredbefore recording caller audio arecord -D imp_phone_remote_capture ... -t rawto capture caller audioaplay -D imp_phone_agent_playback -q {path}to play TTS audio
TTS providers:
openaiusesOPENAI_API_KEYby default and sends audio requests to OpenAI's speech API.elevenlabsusesELEVENLABS_API_KEYby default and sends audio requests to ElevenLabs' text-to-speech API.
Example ElevenLabs controller config:
{
"tts": {
"provider": "elevenlabs",
"voice": "your-elevenlabs-voice-id",
"model": "eleven_multilingual_v2",
"format": "wav_16000"
}
}When using ElevenLabs, make sure the endpoint response.speech.voice value is also an ElevenLabs voice ID or omit the endpoint voice override so the local fallback is used.
While imp is working on a response, the controller can play a configurable hold message after conversation.holdMessageAfterSeconds and then every conversation.holdMessageIntervalSeconds.
Short feedback tones are available for captured, accepted, error, and closed. They are played through the same phone playback command and can be disabled with feedbackTones.enabled = false.
For real phone conversations, configure baresip, arecord, and aplay to use an ALSA/Pulse/PipeWire bridge where:
- SIP remote audio is readable by
capture.command playback.commandwrites to the audio device thatbaresipsends as microphone input
One ALSA loopback setup is:
baresip audio_player -> imp_phone_remote_playback
imp capture <- imp_phone_remote_capture
imp playback -> imp_phone_agent_playback
baresip audio_source <- imp_phone_agent_captureWith snd-aloop, those named PCMs can be defined in ~/.asoundrc:
pcm.imp_phone_remote_playback {
type plug
slave.pcm "hw:Loopback,0,0"
}
pcm.imp_phone_remote_capture {
type plug
slave.pcm "hw:Loopback,1,0"
}
pcm.imp_phone_agent_playback {
type plug
slave.pcm "hw:Loopback,0,1"
}
pcm.imp_phone_agent_capture {
type plug
slave.pcm "hw:Loopback,1,1"
}Then set baresip:
audio_player alsa,imp_phone_remote_playback
audio_source alsa,imp_phone_agent_captureManual Request
node bin/request-call.mjs \
--requests-dir /home/thomas/.imp/runtime/plugins/imp-phone/requests \
--contact-id thomas \
--contact-name Thomas \
--uri +10000000000 \
--comment "work colleague" \
--agent-id imp.telebot \
--waitDevelopment
Run one controller pass:
OPENAI_API_KEY=... node bin/controller.mjs --config config/default.json --once