thothtalk
v0.1.0
Published
Self-hosted ThothTalk cadence API server with tunnel and shell helpers.
Maintainers
Readme
𓁟ThothTalk
thothtalk is the internal cadence-engine protocol and nerd-facing base repo.
It is the underlying live-voice spine that later product shells wrapped in different ways.
Naming boundary:
𓁟ThothTalkis the internal protocol, cadence engine, and adapter stackRemora🦈is a historical shell name still present in this repo's routes and UIWHALETale🐳andblowholeare the current product-facing shells
This repo still contains Remora🦈 route names and shell docs because that was the last full wrapper built directly on top of the protocol. The underlying package, env prefix, logger namespace, and repo identity remain thothtalk.
Current canonical architecture checkpoint:
This README still contains older exploration context deeper down. For the current product direction, start with the lucid-start and cadence-training checkpoints above.
Remora🦈 is now a Telegram-first voice wrapper for OpenClaw.
Canonical ship path:
Telegram Mini Appfor the native-feeling surfaceOpenClawfor model authority, pairing, and sanctioned credentialsOpenAI speech APIas the only realtime wireRemora🦈 cadence brainbetween the model and the person
There is older exploration code in the repo, but the product direction is no longer ambiguous.
Core layers:
Cadence core: turn-taking, interruption, pacing, and voice-behavior logicMeaning core: OpenClaw-managed frontier modelFlagship shell: Telegram Mini AppWire: the OpenAI speech API
OpenClaw is now the preferred authority for sanctioned credentials and the final ship path.
Why this stack
This keeps the voice path deterministic and tight:
- Telegram opens Remora🦈 inside the native app.
- Remora🦈 captures mic input and forwards speech into the cadence brain.
- Voice activity detection and waveform analysis shape the handoff before language is even considered.
- OpenAI STT turns speech into text for the model lane.
- OpenClaw runs the model and returns the semantic reply.
- OpenAI TTS turns the reply back into audio.
- Remora🦈 streams that audio back through the Mini App with the cadence engine sitting between the model and the person.
npm install
thothtalk is now meant to become the self-hosted install surface as well as the protocol repo.
Intended shape:
npm install -g thothtalkthothtalk installthothtalk stack up
On first run, the npm CLI:
- creates
.venvif needed - installs Python requirements
- generates
THOTHTALK_INTERFACE_SOCKET_TOKENif one does not already exist - prints the local bootstrap URL and, once the tunnel is live, the public bootstrap/socket URLs
That API key is the shell-facing token.
WHALETale🐳/blowholeshould connect to the user’s own𓁟ThothTalkserver- Telegram still goes through the bot, but it uses the same server/token model
- each user is expected to run their own
𓁟ThothTalkinstance to power the interface shells
The canonical protocol bootstrap is:
GET /thothtalk/bootstrapWS /thothtalk/ws
Legacy Remora🦈 routes remain available for compatibility:
GET /remora/bootstrapWS /remora/ws
Quick start
- Install deps:
cd /root/thothtalk
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt- Set credentials:
cp credentials/openai.txt.template credentials/openai.txt
# or better: onboard OpenAI through OpenClaw and let Remora🦈 inherit it- Set your public callback URL:
cp .env.example .env
# optional if using stack.sh; otherwise set THOTHTALK_PUBLIC_BASE_URL- Start service:
./run.sh
# health: http://127.0.0.1:8010/healthOr start full resilient stack (app + tunnel auto-recovery):
./scripts/stack.sh up
./scripts/stack.sh status
./scripts/stack.sh logsThat resilient stack is the canonical always-on path. It keeps:
- the app supervisor alive
- the tunnel supervisor alive
- machine-readable runtime state in
.runtime/stack_state.json - OpenClaw-facing handoff state in
.runtime/openclaw_gateway_sync.json
Cadence Self-Play
Remora🦈 now has an offline dual-speaker self-play trainer so we can tune the cadence layer without human live tests.
It does this:
- picks a language + conversational archetype
- spawns two synthetic speaker states
- renders gibberish turns through the same cadence planner the live runtime uses
- synthesizes those turns through the current TTS lane
- compares the resulting waveform against multilingual conversational reference profiles
- writes distilled bias back into
.runtime/cadence_lab_bias.json
Run it:
cd /root/thothtalk
. .venv/bin/activate
python scripts/cadence_sim_loop.py --episodes 8 --turns 8 --update-live-biasInterface sockets
Remora🦈 now exposes an authenticated socket plane for future adapters, so Telegram, Discord, OpenClaw, or any custom client can subscribe to the live runtime without changing the cadence core.
Endpoints:
GET /interfaces-> interface manifest + credential readiness + live connection countsGET /interfaces/{name}-> one interface definition + strict auth templateGET /interfaces/{name}/auth-template-> plain-text template to copy intocredentials/interfaces/{name}.txtGET /interfaces/{name}/auth-status-> whether the strict credential file is present and which keys are still missing/placeholdersWS /interfaces/ws/{name}-> authenticated realtime event socket
Auth pattern:
- Set
THOTHTALK_INTERFACE_SOCKET_TOKENin.env - Or let it inherit
OPENCLAW_GATEWAY_TOKENautomatically when paired with an OpenClaw gateway - Then connect with either:
- header
x-remora-token: ... - legacy header
x-thothtalk-token: ... - query
?token=... - header
Authorization: Bearer ...
- header
Strict credential templates live in credentials/interfaces/:
openclaw.txt.templatetelegram.txt.templatediscord.txt.templatetwilio.txt.templategeneric.txt.template
Workflow:
- Copy the template you need to
credentials/interfaces/<name>.txt - Fill the exact
KEY=valuelines - Check
GET /interfaces/<name>/auth-status - Connect your worker or bridge to
WS /interfaces/ws/<name>
Operator shortcut:
cd /root/thothtalk
. .venv/bin/activate
python scripts/interface_doctor.py
python scripts/interface_doctor.py --interface telegram
python scripts/interface_doctor.py --jsonThis is the same Mail Caduceus style: strict credential drop-in, clear readiness surface, then the adapter is in business.
Telegram bridge
The first thin adapter worker is now shipped for Telegram.
Purpose:
- authenticate once with a bot token
- subscribe to the Remora🦈 runtime socket
- expose a lightweight operator console without touching the voice core
- launch
Remora🦈, a mobile-friendly semantic voice surface, because Telegram bots cannot place native Telegram calls
Files:
Setup:
- Copy
credentials/interfaces/telegram.txt.templatetocredentials/interfaces/telegram.txt - Fill at least:
TELEGRAM_BOT_TOKENTHOTHTALK_INTERFACE_SOCKET_TOKEN
- Optional hardening:
TELEGRAM_ALLOWED_CHAT_IDS=...TELEGRAM_DEFAULT_CHAT_IDS=...TELEGRAM_EVENT_KINDS=session_started,session_ended,cadence_training
- Start it:
cd /root/thothtalk
. .venv/bin/activate
python scripts/telegram_bridge.py --log-level infoOr install it as a companion service:
sudo cp deploy/thothtalk-telegram.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now thothtalk-telegramTelegram commands:
/help/health/state/interfaces/calls/remora/voice/subscribe/unsubscribe/events
Remora
Remora🦈 is the thin Telegram-friendly call surface.
Why it exists:
- Telegram bots cannot initiate or receive native Telegram calls
- Remora🦈 already has a semantic edge-call lane for fast Mini App sessions
- so the clean move is: Telegram launches Remora🦈, Remora🦈 latches onto the semantic lane, and the voice core stays unchanged
- the Telegram bridge now sends a native
web_appMini App button so Remora opens inside Telegram instead of as a plain external link
Files:
Endpoints:
GET /remoraGET /remora/bootstrap
Runtime behavior:
- mobile-first voice UI
- bootstrap-aware Mini App handshake with sanctioned OpenClaw state
- Telegram
/remoraand/voicecommands send a one-tap Mini App button to the public Remora URL
- In Twilio phone number voice webhook, set:
- Method:
POST - URL:
https://YOUR_PUBLIC_URL/twilio/incoming
WebRTC lab
When the app is running, open:
http://127.0.0.1:8010/webrtcThat page opens a direct browser WebRTC session into the same cadence engine. It also shows live input/output signal meters so you can tune cadence against the raw wave, not just the transcript.
For a CLI validation path:
cd /root/thothtalk
. .venv/bin/activate
python scripts/webrtc_loopback_probe.pyVoice compatibility
- OpenAI API on this key does not expose
emberorjenny. - Default OpenAI lane is now
THOTHTALK_TTS_MODEL=tts-1withTHOTHTALK_TTS_MODEL_CANDIDATES=tts-1,gpt-4o-mini-tts,tts-1-hd. /healthreports the active provider, model, and voice so you can confirm which lane won.
Live behavior controls (voice)
During the call, the caller can say:
thinking onorreasoning onthinking offorreasoning off
Remora🦈 will toggle whether <think> streams are allowed from model output.
Smoke test
./scripts/smoke.shThis validates:
- service boot
/health/state/interfaces/interfaces/generic/auth-status- TwiML generation for
/twilio/incoming - WebRTC lab page at
/webrtc
OpenAI TTS benchmark:
./scripts/benchmark_openai_tts.pyOptional outbound call trigger
If Twilio REST creds are configured, place outbound call:
curl -X POST http://127.0.0.1:8010/twilio/make-call \
-d 'to_number=+1XXXXXXXXXX'You can also force caller ID explicitly:
curl -X POST http://127.0.0.1:8010/twilio/make-call \
-d 'to_number=+1XXXXXXXXXX' \
-d 'from_number=+1YYYYYYYYYY'Ollama notes
Default model is silk/eidolon:latest. Override in .env:
THOTHTALK_OLLAMA_MODEL=silk/eidolon:latest
THOTHTALK_STT_MODEL=gpt-4o-mini-transcribe
THOTHTALK_TTS_PROVIDER=openai
THOTHTALK_TTS_MODEL=tts-1
THOTHTALK_TTS_MODEL_CANDIDATES=tts-1,gpt-4o-mini-tts,tts-1-hd
THOTHTALK_TTS_VOICE=alloy
THOTHTALK_TTS_FALLBACK_VOICE=nova
THOTHTALK_CALL_MEMORY_DIR=/root/thothtalk/.runtime/call_memory
THOTHTALK_TRANSCRIPT_EXCERPT_EVENTS=10
THOTHTALK_TRANSCRIPT_EXCERPT_CHARS=1200
THOTHTALK_DELIVERY_MIN_WORDS=1
THOTHTALK_DELIVERY_MAX_WORDS=4
THOTHTALK_DEFER_BARGE_IN_UNTIL_CLAUSE_END=1Make sure model is pulled and warm:
ollama pull silk/eidolon:latestEndpoints
POST /twilio/incoming-> returns TwiML that starts media streamWS /twilio/stream-> bi-directional audio bridgePOST /twilio/make-call-> optional outbound call helperGET /webrtc-> browser WebRTC lab clientPOST /webrtc/offer-> WebRTC signaling endpointDELETE /webrtc/session/{session_id}-> close a live WebRTC peer sessionGET /healthGET /state-> runtime summary: transport status, cadence bias, public URLs, active sessionsGET /interfaces-> adapter manifest + auth readinessGET /interfaces/{name}-> adapter detail + strict auth templateGET /interfaces/{name}/auth-templateGET /interfaces/{name}/auth-statusWS /interfaces/ws/{name}-> authenticated realtime adapter socketscripts/tunnel_supervisor.py-> auto-recovers quick tunnel + updates active URL file (+ optional Twilio webhook sync)scripts/stack.sh-> one-command local process control for app/tunnelscripts/stack_supervisor.py-> long-running parent process for always-on service mode
Production service (systemd)
sudo cp deploy/thothtalk.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now thothtalk
sudo systemctl status thothtalkThe shipped unit now runs the full supervised stack, not just the raw app process. That means inbound calling survives:
- app crashes
- dead quick tunnels
- Cloudflare quick-tunnel rate limits by falling through to
localhost.run - public URL rotation
- Twilio webhook drift after tunnel restarts
If you are pairing this with OpenClaw later, Remora🦈 also emits:
.runtime/openclaw_gateway_sync.json
That file carries the active public URL, Twilio ingress URLs, and detected OpenClaw gateway settings so another runtime can discover the voice plane without guessing.
It is an interim bridge artifact, not the final OpenClaw integration contract.
The clean final path is an OpenClaw remora channel/plugin that reports through gateway channels.status.
Latency profile (default now: snappy)
Default timing is tuned for lower turn delay and less false barge-in:
THOTHTALK_SILENCE_MS=300THOTHTALK_MIN_SPEECH_MS=220THOTHTALK_BARGE_IN_MS=160THOTHTALK_BARGE_IN_PROBE_MS=420(requires spoken-word evidence before cut-off)THOTHTALK_PARTIAL_TRANSCRIBE_ENABLED=1(incremental partial transcripts while caller speaks)THOTHTALK_FAST_ENDPOINT_ENABLED=1(commits sooner once partial text stabilizes)THOTHTALK_FAST_ENDPOINT_SILENCE_MS=140THOTHTALK_FAST_ENDPOINT_STABLE_MS=180THOTHTALK_MIN_CLAUSE_CHARS=8THOTHTALK_MAX_CLAUSE_CHARS=36THOTHTALK_FIRST_CLAUSE_MIN_CHARS=4THOTHTALK_FIRST_CLAUSE_MAX_CHARS=18THOTHTALK_DELIVERY_MIN_WORDS=1THOTHTALK_DELIVERY_MAX_WORDS=4(small overlap landing window before yielding the floor)THOTHTALK_BRIDGE_DELAY_MS=520(plays a cached human-style bridge if the model needs an extra beat)THOTHTALK_BRIDGE_COOLDOWN_MS=4500THOTHTALK_CALL_MEMORY_DIR=/root/thothtalk/.runtime/call_memory(full-fidelity raw call ledger lives outside model context)THOTHTALK_GATEWAY_SYNC_FILE=/root/thothtalk/.runtime/openclaw_gateway_sync.jsonTHOTHTALK_TRANSCRIPT_EXCERPT_EVENTS=10THOTHTALK_TRANSCRIPT_EXCERPT_CHARS=1200THOTHTALK_DEFER_BARGE_IN_UNTIL_CLAUSE_END=1(finish the current clause, then pivot)THOTHTALK_INITIATE_ON_ANSWER=0
Default opening behavior is now more natural for phone calls:
THOTHTALK_GREETING=Hello?THOTHTALK_INITIATE_ON_ANSWER=0- greet briefly, then wait for the caller to respond before saying more
Architecture
- VAD:
webrtcvad(20ms frames, 8kHz) - STT: OpenAI
audio.transcriptions.create(defaultgpt-4o-mini-transcribe) - LLM: Ollama
/api/chatstream mode - TTS: OpenAI TTS by default, optional
edge-tts(JennyNeural) support -> PCM16 -> µ-law -> Twilio media frames - Memory: every call writes a local JSONL ledger; only a compact recent meaning excerpt is injected back into the model
- Barge-in: caller speech is tracked word-by-word, but handoff can wait until the current spoken clause lands
Security
- Do not commit
credentials/openai.txtorcredentials/twilio.txt - Use environment variables in production secret managers
- Optionally add Twilio signature validation if exposing publicly
- If OpenClaw is installed locally, gateway URL/token are auto-discovered from
/root/.openclaw/openclaw.jsonunless explicitly overridden
