thothtalk

v0.1.0

Published

3 months ago

Self-hosted ThothTalk cadence API server with tunnel and shell helpers.

0High
0Medium
0Low

lmtlssss

voice cadence telephony telegram whaletale thothtalk

𓁟ThothTalk

thothtalk is the internal cadence-engine protocol and nerd-facing base repo. It is the underlying live-voice spine that later product shells wrapped in different ways.

Naming boundary:

𓁟ThothTalk is the internal protocol, cadence engine, and adapter stack
Remora🦈 is a historical shell name still present in this repo's routes and UI
WHALETale🐳 and blowhole are the current product-facing shells

This repo still contains Remora🦈 route names and shell docs because that was the last full wrapper built directly on top of the protocol. The underlying package, env prefix, logger namespace, and repo identity remain thothtalk.

Current canonical architecture checkpoint:

This README still contains older exploration context deeper down. For the current product direction, start with the lucid-start and cadence-training checkpoints above.

Remora🦈 is now a Telegram-first voice wrapper for OpenClaw.

Canonical ship path:

Telegram Mini App for the native-feeling surface
OpenClaw for model authority, pairing, and sanctioned credentials
OpenAI speech API as the only realtime wire
Remora🦈 cadence brain between the model and the person

There is older exploration code in the repo, but the product direction is no longer ambiguous.

Core layers:

Cadence core: turn-taking, interruption, pacing, and voice-behavior logic
Meaning core: OpenClaw-managed frontier model
Flagship shell: Telegram Mini App
Wire: the OpenAI speech API

OpenClaw is now the preferred authority for sanctioned credentials and the final ship path.

Why this stack

This keeps the voice path deterministic and tight:

Telegram opens Remora🦈 inside the native app.
Remora🦈 captures mic input and forwards speech into the cadence brain.
Voice activity detection and waveform analysis shape the handoff before language is even considered.
OpenAI STT turns speech into text for the model lane.
OpenClaw runs the model and returns the semantic reply.
OpenAI TTS turns the reply back into audio.
Remora🦈 streams that audio back through the Mini App with the cadence engine sitting between the model and the person.

npm install

thothtalk is now meant to become the self-hosted install surface as well as the protocol repo.

Intended shape:

npm install -g thothtalk
thothtalk install
thothtalk stack up

On first run, the npm CLI:

creates .venv if needed
installs Python requirements
generates THOTHTALK_INTERFACE_SOCKET_TOKEN if one does not already exist
prints the local bootstrap URL and, once the tunnel is live, the public bootstrap/socket URLs

That API key is the shell-facing token.

WHALETale🐳 / blowhole should connect to the user’s own 𓁟ThothTalk server
Telegram still goes through the bot, but it uses the same server/token model
each user is expected to run their own 𓁟ThothTalk instance to power the interface shells

The canonical protocol bootstrap is:

GET /thothtalk/bootstrap
WS /thothtalk/ws

Legacy Remora🦈 routes remain available for compatibility:

GET /remora/bootstrap
WS /remora/ws

Quick start

Install deps:

cd /root/thothtalk
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt

Set credentials:

cp credentials/openai.txt.template credentials/openai.txt
# or better: onboard OpenAI through OpenClaw and let Remora🦈 inherit it

Set your public callback URL:

cp .env.example .env
# optional if using stack.sh; otherwise set THOTHTALK_PUBLIC_BASE_URL

Start service:

./run.sh
# health: http://127.0.0.1:8010/health

Or start full resilient stack (app + tunnel auto-recovery):

./scripts/stack.sh up
./scripts/stack.sh status
./scripts/stack.sh logs

That resilient stack is the canonical always-on path. It keeps:

the app supervisor alive
the tunnel supervisor alive
machine-readable runtime state in .runtime/stack_state.json
OpenClaw-facing handoff state in .runtime/openclaw_gateway_sync.json

Cadence Self-Play

Remora🦈 now has an offline dual-speaker self-play trainer so we can tune the cadence layer without human live tests.

It does this:

picks a language + conversational archetype
spawns two synthetic speaker states
renders gibberish turns through the same cadence planner the live runtime uses
synthesizes those turns through the current TTS lane
compares the resulting waveform against multilingual conversational reference profiles
writes distilled bias back into .runtime/cadence_lab_bias.json

Run it:

cd /root/thothtalk
. .venv/bin/activate
python scripts/cadence_sim_loop.py --episodes 8 --turns 8 --update-live-bias

Interface sockets

Remora🦈 now exposes an authenticated socket plane for future adapters, so Telegram, Discord, OpenClaw, or any custom client can subscribe to the live runtime without changing the cadence core.

Endpoints:

GET /interfaces -> interface manifest + credential readiness + live connection counts
GET /interfaces/{name} -> one interface definition + strict auth template
GET /interfaces/{name}/auth-template -> plain-text template to copy into credentials/interfaces/{name}.txt
GET /interfaces/{name}/auth-status -> whether the strict credential file is present and which keys are still missing/placeholders
WS /interfaces/ws/{name} -> authenticated realtime event socket

Auth pattern:

Set THOTHTALK_INTERFACE_SOCKET_TOKEN in .env
Or let it inherit OPENCLAW_GATEWAY_TOKEN automatically when paired with an OpenClaw gateway
Then connect with either:
- header x-remora-token: ...
- legacy header x-thothtalk-token: ...
- query ?token=...
- header Authorization: Bearer ...

Strict credential templates live in credentials/interfaces/:

openclaw.txt.template
telegram.txt.template
discord.txt.template
twilio.txt.template
generic.txt.template

Workflow:

Copy the template you need to credentials/interfaces/<name>.txt
Fill the exact KEY=value lines
Check GET /interfaces/<name>/auth-status
Connect your worker or bridge to WS /interfaces/ws/<name>

Operator shortcut:

cd /root/thothtalk
. .venv/bin/activate
python scripts/interface_doctor.py
python scripts/interface_doctor.py --interface telegram
python scripts/interface_doctor.py --json

This is the same Mail Caduceus style: strict credential drop-in, clear readiness surface, then the adapter is in business.

Telegram bridge

The first thin adapter worker is now shipped for Telegram.

Purpose:

authenticate once with a bot token
subscribe to the Remora🦈 runtime socket
expose a lightweight operator console without touching the voice core
launch Remora🦈, a mobile-friendly semantic voice surface, because Telegram bots cannot place native Telegram calls

Files:

Setup:

Copy credentials/interfaces/telegram.txt.template to credentials/interfaces/telegram.txt
Fill at least:
- TELEGRAM_BOT_TOKEN
- THOTHTALK_INTERFACE_SOCKET_TOKEN
Optional hardening:
- TELEGRAM_ALLOWED_CHAT_IDS=...
- TELEGRAM_DEFAULT_CHAT_IDS=...
- TELEGRAM_EVENT_KINDS=session_started,session_ended,cadence_training
Start it:

cd /root/thothtalk
. .venv/bin/activate
python scripts/telegram_bridge.py --log-level info

Or install it as a companion service:

sudo cp deploy/thothtalk-telegram.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now thothtalk-telegram

Telegram commands:

/help
/health
/state
/interfaces
/calls
/remora
/voice
/subscribe
/unsubscribe
/events

Remora

Remora🦈 is the thin Telegram-friendly call surface.

Why it exists:

Telegram bots cannot initiate or receive native Telegram calls
Remora🦈 already has a semantic edge-call lane for fast Mini App sessions
so the clean move is: Telegram launches Remora🦈, Remora🦈 latches onto the semantic lane, and the voice core stays unchanged
the Telegram bridge now sends a native web_app Mini App button so Remora opens inside Telegram instead of as a plain external link

Files:

remora.html

Endpoints:

GET /remora
GET /remora/bootstrap

Runtime behavior:

mobile-first voice UI
bootstrap-aware Mini App handshake with sanctioned OpenClaw state
Telegram /remora and /voice commands send a one-tap Mini App button to the public Remora URL

In Twilio phone number voice webhook, set:

Method: POST
URL: https://YOUR_PUBLIC_URL/twilio/incoming

WebRTC lab

When the app is running, open:

http://127.0.0.1:8010/webrtc

That page opens a direct browser WebRTC session into the same cadence engine. It also shows live input/output signal meters so you can tune cadence against the raw wave, not just the transcript.

For a CLI validation path:

cd /root/thothtalk
. .venv/bin/activate
python scripts/webrtc_loopback_probe.py

Voice compatibility

OpenAI API on this key does not expose ember or jenny.
Default OpenAI lane is now THOTHTALK_TTS_MODEL=tts-1 with THOTHTALK_TTS_MODEL_CANDIDATES=tts-1,gpt-4o-mini-tts,tts-1-hd.
/health reports the active provider, model, and voice so you can confirm which lane won.

Live behavior controls (voice)

During the call, the caller can say:

thinking on or reasoning on
thinking off or reasoning off

Remora🦈 will toggle whether <think> streams are allowed from model output.

Smoke test

./scripts/smoke.sh

This validates:

service boot
/health
/state
/interfaces
/interfaces/generic/auth-status
TwiML generation for /twilio/incoming
WebRTC lab page at /webrtc

OpenAI TTS benchmark:

./scripts/benchmark_openai_tts.py

Optional outbound call trigger

If Twilio REST creds are configured, place outbound call:

curl -X POST http://127.0.0.1:8010/twilio/make-call \
  -d 'to_number=+1XXXXXXXXXX'

You can also force caller ID explicitly:

curl -X POST http://127.0.0.1:8010/twilio/make-call \
  -d 'to_number=+1XXXXXXXXXX' \
  -d 'from_number=+1YYYYYYYYYY'

Ollama notes

Default model is silk/eidolon:latest. Override in .env:

THOTHTALK_OLLAMA_MODEL=silk/eidolon:latest
THOTHTALK_STT_MODEL=gpt-4o-mini-transcribe
THOTHTALK_TTS_PROVIDER=openai
THOTHTALK_TTS_MODEL=tts-1
THOTHTALK_TTS_MODEL_CANDIDATES=tts-1,gpt-4o-mini-tts,tts-1-hd
THOTHTALK_TTS_VOICE=alloy
THOTHTALK_TTS_FALLBACK_VOICE=nova
THOTHTALK_CALL_MEMORY_DIR=/root/thothtalk/.runtime/call_memory
THOTHTALK_TRANSCRIPT_EXCERPT_EVENTS=10
THOTHTALK_TRANSCRIPT_EXCERPT_CHARS=1200
THOTHTALK_DELIVERY_MIN_WORDS=1
THOTHTALK_DELIVERY_MAX_WORDS=4
THOTHTALK_DEFER_BARGE_IN_UNTIL_CLAUSE_END=1

Make sure model is pulled and warm:

ollama pull silk/eidolon:latest

Endpoints

POST /twilio/incoming -> returns TwiML that starts media stream
WS /twilio/stream -> bi-directional audio bridge
POST /twilio/make-call -> optional outbound call helper
GET /webrtc -> browser WebRTC lab client
POST /webrtc/offer -> WebRTC signaling endpoint
DELETE /webrtc/session/{session_id} -> close a live WebRTC peer session
GET /health
GET /state -> runtime summary: transport status, cadence bias, public URLs, active sessions
GET /interfaces -> adapter manifest + auth readiness
GET /interfaces/{name} -> adapter detail + strict auth template
GET /interfaces/{name}/auth-template
GET /interfaces/{name}/auth-status
WS /interfaces/ws/{name} -> authenticated realtime adapter socket
scripts/tunnel_supervisor.py -> auto-recovers quick tunnel + updates active URL file (+ optional Twilio webhook sync)
scripts/stack.sh -> one-command local process control for app/tunnel
scripts/stack_supervisor.py -> long-running parent process for always-on service mode

Production service (systemd)

sudo cp deploy/thothtalk.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now thothtalk
sudo systemctl status thothtalk

The shipped unit now runs the full supervised stack, not just the raw app process. That means inbound calling survives:

app crashes
dead quick tunnels
Cloudflare quick-tunnel rate limits by falling through to localhost.run
public URL rotation
Twilio webhook drift after tunnel restarts

If you are pairing this with OpenClaw later, Remora🦈 also emits:

.runtime/openclaw_gateway_sync.json

That file carries the active public URL, Twilio ingress URLs, and detected OpenClaw gateway settings so another runtime can discover the voice plane without guessing. It is an interim bridge artifact, not the final OpenClaw integration contract. The clean final path is an OpenClaw remora channel/plugin that reports through gateway channels.status.

Latency profile (default now: snappy)

Default timing is tuned for lower turn delay and less false barge-in:

THOTHTALK_SILENCE_MS=300
THOTHTALK_MIN_SPEECH_MS=220
THOTHTALK_BARGE_IN_MS=160
THOTHTALK_BARGE_IN_PROBE_MS=420 (requires spoken-word evidence before cut-off)
THOTHTALK_PARTIAL_TRANSCRIBE_ENABLED=1 (incremental partial transcripts while caller speaks)
THOTHTALK_FAST_ENDPOINT_ENABLED=1 (commits sooner once partial text stabilizes)
THOTHTALK_FAST_ENDPOINT_SILENCE_MS=140
THOTHTALK_FAST_ENDPOINT_STABLE_MS=180
THOTHTALK_MIN_CLAUSE_CHARS=8
THOTHTALK_MAX_CLAUSE_CHARS=36
THOTHTALK_FIRST_CLAUSE_MIN_CHARS=4
THOTHTALK_FIRST_CLAUSE_MAX_CHARS=18
THOTHTALK_DELIVERY_MIN_WORDS=1
THOTHTALK_DELIVERY_MAX_WORDS=4 (small overlap landing window before yielding the floor)
THOTHTALK_BRIDGE_DELAY_MS=520 (plays a cached human-style bridge if the model needs an extra beat)
THOTHTALK_BRIDGE_COOLDOWN_MS=4500
THOTHTALK_CALL_MEMORY_DIR=/root/thothtalk/.runtime/call_memory (full-fidelity raw call ledger lives outside model context)
THOTHTALK_GATEWAY_SYNC_FILE=/root/thothtalk/.runtime/openclaw_gateway_sync.json
THOTHTALK_TRANSCRIPT_EXCERPT_EVENTS=10
THOTHTALK_TRANSCRIPT_EXCERPT_CHARS=1200
THOTHTALK_DEFER_BARGE_IN_UNTIL_CLAUSE_END=1 (finish the current clause, then pivot)
THOTHTALK_INITIATE_ON_ANSWER=0

Default opening behavior is now more natural for phone calls:

THOTHTALK_GREETING=Hello?
THOTHTALK_INITIATE_ON_ANSWER=0
greet briefly, then wait for the caller to respond before saying more

Architecture

VAD: webrtcvad (20ms frames, 8kHz)
STT: OpenAI audio.transcriptions.create (default gpt-4o-mini-transcribe)
LLM: Ollama /api/chat stream mode
TTS: OpenAI TTS by default, optional edge-tts (JennyNeural) support -> PCM16 -> µ-law -> Twilio media frames
Memory: every call writes a local JSONL ledger; only a compact recent meaning excerpt is injected back into the model
Barge-in: caller speech is tracked word-by-word, but handoff can wait until the current spoken clause lands

Security

Do not commit credentials/openai.txt or credentials/twilio.txt
Use environment variables in production secret managers
Optionally add Twilio signature validation if exposing publicly
If OpenClaw is installed locally, gateway URL/token are auto-discovered from /root/.openclaw/openclaw.json unless explicitly overridden