npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pi-speak-pk

v0.2.5

Published

Voice, wake-word, Telegram, and mobile web remote extensions for Pi.

Readme

pi-speak-pk

Voice, wake-word, and remote-control extensions for Pi / pi-mono.

This package turns Pi into a usable voice workstation, not just a text assistant with TTS bolted on. It gives you:

  • spoken assistant replies with multiple TTS backends
  • the always-listening PK wake phrase flow
  • Telegram text and voice turns from your phone
  • a local HTTP control API
  • a built-in mobile web app at /app/
  • a Unified Remote control surface

What To Use

If you just want the shortest path:

  1. Local desktop voice: use /speak on
  2. Hands-free on the same machine: use /mono on
  3. Remote from your phone with the least friction: use /phone on
  4. Remote from your phone with QR setup: use /pk-remote, then scan the QR from the Android phone
  5. Remote button grid on Android: use the bundled Unified Remote remote

Install

Install the extension:

pi npm i pi-speak-pk

Reload Pi after install.

Quick Start

1. Make Pi Speak Locally

/speak on
/speak test
/speak status

If you do nothing else, auto provider selection will try available backends in this order:

  1. legacy via speak11
  2. elevenlabs
  3. openai
  4. edge

If an earlier auto-selected backend fails at synthesis time, Pi now falls through to the next available provider instead of stopping on the first failure.

2. Enable The Always-Listening Wake Phrase

/mono on

Say:

PK

Pi will open a short voice-input window, play a short listening cue, and update the mono status so you can tell it is actively listening. Say PK again within the timeout to keep it alive. Default keep-alive is 15 seconds.

Wake matching now has a sensitivity preset. Use PI_SPEAK_WAKE_SENSITIVITY=low|medium|high to make activation stricter or more forgiving. medium is the default.

3. Remote In From Your Phone With Telegram

/phone setup
/phone token <bot-token>
/phone on
/phone code

Then in Telegram:

  1. Open your bot
  2. Send /link <code>
  3. Send text or voice notes to Pi

This is the easiest remote path. It works well when you want reliability more than low latency.

/phone setup prints the running-session setup steps. If the token is not already in the environment, paste it with /phone token <bot-token>; the extension saves it, starts the bridge, and prints the /link code. PI_SPEAK_TELEGRAM_BOT_TOKEN can still point to an existing bot you already control.

4. Remote In From Your Phone With The Built-In Web App

/pk-remote
/remote setup
/remote setup bluetooth

/pk-remote is the shortest path. It starts the remote API if needed, chooses a setup URL in this order, and prints a QR code for the phone setup page:

  1. PI_SPEAK_PUBLIC_BASE_URL
  2. detected Tailscale IPv4 address
  3. detected local LAN IPv4 address
  4. configured fallback

Scan the QR from the Android phone to open the setup page. From there you can download the bundled APK, open the native pi-speak://setup link, and save the machine URL, token, profile name, connection mode, and Codex route metadata. If you want the browser app instead, open one of the printed browser URLs:

http://localhost:8767/app/
https://<tailnet-host>/app/
https://<tunnel-domain>/app/

The web app:

  • records your microphone in the browser
  • sends audio to /v1/turn/voice
  • shows the transcript
  • plays the returned reply audio
  • stores the remote token in the current browser session by default
  • can explicitly remember the token on that device if you enable it in Settings

/remote setup prints the same QR and links as /pk-remote. Use /remote setup bluetooth or /pk-remote bluetooth when the phone is paired over Bluetooth networking/PAN.

For real phone use, prefer an HTTPS URL through Tailscale Serve or a tunnel. If the phone is paired over Bluetooth networking/PAN instead, use /remote setup bluetooth; the Android app treats that as a Bluetooth local-link profile and does not require Tailscale.

Optional Windows tray:

/remote tray on

Right-click the tray icon and choose Show setup QR code. Scanning the QR opens the Android app with this computer's Tailscale endpoint, token, and saved machine profile metadata. Set PI_SPEAK_TRAY=1 to start the tray automatically with /remote on.

NPM-installed tray/service path:

npx -p pi-speak-pk pi-speak-tray

Or, after global install:

pi-speak-tray --install-startup

The tray keeps the headless gateway running in the background, restarts it if it exits, and exposes setup, APK download, status, settings, restart, and web remote actions from the tray menu.

Gemini Live Smoke Test

Use this before wiring a real-time session into the phone UI:

set PI_SPEAK_GEMINI_BACKEND=vertex
set PI_SPEAK_VERTEX_API_KEY=<optional-vertex-api-key>
set GOOGLE_CLOUD_PROJECT=<your-gcloud-project>
set GOOGLE_CLOUD_LOCATION=us-central1
gcloud auth application-default login
pi-speak-gemini-live-smoke --model gemini-2.5-flash-native-audio-preview-12-2025 --modality audio

To run the tray/headless gateway through ElevenLabs voice, backed by Vertex AI Gemini text reasoning:

set ELEVENLABS_API_KEY=<your-elevenlabs-key>
set PI_SPEAK_GEMINI_BACKEND=vertex
set PI_SPEAK_VERTEX_API_KEY=<optional-vertex-api-key>
set GOOGLE_CLOUD_PROJECT=<your-gcloud-project>
set GOOGLE_CLOUD_LOCATION=us-central1
gcloud auth application-default login
set AGENT_PROVIDER=elevenlabs
pi-speak-gateway

This is the recommended high-quality voice stack. It uses ElevenLabs for reply audio and Vertex AI for Gemini reasoning so Google Cloud billing/credits apply through your Cloud project. Set PI_SPEAK_ELEVENLABS_MODEL_ID=eleven_multilingual_v2 when quality matters more than credit use.

To run the tray/headless gateway through Gemini Live instead:

set AGENT_PROVIDER=gemini-live
set PI_SPEAK_GEMINI_BACKEND=vertex
set GOOGLE_CLOUD_PROJECT=<your-gcloud-project>
set GOOGLE_CLOUD_LOCATION=us-central1
set PI_SPEAK_GEMINI_LIVE_MODEL=gemini-2.5-flash-native-audio-preview-12-2025
pi-speak-gateway

Optional environment:

  • PI_SPEAK_GEMINI_BACKEND=vertex|developer-api selects Vertex AI or direct Gemini Developer API
  • PI_SPEAK_VERTEX_API_KEY uses a Vertex AI API key instead of Application Default Credentials
  • GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION configure Vertex AI
  • PI_SPEAK_GEMINI_LIVE_MODEL selects the Live model
  • PI_SPEAK_GEMINI_LIVE_MODALITY=audio|text selects response mode
  • PI_SPEAK_GEMINI_API_VERSION=v1beta|v1alpha selects the Gemini API version
  • PI_SPEAK_ELEVENLABS_MODEL_ID selects the ElevenLabs speech model
  • PI_SPEAK_ELEVENLABS_VOICE_ID selects the ElevenLabs voice

Keep Gemini, Vertex, and ElevenLabs credentials server-side. Do not put them in the Android app or browser app.

Main Commands

/speak

Turns spoken replies on, off, or changes the backend.

Common examples:

/speak on
/speak off
/speak stop
/speak status
/speak test
/speak providers
/speak provider edge
/speak provider openai
/speak provider elevenlabs
/speak rewrite on
/speak rewrite off

Behavior:

  • Pi still keeps the full on-screen response
  • the spoken version can optionally be rewritten for audio clarity
  • /speak stop interrupts playback without disabling speech mode

/mono

Controls the wake-word listener.

/mono on
/mono off
/mono status

Behavior:

  • waits for the wake phrase PK by default
  • activates voice input for a short window
  • keeps the existing /mono flow intact with a faster-whisper wake detector
  • supports PK <session-name> to route into a named session when the target name is spoken clearly
  • keeps short numeric routes deterministic:
    • PK one, PK 1, and PK1 belong to the same 1 family
    • PK two, PK 2, and PK2 belong to the same 2 family
    • 1 stays distinct from 2
    • multi-word names like PK to Google stay literal and are not coerced into 2

/phone

Controls the Telegram bridge.

/phone on
/phone off
/phone status
/phone setup
/phone token <bot-token>
/phone code
/phone unpair

Behavior:

  • text messages become Pi turns
  • voice notes are transcribed, then sent to Pi
  • replies can be delivered as text plus generated audio

/remote

Controls the HTTP API and mobile web app.

/remote on
/remote off
/remote status
/remote token
/remote setup
/remote setup bluetooth
/remote tray on
/remote tray off
/remote tray status

Behavior:

  • starts the HTTP server
  • serves the mobile app from /app/
  • serves the phone setup page from /setup
  • serves the bundled Android APK from /download/pi-speak.apk
  • exposes remote-control endpoints
  • generates a token if one is not already configured
  • prints one-step setup URLs for the browser app and native Android app

/sess

Named sessions, wake aliases, and routing summaries for voice control.

/sess
/sess new bugfix
/sess switch bugfix
/sess name active-work
/sess rename bugfix voice-bugfix
/sess wake one
/sess wake clear one
/sess alias add bugfix one
/sess alias remove one
/sess edit bugfix
/sess remove bugfix
/sess confirm remove bugfix
/sess slots
/sess export
/sess ui
/sess ui open

This matters because PK bugfix can route voice input to that named session, while compact routes like PK one / PK1 and PK two / PK2 can stay stable and distinct.

/sess with no args shows the current session, ready sessions, aliases, store path, a compact 1 vs 2 lane summary, and inline state for known sessions.

Use /sess slots when you want the explicit compact-route view for PK one / PK1 and PK two / PK2.

Use /sess ui for inline guidance without opening another terminal. Use /sess ui open only when you explicitly want the older terminal pane; repeat launches reuse the existing pane instead of creating more terminal windows. The pane mirrors the /sess dashboard, refreshes within one second of external mutations, supports focus movement with arrow keys, tab, or j / k, shows the compact PK1/PK2 route lanes plus a focused-session footer, and adds keybindings [r] rename, [a] alias, [x] remove, and [q] quit.

For operator details, see:

  • docs/VOICE_SESSION_BRIDGE.md
  • docs/SESSION_OPERATIONS.md
  • docs/CODEBASE_MAP.md

Architecture

There are six main subsystems:

  1. index.ts The extension entrypoint. Registers commands, persists state, owns wake-word routing, and coordinates TTS, STT, Telegram, and HTTP control.

  2. tts.ts Multi-provider speech synthesis. Supports legacy, edge, openai, elevenlabs, and auto.

  3. stt.ts and listener/stt_worker.py Remote voice transcription for uploaded audio. auto prefers OpenAI when an API key is present, otherwise a warm local faster-whisper worker process.

  4. listener/listener.py The always-on two-tier listener:

    • Tier 1: faster-whisper tiny for wake-phrase detection
    • Tier 2: faster-whisper for actual speech transcription
  5. phone-bridge.ts Telegram transport for remote text and voice notes.

  6. control-server.ts Local HTTP API, audio artifact serving, and the built-in mobile app host.

Remote Paths

Best Overall: Built-In Mobile Web App

Use this when you want:

  • browser mic capture
  • browser audio playback
  • one-tap remote use from Android
  • compatibility with Tailscale or an HTTPS tunnel

Start it:

/remote on

Open:

https://<your-url>/app/

Best Zero-Friction Fallback: Telegram

Use this when you want:

  • the least setup
  • reliable remote turns
  • simple text plus voice note interaction

Start it:

/phone on

Best Button Grid: Unified Remote

Use this when you want:

  • fast buttons for mono, speak, provider changes, and phone pairing
  • a control surface on the phone

Do not use this as your main audio path. It is a controller, not a real voice transport.

Mobile Web App

The mobile app is built into the extension and served from:

/app/

Capabilities:

  • record a voice turn with the browser microphone
  • send typed fallback text
  • request spoken replies on each turn
  • autoplay returned audio when the browser allows it
  • keep the token in session storage by default
  • optionally remember the token on that device
  • install as a PWA on Android

Token onboarding options:

  1. Paste the token in the Settings panel
  2. Open the app once with:
/app/?token=YOUR_TOKEN

The app will save the token into the current browser session and clean the URL immediately.

Secure-origin rules:

  • localhost works
  • HTTPS works
  • random plain HTTP hostnames usually will not allow browser microphone access

That is why Tailscale Serve or an HTTPS tunnel is the right remote path.

Native Android can also use Bluetooth networking/PAN. Pair the phone with the desktop, start /remote on, run /remote setup bluetooth, then open the native setup link or select the built-in Bluetooth / local link profile and adjust the base URL to the desktop Bluetooth adapter IP if needed. Set PI_SPEAK_BLUETOOTH_BASE_URL before launching Pi Speak if you want /remote setup bluetooth to print a known adapter URL instead of the default http://192.168.44.1:8767/.

HTTP API

Start it with:

/remote on

Default bind:

host: 0.0.0.0
port: 8767

Public Routes

These are available before auth because they serve the built-in app:

GET /
GET /app/
GET /app/index.html
GET /app/app.webmanifest
GET /app/sw.js
GET /app/icon.svg

Control Routes

GET  /v1/health
GET  /v1/status
GET  /v1/diagnostics
GET  /v1/route
POST /v1/route

GET  /v1/mono/status
POST /v1/mono/on
POST /v1/mono/off

GET  /v1/speak/status
GET  /v1/speak/providers
POST /v1/speak/on
POST /v1/speak/off
POST /v1/speak/stop
POST /v1/speak/test
POST /v1/speak/provider/:provider
POST /v1/speak/rewrite/:onOrOff

GET  /v1/phone/status
POST /v1/phone/on
POST /v1/phone/off
POST /v1/phone/code
POST /v1/phone/unpair

GET  /v1/turn/text?text=hello&audio=1
POST /v1/turn/text
POST /v1/turn/voice

GET  /v1/audio/:id

Auth

Local bypass applies only to true localhost requests:

  • localhost
  • 127.0.0.1
  • ::1

Remote clients must send one of:

  • Authorization: Bearer <token>
  • X-Pi-Speak-Token: <token>

Query-string token auth is reserved for:

  • /app/?token=... bootstrap onboarding
  • /v1/audio/:id?token=... reply-audio playback in the browser

Remote control and turn endpoints should use headers, not query-string auth.

Hardening Defaults

The production-oriented defaults are:

  • same-origin CORS unless PI_SPEAK_HTTP_ALLOWED_ORIGINS is set
  • request body limit for text turns: 64 KB
  • request body limit for voice turns: 25 MB
  • lightweight in-memory rate limits for non-local traffic
  • background cleanup of expired reply-audio artifacts
  • authenticated diagnostics at /v1/diagnostics, including a compact summary block for queue state, phone linkage, mono state, current session/target, and active error sources
  • queue/backpressure for remote turns so Pi returns a deterministic busy response instead of piling up unlimited work
  • synchronous remote turns fail fast when the current Pi session is already mid-turn, instead of hanging the HTTP request against the same active session
  • mutating control routes require POST, leaving GET read-only for fetch-safe status endpoints
  • outbound provider calls share a default 30s timeout via PI_SPEAK_OUTBOUND_TIMEOUT_MS

Inspect the active token with:

/remote token

Example Requests

Text turn:

curl -X POST http://127.0.0.1:8767/v1/turn/text ^
  -H "Content-Type: application/json" ^
  -d "{\"text\":\"Summarize the repo\",\"audio\":true}"

Voice turn:

curl -X POST "https://<your-host>/v1/turn/voice?audio=1" ^
  -H "Authorization: Bearer <token>" ^
  -H "Content-Type: audio/webm" ^
  --data-binary "@voice.webm"

Unified Remote

Bundled remote source:

unified-remote/Pi Speak

Install path:

C:\ProgramData\Unified Remote\Remotes\Custom\Pi Speak

What it is good at:

  • toggling mono
  • toggling speak
  • switching providers
  • requesting the Telegram pair code
  • sending short text turns

What it is not good at:

  • full remote voice capture
  • browser-style audio playback
  • low-latency conversational audio

Environment Variables

Core

AGENT_PROVIDER=pi|codex|elevenlabs|gemini|gemini-live
CODEX_BIN=codex
PI_BIN=pi
AGENT_MODEL=
PI_SPEAK_EXECUTION_ROUTER_MODE=auto|pi|codex
AGENT_CWD=
AGENT_WORKSPACE=
PI_SPEAK_TTS_PROVIDER=auto|legacy|edge|openai|elevenlabs
PI_SPEAK_REWRITE_ENABLED=true|false
PI_SPEAK_WAKE_PHRASE=PK
PI_SPEAK_MONO_ACTIVITY_TIMEOUT=15
PI_SPEAK_WAKE_SENSITIVITY=low|medium|high
PI_SPEAK_WAKE_FUZZY_ENABLED=true|false              # optional override
PI_SPEAK_WAKE_FUZZY_MAX_DISTANCE=0|1|2              # optional override
PI_SPEAK_WAKE_COMPACT_PREFIX_ENABLED=true|false     # optional override

If PI_SPEAK_EXECUTION_ROUTER_MODE is unset, explicit AGENT_PROVIDER=pi or AGENT_PROVIDER=codex controls which backend remote turns dispatch to. Set the router mode to auto when you want the conversation router to choose Pi or Codex from the reduced task.

Rewrite

OPENROUTER_API_KEY=...
PI_SPEAK_REWRITE_MODEL=openai/gpt-oss-20b:nitro
PI_SPEAK_OPENROUTER_URL=https://openrouter.ai/api/v1/chat/completions

OpenAI

# Dedicated key for audio TTS (avoids consuming the general LLM key)
PI_SPEAK_OPENAI_KEY=...
# Legacy fallback
VOICE_TOOLS_OPENAI_KEY=...
PI_SPEAK_OPENAI_TTS_MODEL=gpt-4o-mini-tts
PI_SPEAK_OPENAI_VOICE=alloy
PI_SPEAK_REMOTE_OPENAI_STT_MODEL=whisper-1
PI_SPEAK_OPENAI_BASE_URL=https://api.openai.com/v1

ElevenLabs

ELEVENLABS_API_KEY=...
PI_SPEAK_ELEVENLABS_VOICE_ID=pNInz6obpgDQGcFmaJgB
PI_SPEAK_ELEVENLABS_MODEL_ID=eleven_multilingual_v2

Vertex AI Gemini

PI_SPEAK_GEMINI_BACKEND=vertex
PI_SPEAK_VERTEX_API_KEY=<optional-vertex-api-key>
GOOGLE_CLOUD_PROJECT=<your-gcloud-project>
GOOGLE_CLOUD_LOCATION=us-central1
PI_SPEAK_GEMINI_TEXT_MODEL=gemini-2.5-flash
PI_SPEAK_GEMINI_LIVE_MODEL=gemini-2.5-flash-native-audio-preview-12-2025

Run gcloud auth application-default login on the machine hosting the tray/gateway, or set PI_SPEAK_VERTEX_API_KEY to a Vertex AI API key. Enable the Vertex AI API on the Cloud project.

Edge TTS

PI_SPEAK_EDGE_VOICE=en-US-AriaNeural
PI_SPEAK_EDGE_LANG=en-US
PI_SPEAK_EDGE_RATE=1
PI_SPEAK_EDGE_TIMEOUT_MS=15000

Legacy / Local Python

PI_SPEAK_SPEAK11_PATH=...
PI_SPEAK_PYTHON=...
WHISPER_MODEL=tiny
WHISPER_DEVICE=cpu
WHISPER_COMPUTE=int8
PI_SPEAK_REMOTE_WHISPER_MODEL=base
PI_SPEAK_REMOTE_STT_PROVIDER=auto|local|openai

PI_SPEAK_PYTHON and PI_SPEAK_SPEAK11_PATH are now the first-class override path for local Python audio setups. When they are unset, Pi scans the normal Windows user-site Python*/Scripts locations before falling back to PATH resolution.

Telegram

PI_SPEAK_TELEGRAM_BOT_TOKEN=...
TELEGRAM_BOT_TOKEN=...
PI_SPEAK_PHONE_WAIT_TIMEOUT_MS=180000

HTTP Remote

PI_SPEAK_HTTP_HOST=0.0.0.0
PI_SPEAK_HTTP_PORT=8767
PI_SPEAK_HTTP_TOKEN=...
PI_SPEAK_HTTP_AUDIO_TTL_MS=600000
PI_SPEAK_HTTP_AUDIO_CLEANUP_MS=30000
PI_SPEAK_HTTP_ALLOWED_ORIGINS=https://your-tailnet-host,https://your-tunnel-host
PI_SPEAK_HTTP_TIMEOUT_MS=180000
PI_SPEAK_HTTP_TEXT_BODY_LIMIT_BYTES=65536
PI_SPEAK_HTTP_VOICE_BODY_LIMIT_BYTES=26214400
PI_SPEAK_HTTP_RATE_LIMIT_WINDOW_MS=60000
PI_SPEAK_HTTP_RATE_LIMIT_CONTROL=20
PI_SPEAK_HTTP_RATE_LIMIT_VOICE=6
PI_SPEAK_OUTBOUND_TIMEOUT_MS=30000

Troubleshooting

The phone web app opens, but the mic does not work

You are probably not on a secure origin.

Use one of:

  • http://localhost:8767/app/
  • Tailscale Serve over HTTPS
  • Cloudflare Tunnel over HTTPS

/mono on starts, but voice transcription fails

You likely do not have the Python audio stack installed. The local listener depends on:

  • numpy
  • sounddevice
  • faster_whisper

Remote voice turns fail

Check these in order:

  1. /remote status
  2. /v1/diagnostics
  3. /remote token
  4. PI_SPEAK_REMOTE_STT_PROVIDER
  5. OpenAI key or local whisper setup

Speech is using the wrong provider

Check:

/speak status
/speak providers
/speak provider edge

Telegram pairing is stuck

Use:

/phone code
/phone unpair

Then link again with the fresh code.

Testing

Run the automated production-readiness checks with:

npm test

Current automated coverage includes:

  • non-local auth enforcement
  • localhost auth bypass
  • body-size rejection
  • voice content-type rejection
  • rate limiting
  • audio artifact expiry
  • Telegram link + text-turn handling
  • PWA token persistence rules
  • remote queue backpressure behavior
  • runtime path resolution for local Python / speak11
  • explicit listener shutdown signaling with force-kill fallback

Manual Smoke Checklist

Before treating a machine as production-ready, verify:

  1. /mono on
  2. local wake phrase: say PK
  3. /phone on then /phone code, then complete a Telegram text turn and voice-note turn
  4. /remote on, open /app/, complete a text turn and voice turn, and confirm reply audio playback
  5. over Tailscale or your HTTPS tunnel, confirm non-local requests fail without the token and succeed with it

For a full phone-focused run sheet with pass/fail capture fields, use docs/REMOTE_VALIDATION_CHECKLIST.md. For a compact operator worksheet, use docs/REMOTE_VALIDATION_RUN_SHEET.md.

Files You Will Care About

Release Notes

See CHANGELOG.md.