openvoiceui

v2026.6.9

Published

20 days ago

Voice-powered AI assistant platform — connect any LLM, any TTS, with a live web canvas, music generation, and agent orchestration

0High
0Medium
0Low

mikecerqua

voice-ai openvoiceui openclaw tts stt ai-assistant voice-agent ai-canvas llm speech-to-text text-to-speech voice-interface ai-music docker

Watch the demo -- see voice-to-canvas in action

Install

Prerequisite: Docker must be installed and running for all install methods.

Pinokio (one-click)

Download Pinokio if you don't have it, then search "OpenVoiceUI" in the app store and click Install.

npm

npx openvoiceui setup     # interactive wizard — walks you through API keys + builds Docker images
npx openvoiceui start     # starts everything

Docker

git clone https://github.com/MCERQUA/OpenVoiceUI.git
cd OpenVoiceUI
cp .env.example .env        # edit with your API keys
docker compose up

Open localhost:5001 and start talking.

What is OpenVoiceUI?

OpenVoiceUI is a hands-free, AI-controlled computer. You talk — it builds. Live web apps, dashboards, games, full websites — rendered in real time while you watch. No mouse, no keyboard, no typing prompts into a chat box.

It runs on OpenClaw and works with any LLM. The AI agent can build and display apps mid-conversation, switch between projects with a voice command, generate music on the fly, delegate work to parallel sub-agents, and remember everything across sessions. It uses any Claude Code or OpenClaw skill — and the community can build and share more through the plugin system.

Self-hosted. Your hardware, your data. MIT licensed, forever free.

Core Features

Hands-Free AI Computer — Talk and watch it work. The AI builds apps, switches between projects, runs tasks, and displays results on a live visual canvas — all without touching a mouse or keyboard.
Live Canvas — AI renders real HTML pages mid-conversation: dashboards, tools, galleries, reports, full web apps. Not text responses — real interactive pages you can use.
AI Music Generation — Generate songs on the fly with your voice using Suno. Full music player with playlist management built in.
Custom Animated Interface — Choose from animated face modes (eye-face avatar, reactive halo-smoke orb) or install community-built faces through plugins. Build your own — the face system is fully extensible.
Sub-Agents — Delegate multiple tasks to parallel AI workers simultaneously and get results back.
Long-Term Memory — Optional context engine plugin curates knowledge every turn. Persists across sessions in human-readable markdown.
Desktop OS Interface — Themed desktop environment with window management (Windows XP, macOS, Ubuntu, Win95, Win 3.1).
Admin Dashboard — Mobile-responsive. Agent profiles, provider config, workspace file browser, plugin management, system health. Everything editable live.
Self-Hosted — Your hardware, your data. No vendor lock-in, no monthly fees.

And More

Image generation (FLUX.1, Stable Diffusion 3.5)
Video creation (Remotion Studio)
Voice cloning (Qwen3-TTS via fal.ai)
Cron jobs for scheduled automation
File explorer with drag-and-drop
Agent profiles — switch personas, voices, and LLM providers from the admin panel

Install Details

Option 1: Pinokio (one-click)

Install Pinokio if you don't have it
Search "OpenVoiceUI" in the Pinokio app store
Click Install, then Start

Pinokio handles Docker, dependencies, and configuration automatically.

Option 2: npm

Requires Node.js 20+, Python 3.10+, and Docker.

npx openvoiceui setup     # interactive wizard — configures LLM, TTS, API keys, builds Docker images
npx openvoiceui start     # starts OpenClaw gateway + Supertonic TTS + voice UI

The setup wizard walks you through choosing an LLM provider, TTS provider, and entering API keys. Configuration is saved to .env and openclaw-data/.

npx openvoiceui stop      # stop all services
npx openvoiceui status    # check what's running
npx openvoiceui logs      # tail service logs

Option 3: Docker

Requires Docker and Docker Compose.

git clone https://github.com/MCERQUA/OpenVoiceUI.git
cd OpenVoiceUI
cp .env.example .env

Edit .env with your API keys (at minimum: an LLM provider key and optionally a TTS key). Then:

docker compose up -d

This starts three containers:

| Container | Port | Purpose | |-----------|------|---------| | openclaw | 18791 | LLM gateway — routes to your chosen LLM provider | | supertonic | (internal) | Free local TTS — no API key needed | | openvoiceui | 5001 | Voice UI + Canvas + Admin dashboard |

Open http://localhost:5001 to use the voice interface, or http://localhost:5001/admin for the admin dashboard.

To stop: docker compose down

Option 4: VPS / Production

For running on an Ubuntu server with nginx and systemd:

git clone https://github.com/MCERQUA/OpenVoiceUI.git
cd OpenVoiceUI
cp .env.example .env               # edit with your API keys
sudo bash deploy/setup-sudo.sh     # creates dirs, installs systemd service
bash deploy/setup-nginx.sh         # generates nginx config (edit domain)

See deploy/ for the full production setup including SSL, nginx reverse proxy, and systemd service files.

Configuration

All configuration is in .env. Copy .env.example to .env and fill in your values.

Required:

An LLM provider API key (OpenAI, Anthropic, Groq, Z.AI, or any OpenClaw-compatible provider)
CLAWDBOT_AUTH_TOKEN — set during npx openvoiceui setup or in OpenClaw's setup wizard

Optional but recommended:

GROQ_API_KEY — enables Groq Orpheus TTS (fast, high quality, free tier)
SUNO_API_KEY — enables AI music generation
CLERK_PUBLISHABLE_KEY — enables login/auth (for multi-user or public deployments)

See .env.example for all available options with descriptions.

Works With Any Provider

LLM

| Provider | Status | |----------|--------| | OpenClaw Gateway | Built-in — routes to OpenAI, Anthropic, Groq, Z.AI, and more | | Z.AI (GLM-5-turbo) | Built-in | | Groq (Llama, Qwen) | Via OpenClaw | | Google Gemini | Via OpenClaw | | MiniMax | Via OpenClaw | | Ollama (local) | Via adapter | | Any LLM | Drop-in gateway plugin |

Text-to-Speech

| Provider | Status | |----------|--------| | Supertonic (local) | Free, ships with Docker setup | | Groq Orpheus | Fast cloud TTS, free tier | | Resemble AI | Premium cloned voices | | Qwen3-TTS (fal.ai) | Voice cloning | | Hume EVI | Emotion-aware | | ElevenLabs | High quality, many voices |

Speech-to-Text

| Provider | Status | |----------|--------| | Web Speech API | Free, browser-native (default) | | Deepgram | Streaming, accurate | | Groq Whisper | Fast cloud transcription |

Admin Dashboard

Access at localhost:5001/admin. Mobile-responsive.

Profiles — View and activate agent personas
Agent Editor — Edit name, voice, LLM provider, system prompt, features, and agent workspace files. 4 tabs: Profile, System Prompt, Features, Agent Files
Plugins — Install and manage face packs, gateways, and extensions
Canvas Pages — Toggle public/private, lock pages, delete with archive
Workspace Files — Browse and edit agent workspace. Audio playback, image preview built in.
Music (Suno) — View all generated songs, play inline, archive tracks
Provider Config — Select LLM, TTS, STT providers. Saves to active profile.
Health and Stats — CPU, RAM, disk, gateway status, session reset
Connector Tests — 12 automated endpoint diagnostics

Use Cases

Small Business — AI receptionist, appointment scheduler, report builder. Talk to your AI and get a live dashboard of today's leads, reviews, and tasks.

Digital Agencies — Deploy custom AI assistants per client. Multi-tenant ready. Each client gets their own voice-powered workspace.

Developers — Fork it, extend it, deploy it anywhere. MIT licensed. Build custom plugins, gateway adapters, and canvas pages on top of a voice-first platform.

How It's Different

| | OpenVoiceUI | Typical Voice AI | |---|---|---| | Source | Open source (MIT) | Closed source | | Canvas UI | Live HTML rendering | Text/audio only | | Skills | Any Claude Code or OpenClaw skill | API endpoints | | Music | AI music generation (Suno) | None | | Memory | Plugin-based long-term context | Session only | | Admin | Full dashboard, mobile-ready | Config files | | Plugins | Community face packs, pages, workflows | None | | Hosting | Self-hosted, your data | Vendor cloud only | | Pricing | Free forever | Per-minute billing |

Tech Stack

| Layer | Technology | |-------|-----------| | Backend | Python / Flask | | Frontend | Vanilla JS (ES modules, no framework) | | Canvas | Fullscreen iframe + SSE | | STT | Web Speech API, Deepgram, Groq Whisper | | TTS | Supertonic, Groq Orpheus, Resemble, Qwen3-TTS | | LLM | Any provider via OpenClaw gateway | | Memory | Context engine plugin (markdown knowledge base) | | Auth | Clerk (optional) | | Deploy | npm, Docker, Pinokio, VPS/systemd |

Plugins

OpenVoiceUI has a plugin system for community-built extensions. Plugins can include animated face packs, canvas pages, workflow dashboards, gateway adapters, or any combination.

| Plugin | Type | Description | |--------|------|-------------| | BHB Animated Characters | Face Pack | Animated BigHead Billionaires character avatars with lip-sync, mood expressions, and show lore. By BHaleyart | | Hermes Agent | Gateway | Self-improving AI agent (Hermes v0.13.0 / nousresearch/hermes-agent:v2026.5.7, pegged — never :latest) with auto-generated skills, deep memory search, autonomous tasks, multi-agent Kanban, goal-locking, video analysis, voice cloning. Adds OpenClaw+Hermes hybrid and Hermes-only modes | | SEO Platform | Canvas Page | Full SEO dashboard powered by DataForSEO — keyword research, rank tracking, backlink analysis, site audits, AI visibility, and local SEO | | Twenty CRM | Canvas Page | Connect to a Twenty CRM instance for contact, company, deal, and task management with embedded CRM view and setup wizard |

Build your own. Face packs, canvas pages, workflow dashboards, gateway adapters (template), or STT/TTS adapters (template). See the plugins repo for submission guidelines.

Documentation

Contributing

We welcome contributions — especially plugins. Build a face pack, a canvas page, a workflow dashboard, or a full extension and submit it to the plugins repo. See CONTRIBUTING.md for code contribution guidelines and openvoiceui.com for full documentation.

License

MIT