swictation

v0.7.27

Published

3 months ago

Cross-platform voice-to-text dictation for Linux and macOS with GPU acceleration (NVIDIA CUDA/CoreML), Secretary Mode (60+ natural language commands), Context-Aware Meta-Learning, and pure Rust performance. Meta-package that automatically installs platfor

Swictation CLI

Command-line interface for managing the Swictation voice dictation daemon.

✨ Wayland Support - Fully Automated! ✨

NEW in v0.4.0: Swictation now fully supports Wayland-based Linux desktops with automated installation:

✅ GNOME Wayland (Ubuntu 25.10+) - Hotkeys auto-configured
✅ Sway - Text injection auto-setup
✅ GPU Acceleration - CUDA libraries auto-downloaded
✅ Service Management - Auto-enabled systemd service

One command installation:

npm install -g swictation --foreground-scripts
# Everything is configured automatically!

→ Complete Wayland Support Guide

Features

🎤 Real-time voice transcription using Parakeet-TDT-1.1B (NVIDIA)
📝 Secretary Mode - 60+ natural language commands for dictation
- Say "comma" → Get ","
- Say "mr smith said quote hello quote" → Get "Mr. Smith said 'Hello'"
- Automatic capitalization, punctuation, numbers, quotes, formatting
- → Full Secretary Mode Guide
🔄 Smart text transformation - MidStream Rust library (~5µs latency)
⚡ Low latency - Pure Rust implementation with CUDA acceleration
🖥️ Wayland & X11 Support - Full dual display server support
🎯 Hotkey support - Auto-configured on GNOME, manual setup on Sway
📊 Real-time metrics - WPM, latency, GPU/CPU usage
🦀 Pure Rust daemon - Zero Python runtime dependencies

System Requirements

Required

Operating System: Ubuntu 24.04 LTS or later (x64)
Node.js: >= 18.0.0
GLIBC: >= 2.39 (Ubuntu 24.04+)

libasound2: Audio library for ALSA

# Ubuntu 24.04+
sudo apt-get install libasound2t64

# Ubuntu 22.04 and older
sudo apt-get install libasound2

Optional (for GPU Acceleration)

NVIDIA GPU: Any CUDA-capable GPU
CUDA: 12.x or later
VRAM: 4GB minimum (6GB+ recommended for 1.1B model)
Driver: NVIDIA proprietary drivers (not nouveau)

Note: GPU acceleration libraries (~2.3GB CUDA + cuDNN) are automatically downloaded during installation. These include CUDA runtime and cuDNN libraries optimized for your GPU architecture.

If GPU is not available, Swictation will run in CPU-only mode (slower transcription).

Installation

One-Command Installation

Recommended: User-Local Installation (No sudo required)

If using nvm (Node Version Manager)

nvm already handles user-local installations automatically! No additional setup needed:

# Just install (nvm manages the prefix)
npm install -g swictation --foreground-scripts

The installation will automatically:

Download GPU acceleration libraries (~2.3GB, if NVIDIA GPU detected)
Install and configure ydotool (Wayland text injection)
Configure GNOME keyboard shortcuts (if on GNOME Wayland)
Set up and enable systemd service
Start the daemon service

Important: If you previously set a custom npm prefix, remove it:

npm config delete prefix

If NOT using nvm

Configure npm to install packages to your home directory:

# One-time setup: Configure npm to use user-local directory
mkdir -p ~/.npm-global
npm config set prefix ~/.npm-global

# Add to PATH (add this line to ~/.bashrc or ~/.zshrc)
export PATH="$HOME/.npm-global/bin:$PATH"

# Source your shell config to apply changes
source ~/.bashrc  # or source ~/.zshrc

# Now install without sudo
npm install -g swictation --foreground-scripts

Alternative: System-Wide Installation (Requires sudo)

# Install with visible output
sudo npm install -g swictation --foreground-scripts

# Or standard install (output hidden by npm)
sudo npm install -g swictation

Why user-local installation is recommended:

✅ No sudo required
✅ No permission issues
✅ Each user has their own installation
✅ Cleaner and more secure
✅ Works with nvm seamlessly

Note: The --foreground-scripts flag makes the installation progress visible. The postinstall script will:

Download GPU acceleration libraries (~2.3GB CUDA + cuDNN, if NVIDIA GPU detected)
Install and configure ydotool (Wayland text injection)
Set up uinput kernel module and permissions
Configure GNOME keyboard shortcuts (if on GNOME Wayland)
Generate systemd service files with GPU library paths
Enable and start the daemon service
Test GPU model loading (if GPU detected)

Installation Options

You can customize the installation with environment variables:

# Default: Standard install with GPU model testing (if GPU detected)
npm install -g swictation --foreground-scripts

# Skip model testing for CI/headless environments (faster install)
SKIP_MODEL_TEST=1 npm install -g swictation --foreground-scripts

Installation behavior:

Default: Model test-loading runs automatically when GPU is detected (~30s to verify)
SKIP_MODEL_TEST=1: Skips model testing entirely (useful for CI/automated environments)

Note: Model testing is recommended to ensure your GPU setup works correctly. It verifies that the daemon can load models with your CUDA installation.

Quick Start

GNOME Wayland (Ubuntu 25.10+): Everything is auto-configured! Just start using it:

# Installation already configured everything, just start using:
swictation status         # Check daemon is running
# Press Super+Shift+D to toggle recording (already configured!)

Sway/Other Compositors: Add hotkey binding to your config (see Hotkey Configuration section), then:

swictation status         # Check daemon is running
# Use your configured hotkey to toggle recording

Manual Setup (if needed):

swictation setup          # Configure systemd service
swictation start          # Start the daemon
swictation toggle         # Toggle recording via command

Optional UI:

swictation start --ui     # Launch with visual interface

Commands

swictation start [--ui] - Start the daemon (and optionally the UI)
swictation stop - Stop the daemon
swictation status - Show service status
swictation toggle - Toggle recording on/off
swictation setup - Configure systemd and hotkeys
swictation help - Show help message

Display Server Support

Fully Supported (Automated Setup)

✅ GNOME Wayland (Ubuntu 25.10+) - Keyboard shortcuts auto-configured
✅ Sway - Text injection auto-setup, manual hotkey configuration
✅ X11 - Traditional X11 support with xdotool

System Requirements

OS: Linux x64
Distribution: Ubuntu 24.04 LTS or newer (GLIBC 2.39+)
- ✅ Ubuntu 24.04 LTS (Noble Numbat)
- ✅ Ubuntu 25.10+ (Questing Quetzal) - Recommended for Wayland
- ✅ Debian 13+ (Trixie)
- ✅ Fedora 39+
- ❌ Ubuntu 22.04 LTS - NOT supported (GLIBC 2.35 too old)
Node.js: 18.0.0 or higher
Storage: ~12 GB total (AI models + GPU libraries)
GPU: NVIDIA with 4GB+ VRAM (CUDA 12.x) or CPU-only mode

GPU Requirements (1.1B Model - Recommended)

For optimal performance with the 1.1B model (62.8x realtime speed):

GPU: NVIDIA GPU with 4GB+ VRAM
CUDA: 11.8+ or 12.x
Compute Capability: 7.0+ (Turing architecture or newer)
ONNX Runtime: onnxruntime-gpu 1.16.0+

GPU Acceleration

GPU acceleration libraries (CUDA + cuDNN) are automatically downloaded during npm installation. No manual CUDA installation required!

Runtime Dependencies

Automatically Installed by npm postinstall:

ydotool (Wayland text injection) - Auto-installed on Wayland systems
netcat (IPC socket communication) - Auto-installed

Pre-installed on most systems:

GLIBC 2.39+ (Ubuntu 24.04+)
libasound2 (ALSA sound library)
systemd (service management)
PipeWire or PulseAudio (audio capture)

For X11 systems (optional):

sudo apt install xdotool  # Ubuntu/Debian
sudo pacman -S xdotool    # Arch Linux
sudo dnf install xdotool  # Fedora

Configuration

Configuration file is located at ~/.config/swictation/config.toml

Hotkey Configuration

GNOME Wayland: ✅ Automatically configured during installation!

Default hotkey: Super+Shift+D
Visible in: Settings → Keyboard → View and Customize Shortcuts
No manual configuration needed

Sway - Add to ~/.config/sway/config:

# Swictation toggle
bindsym $mod+Shift+d exec swictation toggle

# Optional: Push-to-talk (hold to record)
bindsym $mod+Space exec swictation ptt-press
bindsym --release $mod+Space exec swictation ptt-release

Then reload: swaymsg reload

X11 (i3, etc.) - Add to config:

bindsym Mod4+Shift+d exec swictation toggle

→ See Complete Wayland Support Guide for detailed setup and troubleshooting.

Architecture

Runtime Components

Swictation consists of three main components:

Daemon (swictation-daemon) - Rust service handling audio capture and transcription
UI (swictation-ui) - Tauri application for metrics and control
CLI (swictation) - Node.js command-line interface

Package Architecture

Swictation uses a platform-specific package architecture similar to esbuild and swc:

swictation (main package)
├── @agidreams/linux-x64 (Linux x86_64 binaries)
└── @agidreams/darwin-arm64 (macOS Apple Silicon binaries)

How it works:

Automatic Platform Detection: When you run npm install -g swictation, npm automatically detects your platform (Linux x64 or macOS ARM64)
Platform Package Installation: npm installs the main swictation package PLUS the correct platform package for your system via optionalDependencies
Binary Resolution: The CLI wrapper automatically finds and uses the binaries from your platform package

Example installation flow on Linux:

npm install -g swictation
# npm automatically installs:
#   - swictation (main package with CLI)
#   - @agidreams/linux-x64 (platform binaries)

What's in each package:

| Package | Contents | |---------|----------| | swictation | CLI wrapper, postinstall scripts, configuration | | @agidreams/linux-x64 | Linux ELF binaries, ONNX Runtime base library | | @agidreams/darwin-arm64 | macOS Mach-O ARM64 binaries, ONNX Runtime with CoreML |

Benefits:

✅ Smaller downloads - only your platform's binaries are installed
✅ Cleaner architecture - binaries grouped by platform
✅ Better CI/CD - each platform built independently
✅ Same simple install command - npm install -g swictation just works

Versioning Strategy

Swictation uses semantic versioning with a two-level approach:

Distribution Version (user-facing): 0.7.9
- The version you see in npm install [email protected]
- Synchronized across all three packages (main + both platform packages)
- Single version number for the entire distribution
Component Versions (internal):
- Daemon: 0.7.5 (Rust binary version)
- UI: 0.1.0 (Tauri application version)
- ONNX Runtime: 1.22.0 (macOS), 1.23.2 (Linux GPU)

Version Synchronization:

All package versions are automatically synchronized via npm-package/versions.json:

{
  "distribution": "0.7.9",
  "components": {
    "daemon": { "version": "0.7.5" },
    "ui": { "version": "0.1.0" }
  }
}

Why separate versions?

Components evolve at different rates (UI changes rarely, daemon changes often)
Users see ONE simple version number for the distribution
Developers track component versions internally for debugging
Platform packages use distribution version for npm publishing

Checking versions:

swictation --version
# Shows distribution version: 0.7.9

swictation-daemon --version
# Shows daemon component version: 0.7.5

Verification

After installation, verify everything is working:

# Run comprehensive verification script
./node_modules/swictation/scripts/verify-installation.sh

# Or check manually
swictation status        # Check daemon status
systemctl --user status swictation-daemon.service
journalctl --user -u swictation-daemon -n 50  # View logs

Expected on Wayland:

✓ Wayland detected
✓ GNOME/Sway detected (as applicable)
✓ GPU libraries installed (if NVIDIA GPU present)
✓ ydotool installed
✓ uinput module loaded
✓ User in 'input' group
✓ GNOME keyboard shortcut configured (if GNOME)
✓ Service enabled and running
✓ IPC socket responding

Troubleshooting

Wayland-Specific Issues

Text not typing after transcription:

# 1. Verify ydotool permissions
sg input -c 'ydotool type "test"'

# 2. Check group membership (may need logout/login)
groups | grep input

# 3. Verify uinput permissions
ls -l /dev/uinput
# Should show: crw-rw---- 1 root input

# 4. Re-run setup if needed
./node_modules/swictation/scripts/setup-ydotool.sh

GNOME: Hotkey not working (types "D" instead):

# Verify shortcut configuration
gsettings get org.gnome.settings-daemon.plugins.media-keys custom-keybindings

# Re-configure if needed
./node_modules/swictation/scripts/setup-gnome-shortcuts.sh

# Check in Settings → Keyboard → View and Customize Shortcuts
# Look for "Swictation Toggle"

Slow transcription (20-30 seconds instead of <1 second): This means CPU mode instead of GPU. Always start daemon via systemctl, not manually!

# Restart via systemctl (has correct LD_LIBRARY_PATH)
systemctl --user restart swictation-daemon.service

# Check logs for GPU detection
journalctl --user -u swictation-daemon.service | grep CUDA
# Expected: "Successfully registered `CUDAExecutionProvider`"
# Expected: "cuDNN version: 91501"

Platform Package Issues

"Platform package @agidreams/linux-x64 not found"

Cause: npm's optionalDependencies failed to install the platform package.

Solution:

# Force reinstall with clean cache
npm install -g swictation --force

# Or manually install platform package
npm install -g @agidreams/linux-x64
npm install -g swictation

Verify installation:

# Check that platform package is installed
npm list -g @agidreams/linux-x64    # Linux
npm list -g @agidreams/darwin-arm64  # macOS

# Check binary locations
swictation --version
# Should show platform package location

"Unsupported platform: win32-x64"

Cause: Swictation currently supports Linux x64 and macOS ARM64 only.

Supported platforms:

✅ Linux x86_64 (Ubuntu 24.04+, Debian 13+, Fedora 39+)
✅ macOS Apple Silicon (ARM64)
❌ Windows (planned for future release)
❌ macOS Intel (planned for future release)
❌ Linux ARM64 (planned for future release)

"Platform package installed but binaries are missing"

Cause: Platform package build was incomplete or corrupted.

Solution:

# Reinstall platform package
npm uninstall -g @agidreams/linux-x64
npm install -g @agidreams/linux-x64 --force

# Verify binaries exist
ls -lh ~/.npm-global/lib/node_modules/@agidreams/linux-x64/bin/
# Should show: swictation-daemon, swictation-ui

"Wrong architecture - ELF 64-bit instead of Mach-O"

Cause: Wrong platform package installed (Linux package on macOS or vice versa).

Solution:

# Remove wrong platform package
npm uninstall -g @agidreams/linux-x64
npm uninstall -g @agidreams/darwin-arm64

# Clean install with correct platform detection
npm install -g swictation --force

General Installation Issues

"Cannot see postinstall output"

Solution: Use the --foreground-scripts flag:

sudo npm install -g swictation --foreground-scripts

"GPU libraries download failed (404)"

Cause: GitHub release for this version doesn't exist yet. Solution: This only happens with unreleased versions. Published npm versions will work.

"libasound.so.2: cannot open shared object file"

Cause: Missing ALSA library. Solution:

# Ubuntu 24.04+
sudo apt-get install libasound2t64

# Ubuntu 22.04 and older
sudo apt-get install libasound2

"Model test-loading failed during installation"

Cause: GPU not detected or CUDA libraries missing. Solution: This is non-fatal. Swictation will fall back to CPU mode. To verify GPU setup after installation:

# Check if CUDA is available
nvidia-smi

# Check CUDA libraries
ls /usr/local/cuda/lib64/

Service won't start

# Check status
swictation status

# View logs
journalctl --user -u swictation-daemon -f

No audio input

# List audio devices
arecord -l

# Test microphone
arecord -d 5 test.wav && aplay test.wav

Text not being typed

Wayland: See "Wayland-Specific Issues" section above
X11: Ensure xdotool is installed
Check logs: journalctl --user -u swictation-daemon -f

Audio device issues

Only 1 audio device detected (should see 4+): The systemd service should automatically import PulseAudio/PipeWire environment variables. Verify the service is running:

systemctl --user status swictation-daemon.service

Real-time transcription not working (only transcribes at end)

Cause: VAD threshold too low - background noise exceeds threshold.

Solution: Adjust in ~/.config/swictation/config.toml:

vad_threshold = 0.25      # Default optimized for real-time
vad_min_silence = 0.8     # Seconds of silence before flush

Lower threshold (0.003) causes continuous speech detection. Optimal is 0.25 for balanced sensitivity.

After fixing environment variables

Always reload systemd and restart the daemon:

systemctl --user daemon-reload
systemctl --user restart swictation-daemon

# Verify it started correctly
systemctl --user status swictation-daemon
journalctl --user -u swictation-daemon -n 50

Platform Support

| Platform | Status | Notes | |----------|--------|-------| | Linux Wayland (GNOME) | ✅ Fully Supported | Auto-configured hotkeys, ydotool setup | | Linux Wayland (Sway) | ✅ Fully Supported | Auto ydotool setup, manual hotkeys | | Linux X11 | ✅ Supported | Traditional X11 support | | NVIDIA GPU Acceleration | ✅ Supported | Auto-downloaded CUDA + cuDNN | | Linux AMD GPU | 🚧 Planned | ROCm support | | macOS (Apple Silicon/Intel) | 🚧 Planned | CoreML/Metal execution providers | | Windows + NVIDIA | 🚧 Planned | CUDA support | | Windows + AMD/Intel | 🚧 Planned | DirectML execution provider |

Development

Source code: https://github.com/yourusername/swictation

Building from source

# Clone repository
git clone https://github.com/yourusername/swictation
cd swictation

# Build Rust components
cd rust-crates
cargo build --release

# Build Tauri UI
cd ../tauri-ui
npm install
npm run build

License

MIT

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

Additional Resources

Wayland Support Guide - Complete Wayland setup and troubleshooting
Secretary Mode Guide - Natural language commands reference
Testing & Validation Guide - Display server testing procedures

Support

Issues: https://github.com/robertelee78/swictation/issues
Discussions: https://github.com/robertelee78/swictation/discussions
Source Code: https://github.com/robertelee78/swictation

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Swictation CLI

✨ Wayland Support - Fully Automated! ✨

Features

System Requirements

Required

Optional (for GPU Acceleration)

Installation

One-Command Installation

If using nvm (Node Version Manager)

If NOT using nvm

Alternative: System-Wide Installation (Requires sudo)

Installation Options

Quick Start

Commands

Display Server Support

Fully Supported (Automated Setup)

System Requirements

GPU Requirements (1.1B Model - Recommended)

GPU Acceleration

Runtime Dependencies

Configuration

Hotkey Configuration

Architecture

Runtime Components

Package Architecture

Versioning Strategy

Verification

Troubleshooting

Wayland-Specific Issues

Platform Package Issues

"Platform package @agidreams/linux-x64 not found"

"Unsupported platform: win32-x64"

"Platform package installed but binaries are missing"

"Wrong architecture - ELF 64-bit instead of Mach-O"

General Installation Issues

"Cannot see postinstall output"

"GPU libraries download failed (404)"

"libasound.so.2: cannot open shared object file"

"Model test-loading failed during installation"

Service won't start

No audio input

Text not being typed

Audio device issues

Real-time transcription not working (only transcribes at end)

After fixing environment variables

Platform Support

Development

Building from source

License

Contributing

Additional Resources

Support