midnight-ai
v0.1.1
Published
A powerful local and API-driven AI system assistant
Maintainers
Readme
🌙 Midnight
Local AI-powered voice assistant for OS automation through natural language.
Midnight combines speech recognition (STT), a Small Language Model (LLM 1.5B), and a secure terminal execution environment — all running locally on your machine.
Recommended Model: We recommend using Qwen2.5-Coder-1.5B-Instruct (GGUF). It's a "Small Language Model" (SLM) optimized for code and terminal logic. With 1.5 billion parameters, it's the perfect balance between intelligence (understanding complex commands) and efficiency (runs on standard hardware with ~1.2GB RAM usage).
Quick Start • Architecture • Safety • Contributing
Why Midnight?
- Privacy First: No cloud APIs required. Your voice and data stay on your machine.
- Secure by Design: Commands are validated against safety rules before execution.
- Developer Friendly: Easily extensible with custom tools and logic.
- Cross-Platform: Built to work on Windows, Linux, and macOS.
Architecture
Voice/Text Input
│
▼
┌─────────────┐
│ STT Engine │ faster-whisper (int8)
│ (optional) │
└──────┬──────┘
│ text
▼
┌─────────────┐
│ Router │ (classify: terminal, chat, search)
└──────┬──────┘
│
▼
┌─────────────┐
│ LLM Core │ Qwen 1.5B / Cloud API
└──────┬──────┘
│
├─────────────────────┐
▼ ▼
┌─────────────┐ ┌──────────────────┐
│ Chat / TTS │ │ Safety Layer │
│ Response │ │ (Guard + Rules) │
└──────────────┘ └────────┬─────────┘
│
▼
┌──────────────────┐
│ Executor │
│ subprocess │
└──────────────────┘Project Structure
midnight/
├── core/ # Config, engine orchestrator, session management
├── safety/ # Command validation, execution, rule engine
├── stt/ # Speech-to-text (faster-whisper + audio capture + VAD)
├── tts/ # Text-to-speech (Piper ONNX + playback)
├── llm/ # Model loading, LoRA adapter manager, router, prompts
└── ui/ # Rich terminal interface
training/ # QLoRA fine-tuning scripts for Colab/Kaggle
tests/ # Unit testsQuick Start
# Install core dependencies
pip install -r requirements.txt
# Or install with specific components
pip install -e ".[stt]" # + speech recognition
pip install -e ".[tts]" # + voice synthesis
pip install -e ".[llm]" # + language model
pip install -e ".[all]" # everything
# Run
python main.pySafety
Commands are parsed with shlex and validated before execution:
| Level | Commands | Behavior |
|-------|----------|----------|
| 0 (Safe) | ls, pwd, whoami, echo | Instant execution |
| 1 (Normal) | mkdir, git, pip, cp | Execute with notice |
| 2 (Dangerous) | rm, sudo, mv, chmod | Requires confirmation |
| Blacklist | mkfs, dd, rm -rf / | Blocked |
Additional protections: pipe chain detection, redirect blocking, command substitution prevention.
Models
- Model: Qwen2.5-1.5B-Instruct (GGUF, ~1.2 GB in 4-bit) or Cloud API (Gemini/OpenAI)
- Inference Engine: llama.cpp (CPU/GPU) or API Client
Training
See training/README.md for fine-tuning instructions using QLoRA on Google Colab.
Requirements
- Python 3.10+
- RAM: 8 GB minimum
- GPU: Optional (2+ GB VRAM recommended for STT)
- OS: Windows, Linux, macOS
License
MIT
