regaos

v0.2.2

Published

a day ago

Local AI agent daemon — voice control, screen vision, any-provider LLM routing, with a permission gateway on every action

0High
0Medium
0Low

shahrryyar

ai agent voice llm on-device local-ai screen-capture permission-gateway openai litellm

Rega

A local AI agent that can hear you, see your screen, and act on your computer — with every action gated behind your explicit permission.

What is this?

Rega is an on-device agent daemon written in Node/TypeScript. You speak a wake-word, give a command, and the agent plans and executes work on your machine. A Next.js dashboard (connected over WebSocket) gives you a live window into what the agent is doing and lets you approve or reject actions before they run.

Nothing phones home to a vendor by default. Model calls route through a LiteLLM proxy you configure, so the models can be local (Ollama) or cloud — swapped without touching code. The agent never executes a side-effecting action without first passing it through the PermissionGateway.

Features

Voice control — Continuous wake-word detection; speech transcribed on-device via Whisper. Responds by voice.
Screen vision — Captures and interprets your screen; specialist agents can act on what they see.
Mouse & keyboard control — Agent can drive your UI when permitted.
Any-provider model routing — All LLM calls go through a configurable LiteLLM proxy in OpenAI-compatible format. Point it at Ollama, Anthropic, OpenAI, or any other backend without code changes.
Permission Gateway — Every side-effecting action (file write, shell exec, network call, mouse/keyboard, payment) is gated and logged. Three policy levels: paranoid, balanced, yolo.
Skills system — Composable agent capabilities loaded at runtime.
Commander / Specialist layer — A CEO agent receives your command, deliberates, selects the best specialist (vision, coding, CMO, etc.), and delegates. Specialists operate within the same permission boundary.
Integrations — Telegram, WhatsApp, social, and Composio toolkit (GitHub, Slack, and more) via an interceptor that also routes through the Gateway.
Docker sandbox — Shell execution runs in an isolated Alpine container when Docker is available.
Audit log — Every executed action is written to an append-only file.
Dry-run mode — Agent describes what it would do without executing.

Safety

Rega controls your screen and can run shell commands. Trust is established structurally, not by convention.

PermissionGateway

Every action with side effects calls gateway.check(action) before execution. There is no code path that bypasses this. The gateway enforces one of three policy levels:

| Level | Behavior | |---|---| | paranoid | Prompts you to approve every action individually, even read operations with side effects. | | balanced | Prompts for destructive or network-mutating actions; allows safe reads. | | yolo | Auto-approves all actions. Use only in a sandboxed or development environment. |

Set PERMISSION_POLICY in your .env. Default is balanced.

Kill switch

A global hotkey and a mouse-to-corner gesture both abort all in-flight agent actions immediately. Configured in .env.

Audit trail

Every executed action is appended to an audit log. Nothing secret (API keys, passwords) is written to the log.

Install (npm) — recommended

Rega ships as a global CLI. Works on Windows, macOS, and Linux.

npm install -g regaos
rega

rega launches the agent daemon + dashboard. On first run it creates a .env from defaults in the current directory and prompts for a wake phrase. (npx regaos runs it without a global install.)

Requires Node.js ≥ 20. The rega command itself works out of the box on all three OSes — npm generates native cmd/PowerShell shims on Windows.

System dependencies (for voice & vision)

The command launches everywhere. Voice/vision features shell out to a few system binaries — rega checks for them on startup and prints install hints for any that are missing. Only ffmpeg is required; the rest are optional.

| Binary | Required | Purpose | Install | |---|---|---|---| | ffmpeg | yes | audio + screen capture | Debian/Ubuntu: sudo apt install ffmpeg · Arch: sudo pacman -S ffmpeg · macOS: brew install ffmpeg · Windows: winget install ffmpeg | | espeak-ng | optional | local text-to-speech (Linux/Windows) | Debian/Ubuntu: sudo apt install espeak-ng · Arch: sudo pacman -S espeak-ng · macOS: brew install espeak-ng · Windows: choco install espeak-ng | | say | optional | text-to-speech (macOS, built-in) | ships with macOS | | powershell | optional | text-to-speech (Windows, built-in) | ships with Windows | | Docker | optional | sandboxed shell execution | docker.com |

Speech-to-text is not a system binary — it runs through an OpenAI-compatible endpoint set via WHISPER_BASE_URL / WHISPER_MODEL in .env (point it at a local or hosted Whisper service).

Then configure your model routing in .env (see Configuration) and run rega again.

Quickstart (from source)

For development or building from a clone.

Prerequisites

Node.js ≥ 20 — nodejs.org
pnpm ≥ 9 — npm install -g pnpm
Docker (optional) — required for sandbox shell execution

1. Clone

git clone https://github.com/shahrryyar/RegaOS.git
cd RegaOS

2. Install dependencies

pnpm install

3. Configure

cp .env.example .env

Open .env and fill in at minimum:

LITELLM_BASE_URL — your LiteLLM proxy URL (e.g. http://localhost:4000 or https://api.openai.com/v1)
LITELLM_API_KEY — API key for your proxy or provider
PERMISSION_POLICY — paranoid, balanced, or yolo

See Configuration for all variables.

4. Run

pnpm start:os

This starts the agent daemon and the web dashboard together. If no .env is found it copies .env.example automatically.

Dashboard → http://localhost:7000
Agent WebSocket → ws://127.0.0.1:3001

5. Speak

Say the configured wake-word, then your command. Watch the dashboard for the deliberation, gateway prompts, and execution log.

Configuration

All configuration lives in .env. The committed .env.example documents every variable with descriptions. Nothing secret is logged.

Key variables:

# Model routing (OpenAI-compatible, via LiteLLM)
LITELLM_BASE_URL=http://localhost:4000   # proxy or provider base URL
LITELLM_API_KEY=                         # API key for the proxy/provider
LITELLM_MODEL=                           # default model for the fallback role

# Agent roles — define one var per role: ROLE_<NAME>_MODEL (NAME uppercased)
COMMANDER_ROLE=commander                 # which role is the commander
ROLE_COMMANDER_MODEL=claude-opus-4-8
ROLE_VISION_MODEL=gpt-4o
ROLE_VISION_BASE_URL=http://localhost:4000   # optional per-role override
ROLE_CODER_MODEL=claude-sonnet-4-6
# Optional per-role: ROLE_<NAME>_API_KEY, ROLE_<NAME>_DESC, ROLE_<NAME>_SYSTEM_PROMPT
# Or supply everything as one JSON blob in ROLES_CONFIG.

# Safety
PERMISSION_POLICY=balanced   # paranoid | balanced | yolo
KILL_HOTKEY=escape           # global abort hotkey
AUDIT_LOG=audit.log          # append-only action log path

# Voice
VOICE_PROVIDER=local         # local (Whisper) | realtime (cloud)
WAKE_PHRASE=                 # trigger phrase (prompted on first run)
WHISPER_BASE_URL=            # STT endpoint (local voice)
WHISPER_MODEL=

# Ports
DASHBOARD_PORT=7000          # web dashboard
WS_PORT=3001                 # agent WebSocket
WS_HOST=                     # bind host (defaults to local)

# Integrations (optional)
COMPOSIO_API_KEY=

See .env.example for the full list.

Project structure

A pnpm monorepo with three packages. agent is the brain (runs on your machine), web is the face (a dashboard you look at), and shared is the contract between them.

regaos/
├── package.json          # root: the `rega` CLI bin + publish scripts
├── publish.sh            # release helper (reads NPM_TOKEN from .env)
├── config/               # default role definitions
├── skills/               # bundled agent skills (each has a SKILL.md)
│   ├── calculator/
│   └── shell-agent/
├── scripts/
│   ├── rega              # the `rega` launcher (Node, cross-platform)
│   ├── build.sh          # builds all packages
│   └── postinstall.js    # "type rega to run" note after install
└── packages/
    ├── shared/           # types & contracts shared by agent + web (no logic)
    │   └── src/          # gateway, roles, voice, messages, integrations, ...
    ├── web/              # Next.js dashboard — WebSocket client, approval UI
    │   └── src/app/      # the dashboard page
    └── agent/            # the on-device daemon — everything below is its src/
        └── src/
            ├── index.ts        # entry point — wires everything together
            ├── commander/      # CEO agent + specialists (routes & delegates)
            ├── gateway/        # PermissionGateway, kill-switch, Composio interceptor
            ├── llm/            # LLM client, role config, trace streaming
            ├── voice/          # wake-word detection + STT (local / realtime)
            ├── screen/         # screen capture, vision, mouse/keyboard control
            ├── platform/       # per-OS audio + screen backends (mac/linux/win)
            ├── integrations/   # Telegram, WhatsApp, social, omnichannel bus
            ├── skills/          # skill discovery, scanning, execution
            ├── memory/         # persistent agent memory
            ├── sandbox/        # Docker-isolated shell execution + secret masking
            ├── init/           # first-run setup, OAuth, knowledge loading
            └── ws/             # WebSocket server (talks to the web dashboard)

The one rule that ties it together: every side-effecting action — file write, shell, network, mouse/keyboard, payment — passes through gateway/ before it runs. No exceptions. See ARCHITECTURE.md for the full request lifecycle.

Contributing

This project is built in the open, and every contribution genuinely means a lot — whether it's a typo fix, a bug report, a new skill, or a whole platform backend. If you take the time to improve Rega, I'll be deeply grateful for it. 🙏

Good first steps:

Found a bug or have an idea? Open an issue.
Want to write code? Read CONTRIBUTING.md first — it covers local setup, the development workflow, and the one hard rule that keeps Rega safe (every side-effecting action routes through the PermissionGateway).
Not sure where to start? Skills are the easiest entry point — each is a self-contained folder with a SKILL.md.

No contribution is too small. Thank you for being here.

License

MIT