@askalf/agent
v0.3.7
Published
Open source computer-use agent — control your computer with natural language or voice
Maintainers
Readme
@askalf/agent
Your Claude subscription now controls your entire computer.
One npm install. Uses your existing Claude Pro/Max subscription — zero extra API costs. PowerShell-first. Voice control. Interactive sessions. Full computer control.
Install
npm i -g @askalf/agentRequires Node.js 20+ and Claude CLI.
Quick Start
# 1. Install Claude CLI (if you don't have it)
npm i -g @anthropic-ai/claude-code
claude auth login
# 2. Authenticate
askalf-agent auth
# Select "Claude Login" (recommended)
# 3. Run
askalf-agent run "open notepad and type hello world"
# 4. Voice mode — talk to your computer
askalf-agent voice-setup # one-time: downloads whisper.cpp
askalf-agent run "open notepad" --voiceThat's it. Claude opens Notepad, types "Hello World", then asks "What next?" — type or speak your next command.
How It Works
$ askalf-agent run "open chrome and go to amazon.com"
✔ AskAlf Agent — Computer Control
ℹ Using Claude subscription (no per-token costs)
ℹ Type "exit" or Ctrl+C to quit
ℹ → open chrome and go to amazon.com
✔ Chrome is open with Amazon loaded.
ℹ (6 turns)
❯ What next? open notepad and type hello world
✔ Notepad now has "Hello World" in it.
ℹ (14 turns)
❯ What next?
🎙 Listening... (press Enter to stop)
Heard: "minimize everything and open spotify"
✔ Desktop minimized and Spotify is now open.
ℹ (4 turns)
❯ What next? exit
ℹ Session ended.PowerShell-first — Claude runs PowerShell commands directly to open apps, browse the web, manage files, and automate tasks. No slow screenshot loops. A screenshot MCP tool is available when Claude needs to visually verify what's on screen, but most tasks complete entirely through PowerShell.
Voice control — Add --voice to speak commands instead of typing. Uses local whisper.cpp for transcription — free, private, completely offline. No cloud APIs, no data leaves your machine.
Authentication
Claude Login (Recommended)
Uses your existing Claude Pro/Max subscription. Zero extra API costs. This is the default.
npm i -g @anthropic-ai/claude-code
claude auth login
askalf-agent auth
# Select "Claude Login"API Key (Fallback)
Paste your Anthropic API key. Pay per token. Uses the Anthropic SDK directly with the computer_20251124 tool.
askalf-agent auth
# Select "API Key" → paste your sk-ant-... keyNote: SDK mode uses computer-use API calls which cost per token. A simple task like "open notepad" can cost several dollars. Claude Login mode is strongly recommended.
Commands
askalf-agent run "<prompt>"
Start an interactive computer control session.
askalf-agent run "resize all images in ./assets to 800px wide"
askalf-agent run "open VS Code and create a Flask hello world app"
askalf-agent run "go to github.com and star the SprayberryLabs/agent repo"Each task completes and prompts "What next?" for follow-up commands. Type exit or hit Ctrl+C to end the session.
Options:
-v, --voice— Use voice input (microphone → whisper transcription)-m, --model <model>— Model to use (default:claude-sonnet-4-6)-b, --budget <amount>— Max budget in USD for SDK mode (default:5.00)-t, --turns <count>— Max turns per task (default:50)
askalf-agent auth
Configure authentication interactively.
askalf-agent auth --status— Show current auth status
askalf-agent voice-setup
Download whisper.cpp binary and speech model for voice control. One-time setup.
askalf-agent voice-setup # default: base.en model (~148MB)
askalf-agent voice-setup --model tiny # smaller/faster (~75MB)
askalf-agent voice-setup --model small # more accurate (~466MB)askalf-agent check
Verify platform dependencies are installed (including voice/whisper status).
askalf-agent config
View or update configuration.
askalf-agent config --model claude-opus-4-6 --turns 100What It Can Do
| Capability | How |
|---|---|
| Open apps | Start-Process chrome, Start-Process notepad |
| Browse the web | Opens Chrome, navigates sites, fills forms |
| Manage files | Create, move, read, edit files anywhere on your system |
| Run commands | Git, npm, Docker, Python — any CLI tool |
| See your screen | Screenshot tool for visual verification when needed |
| Voice control | Speak commands via local whisper.cpp — offline, private |
| Chain tasks | Interactive loop — complete a task, ask "What next?" |
Platform Support
| OS | Status | Computer Control |
|----|--------|-----------------|
| Windows | Full support | PowerShell (pre-installed) |
| macOS | Full support | cliclick (brew install cliclick) |
| Linux (X11) | Full support | xdotool + scrot (apt install xdotool scrot) |
| Linux (Wayland) | Full support | ydotool + grim (apt install ydotool grim) |
Voice control requires SoX (Windows/macOS) or arecord (Linux, pre-installed). Whisper binary is downloaded automatically by voice-setup.
Run askalf-agent check to verify your setup.
Architecture
askalf-agent run "open chrome" --voice
│
├── Input ─────────────────────────────
│ │
│ ├── --voice OFF: readline (keyboard)
│ └── --voice ON: mic → whisper.cpp → text
│
├── Claude Login (default)
│ │
│ ├── Spawns claude CLI
│ ├── --append-system-prompt (computer control agent)
│ ├── --mcp-config (screenshot tool)
│ ├── Claude uses built-in bash → PowerShell
│ └── Interactive loop: task → "What next?" → repeat
│
└── API Key (fallback)
│
├── Anthropic SDK direct
├── computer_20251124 + bash + text_editor tools
└── Single-run with cost summaryThe MCP server exposes a single screenshot tool. All other computer control happens through Claude's built-in bash tool running PowerShell commands — this is dramatically faster than screenshot-based control loops.
Configuration
Config stored at ~/.askalf/config.json:
{
"authMode": "oauth",
"model": "claude-sonnet-4-6",
"maxBudgetUsd": 5.00,
"maxTurns": 50,
"voice": {
"whisperModel": "base",
"silenceThresholdDb": -40,
"silenceDurationMs": 1500
}
}Full Platform
This CLI is a standalone agent for individual use. For multi-agent orchestration, scheduling, cost controls, 24 built-in tools, and team collaboration, check out the full AskAlf platform.
Links
License
MIT
