@askalf/agent

v0.3.7

Published

9 days ago

Open source computer-use agent — control your computer with natural language or voice

0High
0Medium
0Low

askalf

ai agent computer-use claude anthropic automation voice-control whisper speech-to-text

@askalf/agent

Your Claude subscription now controls your entire computer.

One npm install. Uses your existing Claude Pro/Max subscription — zero extra API costs. PowerShell-first. Voice control. Interactive sessions. Full computer control.

Install

npm i -g @askalf/agent

Requires Node.js 20+ and Claude CLI.

Quick Start

# 1. Install Claude CLI (if you don't have it)
npm i -g @anthropic-ai/claude-code
claude auth login

# 2. Authenticate
askalf-agent auth
# Select "Claude Login" (recommended)

# 3. Run
askalf-agent run "open notepad and type hello world"

# 4. Voice mode — talk to your computer
askalf-agent voice-setup          # one-time: downloads whisper.cpp
askalf-agent run "open notepad" --voice

That's it. Claude opens Notepad, types "Hello World", then asks "What next?" — type or speak your next command.

How It Works

$ askalf-agent run "open chrome and go to amazon.com"

✔ AskAlf Agent — Computer Control
ℹ Using Claude subscription (no per-token costs)
ℹ Type "exit" or Ctrl+C to quit

ℹ → open chrome and go to amazon.com

✔ Chrome is open with Amazon loaded.
ℹ (6 turns)

❯ What next? open notepad and type hello world

✔ Notepad now has "Hello World" in it.
ℹ (14 turns)

❯ What next?
🎙 Listening... (press Enter to stop)
Heard: "minimize everything and open spotify"

✔ Desktop minimized and Spotify is now open.
ℹ (4 turns)

❯ What next? exit
ℹ Session ended.

PowerShell-first — Claude runs PowerShell commands directly to open apps, browse the web, manage files, and automate tasks. No slow screenshot loops. A screenshot MCP tool is available when Claude needs to visually verify what's on screen, but most tasks complete entirely through PowerShell.

Voice control — Add --voice to speak commands instead of typing. Uses local whisper.cpp for transcription — free, private, completely offline. No cloud APIs, no data leaves your machine.

Authentication

Claude Login (Recommended)

Uses your existing Claude Pro/Max subscription. Zero extra API costs. This is the default.

npm i -g @anthropic-ai/claude-code
claude auth login
askalf-agent auth
# Select "Claude Login"

API Key (Fallback)

Paste your Anthropic API key. Pay per token. Uses the Anthropic SDK directly with the computer_20251124 tool.

askalf-agent auth
# Select "API Key" → paste your sk-ant-... key

Note: SDK mode uses computer-use API calls which cost per token. A simple task like "open notepad" can cost several dollars. Claude Login mode is strongly recommended.

Commands

`askalf-agent run "<prompt>"`

Start an interactive computer control session.

askalf-agent run "resize all images in ./assets to 800px wide"
askalf-agent run "open VS Code and create a Flask hello world app"
askalf-agent run "go to github.com and star the SprayberryLabs/agent repo"

Each task completes and prompts "What next?" for follow-up commands. Type exit or hit Ctrl+C to end the session.

Options:

-v, --voice — Use voice input (microphone → whisper transcription)
-m, --model <model> — Model to use (default: claude-sonnet-4-6)
-b, --budget <amount> — Max budget in USD for SDK mode (default: 5.00)
-t, --turns <count> — Max turns per task (default: 50)

`askalf-agent auth`

Configure authentication interactively.

askalf-agent auth --status — Show current auth status

`askalf-agent voice-setup`

Download whisper.cpp binary and speech model for voice control. One-time setup.

askalf-agent voice-setup                # default: base.en model (~148MB)
askalf-agent voice-setup --model tiny   # smaller/faster (~75MB)
askalf-agent voice-setup --model small  # more accurate (~466MB)

`askalf-agent check`

Verify platform dependencies are installed (including voice/whisper status).

`askalf-agent config`

View or update configuration.

askalf-agent config --model claude-opus-4-6 --turns 100

What It Can Do

| Capability | How | |---|---| | Open apps | Start-Process chrome, Start-Process notepad | | Browse the web | Opens Chrome, navigates sites, fills forms | | Manage files | Create, move, read, edit files anywhere on your system | | Run commands | Git, npm, Docker, Python — any CLI tool | | See your screen | Screenshot tool for visual verification when needed | | Voice control | Speak commands via local whisper.cpp — offline, private | | Chain tasks | Interactive loop — complete a task, ask "What next?" |

Platform Support

| OS | Status | Computer Control | |----|--------|-----------------| | Windows | Full support | PowerShell (pre-installed) | | macOS | Full support | cliclick (brew install cliclick) | | Linux (X11) | Full support | xdotool + scrot (apt install xdotool scrot) | | Linux (Wayland) | Full support | ydotool + grim (apt install ydotool grim) |

Voice control requires SoX (Windows/macOS) or arecord (Linux, pre-installed). Whisper binary is downloaded automatically by voice-setup.

Run askalf-agent check to verify your setup.

Architecture

askalf-agent run "open chrome" --voice
        │
        ├── Input ─────────────────────────────
        │       │
        │       ├── --voice OFF: readline (keyboard)
        │       └── --voice ON:  mic → whisper.cpp → text
        │
        ├── Claude Login (default)
        │       │
        │       ├── Spawns claude CLI
        │       ├── --append-system-prompt (computer control agent)
        │       ├── --mcp-config (screenshot tool)
        │       ├── Claude uses built-in bash → PowerShell
        │       └── Interactive loop: task → "What next?" → repeat
        │
        └── API Key (fallback)
                │
                ├── Anthropic SDK direct
                ├── computer_20251124 + bash + text_editor tools
                └── Single-run with cost summary

The MCP server exposes a single screenshot tool. All other computer control happens through Claude's built-in bash tool running PowerShell commands — this is dramatically faster than screenshot-based control loops.

Configuration

Config stored at ~/.askalf/config.json:

{
  "authMode": "oauth",
  "model": "claude-sonnet-4-6",
  "maxBudgetUsd": 5.00,
  "maxTurns": 50,
  "voice": {
    "whisperModel": "base",
    "silenceThresholdDb": -40,
    "silenceDurationMs": 1500
  }
}

Full Platform

This CLI is a standalone agent for individual use. For multi-agent orchestration, scheduling, cost controls, 24 built-in tools, and team collaboration, check out the full AskAlf platform.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@askalf/agent

Install

Quick Start

How It Works

Authentication

Claude Login (Recommended)

API Key (Fallback)

Commands

askalf-agent run "<prompt>"

askalf-agent auth

askalf-agent voice-setup

askalf-agent check

askalf-agent config

What It Can Do

Platform Support

Architecture

Configuration

Full Platform

Links

License

`askalf-agent run "<prompt>"`

`askalf-agent auth`

`askalf-agent voice-setup`

`askalf-agent check`

`askalf-agent config`