osctrl
v0.1.0-alpha
Published
Desktop Automation CLI for AI Coding Agents
Maintainers
Readme
OSCTRL - Desktop Automation CLI
Lightweight CLI for desktop automation, designed for AI coding agents.
Table of Contents
- Overview
- Installation
- Quick Start
- Command Reference
- JSON Output Format
- System Requirements
- Known Limitations
- Troubleshooting
- AI Agent Integration
- Development
- Documentation
- License
Overview
OSCTRL provides programmatic control of desktop operations through simple shell commands with JSON output. Designed to be context-efficient for AI agents - no SDK integration required, just shell commands with minimal token overhead.
Built on @nut-tree-fork/nut-js, it enables mouse control, keyboard input, window management, screen capture, clipboard operations, and process management.
Platform Support
| Platform | Status | Notes | | ------------- | ---------- | ---------------------------- | | Windows 10/11 | ✅ Full | Primary development platform | | macOS | ✅ Full | Intel and Apple Silicon | | Linux | 🔮 Planned | Future release |
Installation
Prerequisites
- Operating System: Windows 10/11 or macOS (Intel/Apple Silicon)
- Node.js: >= 18.0.0 (LTS recommended)
- Permissions: Standard user (no admin/sudo required)
Install Dependencies
npm installGlobal Installation (Optional)
npm linkAfter linking, you can use osctrl directly instead of node osctrl.mjs.
Quick Start
# Get desktop context (ALWAYS do this first for mouse operations)
node osctrl.mjs context
# Move mouse cursor
node osctrl.mjs mouse move 500 300
# Type text
node osctrl.mjs keyboard type "Hello World"
# Take screenshot
node osctrl.mjs screen capture --output screenshot.png
# Get active window
node osctrl.mjs window active
# Read clipboard
node osctrl.mjs clipboard read
# List processes
node osctrl.mjs process listCommand Reference
# Get complete desktop context (screen + mouse + active window)
osctrl context
# Get screen dimensions only
osctrl context-screen
# Get mouse position only
osctrl context-mouse
# Get active window info only
osctrl context-windowExample output for osctrl context:
{
"success": true,
"command": "context",
"data": {
"screen": { "width": 2560, "height": 1440 },
"mouse": { "x": 500, "y": 300 },
"activeWindow": {
"title": "Visual Studio Code",
"position": { "x": 100, "y": 50 },
"size": { "width": 1200, "height": 800 }
}
}
}# Run health check
osctrl healthOutput Example:
{
"success": true,
"command": "health",
"data": {
"platform": "darwin",
"platformName": "macOS",
"nodeVersion": "v20.10.0",
"osctrlVersion": "0.1.0-alpha",
"checks": {
"platform": true,
"mouse": true,
"screen": true,
"node": true
},
"status": "healthy"
}
}# Move cursor to absolute position
osctrl mouse move <x> <y>
osctrl mouse move <x> <y> --from-resolution <width>x<height> # Scale coordinates
# Move and click in one command
osctrl mouse click-at <x> <y> [--button left|right|middle]
# Click at current position
osctrl mouse click [--button left|right|middle]
# Double-click
osctrl mouse doubleclick [--button left|right|middle]
# Scroll wheel (positive=down, negative=up)
osctrl mouse scroll <amount>
# Drag from current position to target
osctrl mouse drag <x> <y> [--button left|right|middle]
# Get current cursor position
osctrl mouse positionExamples:
osctrl mouse move 1920 1080
osctrl mouse move 960 540 --from-resolution 1920x1080 # Scale from different resolution
osctrl mouse click-at 500 300 # Move and click
osctrl mouse click --button right
osctrl mouse scroll 5
osctrl mouse drag 100 200
osctrl mouse position# Type text string
osctrl keyboard type <text>
# Press and release single key
osctrl keyboard press <key>
# Execute key combination (hotkey)
osctrl keyboard hotkey <keys...>
# Press a key multiple times
osctrl keyboard repeat <key> <count> [--delay <ms>]
# Hold key down
osctrl keyboard hold <key>
# Release held key
osctrl keyboard release <key>
# Emergency reset - release all modifier keys
osctrl keyboard releaseallExamples:
osctrl keyboard type "Hello World"
osctrl keyboard press enter
osctrl keyboard hotkey ctrl c # Copy
osctrl keyboard hotkey ctrl v # Paste
osctrl keyboard hotkey win r # Run dialog (Windows)
osctrl keyboard hotkey ctrl shift esc # Task Manager
osctrl keyboard repeat down 10 # Press down arrow 10 times
osctrl keyboard repeat tab 5 --delay 100 # Tab 5 times with 100ms delay
osctrl keyboard releaseall # Fix stuck keysSupported Key Names:
- Modifiers:
ctrl,alt,shift,win(aliases:control,super,cmd,meta) - Function:
f1throughf24 - Navigation:
up,down,left,right,home,end,pageup,pagedown - Editing:
enter,tab,space,backspace,delete,insert,esc - Letters:
athroughz(case-insensitive) - Numbers:
0through9 - See
lib/keys.mjsfor complete key mappings
# Get active window information
osctrl window active
# List all windows with titles
osctrl window list
# Focus window by title
osctrl window focus <title> [--exact]
# Wait for window to appear
osctrl window wait <title> [--timeout <ms>] [--exact]
# Get window position and size
osctrl window bounds <title> [--exact]
# Check if window exists (error if not found)
osctrl window exists <title> [--exact]
# Get window state (minimized, maximized, normal)
osctrl window state <title> [--exact]
# Move window to position
osctrl window move <title> <x> <y> [--exact]
# Resize window
osctrl window resize <title> <width> <height> [--exact]
# Minimize window
osctrl window minimize <title> [--exact]
# Maximize window
osctrl window maximize <title> [--exact]Window Matching:
- Default: Partial match (case-insensitive) -
"Notepad"matches "Untitled - Notepad" - With
--exact: Exact match (case-sensitive)
Examples:
osctrl window active
osctrl window list
osctrl window focus "Notepad"
osctrl window wait "Save As" --timeout 5000 # Wait up to 5 seconds
osctrl window bounds "Chrome"
osctrl window exists "Calculator"
osctrl window state "Visual Studio Code"
osctrl window resize "Chrome" 1024 768
osctrl window maximize "Visual Studio Code" --exact# Capture screenshot to file
osctrl screen capture [--output path] [--format png|jpg]
# Capture as base64 (for embedding)
osctrl screen capture --base64 [--format png|jpg]
# Capture specific region (x,y,width,height)
osctrl screen capture --region <x,y,w,h> [--output path]
# Get screen dimensions
osctrl screen size
# List all monitors
osctrl screen monitors
# Get pixel color at position (returns hex #RRGGBB)
osctrl screen pixel <x> <y>
# Wait until pixel matches color
osctrl screen wait-color <x> <y> <hex> [--timeout <ms>] [--tolerance <n>]
# Assert pixel matches color (fails if not)
osctrl screen assert-color <x> <y> <hex> [--tolerance <n>]
# Find first pixel matching color in region
osctrl screen find-color <hex> [--region <x,y,w,h>] [--tolerance <n>]
# Wait for screen content to change
osctrl screen wait-change [--timeout <ms>] [--region <x,y,w,h>]
# Compare current screen to baseline
osctrl screen diff [--baseline <path>] [--threshold <n>]Examples:
osctrl screen capture --output screenshot.png
osctrl screen capture --base64 --format jpg
osctrl screen capture --region 100,100,800,600 --output region.png
osctrl screen size
osctrl screen monitors
osctrl screen pixel 500 300
osctrl screen wait-color 100 100 "#FF0000" --timeout 5000
osctrl screen assert-color 100 100 "#00FF00" --tolerance 10
osctrl screen find-color "#0000FF" --region 0,0,500,500
osctrl screen wait-change --timeout 3000# Read clipboard text
osctrl clipboard read
# Write text to clipboard
osctrl clipboard write <text>Examples:
osctrl clipboard read
osctrl clipboard write "Hello from CLI"# Launch application (detached process)
osctrl process launch <command> [args...]
# List running processes
osctrl process list [--filter name]
# Kill process by PID
osctrl process kill <pid> [--force]Examples:
osctrl process launch notepad.exe
osctrl process launch cmd.exe /k echo "Hello"
osctrl process list
osctrl process list --filter chrome
osctrl process kill 12345
osctrl process kill 12345 --force# Pause execution for specified milliseconds
osctrl wait <ms>
# Execute multiple commands from JSON file
osctrl batch run <file> [--timeout <ms>]Examples:
osctrl wait 1000 # Wait 1 second
osctrl wait 500 # Wait 500ms
osctrl batch run commands.json # Run commands from file
osctrl batch run commands.json --timeout 30000 # With 30s timeoutBatch file format:
{
"commands": [
["mouse", "move", "500", "300"],
["wait", "100"],
["mouse", "click"],
["keyboard", "type", "Hello"]
]
}JSON Output Format
All commands return JSON for easy parsing by AI agents and scripts.
Success Response
{
"success": true,
"timestamp": "2026-01-13T12:00:00.000Z",
"command": "mouse move",
"data": {
"x": 500,
"y": 300
}
}Error Response
{
"success": false,
"timestamp": "2026-01-13T12:00:00.000Z",
"command": "window focus",
"error": "Window not found: Notepad",
"errorCode": "WINDOW_NOT_FOUND"
}Error Codes
WINDOW_NOT_FOUND- Specified window title not foundINVALID_COORDINATES- Coordinates out of range or invalid formatINVALID_KEY- Unrecognized key nameFILE_WRITE_ERROR- Cannot write to specified output pathTIMEOUT- Operation exceeded time limitPROCESS_LAUNCH_FAILED- Failed to launch processCLIPBOARD_ERROR- Clipboard operation failedUNKNOWN_ERROR- Unexpected error occurred
System Requirements
Supported Platforms
- Windows 10 (64-bit)
- Windows 11 (64-bit)
- macOS (Intel and Apple Silicon)
Node.js Versions
Tested with Node.js 18.x and 20.x LTS releases.
Known Limitations
Multi-Monitor Support
Status: Partial support
osctrl screen monitorslists available displays- Screen commands operate on primary monitor by default
- Region capture with absolute coordinates may work across monitors
Window Operations
- Minimize/Maximize: Uses platform-specific keyboard shortcuts
- Windows: Win+Down / Win+Up
- macOS: Cmd+M / Cmd+Ctrl+F (fullscreen)
- Window List: Accesses internal provider registry to get window titles
Platform-Specific Implementations
- Process listing: PowerShell (Windows) / ps aux (macOS)
- Window shortcuts: Platform-aware keyboard combinations
Troubleshooting
Installation Issues
Problem: npm install fails with nut.js errors
Solutions:
- Verify Node.js version:
node --version(must be >= 18.0.0) - Try
npm install --force - Check for Visual C++ Redistributables
- See nut.js fork documentation
Runtime Issues
Problem: "Command not found" errors
Solution: Run with node osctrl.mjs prefix or use npm link for global installation
Problem: Mouse/keyboard commands not working
Solution: Ensure no other automation tools are interfering. Close conflicting applications.
Problem: Window commands can't find windows
Solution:
- Use
osctrl window listto see exact window titles - Try partial match without
--exactflag - Some system windows may be inaccessible
AI Agent Integration
Why a CLI?
AI coding agents operate under context constraints. OSCTRL is designed for efficiency:
- ~15 tokens per command (
osctrl mouse click) - ~50 tokens per response (compact JSON)
- No schema overhead - uses existing Bash/shell tools
- Stateless - no connection management
AI-Specific Instructions
Ready-to-use instruction files for different AI assistants:
| AI Assistant | Instructions File | | ------------ | ---------------------------------------------------------------------------- | | Claude Code | docs/ai-instructions/CLAUDE.md | | Cursor | docs/ai-instructions/cursorrules.md | | OpenAI Codex | docs/ai-instructions/codex.md | | Gemini CLI | docs/ai-instructions/gemini.md |
Copy the relevant instructions into your AI assistant's configuration file.
Usage from AI Coding Agents
// Execute command via shell
const { execSync } = require("child_process");
const result = execSync("node osctrl.mjs mouse move 500 300", {
encoding: "utf8",
});
const response = JSON.parse(result);
if (response.success) {
console.log("Mouse moved to:", response.data);
} else {
console.error("Error:", response.error);
}Recommended Practices
- Always get context first - Run
osctrl contextbefore mouse operations - Parse JSON output - All responses are valid JSON
- Check
successfield - Don't assume operations succeeded - Handle error codes - Use
errorCodefor programmatic error handling - Validate coordinates - Ensure they're within screen bounds
- Use exact match - Add
--exactfor window commands when title is known
Development
Running Tests
npm testCode Formatting
npm run formatDocumentation
- README.md - Main documentation and command reference
- CHANGELOG.md - Version history and changes
- docs/skills/SKILL.md - Claude Code skill definition
- docs/ai-instructions/ - AI assistant instructions
License
Apache-2.0 - See LICENSE for details.
Credits
Built with:
- @nut-tree-fork/nut-js - Desktop automation library (community fork)
- Commander.js - CLI framework
Version: 0.1.0-alpha Platforms: Windows, macOS Target Users: AI Coding Agents, Developers, Test Automation
