macos-computer-use
v0.1.1
Published
macOS computer automation via Swift CoreGraphics — screenshot, click, type, key press, scroll, drag
Downloads
16
Maintainers
Readme
macos-computer-use
macOS computer automation via Swift CoreGraphics. Screenshot, click, type, key press, scroll, drag, hover, zoom — all from the command line or as a library.
No Python, no pyobjc, no third-party dependencies. Uses Swift and screencapture, which are built into every Mac.
Install
npm install -g macos-computer-useOr run directly:
npx macos-computer-use screenshotCLI
# Screenshot (captures all displays, returns JPEG base64)
macos-computer-use screenshot
# Click at model coordinates (max 1280px on longest axis)
macos-computer-use click 640 416
macos-computer-use click 640 416 --button right
# Double-click
macos-computer-use double-click 640 416
# Type text (uses pbcopy + Cmd+V)
macos-computer-use type "Hello world"
# Key press with optional modifiers
macos-computer-use key-press return
macos-computer-use key-press space --modifiers command
macos-computer-use key-press a --modifiers command,shift
# Scroll
macos-computer-use scroll --dx 0 --dy -3
# Mouse move
macos-computer-use mouse-move 640 416
# Hover (move + wait)
macos-computer-use hover 640 416 --duration 600
# Drag
macos-computer-use drag 100 100 500 500
macos-computer-use drag 100 100 500 500 --button right --duration 400
# Zoom (crop a region from a screenshot)
macos-computer-use zoom 200 200 600 600
# Screen size (model coordinate space)
macos-computer-use screen-size
# Display info
macos-computer-use displaysAll coordinates are in model space (the longest screen axis is scaled to 1280px). Output is JSON.
Library
import { click, keyPress, typeText, captureScreenshot, getScreenSize } from 'macos-computer-use';
const screenshot = captureScreenshot();
// { image: 'base64...', format: 'jpeg', imagePath: '/tmp/...', thumbnailPath: '/tmp/...' }
const result = click(640, 416);
// { clicked: true, x: 756, y: 492, image: '...', ... }
const size = getScreenSize();
// { width: 1280, height: 831 }How it works
- Screenshots:
screencapture -x(captures all displays into one image) - Input actions: Swift CoreGraphics via
swift -(CGEvent API for mouse, keyboard, scroll) - Image processing:
sips(resize, format conversion, thumbnails) - Display detection: Swift AppKit
NSScreen.screens - Coordinate mapping: Model space (max 1280px) mapped to logical screen coordinates
Requirements
- macOS
- Node.js 18+
- Accessibility permission for the terminal app (System Settings > Privacy & Security > Accessibility)
License
MIT
