android-agent-cli
v0.1.1
Published
Android automation CLI for AI agents. Control Android devices via ADB with OCR support.
Maintainers
Readme
agent-android
Android automation CLI for AI agents. Control Android devices via ADB with OCR support for element detection.
Installation
Global Installation (recommended)
npm install -g agent-androidQuick Start (npx)
npx agent-android devices
npx agent-android snapshotPrerequisites
- ADB installed -
brew install android-platform-toolson macOS - Tesseract installed -
brew install tesseractfor OCR - USB Debugging enabled - Enable in Android Developer Options
- Device connected - Via USB or WiFi (
adb connect <ip>:5555)
Quick Start
agent-android devices # List connected devices
agent-android wake # Wake up device
agent-android snapshot # Get screen elements with refs
agent-android tap @e2 # Tap by ref from snapshot
agent-android tap "Settings" # Tap by text (OCR search)
agent-android fill @e3 "[email protected]" # Fill input by ref
agent-android screenshot page.png # Take screenshot
agent-android home # Press home buttonCommands
Device Management
agent-android devices # List connected devices
agent-android info # Get device info (screen size, battery, etc.)
agent-android wake # Wake up device
agent-android sleep # Put device to sleep
agent-android reboot # Reboot device
agent-android reboot --bootloader # Reboot to bootloader
agent-android reboot --recovery # Reboot to recoveryScreen Capture & OCR
agent-android screenshot [path] # Take screenshot (saves to temp if no path)
agent-android snapshot # OCR scan - get text elements with refs
agent-android text # Get all text on screen
agent-android find "Search" # Find text and get coordinatesInput Actions
agent-android tap <target> # Tap (ref @e1, "text", or x,y)
agent-android longpress <target> # Long press
agent-android swipe <dir> [--amount] # Swipe (up/down/left/right)
agent-android type "hello world" # Type text
agent-android fill <target> <text> # Tap and type
agent-android key ENTER # Press key
agent-android back # Press back button
agent-android home # Press home buttonApp Management
agent-android open com.facebook.katana # Open app by package
agent-android close com.facebook.katana # Force stop app
agent-android packages # List installed packages
agent-android packages -f facebook # Filter packages
agent-android install app.apk # Install APK
agent-android uninstall com.example # Uninstall appFile Operations
agent-android push local.txt /sdcard/ # Push file to device
agent-android pull /sdcard/file.txt . # Pull file from deviceSelectors
Refs (Recommended for AI)
Refs provide deterministic element selection from snapshots:
# 1. Get snapshot with refs
agent-android snapshot
# Output:
# - text "Settings" [ref=@e1] [pos=164,1211] [size=118x55] [conf=97%]
# - text "Search" [ref=@e2] [pos=302,1210] [size=179x57] [conf=96%]
# 2. Use refs to interact
agent-android tap @e1 # Tap Settings
agent-android fill @e2 "query" # Fill search fieldText Search
Find and tap by visible text:
agent-android tap "Settings" # OCR search + tap
agent-android find "Login" # Just find coordinatesCoordinates
Direct coordinate input:
agent-android tap 500,1200 # Tap at x=500, y=1200Options
| Option | Description |
| ----------------------- | --------------------------- |
| -s, --serial <serial> | Target specific device |
| --json | Output as JSON (for agents) |
| -V, --version | Show version |
| -h, --help | Show help |
Environment Variables
| Variable | Description |
| ---------------------- | --------------------- |
| AGENT_ANDROID_SERIAL | Default device serial |
Agent Mode
Use --json for machine-readable output:
agent-android snapshot --json
# Returns: {"success":true,"data":{"elements":[...],"refs":{...},"fullText":"..."}}
agent-android tap @e1 --json
# Returns: {"success":true,"data":{"x":223,"y":1238}}Optimal AI Workflow
# 1. Wake and get snapshot
agent-android wake
agent-android snapshot --json # AI parses elements and refs
# 2. AI identifies target refs from snapshot
# 3. Execute actions using refs
agent-android tap @e2
agent-android fill @e3 "input text"
# 4. Get new snapshot if page changed
agent-android snapshot --jsonCommon Patterns
Open app and interact
agent-android open com.facebook.katana
sleep 2
agent-android snapshot
agent-android tap "What's on your mind?"
agent-android type "Hello from agent-android!"
agent-android tap "Post"Login flow
agent-android open com.example.app
agent-android snapshot
agent-android fill @e1 "[email protected]"
agent-android fill @e2 "password123"
agent-android tap @e3 # Login buttonScroll and find
agent-android swipe up --amount 800
agent-android snapshot
agent-android find "Target Item"
agent-android tap @e5Usage with AI Agents
Just ask the agent
Use agent-android to open Instagram and post a photo.
Run agent-android --help to see available commands.Claude Code / OpenCode
Install the skill:
npx skills add agent-androidAGENTS.md / CLAUDE.md
Add to your instructions:
## Android Automation
Use `agent-android` for Android device control. Run `agent-android --help` for commands.
Core workflow:
1. `agent-android wake` - Wake device
2. `agent-android snapshot` - Get elements with refs (@e1, @e2)
3. `agent-android tap @e1` / `fill @e2 "text"` - Interact using refs
4. Re-snapshot after page changesLicense
MIT
