app-screen-mcp
v1.0.0
Published
MCP server for iOS Simulator automation — accessibility tree, screenshot, tap/swipe/type, and AI screen perception tools
Maintainers
Readme
Why app-screen-mcp
Most mobile AI automation fails for one reason: it acts blind.
app-screen-mcp solves that by combining:
- Structured accessibility data (
idb ui describe-all) - Real simulator screenshots (
xcrun simctl io ... screenshot) - Direct simulator actions (tap, type, swipe, hardware buttons)
Result: agents that can understand screen state before acting, then execute deterministic interactions.
What You Can Do
- Build autonomous QA flows for iOS simulators
- Run AI-driven smoke tests without brittle selectors
- Automate onboarding/login/payment demos from natural language
- Create self-healing UI scripts that use labels instead of fixed coordinates
- Feed accessibility tree + screenshot to multimodal models for stronger reasoning
How It Works
AI Agent / MCP Client
|
v
app-screen-mcp
|
+--> idb (UI tree + gestures + text + buttons)
|
+--> xcrun simctl (device lifecycle + screenshots + app launch)
|
v
iOS SimulatorFeature Highlights
- Full simulator discovery and boot control
- App launch by bundle ID
- Accessibility-first perception via normalized UI elements
- Screenshot capture with resize and JPEG quality controls
- Hash-based unchanged-image suppression to save tokens
tap_textfor semantic interaction by visible labeltap_relativefor resolution-independent tapping (for example0.5, 0.5= center)get_screen_summaryfor one-call AI context (tree + screenshot)- Safe text input escaping in shell execution path
- Tooling designed for Claude Desktop, Cursor, and any MCP-compatible client
Tool Catalog
| Tool | Purpose |
|---|---|
| list_simulators | List available simulators and current boot state |
| boot_simulator | Boot a simulator by UDID |
| launch_app | Launch an installed app by bundle_id |
| get_ui_tree | Return full normalized accessibility tree |
| take_screenshot | Return JPEG image with max_dim, quality, and unchanged-image suppression |
| get_screen_summary | Return UI tree plus optional screenshot (include_image, compact_tree, image hash metadata) |
| tap | Tap exact (x, y) coordinates |
| tap_relative | Tap relative (rx, ry) in [0,1] (0.5, 0.5 is center) |
| type_text | Type into currently focused field |
| swipe | Swipe between two points with optional duration |
| press_button | Press HOME, LOCK, SIDE_BUTTON, or SIRI |
| find_elements | Search UI elements by label/value/hint text |
| tap_text | Find first matching element by text and tap its center |
Token-Efficient Usage
Start with tree-only context, then request an image only when needed:
{
"name": "get_screen_summary",
"arguments": {
"include_image": false,
"compact_tree": true
}
}When image is needed, compress it:
{
"name": "get_screen_summary",
"arguments": {
"include_image": true,
"max_dim": 720,
"quality": 55
}
}Skip resending unchanged screenshots:
{
"name": "get_screen_summary",
"arguments": {
"include_image": true,
"only_if_changed": true,
"previous_image_hash": "<last_hash>"
}
}Use relative taps when acting from image coordinates:
{
"name": "tap_relative",
"arguments": {
"rx": 0.5,
"ry": 0.5
}
}Prerequisites
- macOS with Xcode + iOS Simulator
- Node.js 18+
idbtooling
brew tap facebook/fb
brew install idb-companion
pip3 install fb-idbInstallation
git clone https://github.com/xmuweili/app-screen-mcp.git
cd app-screen-mcp
npm install
npm run buildConfigure Your MCP Client
Claude Desktop
~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"ios-simulator": {
"command": "node",
"args": ["/absolute/path/to/app-screen-mcp/dist/index.js"]
}
}
}Cursor / VS Code MCP
{
"mcp.servers": {
"ios-simulator": {
"command": "node",
"args": ["/absolute/path/to/app-screen-mcp/dist/index.js"]
}
}
}Restart your MCP client after updating config.
Avoid Repeated Permission Prompts
Prompt behavior is controlled by the MCP client, not this server.
Most GUI MCP clients (Claude Desktop, Cursor, Windsurf, Zed, Continue.dev) usually treat adding the server to config as trust grant, so you should not see repeated tool approvals.
Claude Code (CLI)
Allow this server's tools in ~/.claude/settings.json:
{
"permissions": {
"allow": [
"mcp__ios-simulator__*"
]
}
}ios-simulator must match the server name in your MCP config.
Use .claude/settings.json in project root if you want this scoped per-repo.
Codex CLI
Codex uses command-level approval. To avoid repeated prompts:
- Approve once with "always allow" when Codex asks.
- Save reusable prefix rules for common commands.
- Typical prefix:
["xcrun", "simctl", "list", "devices", "--json"] - Typical prefix:
["idb", "list-targets"] - Typical prefix:
["idb", "list-apps", "--udid", "<SIMULATOR_UDID>"]
Codex may still prompt for new or higher-risk command patterns.
Quick Agent Workflow
1) get_screen_summary()
2) find_elements("Sign In")
3) tap_text("Email")
4) type_text("[email protected]")
5) tap_text("Password")
6) type_text("••••••••")
7) tap_text("Sign In")
8) get_screen_summary()This keeps actions grounded in visible state, not assumptions.
Local Development
npm run build
npm startMain implementation lives in:
src/index.ts
Reliability Notes
- If
udidis omitted, tools default to the currently booted simulator. tap_textandfind_elementsrely on accessibility labels/values/hints.- Better accessibility metadata in your app means better AI performance.
- If no simulator is booted, the server returns a clear MCP error.
Troubleshooting
No iOS simulator is currently running: boot one via Simulator or callboot_simulator.idbcommand failures: verifyidb/idb-companioninstallation and PATH.- Empty or weak element matches: improve app accessibility labels/semantics.
License
MIT
