app-screen-mcp

v1.0.0

Published

3 months ago

MCP server for iOS Simulator automation — accessibility tree, screenshot, tap/swipe/type, and AI screen perception tools

Downloads

0High
0Medium
0Low

xmuweili

mcp ios simulator screenshot accessibility automation xcrun idb ai

Why app-screen-mcp

Most mobile AI automation fails for one reason: it acts blind.

app-screen-mcp solves that by combining:

Structured accessibility data (idb ui describe-all)
Real simulator screenshots (xcrun simctl io ... screenshot)
Direct simulator actions (tap, type, swipe, hardware buttons)

Result: agents that can understand screen state before acting, then execute deterministic interactions.

What You Can Do

Build autonomous QA flows for iOS simulators
Run AI-driven smoke tests without brittle selectors
Automate onboarding/login/payment demos from natural language
Create self-healing UI scripts that use labels instead of fixed coordinates
Feed accessibility tree + screenshot to multimodal models for stronger reasoning

How It Works

AI Agent / MCP Client
        |
        v
   app-screen-mcp
        |
        +--> idb (UI tree + gestures + text + buttons)
        |
        +--> xcrun simctl (device lifecycle + screenshots + app launch)
        |
        v
   iOS Simulator

Feature Highlights

Full simulator discovery and boot control
App launch by bundle ID
Accessibility-first perception via normalized UI elements
Screenshot capture with resize and JPEG quality controls
Hash-based unchanged-image suppression to save tokens
tap_text for semantic interaction by visible label
tap_relative for resolution-independent tapping (for example 0.5, 0.5 = center)
get_screen_summary for one-call AI context (tree + screenshot)
Safe text input escaping in shell execution path
Tooling designed for Claude Desktop, Cursor, and any MCP-compatible client

Tool Catalog

| Tool | Purpose | |---|---| | list_simulators | List available simulators and current boot state | | boot_simulator | Boot a simulator by UDID | | launch_app | Launch an installed app by bundle_id | | get_ui_tree | Return full normalized accessibility tree | | take_screenshot | Return JPEG image with max_dim, quality, and unchanged-image suppression | | get_screen_summary | Return UI tree plus optional screenshot (include_image, compact_tree, image hash metadata) | | tap | Tap exact (x, y) coordinates | | tap_relative | Tap relative (rx, ry) in [0,1] (0.5, 0.5 is center) | | type_text | Type into currently focused field | | swipe | Swipe between two points with optional duration | | press_button | Press HOME, LOCK, SIDE_BUTTON, or SIRI | | find_elements | Search UI elements by label/value/hint text | | tap_text | Find first matching element by text and tap its center |

Token-Efficient Usage

Start with tree-only context, then request an image only when needed:

{
  "name": "get_screen_summary",
  "arguments": {
    "include_image": false,
    "compact_tree": true
  }
}

When image is needed, compress it:

{
  "name": "get_screen_summary",
  "arguments": {
    "include_image": true,
    "max_dim": 720,
    "quality": 55
  }
}

Skip resending unchanged screenshots:

{
  "name": "get_screen_summary",
  "arguments": {
    "include_image": true,
    "only_if_changed": true,
    "previous_image_hash": "<last_hash>"
  }
}

Use relative taps when acting from image coordinates:

{
  "name": "tap_relative",
  "arguments": {
    "rx": 0.5,
    "ry": 0.5
  }
}

Prerequisites

macOS with Xcode + iOS Simulator
Node.js 18+
idb tooling

brew tap facebook/fb
brew install idb-companion
pip3 install fb-idb

Installation

git clone https://github.com/xmuweili/app-screen-mcp.git
cd app-screen-mcp
npm install
npm run build

Configure Your MCP Client

Claude Desktop

~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "ios-simulator": {
      "command": "node",
      "args": ["/absolute/path/to/app-screen-mcp/dist/index.js"]
    }
  }
}

Cursor / VS Code MCP

{
  "mcp.servers": {
    "ios-simulator": {
      "command": "node",
      "args": ["/absolute/path/to/app-screen-mcp/dist/index.js"]
    }
  }
}

Restart your MCP client after updating config.

Avoid Repeated Permission Prompts

Prompt behavior is controlled by the MCP client, not this server.

Most GUI MCP clients (Claude Desktop, Cursor, Windsurf, Zed, Continue.dev) usually treat adding the server to config as trust grant, so you should not see repeated tool approvals.

Claude Code (CLI)

Allow this server's tools in ~/.claude/settings.json:

{
  "permissions": {
    "allow": [
      "mcp__ios-simulator__*"
    ]
  }
}

ios-simulator must match the server name in your MCP config.

Use .claude/settings.json in project root if you want this scoped per-repo.

Codex CLI

Codex uses command-level approval. To avoid repeated prompts:

Approve once with "always allow" when Codex asks.
Save reusable prefix rules for common commands.
Typical prefix: ["xcrun", "simctl", "list", "devices", "--json"]
Typical prefix: ["idb", "list-targets"]
Typical prefix: ["idb", "list-apps", "--udid", "<SIMULATOR_UDID>"]

Codex may still prompt for new or higher-risk command patterns.

Quick Agent Workflow

1) get_screen_summary()
2) find_elements("Sign In")
3) tap_text("Email")
4) type_text("[email protected]")
5) tap_text("Password")
6) type_text("••••••••")
7) tap_text("Sign In")
8) get_screen_summary()

This keeps actions grounded in visible state, not assumptions.

Local Development

npm run build
npm start

Main implementation lives in:

src/index.ts

Reliability Notes

If udid is omitted, tools default to the currently booted simulator.
tap_text and find_elements rely on accessibility labels/values/hints.
Better accessibility metadata in your app means better AI performance.
If no simulator is booted, the server returns a clear MCP error.

Troubleshooting

No iOS simulator is currently running: boot one via Simulator or call boot_simulator.
idb command failures: verify idb/idb-companion installation and PATH.
Empty or weak element matches: improve app accessibility labels/semantics.

License

MIT