npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

native-devtools-mcp

v0.7.1

Published

MCP server for native app testing — screenshot, OCR, click, type, find_text, template matching. macOS, Windows & Android.

Readme

native-devtools-mcp

native-devtools-mcp is a Model Context Protocol (MCP) server for computer use on macOS, Windows, and Android. It gives AI agents and MCP clients direct control over native desktop apps, Chrome/Electron browsers, and Android devices through screenshots, OCR, accessibility-based text lookup, input simulation, window management, Chrome DevTools Protocol (CDP), and ADB.

Use it when browser-only automation is not enough: Electron apps (Signal, Discord, VS Code), Chrome browser automation, system dialogs, desktop tools, native app testing, and Android device workflows. It works with Claude Desktop, Claude Code, Cursor, and other MCP-compatible clients.

Useful for MCP-based computer use, desktop automation, browser automation, UI automation, native app testing, e2e testing, RPA, screen reading, mouse and keyboard control, Chrome DevTools Protocol automation, and Android device automation.

npx -y native-devtools-mcp

Core capabilities

  • Screenshots, OCR, and accessibility-first find_text
  • click, type_text, scroll, launch_app, quit_app, and window management
  • element_at_point for inspecting accessible UI elements at screen coordinates
  • load_image + find_image for non-text UI elements such as icons and custom controls
  • Chrome/Electron automation via CDP: snapshots, click, fill, navigate, type, and tab management
  • Android screenshots, text lookup, input, and app control over ADB
  • Local execution: screenshots and input stay on the machine

For AI agents: Read AGENTS.md for tool definitions, workflow patterns, and machine-readable usage guidance.

Version License Platform Downloads

FeaturesInstallationGetting StartedRecipesSecurity & TrustFor AI AgentsChrome/Electron (CDP)Android


🚀 Features

  • 👀 Computer Vision: Capture screenshots of screens, windows, or specific regions. Includes built-in OCR (text recognition) to "read" the screen.
  • 🖱️ Input Simulation: Click, drag, scroll, and type text naturally. Supports global coordinates and window-relative actions.
  • 🪟 Window Management: List open windows, find applications, and bring them to focus.
  • 🧩 Template Matching: Find non-text UI elements (icons, shapes) using load_image + find_image, returning precise click coordinates.
  • 🔒 Local & Private: 100% local execution. No screenshots or data are ever sent to external servers.
  • 📱 Android Support: Connect to Android devices over ADB for screenshots, input simulation, UI element search, and app management — all from the same MCP server.
  • 🔍 Hover Tracking: Track cursor hover transitions across UI elements in real-time. Configurable dwell threshold filters pass-through noise — designed for LLMs observing user navigation patterns.
  • 🌐 Browser Automation (CDP): Connect to Chrome/Electron apps via Chrome DevTools Protocol. Take accessibility tree snapshots, click elements by UID, evaluate JavaScript, and manage tabs — all without a separate Node.js server.
  • 🔌 Dual-Mode Interaction:
    1. Visual/Native: Works with any app via screenshots & coordinates (Universal).
    2. AppDebugKit: Deep integration for supported apps to inspect the UI tree (DOM-like structure).
    3. CDP: Connect to Chrome/Electron via --remote-debugging-port for DOM-level element targeting and JS evaluation.

🤖 For AI Agents (LLMs)

This MCP server is designed to be highly discoverable and usable by AI models (Claude, Gemini, GPT).

  • 📄 Read AGENTS.md: A compact, token-optimized technical reference designed specifically for ingestion by LLMs. It contains intent definitions, schema examples, and reasoning patterns.

Core Capabilities for System Prompts:

  1. take_screenshot: The "eyes". Returns images + layout metadata + text locations (OCR).
  2. click / type_text: The "hands". Interacts with the system based on visual feedback.
  3. find_text: A shortcut to find text on screen and get its coordinates immediately. Uses the platform accessibility API (macOS Accessibility / Windows UI Automation) for precise element-level matching, with OCR fallback.
  4. element_at_point: Inspect the accessibility element at given screen coordinates — returns name, role, label, value, bounds, pid, and app_name. Note: privacy-focused Electron apps (e.g. Signal) may restrict their AX tree, returning only a container — use take_screenshot with OCR as a fallback.
  5. load_image / find_image: Template matching for non-text UI elements (icons, shapes), returning screen coordinates for clicking.
  6. start_hover_tracking / get_hover_events / stop_hover_tracking: Track cursor hover transitions across UI elements. Configurable dwell threshold filters pass-throughs.
  7. start_recording / stop_recording: Record the frontmost app's window at ~5fps as timestamped JPEG frames. Automatically follows app switches.
  8. launch_app / quit_app: Launch apps with optional CLI args, or gracefully/forcefully quit them.
  9. cdp_connect / cdp_take_snapshot / cdp_click / cdp_fill / cdp_navigate: Connect to Chrome or Electron apps via CDP for DOM-level automation — snapshots, clicking, typing, navigation, and tab management without a separate Node.js server.

📦 Installation

The install steps are identical on macOS and Windows.

Option 1: Run with npx (no install needed)

npx -y native-devtools-mcp

Option 2: Global install

npm install -g native-devtools-mcp

Option 3: Build from source (Rust)

Using the build script (clones, builds, and runs setup):

curl -fsSL https://raw.githubusercontent.com/sh3ll3x3c/native-devtools-mcp/master/scripts/build-from-source.sh | bash

Or manually:

git clone https://github.com/sh3ll3x3c/native-devtools-mcp
cd native-devtools-mcp
cargo build --release
# Binary: ./target/release/native-devtools-mcp

🏁 Getting Started

After installing, run the setup wizard:

npx native-devtools-mcp setup

This will:

  1. Check permissions (macOS) — verifies Accessibility and Screen Recording, opens System Settings if needed
  2. Detect your MCP clients — finds Claude Desktop, Claude Code, Cursor
  3. Write the configuration — generates the correct JSON config and offers to write it for you

Then restart your MCP client and you're ready to go.

Claude Desktop on macOS requires the signed app bundle (Gatekeeper blocks npx). Download NativeDevtools-X.X.X.dmg from GitHub Releases, drag to /Applications, then run setup — it will detect the app and configure Claude Desktop to use it.

VS Code, Windsurf, and other clients: setup doesn't auto-detect these yet. Run setup for the permission checks, then see the manual configuration below for the JSON config snippet.

Claude Code tip: To avoid approving every tool call (clicks, screenshots), add this to .claude/settings.local.json:

{ "permissions": { "allow": ["mcp__native-devtools__*"] } }

📚 Recipes and Examples

macOS — Claude Desktop

Config file: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "native-devtools": {
      "command": "/Applications/NativeDevtools.app/Contents/MacOS/native-devtools-mcp"
    }
  }
}

Windows — Claude Desktop

Config file: %APPDATA%\Claude\claude_desktop_config.json

Claude Code, Cursor, and other MCP clients

{
  "mcpServers": {
    "native-devtools": {
      "command": "npx",
      "args": ["-y", "native-devtools-mcp"]
    }
  }
}

Requires Node.js 18+.

🔐 Security & Trust

This tool requires Accessibility and Screen Recording permissions — that's a lot of trust. Here's how to verify it deserves it.

Verify your binary

native-devtools-mcp verify

Computes the SHA-256 hash of the running binary and checks it against the official checksums published on the GitHub Releases page. If the hash matches, you're running an unmodified official build.

Build from source

Don't trust pre-built binaries? Build it yourself:

curl -fsSL https://raw.githubusercontent.com/sh3ll3x3c/native-devtools-mcp/master/scripts/build-from-source.sh | bash

The script clones the repo, optionally opens it for review before building, compiles the release binary, and runs setup. See scripts/build-from-source.sh.

Audit the code

SECURITY_AUDIT.md documents exactly which permissions are used, where in the source code, and includes an LLM audit prompt you can paste into any AI model to perform an independent security review.

What this server does NOT do

  • No unsolicited network access — the server never phones home. Network is only used when the MCP client explicitly invokes app_connect (WebSocket to a local debug server) or when you run the verify subcommand (fetches checksums from GitHub)
  • No file scanning — does not read or index your files. The only file reads are load_image (reads a path the MCP client explicitly provides) and short-lived temp files for screenshots (deleted immediately after capture)
  • No background persistence — exits when the MCP client disconnects
  • No data exfiltration — screenshots are returned to the MCP client via stdout, never stored or transmitted elsewhere

🔍 Two Approaches to Interaction

We provide two ways for agents to interact, allowing them to choose the best tool for the job.

1. The "Visual" Approach (Universal)

Best for: 99% of apps (Electron, Qt, Games, Browsers).

  • How it works: The agent takes a screenshot, analyzes it visually (or uses OCR), and clicks at coordinates.
  • Tools: take_screenshot, find_text, click, type_text (plus load_image / find_image for icons and shapes).
  • Example: "Click the button that looks like a gear icon." → use find_image with a gear template.

2. The "Structural" Approach (AppDebugKit)

Best for: Apps specifically instrumented with our AppDebugKit library (mostly for developers testing their own apps).

  • How it works: The agent connects to a debug port and queries the UI tree (like HTML DOM).
  • Tools: app_connect, app_query, app_click.
  • Example: app_click(element_id="submit-button").

🧩 Template Matching (find_image)

Use find_image when the target is not text (icons, toggles, custom controls) and OCR or find_text cannot identify it.

Typical flow:

  1. take_screenshot(app_name="MyApp")screenshot_id
  2. load_image(path="/path/to/icon.png")template_id
  3. find_image(screenshot_id="...", template_id="...")matches with screen_x/screen_y
  4. click(x=..., y=...)

Fast vs Accurate:

  • fast (default): uses downscaling and early-exit for speed.
  • accurate: uses full-resolution, wider scale search, and smaller stride for thorough matching.

Optional inputs like mask_id, search_region, scales, and rotations can improve precision and performance.

🌐 Browser Automation (CDP)

Connect to Chrome or Electron apps via the Chrome DevTools Protocol for DOM-level automation — more reliable than coordinate-based clicking for web content.

# Launch Chrome with remote debugging
launch_app(app_name="Google Chrome", args=["--remote-debugging-port=9222", "--user-data-dir=/tmp/chrome-profile"])

# Connect and automate
cdp_connect(port=9222)
cdp_navigate(url="https://example.com")
cdp_take_snapshot()           # accessibility tree with element UIDs
cdp_fill(uid="10", value="search query")
cdp_press_key(key="Enter")
cdp_wait_for(text=["Results"])

16 CDP tools — click, hover, fill, type, press key, navigate, handle dialogs, manage tabs, evaluate JS, and more. Works with Chrome 136+, Chromium, and Electron apps (Signal, Discord, VS Code, Slack). See AGENTS.md for full tool reference.

Chrome 136+ note: Requires --user-data-dir=<path> alongside --remote-debugging-port (Chrome silently ignores the debug port with the default profile). Electron apps only need --remote-debugging-port.

📱 Android Support

Android support is built-in. The MCP server communicates with Android devices over ADB (USB or Wi-Fi), providing screenshots, input simulation, UI element search, and app management.

Prerequisites

  1. ADB installed on the host machine (brew install android-platform-tools on macOS, or install via Android SDK)
  2. USB debugging enabled on the Android device (Settings > Developer options > USB debugging)
  3. ADB server running — starts automatically when you run adb devices

Android tools

All Android tools are prefixed with android_ and appear dynamically after connecting to a device:

| Tool | Description | |------|-------------| | android_list_devices | List all ADB-connected devices (always available) | | android_connect | Connect to a device by serial number | | android_disconnect | Disconnect from the current device | | android_screenshot | Capture the device screen | | android_find_text | Find UI elements by text (via uiautomator) | | android_click | Tap at screen coordinates | | android_swipe | Swipe between two points | | android_type_text | Type text on the device | | android_press_key | Press a key (e.g., KEYCODE_HOME, KEYCODE_BACK) | | android_launch_app | Launch an app by package name | | android_list_apps | List installed packages | | android_get_display_info | Get screen resolution and density | | android_get_current_activity | Get the current foreground activity |

Typical workflow

android_list_devices          → find your device serial
android_connect(serial="...")  → connect (unlocks android_* tools)
android_screenshot            → see what's on screen
android_find_text(text="OK")  → locate a button
android_click(x=..., y=...)   → tap it

Known issues

MIUI / HyperOS (Xiaomi, Redmi, POCO devices): Input injection (android_click, android_type_text, android_press_key, android_swipe) and android_find_text (via uiautomator) require an additional security toggle:

Settings > Developer options > USB debugging (Security settings) — enable this toggle. MIUI may require you to sign in with a Mi account to enable it.

Without this, you'll see INJECT_EVENTS permission errors for input tools and could not get idle state errors for android_find_text. Screenshot and device info tools work without this toggle.

Wireless ADB: To connect without a USB cable, first connect via USB and run:

adb tcpip 5555
adb connect <phone-ip>:5555

Then use the <phone-ip>:5555 serial in android_connect.

Smoke tests

Smoke tests verify all Android tools against a real connected device. They are #[ignore]d by default and must be run explicitly:

cargo test --test android_smoke_tests -- --ignored --test-threads=1

Tests must run sequentially (--test-threads=1) since they share a single physical device. The device must be unlocked and awake.

🏗️ Architecture

graph TD
    Client[Claude / LLM Client] <-->|JSON-RPC 2.0| Server[native-devtools-mcp]
    Server -->|Direct API| Sys[System APIs]
    Server -->|WebSocket| Debug[AppDebugKit]
    Server -->|ADB Protocol| Android[Android Device]

    subgraph "Your Machine"
        Sys -->|Screen/OCR| macOS[CoreGraphics / Vision]
        Sys -->|Input| Win[Win32 / SendInput]
        Sys -->|Text Search| UIA[UI Automation]
        Debug -.->|Inspect| App[Target App]
    end

    subgraph "Android Device (USB/Wi-Fi)"
        Android -->|screencap| Screen[Screenshots]
        Android -->|input| Input[Tap / Swipe / Type]
        Android -->|uiautomator| UITree[UI Hierarchy]
    end

| OS | Feature | API Used | |----|---------|----------| | macOS | Screenshots | screencapture (CLI) | | | Input | CGEvent (CoreGraphics) | | | Text Search (find_text) | Accessibility API (primary), Vision OCR (fallback) | | | Element Inspection (element_at_point) | AXUIElementCopyElementAtPosition + AX tree walk fallback (Accessibility API) | | | Hover Tracking (start_hover_tracking) | CGEvent cursor + Accessibility API polling | | | Screen Recording (start_recording) | CGWindowListCreateImage at configurable fps | | | OCR | VNRecognizeTextRequest (Vision Framework) | | Windows | Screenshots | BitBlt (GDI) | | | Input | SendInput (Win32) | | | Text Search (find_text) | UI Automation (primary), WinRT OCR (fallback) | | | Element Inspection (element_at_point) | IUIAutomation::ElementFromPoint (UI Automation) | | | Hover Tracking (start_hover_tracking) | GetCursorPos + UI Automation polling | | | Screen Recording (start_recording) | BitBlt (GDI) at configurable fps | | | OCR | Windows.Media.Ocr (WinRT) | | Android | Screenshots | screencap / ADB framebuffer | | | Input | adb shell input (tap, swipe, text, keyevent) | | | Text Search (find_text) | uiautomator dump (accessibility tree) | | | Device Communication | adb_client crate (native Rust ADB protocol) |

Screenshot Coordinate Precision

Screenshots include metadata for accurate coordinate conversion:

  • screenshot_origin_x/y: Screen-space origin of the captured area (in points)
  • screenshot_scale: Display scale factor (e.g., 2.0 for Retina displays)
  • screenshot_pixel_width/height: Actual pixel dimensions of the image
  • screenshot_window_id: Window ID (for window captures)

Coordinate conversion:

screen_x = screenshot_origin_x + (pixel_x / screenshot_scale)
screen_y = screenshot_origin_y + (pixel_y / screenshot_scale)

Implementation notes:

  • Window captures (macOS): Uses screencapture -o which excludes window shadow. The captured image dimensions match kCGWindowBounds × scale exactly, ensuring click coordinates derived from screenshots land on intended UI elements.
  • Region captures: Origin coordinates are aligned to integers to match the actual captured area.

⚠️ Operational Safety

  • Hands Off: When the agent is "driving" (clicking/typing), do not move your mouse or type.
    • Why? Real hardware inputs can conflict with the simulated ones, causing clicks to land in the wrong place.
  • Focus Matters: Ensure the window you want the agent to use is visible. If a popup steals focus, the agent might type into the wrong window unless it checks first.

🪟 Windows Notes

Works out of the box on Windows 10/11.

  • Uses standard Win32 APIs (GDI, SendInput).
  • find_text uses UI Automation (UIA) as the primary search mechanism, querying the accessibility tree for element names. This is the same accessibility-first approach used on macOS (with the Accessibility API). Falls back to OCR automatically when UIA finds no matches.
  • OCR uses the built-in Windows Media OCR engine (offline).
  • Note: Cannot interact with "Run as Administrator" windows unless the MCP server itself is also running as Administrator.
  • Screen Recording Performance: Screen recording uses GDI/BitBlt at configurable fps (default 5). For higher fps requirements or game capture scenarios, DXGI Desktop Duplication API would provide hardware-accelerated capture — this is a planned future upgrade.

📜 License

MIT © sh3ll3x3c