npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@cool-mcp/desktop-automation

v1.0.2

Published

MCP server for desktop automation - mouse, keyboard, screenshot, window management

Readme

Desktop Automation MCP

一个跨平台(macOS/Windows)的桌面自动化 MCP 服务器,提供鼠标、键盘、截图和窗口管理功能。

功能特性

  • 🖱️ 鼠标操作: 点击、双击、右键、拖拽、滚动、移动
  • ⌨️ 键盘操作: 文本输入、快捷键、按键控制
  • 📸 截图功能: 按进程名截取窗口、全屏截图
  • 🪟 窗口管理: 激活窗口、获取窗口信息、列出所有窗口
  • 🖥️ 多显示器支持: 自动处理多显示器和 DPI 缩放
  • 📐 坐标转换: 支持 bbox (0-1000) 归一化坐标自动转换

快速开始

使用 npx(推荐)

无需安装,直接在 MCP 配置中使用:

{
  "mcpServers": {
    "desktop-automation": {
      "command": "npx",
      "args": ["-y", "@cool-mcp/desktop-automation"],
      "disabled": false
    }
  }
}

全局安装

npm install -g @cool-mcp/desktop-automation

然后配置:

{
  "mcpServers": {
    "desktop-automation": {
      "command": "desktop-automation-mcp",
      "disabled": false
    }
  }
}

本地开发

git clone <repo-url>
cd desktop-automation-mcp
npm install
npm run build

MCP 配置位置

  • Kiro: ~/.kiro/settings/mcp.json 或项目 .kiro/settings/mcp.json
  • Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
  • Cursor: 项目 .cursor/mcp.json

工具列表

截图

| 工具 | 描述 | |------|------| | screenshot | 截取指定进程窗口的截图 | | screenshot_fullscreen | 截取整个屏幕 |

鼠标操作

| 工具 | 描述 | |------|------| | click | 鼠标点击(支持 bbox 或像素坐标) | | move_mouse | 移动鼠标 | | drag | 鼠标拖拽 | | scroll | 滚动鼠标滚轮 | | get_mouse_position | 获取当前鼠标位置 |

键盘操作

| 工具 | 描述 | |------|------| | type_text | 输入文本 | | hotkey | 按下快捷键组合 | | press_key | 按下按键(不释放) | | release_key | 释放按键 |

窗口管理

| 工具 | 描述 | |------|------| | activate_window | 激活窗口(置顶) | | get_window_info | 获取指定进程的窗口信息 | | get_active_window | 获取当前活动窗口信息 | | list_windows | 列出所有打开的窗口 |

其他

| 工具 | 描述 | |------|------| | get_displays | 获取所有显示器信息 | | convert_bbox_to_screen | 将 bbox 坐标转换为屏幕坐标 | | wait | 等待指定时间 |

坐标系统

bbox 坐标 (0-1000 归一化)

模型输出的坐标是 0-1000 范围的归一化坐标:

bbox = [x1, y1, x2, y2]
  • x1, y1: 左上角坐标
  • x2, y2: 右下角坐标
  • 范围: 0-1000

坐标转换

使用 bbox 时需要同时提供 windowBounds(从 screenshot 返回值获取):

// screenshot 返回值
{
  "base64": "...",
  "width": 1920,
  "height": 1080,
  "windowBounds": {
    "x": 100,    // 窗口在屏幕中的 X 位置
    "y": 50,     // 窗口在屏幕中的 Y 位置
    "width": 1200,
    "height": 800
  }
}

// 点击时传入 windowBounds
click({
  bbox: "450 300 550 350",
  windowBounds: { x: 100, y: 50, width: 1200, height: 800 }
})

转换公式:

centerX = (x1 + x2) / 2 / 1000 * windowWidth + windowX
centerY = (y1 + y2) / 2 / 1000 * windowHeight + windowY

示例

截取 Chrome 窗口并点击

// 1. 激活窗口
await activate_window({ processName: 'Chrome' })

// 2. 截图
const result = await screenshot({ processName: 'Chrome' })
const { windowBounds } = result

// 3. 点击(使用 bbox 坐标)
await click({
  bbox: '500 100 600 130',
  windowBounds,
  button: 'left'
})

输入文本并回车

// 点击输入框
await click({ x: 500, y: 300 })

// 输入文本并回车
await type_text({ text: 'Hello World\n' })

使用快捷键

// 复制
await hotkey({ keys: 'ctrl c' })  // Windows
await hotkey({ keys: 'cmd c' })   // macOS

// 保存
await hotkey({ keys: 'ctrl s' })

平台差异

macOS

  • 使用 cmd 代替 ctrl 作为主要修饰键
  • 截图使用 screencapture 命令
  • 窗口管理使用 AppleScript

Windows

  • 使用 ctrl 作为主要修饰键
  • 文本输入使用剪贴板方式(更可靠)
  • 窗口管理使用 PowerShell + Win32 API

依赖

  • @modelcontextprotocol/sdk: MCP SDK
  • @nut-tree/nut-js: 跨平台鼠标键盘控制
  • jimp: 图像处理
  • active-win: 获取活动窗口信息
  • node-screenshots: 截图功能

系统要求

  • Node.js >= 18
  • macOS 或 Windows
  • 需要授予辅助功能权限(macOS)

License

MIT