npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

spectrai-claw

v0.4.0

Published

Desktop automation MCP Server for ClaudeOps

Downloads

156

Readme

SpectrAI Claw

跨平台桌面自动化 MCP Server。macOS 端已升级为 Node.js 长连接 Swift daemon,并在 v0.4 引入三路检测:AX 主路 + Vision OCR 兵底 + CDP 浏览器专路

架构图

AI → MCP Server (Node.js) → [zod] → DaemonClient (Unix socket)
                                              ↓
                                       Swift daemon
                                              ↓
                  ┌────────────────┼─────────────────┐
           ┌───┴───┐                  │                  │
           ▼       ▼                  ▼                  ▼
      AXorcist   ScreenCaptureKit  Vision (OCR/Rect)  CDP (WebSocket)
      增强 Web AX  截图              兵底补充             浏览器专路

三路识别模式(describe_screen.mode

可选值:auto / ax_only / ax_plus_vision / ax_plus_cdp / cdp_only / vision_only

| mode | 适用 | 识别率 | 速度 | 依赖 | |---|---|---|---|---| | ax_only | 原生 AppKit / Tauri (Electron) | 80-95% | 极快 (~80ms) | AX 权限 | | ax_plus_vision | Web 页面内容 / 后补 | 95%+ | 中 (~1s) | + Screen Recording | | ax_plus_cdp | Chrome / Edge | 99% | 极快 (~30ms) | + Chrome --remote-debugging-port | | cdp_only | Chrome 专项任务 | 99% | 极快 (~30ms) | + debug port | | vision_only | 原本不提供 AX 的游戏 / Java app | 70-85% | 中 (~1s) | + Screen Recording | | auto | 默认 | 自适应 | 自适应 | - |

实测性能(主会话)

| 场景 | mode | 元素识别数 | 说明 | |---|---|---|---| | SpectrAI (Tauri) | ax_only | 124 (vs 8) | AXManualAccessibility 唤醒后取醒 web AX | | Chrome 首页 | ax_only | 48 (vs 41) | Chrome 主线禁用了唤醒 API,有限 | | Chrome 首页 | ax_plus_vision | 140 | Vision 补 92 个网页元素(含主 CTA 按钮) |

以上数据与 v0.4 集成验证一致:Tauri/Electron 提升明显,Chrome 在 AX 主线受限时依赖 Vision/CDP 补齐。

CDP 浏览器启用说明

# 1. 退出 Chrome
# 2. 启动时加参数
open -a "Google Chrome" --args --remote-debugging-port=9222

# 3. 验证
curl http://localhost:9222/json | head -20

启用后 ax_plus_cdp 模式会自动检测。

安装

从 npm 安装(推荐)

本包是标准 stdio MCP server,入口即 server,无需子命令,npx 直接拉起:

npx spectrai-claw

在 MCP 客户端(Claude Desktop / Claude Code 等)中配置:

{
  "mcpServers": {
    "spectrai-claw": {
      "command": "npx",
      "args": ["-y", "spectrai-claw"]
    }
  }
}

Windows 端为纯 Node 实现,npx spectrai-claw 即可运行,无需原生编译; macOS 端首次运行会自动构建/拉起 Swift daemon(需 Xcode Command Line Tools)。

从源码构建

cd mcps/spectrai-claw
npm install
npm run build:all

运行要求

  • macOS 14+
  • Node.js 18+
  • Xcode Command Line Tools(或完整 Xcode)
  • 首次运行会触发系统权限请求:
    • Screen Recording
    • Accessibility

MCP 工具(macOS daemon 路径)

| 工具 | 用途 | 关键参数 | |---|---|---| | describe_screen | 截图 + UI 元素识别 + SoM 标注 | target?, annotated?, allow_web_focus?, mode? | | click | 点击元素或坐标 | element_id(优先), snapshot_id?, x?, y? | | type_text | Unicode 输入(中文可用) | text, element_id?, snapshot_id? | | hotkey | 发送组合键 | keys[], hold_ms? | | scroll | 滚轮滚动 | direction, amount, x?, y? | | list_apps | 列出运行中的应用 | - | | activate_app | 激活应用到前台 | bundle_idname |

v0.4+ 原生 AX 动作优先

click / type_text 在传入 snapshot_id + element_id 时会优先使用 macOS Accessibility 原生动作:

  • 左键单击且无修饰键时,先尝试 AXUIElementPerformAction(kAXPressAction),成功则不会移动真实鼠标光标。
  • 文本输入命中 AXTextField / AXTextArea / AXSearchField 等输入元素时,先尝试 AXFocused + AXSetValue(kAXValueAttribute)clear_existing=true 直接替换,默认追加现有值。
  • AX 元素不可重找、已 stale、action/value 不可用或失败时,自动回退到原有 CGEvent 坐标点击 / 键盘输入。

这也是接近 Codex App Computer Use “后台式 / 无光标移动”体验的关键:识别仍依赖 AX 树/截图,执行优先走程序化 UI action。

daemon 管理

  • 自动拉起:DaemonLifecycle.ensure() 会先尝试连接,失败时自动 spawn daemon。
  • 手动启动:
./src/swift-helper/.build/release/spectrai-claw-helper daemon run --socket <path>
  • 默认 socket:~/Library/Application Support/spectrai-claw/claw.sock
  • 健康检查(推荐):
node scripts/smoke-daemon.mjs
  • 低层协议见:docs/ipc-protocol.md

综合测试

# 单测(mock socket)
npm run test:daemon-client

# E2E(真实 spawn daemon)
npm run test:e2e

已知局限

  • AXManualAccessibility 在 Chrome 主线被限制(issue 37465),Tauri/Electron 仍然有效
  • Vision OCR 冷启约 500ms,后续走 cache;且需要 Screen Recording 权限
  • CDP 需手动启用 debug port,不能动态唤醒已启动的 Chrome
  • 仅 macOS 14+(依赖 ScreenCaptureKit)
  • Windows 端仍为旧路径(未迁移到 Swift daemon 架构),但 click_element / keyboard_type 已升级为 UIA 原生动作优先:优先尝试 Invoke/Toggle/Selection/ExpandCollapse/Focus 与 ValuePattern.SetValue,失败自动回退 HID 鼠标事件 / SendKeys。

Windows 操作准确性改造(screenshot 标注链路)

screenshot(annotate=true) / zoom_screenshot 的元素识别已做语义增强:

  • UIA 候选过滤打分:枚举时多采 IsEnabled / IsOffscreen 与控件 pattern(Invoke/Toggle/SelectionItem/ExpandCollapse/Value),过滤离屏元素与无 pattern 的纯 Image 噪声,并按「可操作性」把可原生操作的元素排在标注列表前列(徽章编号保持稳定,click_element(number) 不受影响)。
  • OCR 锚定 UIA:OCR 兜底文本若落在某带 pattern 的 UIA 元素 bounds 内,则用该原生元素替代裸 OCR 坐标(Src=OCR_UIA),从而走 UIA pattern 动作而非盲点坐标;命不中才保留 OCR 坐标兜底。兜底链为 UIA pattern → UIA 坐标 → OCR 坐标 → vision。
  • 动作后验证:UIA 原生动作执行后回读目标元素状态(ToggleState / ExpandCollapseState / IsSelected / Value / 焦点),在返回里附 verify=verified | state_not_changed | needs_resnapshot | uncertain,便于上层判断动作是否真正生效。

上述链路依赖本机 Windows 桌面会话的 PowerShell + UIA,需在本地直连 MCP 验证真机行为。