npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

claude-screen-mcp

v0.4.0

Published

MCP server that lets Claude see your screen — fills the Anthropic computer-use macOS-only gap for Windows + Linux. OCR + smart vision-diff included.

Readme

claude-screen-mcp

Let Claude see your screen. A cross-platform MCP server for Windows + macOS + Linux with OCR and smart vision-diff. Zero native runtime deps.

License: MIT Node MCP CI

Anthropic's official computer-use MCP for Claude Code is macOS-only today. This server fills the gap for Windows + Linux — and adds two things the official one doesn't have:

  • 🔍 OCR so Claude can read screen text without spending vision tokens
  • 📊 Smart vision-diff so 24/7 monitoring stays economical (skip frames that didn't change)

Quick start

# from source (until npm publish)
git clone https://github.com/lfzds4399-cpu/claude-screen-mcp
cd claude-screen-mcp
npm install
npm run build

# register with Claude Code
claude mcp add screen -- node "$(pwd)/dist/index.js"

# restart Claude Code, then ask:
# "Take a screenshot and tell me what's on my screen."
# "OCR my screen and tell me if there's an error message anywhere."
# "Watch my screen and ping me when the build finishes."

Tools (10 total)

| Tool | Since | What it does | |---|---|---| | screenshot | v0.1 | Capture full display, auto-resize for vision-token efficiency | | screenshot_region | v0.1 | Capture an (x, y, w, h) region — way cheaper than full | | list_displays | v0.1 | Enumerate connected monitors | | list_windows | v0.1 | List visible windows with optional title filter | | read_screen_text | v0.2 | OCR full screen or region (10-100× cheaper than vision) | | find_text_on_screen | v0.2 | Search OCR'd text, return matching lines + bboxes | | screenshot_if_changed | v0.3 | Capture only when perceptual hash distance ≥ threshold | | get_screen_diff | v0.3 | Distance-only diff — no image returned | | wait_for_change | v0.4 | Long-poll until the screen changes, then return one keyframe | | record_screen | v0.4 | Capture N seconds at low fps and return deduplicated keyframes |

All 8 tools work the same way on Windows (PowerShell + System.Drawing), macOS (screencapture + osascript), and Linux (grim / scrot / import + wmctrl).


Use cases

1. Debug what you see"Why is my React app not rendering? Look at the screen."screenshot → Claude sees the error overlay → suggests fix.

2. Find something specific without burning vision tokens"Is there an error message anywhere on my screen?"find_text_on_screen("error") returns matching line + bbox → Claude calls screenshot_region on just that bbox.

3. Watch-while-task"Ping me when this build finishes."wait_for_change(timeoutMs=300000, threshold=12) — server blocks until the screen actually changes (or 5 min elapses), so the model only spends a turn when something happens. For longer watches, loop screenshot_if_changed(threshold=12) every 30s.

4. Show me what just happened"I saw something flash by, replay the last 15 seconds."record_screen(durationMs=15000, targetFps=2, maxFrames=6) returns up to 6 deduplicated keyframes covering that period in a single tool result — like rewinding a clip without storing video.

5. Read what's on screen, not look at it"What does the current GitHub PR description say?"read_screen_text returns plain text → 10-100× fewer tokens than vision.


Why this exists

Anthropic's official Claude Code computer-use MCP server (v2.1.85+) is macOS-only as of May 2026. Windows and Linux users have no first-party way to give Claude vision into their desktop.

This project fills the gap with three deliberate constraints:

  1. Zero native runtime deps — uses each OS's built-in screenshot tooling (PowerShell + System.Drawing on Win, screencapture on Mac, grim/scrot/import on Linux). No node-gyp, no postinstall flakiness, no platform-specific binaries to bundle.
  2. Single responsibility — only screen capture (read-only). Keyboard / mouse control belongs in a separate server (different threat model). This means it can be safely autostarted in any Claude session without granting input control.
  3. Token-aware by design — auto-resize to maxEdge=1600, JPEG/WebP support, region capture, OCR (skip vision entirely for text), and perceptual-hash diff (skip frames that didn't change).

Quality bar

Every release was reviewed by 3 specialized agents (code quality + silent-failure-hunter + security auditor) before tagging. Across v0.1 → v0.3, the audits caught 16 P0 issues that were fixed before any tag was pushed:

  • v0.1: PowerShell -EncodedCommand BOM / Mac+Linux list_displays returning fake data / tool errors swallowing stderr / displayId argument injection / region OOM / output byte caps
  • v0.2: SCREEN_MCP_OCR_LANGS supply-chain injection (allowlist enforcement) / OCR worker timeout (was unbounded) / no-match token bomb / structured OCR diagnostics / SIGTERM handler
  • v0.3: cache size cap + LRU + 24h stale TTL / dHash channel assert (silent monitoring failure prevention) / cross-tool cache pollution fix / CompareResult.reason to distinguish first-call from real change
  • v0.4: Windows window-title mojibake (PowerShell OEM codepage → UTF-8) / Tesseract v6+ output schema (blocks: true required for line bboxes; without it find_text_on_screen silently returned 0 matches) / get_screen_diff misleading above_threshold reason / two new tools (wait_for_change, record_screen) for real-time-ish workflows

See the commit log for the full audit trail.


Configuration

Environment variables:

| Var | Default | Purpose | |---|---|---| | SCREEN_MCP_LOG_LEVEL | info | debug / info / warn / error. Logs go to stderr. | | SCREEN_MCP_OCR_LANGS | eng+chi_sim | Plus-separated tesseract codes. Allowlist enforced to prevent supply-chain attacks. Allowed: eng, chi_sim, chi_tra, jpn, kor, fra, deu, spa, rus, ita, por, ara, nld, tur, vie, tha, hin, ben, ukr. |

First OCR call downloads ~40 MB of language models from cdn.jsdelivr.net. Subsequent calls reuse the cached worker.


Platform support

| Platform | Capture | Region | Displays | Windows | OCR | Vision-diff | |---|---|---|---|---|---|---| | Windows ≥ 10 | ✅ tested | ✅ | ✅ multi-display | ✅ | ✅ | ✅ | | macOS ≥ 11 | ✅ code | ✅ | 🟡 stub (single only) | ✅ | ✅ | ✅ | | Linux (X11 + Wayland) | ✅ code | ✅ | 🟡 stub (single only) | 🟡 needs wmctrl | ✅ | ✅ |

Windows is the maintainer's primary platform and has end-to-end test coverage. macOS / Linux paths are written and CI-built but not yet end-to-end tested by the maintainer — PRs and issue reports very welcome.


Security & privacy

  • The server runs entirely locally. No screenshot data leaves your machine via this server. (Whatever LLM client connects controls where the image goes — that's the API call you authorized when registering the connector.)
  • OCR text is untrusted input. Anything visible on your screen — notifications, web pages, chat windows, ads — gets passed to the LLM as a tool result. A malicious actor controlling something on your screen could embed prompt-injection content. Tool descriptions and output delimiters (<screen_ocr>...</screen_ocr>) flag this clearly so downstream models can be guided to distrust.
  • Use screenshot_region when you don't need the whole screen.
  • Use read_screen_text instead of screenshot when you only need text — vastly fewer tokens and you're not exposing other windows that happen to be open.

Development

git clone https://github.com/lfzds4399-cpu/claude-screen-mcp
cd claude-screen-mcp
npm install
npm run build
node tests/e2e-wire.mjs    # spawn server + drive JSON-RPC + verify all 8 tools

Roadmap

  • v0.5screenshot_window(title) precisely scoped to a window's bounds; macOS multi-display enumeration via system_profiler; Linux multi-display via xrandr / wlr-randr; optional vendored tesseract models (SCREEN_MCP_OCR_LANG_PATH) for offline / air-gapped use
  • v1.0 — first-class MCPB bundle for one-click install via Claude Desktop

Why "real-time video" isn't a tool

MCP is request-response and each tool call costs an LLM turn (~1–3 s end-to-end). 24 fps streaming is physically impossible at that latency. Three substitutes cover the real use cases:

  • wait_for_change — like a human watching the screen and only saying something when it changes
  • record_screen — like rewinding a short clip with the boring frames cut out
  • screenshot_if_changed in a loop — for sustained polling under your own pacing

Contributing

PRs especially welcome for:

  • macOS multi-display enumeration (system_profiler SPDisplaysDataType -json parsing)
  • Linux per-output capture (grim -o, scrot --screen)
  • screenshot_window for v0.4
  • Performance regressions if you find any

See CONTRIBUTING.md (TODO).


License

MIT — see LICENSE.


中文 TL;DR

让 Claude 看到你的屏幕。MCP server,跨 Win/Mac/Linux,零原生依赖。

填补 Anthropic 官方 computer-use MCP 仅 macOS 的空白,外加 OCR(省 vision token 10-100x)和智能 vision-diff(让 24/7 监测在 token 经济上可行)。

8 个 tool(截屏 / 区域 / 列显示器 / 列窗口 / OCR / 找文字 / 智能截屏 / 看变化),跨平台一致。每个 release 都过了 3 agent 联合审核(代码质量 + silent failure + security),共修了 16 个 P0 才发出去。

git clone https://github.com/lfzds4399-cpu/claude-screen-mcp
cd claude-screen-mcp && npm install && npm run build
claude mcp add screen -- node "$(pwd)/dist/index.js"
# 重启 Claude Code,然后说"截一张屏幕给我看"

中文 OCR 默认开启(eng+chi_sim),无需额外配置。