npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@agent-sh/computer-use-linux

v0.2.1

Published

Linux desktop-control MCP server: AT-SPI accessibility trees, Wayland/X11 input, screenshots, and compositor window targeting.

Readme

computer-use-linux

Linux desktop control for any MCP host — AT-SPI accessibility trees, portal screenshots, Wayland/X11 input, and multi-compositor window targeting for GNOME, KDE/KWin, Hyprland, i3, and COSMIC.

CI crates.io License: MIT

Current release: v0.2.1. The Rust crate is published as computer-use-linux, and the npm wrapper is published as @agent-sh/computer-use-linux.

What this is

computer-use-linux is a Rust MCP server and CLI for Linux desktop control. The crate ships the main computer-use-linux binary plus a small computer-use-linux-cosmic helper used only for COSMIC Wayland window management. Any MCP host — Codex Desktop's Linux build, Claude Desktop, Hermes Agent, or your own client — can spawn it and gain full control of the local Linux desktop: read accessibility trees, list and focus windows, take screenshots, click, drag, scroll, type, and invoke semantic accessibility actions.

Most computer-use MCP servers are macOS-only (they rely on AppKit, AXUIElement, CGEvent). The few that target Linux either drive xdotool against an X11 root window or shell out to OCR over screenshots. This crate is different on four points worth caring about:

  • Wayland actually works. Pointer actions can use the org.freedesktop.portal.RemoteDesktop interface on Wayland, with ydotool / ydotoold (uinput) as the deterministic fallback and keyboard/text path. Screenshots use the GNOME Shell DBus screenshot method when present and org.freedesktop.portal.Screenshot otherwise.
  • Window targeting is compositor-aware. The window registry tries GNOME Shell extension, GNOME Shell Introspect, COSMIC Wayland helper, KWin DBus scripting, Hyprland hyprctl, and i3 IPC in order, then reports exactly which backend won or why each backend failed.
  • Semantic selectors, not pixel coordinates. Tools like click, perform_action, and set_value accept role / name / text / states selectors backed by AT-SPI. Pixel coordinates remain available as a fallback for rendering-only surfaces (canvas, games, X clients without ATK).
  • One JSON readiness report. computer-use-linux doctor returns a structured document covering platform, portals, AT-SPI, windowing, input, and a readiness summary with explicit blockers and a recommended next step. MCP hosts can render or surface that to the user without parsing prose.

The crate was extracted from codex-desktop-linux (the Linux distribution of Codex Desktop), which still bundles this binary as a built-in plugin. This standalone repo is the upstream.

Features

15 MCP tools exposed by the current v0.2.1 server:

Diagnostics

  • doctor — single-shot JSON readiness report (platform, portals, accessibility, windowing, input, readiness summary)
  • setup_accessibility — enables GNOME's org.gnome.desktop.interface toolkit-accessibility setting so toolkit apps expose AT-SPI trees
  • setup_window_targeting — installs and enables the bundled GNOME Shell extension when org.gnome.Shell.Introspect is locked down

Discovery

  • list_apps — running desktop apps visible to the AT-SPI registry
  • list_windows — compositor windows with title, app id, wm_class, focus state, client type (Wayland/X11), and bounds
  • focused_window — the window currently holding keyboard focus
  • get_app_state — combined screenshot + accessibility tree for a chosen app, with element indices that the input tools accept

Input

  • click — by element index, semantic selector, or pixel coordinates
  • drag — pixel-coordinate drag (start / end)
  • scroll — page-based scroll on an element or at a pixel location
  • press_key — keys / chords; can focus a window or terminal first
  • type_text — literal text input, optionally targeted at a window or terminal

Semantic actions

  • perform_action — invoke any AT-SPI action exposed by an element (Press, Activate, Toggle, …); defaults to the primary action
  • set_value — write to a settable accessibility element (text fields, sliders, spinners)

Navigation

  • activate_window — focus a window by window_id, pid, app_id, wm_class, title, or terminal selectors

MCP safety contract

computer-use-linux is not a read-only data source. It can observe the local desktop and, when a mutating tool is called, can change real application state. The tools/list response includes MCP ToolAnnotations so hosts can surface this distinction before invocation:

| Class | Tools | Contract | | --- | --- | --- | | Read-only observation | doctor, list_apps, list_windows, focused_window, get_app_state | readOnlyHint=true; may reveal app, window, accessibility, and screenshot contents. get_app_state may trigger the desktop screenshot portal prompt. | | Local setup mutators | setup_accessibility, setup_window_targeting | readOnlyHint=false, destructiveHint=false, idempotentHint=true; modifies user desktop configuration by enabling accessibility or installing/enabling the GNOME window-targeting extension. | | UI state mutators | activate_window, scroll | readOnlyHint=false, destructiveHint=false; changes focus or scroll position in the live desktop. | | Desktop action mutators | click, drag, press_key, type_text, perform_action, set_value | readOnlyHint=false, destructiveHint=true, openWorldHint=true; can trigger arbitrary actions in whatever local application is targeted. |

Annotations are safety hints, not an authorization system. MCP hosts should still ask the user before calls that could submit, delete, send, purchase, overwrite, or otherwise commit state.

The binary also exposes the same capabilities from the CLI for scripting and debugging:

computer-use-linux mcp                                  # stdio MCP server
computer-use-linux doctor                               # JSON readiness report
computer-use-linux setup                                # enable AT-SPI
computer-use-linux setup-window-targeting               # install GNOME Shell extension
computer-use-linux apps
computer-use-linux state [APP_NAME]
computer-use-linux screenshot                           # JSON screenshot summary
computer-use-linux windows

Support matrix

Validated manually on Ubuntu 25.10 (GNOME Shell 50.1, Wayland). Other compositor backends are implemented and covered by parser / contract tests, but real desktop behavior still depends on each session exposing its expected control API.

| Desktop/session | Window backend | Notes | | --- | --- | --- | | GNOME Wayland | GNOME Shell extension first, org.gnome.Shell.Introspect fallback | Full target. The extension provides exact window activation when GNOME blocks native introspection; Introspect can list windows and focus apps by app_id when allowed. | | GNOME X11 | org.gnome.Shell.Introspect when allowed | AT-SPI and ydotool work; the bundled GNOME Shell extension is only needed for GNOME Wayland. Exact per-window focus may be unavailable without the extension backend. | | KDE Plasma / KWin | temporary KWin DBus scripting | Lists and focuses windows through org.kde.KWin scripting when the session bus exposes it. | | Hyprland | hyprctl clients -j and hyprctl dispatch focuswindow | Requires hyprctl in the desktop session. | | i3 | i3-msg; optional xprop for PID hydration | Lists and focuses i3 windows over the active i3 IPC socket. | | COSMIC Wayland | computer-use-linux-cosmic helper | Installed automatically by ./install.sh, cargo install, and npm. For custom/manual layouts, put the helper next to the main binary, on PATH, or point COMPUTER_USE_LINUX_COSMIC_HELPER at it. | | Sway / generic wlroots | no dedicated backend yet | AT-SPI, screenshots, and global ydotool input can still work; exact window list/focus is currently unavailable unless another backend applies. | | Generic X11 / XFCE / other WMs | no dedicated backend yet | AT-SPI plus ydotool global input only, unless running under i3. |

If you run on a desktop not covered above, or a covered backend does not come up cleanly, please open an issue with the output of computer-use-linux doctor so we can extend the matrix honestly.

Install

COSMIC users do not need a second package or a separate helper install when using ./install.sh, cargo install, or the npm wrapper. Those paths install computer-use-linux-cosmic alongside the main binary automatically. Only manual prebuilt-binary installs need you to copy both release assets.

Option A — ./install.sh from a clone

Installs system packages on Debian/Ubuntu, Fedora/RHEL-like, or Arch-like distros; installs Rust if needed; builds both release binaries; installs them to ~/.local/bin; enables ydotoold as a user service; enables GNOME AT-SPI settings when running under GNOME; and installs the bundled GNOME Shell extension on GNOME Wayland.

git clone https://github.com/avifenesh/computer-use-linux
cd computer-use-linux
./install.sh
# log out and back in if the GNOME extension was newly installed
computer-use-linux doctor | jq .readiness

Option B — cargo install (Rust binaries, no system setup)

Installs the Rust binaries from crates.io. You still handle the system-level pieces yourself: ydotoold, AT-SPI, desktop portals, and the GNOME extension if you need the GNOME Wayland exact-focus backend.

cargo install computer-use-linux --version 0.2.1
computer-use-linux doctor

For unreleased changes from main, install directly from Git:

cargo install --git https://github.com/avifenesh/computer-use-linux

Then, as needed:

sudo apt install ydotool at-spi2-core         # or your distro's equivalent
systemctl --user enable --now ydotoold
computer-use-linux setup                      # gsettings AT-SPI bridge
computer-use-linux setup-window-targeting     # GNOME Shell extension

Option C — npm wrapper (binary download)

Good for users who already have Node.js and want a no-Rust install. The npm package downloads and verifies the matching main and COSMIC helper binaries during install, then the wrapper sets COMPUTER_USE_LINUX_COSMIC_HELPER to the bundled helper automatically.

npm install -g @agent-sh/[email protected]
computer-use-linux doctor

You will still need ydotoold running and AT-SPI enabled (run computer-use-linux setup and the systemd commands above).

Option D — prebuilt binaries

Linux x86_64 / aarch64 builds are published with each tag. Each binary ships a .sha256 next to it.

target=x86_64-unknown-linux-gnu
version=v0.2.1
for binary in computer-use-linux computer-use-linux-cosmic; do
  asset="$binary-$target"
  curl -L -O "https://github.com/avifenesh/computer-use-linux/releases/download/$version/$asset"
  curl -L -O "https://github.com/avifenesh/computer-use-linux/releases/download/$version/$asset.sha256"
  sha256sum -c "$asset.sha256"
  install -m 0755 "$asset" "$HOME/.local/bin/$binary"
done

You will still need ydotoold running and AT-SPI enabled (run computer-use-linux setup and the systemd commands above).

Wire it into your MCP host

The binary speaks the rmcp 2024-11-05 stdio protocol. Pass mcp as the only argument; everything else is configured through MCP tool calls.

Codex Desktop (Linux build)

The Linux build of Codex Desktop already bundles this binary as a plugin. You don't need to wire it up manually — the plugin definition lives in codex-desktop-linux under its plugins/ directory and is enabled by default. To upgrade the plugin in place, replace the binary it ships with the one from this repo's release assets.

Claude Desktop

Edit ~/.config/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "computer-use-linux": {
      "command": "computer-use-linux",
      "args": ["mcp"]
    }
  }
}

Restart Claude Desktop. The 15 tools should appear in the tools list.

Hermes Agent

If computer-use-linux is on your PATH, let Hermes discover it:

hermes mcp add computer-use-linux --command computer-use-linux --args mcp
hermes mcp test computer-use-linux

Press Enter at the "Enable all tools?" prompt to expose all 15 tools. Hermes registers them as mcp_computer_use_linux_<tool> and creates the mcp-computer-use-linux runtime toolset.

If you installed the binary somewhere that is not on PATH, pass the absolute path as --command.

You can also edit ~/.hermes/config.yaml directly:

mcp_servers:
  computer-use-linux:
    command: computer-use-linux
    args: ["mcp"]

# Optional: expose the tools to subagents as well.
inherit_mcp_toolsets: true

Generic MCP client

Spawn the binary with ["mcp"] as the argv tail. It speaks JSON-RPC over stdio per the rmcp 2024-11-05 protocol; capability discovery happens through tools/list and the doctor tool. The server normally needs no MCP-specific configuration, but desktop runtime environment still matters (DBUS_SESSION_BUS_ADDRESS, XDG_RUNTIME_DIR, portals, AT-SPI, ydotoold, and optionally COMPUTER_USE_LINUX_COSMIC_HELPER).

First-run checklist

  1. Run doctor.

    computer-use-linux doctor | jq .readiness

    Aim for can_register_mcp_tools, can_build_accessibility_tree, can_send_development_input, and can_query_windows all true. The blockers array should be empty.

  2. If accessibility.at_spi_bus.ok = false — run computer-use-linux setup (or call the setup_accessibility MCP tool). This sets:

    • org.gnome.desktop.interface toolkit-accessibility true

    You may need to restart toolkit-using apps for the change to take effect.

  3. If windowing.can_list_windows = false — inspect doctor.windowing.backends. On GNOME Wayland, run computer-use-linux setup-window-targeting (or call setup_window_targeting) to install the bundled [email protected] Shell extension, then log out and back in so GNOME Shell loads it. On KDE, Hyprland, i3, or COSMIC, install or expose the matching compositor tool/helper shown in the backend details.

  4. Grant the screencast portal on first screenshot. The first time get_app_state or any screenshot subcommand runs, GNOME will pop a portal dialog asking to share the screen. Accept once and tick "remember" to make it sticky for the session.

  5. Confirm ydotoold is running.

    systemctl --user status ydotoold

    Its socket should appear at /run/user/$UID/.ydotool_socket.

Architecture

  • Accessibility treeatspi crate (tokio backend) talks to the AT-SPI registry on the user session bus. The tree is flattened to (role, name, text, states, bounds) tuples and indexed; element indices are stable for the duration of a get_app_state snapshot.
  • DBus where desktops expose itzbus for portal calls (org.freedesktop.portal.Screenshot, …RemoteDesktop, …ScreenCast), GNOME Shell screenshots (org.gnome.Shell.Screenshot), the bundled GNOME extension's dev.avifenesh.ComputerUseLinux.WindowControl service, and temporary KWin scripting.
  • MCP transportrmcp with the transport-io feature; stdio framing, no network.
  • Input fallback — when the remote-desktop portal isn't available or the host wants deterministic injection, the binary writes to ydotoold's socket, which writes to /dev/uinput. install.sh can configure ydotoold; the setup command only enables the GNOME AT-SPI bridge.
  • Window registrylist_windows, focused_window, activate_window, press_key, and type_text share a backend registry. It tries GNOME extension, GNOME Introspect, COSMIC helper, KWin scripting, Hyprland hyprctl, and i3 IPC in that order, skipping empty or failed backends so another compositor backend can answer.
  • GNOME extension fallback — recent GNOME builds deny org.gnome.Shell.Introspect.GetWindows to non-blessed clients. The bundled Shell extension exposes window data and exact activation under dev.avifenesh.ComputerUseLinux.WindowControl.
  • COSMIC helpercomputer-use-linux-cosmic talks to COSMIC toplevel protocols and is resolved from COMPUTER_USE_LINUX_COSMIC_HELPER, next to the running binary, or from PATH.
  • Terminal enrichmentlist_windows cross-references each terminal window with its controlling TTY and the foreground process on that TTY, so type_text / press_key can target "the terminal where pytest is running" without the host ever knowing the window id.

Security

Computer-use tooling is, by definition, a privilege-escalation surface. The threat model:

  • ydotoold runs as a per-user systemd service with read/write access to /dev/uinput. Any process that can connect to its socket (/run/user/$UID/.ydotool_socket, mode 0600 by default) can synthesize arbitrary input — keypresses, clicks, anything. Keep the socket in the user runtime dir (the default), not in /tmp or any world-readable location. Do not run ydotoold as a system service.
  • The screencast portal asks for permission once per session. Granting it lets the calling MCP host capture the screen for the rest of the session. If you don't want that, decline the portal dialog and use get_app_state with include_screenshot: false.
  • AT-SPI exposes window contents to any client on your session bus. Enabling the AT-SPI bridge (setup_accessibility) is a prerequisite for this binary; it's also what screen readers use, and it shares the same trust boundary.
  • The GNOME Shell extension is loaded only into your user's GNOME Shell, runs in the Shell's JS sandbox, and exposes a single DBus interface on the user session bus. It does not request any extra permissions.
  • No network. This binary opens no TCP/UDP listener, makes no outbound Internet connections, and ships no telemetry. It does use local session transports such as DBus and the per-user ydotoold Unix socket.
  • Mutating tools are explicit. The MCP tool list annotates read-only versus mutating tools, and CI fails if the published tool annotations drift from the table above. Treat those annotations as hints; the host is still responsible for user approval and policy.

If you're running this on a shared workstation, set ydotoold's socket permissions to 0600 (the default) and audit which processes on your user can connect() to it.

Troubleshooting

computer-use-linux doctor is the source of truth. Common failure modes and fixes:

  • accessibility.at_spi_bus.ok = false — AT-SPI registry isn't running or the toolkit bridge is off. Fix: computer-use-linux setup (or call the setup_accessibility MCP tool). Restart the apps you want to drive.
  • windowing.gnome_shell_introspect.ok = false and gnome_shell_extension_dbus.ok = false — GNOME blocks introspection and the extension isn't installed. Fix: computer-use-linux setup-window-targeting, then log out and log back in.
  • input.ydotool_socket.ok = false — daemon isn't running. Fix: systemctl --user enable --now ydotoold. If the unit doesn't exist, install the ydotool package and rerun ./install.sh (or copy the unit from systemd/ydotoold.service in this repo).
  • input.uinput.ok = false/dev/uinput isn't accessible to your user. Fix: add yourself to the input group (sudo usermod -aG input $USER) and re-login. On distros that ship uinput as a kernel module without auto-loading it, add uinput to /etc/modules-load.d/.
  • Portal calls hang or time outxdg-desktop-portal or its backend (-gnome, -gtk, -kde, -wlr) crashed. Fix: check journalctl --user -u xdg-desktop-portal -u xdg-desktop-portal-gnome --since '5 min ago' and restart the relevant unit.
  • KWin / Hyprland / i3 / COSMIC windowing is unavailable — check doctor.windowing.backends. KWin needs session-bus scripting; Hyprland needs hyprctl; i3 needs i3-msg and its IPC socket. COSMIC needs computer-use-linux-cosmic, which the standard installers provide automatically; if you copied binaries by hand, copy the helper too or set COMPUTER_USE_LINUX_COSMIC_HELPER.
  • Screenshots return black frames on multi-monitor setups — known portal / compositor edge case. Use get_app_state with include_screenshot: false and rely on AT-SPI until the portal backend is healthy.
  • type_text types into the wrong window — pass an explicit target (window_id, pid, wm_class, title, or for terminals tty / terminal_pid / terminal_command / terminal_cwd). Without a target, input goes to whatever window currently has compositor focus.

If doctor is green and a specific tool still misbehaves, file an issue with the JSON output of doctor and the failing tool's request payload.

Credits

Extracted from codex-desktop-linux, the Linux distribution of Codex Desktop, which continues to ship this same binary as a bundled plugin. Maintained by Avi Fenesh.

Built on top of:

Publishing

Publishing is tag-driven from GitHub Actions. The repository needs these Actions secrets:

gh secret set CARGO_REGISTRY_TOKEN -R avifenesh/computer-use-linux
gh secret set NPM_TOKEN -R avifenesh/computer-use-linux

Then bump Cargo.toml and package.json together, update CHANGELOG.md, and push a vX.Y.Z tag. CI runs the full Rust and MCP safety gates, builds release assets for both architectures, publishes computer-use-linux to crates.io, and publishes the npm wrapper after the GitHub release binaries are available.

License

MIT — see LICENSE.