npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

axiclick

v0.1.4

Published

AXI-compliant macOS mouse, keyboard, and screen automation for AI agents

Downloads

47

Readme


Why axiclick?

Most desktop automation tools are built for humans scripting GUIs. axiclick is built for AI agents that need to control macOS apps — with token-efficient output, structured perception, and zero guesswork.

  • SoM (Set-of-Mark) perception — detects every UI element on screen, labels them with @id tags, and lets you click by ID instead of fragile pixel coordinates
  • Token-efficient — outputs TOON-formatted data designed for LLM context windows, not humans reading terminals
  • Full input control — mouse clicks, drags, keyboard shortcuts, text typing, scrolling — everything an agent needs
  • Accessibility tree access — query native macOS AXUIElement trees for apps that expose them
  • Agent session hooks — self-installs into Claude Code and Codex so agents start with axiclick context automatically

Quick Start

brew install cliclick        # required dependency
npm install -g axiclick      # install globally

Verify the install:

axiclick                     # shows mouse position, active window, display info

Your first SoM workflow

The core loop: screenshot → detect elements → click by ID → verify.

axiclick som-setup                    # one-time: download OmniParser V2 models (~2GB)
axiclick som-start                    # start the detection daemon

axiclick focus Safari                 # bring target app to front
axiclick wait 500
axiclick som /tmp/page.png            # detect all UI elements → labeled image
axiclick som-click @12                # click element #12 by ID
axiclick screenshot /tmp/verify.png   # confirm the result

Tip: If a nested surface like iPhone Mirroring is visible but not active, click inside that window first, then run som.

Install session hooks

Auto-inject axiclick context into every Claude Code and Codex session:

axiclick install

Commands

| Command | Description | Example | |---------|-------------|---------| | som-setup | Install OmniParser V2 models and venv | axiclick som-setup | | som-start | Start the warm SoM daemon | axiclick som-start | | som-stop | Stop the warm SoM daemon | axiclick som-stop | | som <path> | Capture and annotate visible UI elements | axiclick som /tmp/screen.png --no-caption | | som-click @<id> | Click a marked element from the last SoM pass | axiclick som-click @3 |

| Command | Description | Example | |---------|-------------|---------| | click <x>,<y> | Left-click | axiclick click 100,200 | | rclick <x>,<y> | Right-click | axiclick rclick 100,200 | | dclick <x>,<y> | Double-click | axiclick dclick 100,200 | | tclick <x>,<y> | Triple-click | axiclick tclick 100,200 | | move <x>,<y> | Move cursor | axiclick move 500,300 | | drag <from> <to> | Drag between points | axiclick drag 100,200 300,400 | | type <text> | Type text | axiclick type "Hello" | | key <key> | Press a key | axiclick key return | | keydown <mods> | Hold modifier keys | axiclick keydown cmd | | keyup <mods> | Release modifier keys | axiclick keyup cmd | | combo <mod+key> | Keyboard shortcut | axiclick combo cmd+c | | submit [--at x,y] | Submit input after dismissing suggestions | axiclick submit --at 500,467 | | scroll <dir> [n] | Scroll | axiclick scroll down 5 | | wait <ms> | Wait | axiclick wait 500 | | run <raw> | Raw cliclick passthrough | axiclick run "c:1,2 t:hi" |

Coordinates support absolute (100,200), relative (+50,+0), and current position (.).

key <key> uses macOS System Events for web-relevant special keys like return, tab, and arrow keys so browsers receive real DOM key events reliably.

| Command | Description | Example | |---------|-------------|---------| | screenshot <path> | Capture screen to file | axiclick screenshot /tmp/s.png | | info <image> | Show image dimensions and mapping metadata | axiclick info /tmp/s.png | | probe <image> <x>,<y> | Mark an image pixel and resolve screen coords | axiclick probe /tmp/s.png 940,644 | | windows | List visible windows | axiclick windows | | active | Show focused app/window | axiclick active | | screen | Display info | axiclick screen | | position | Mouse coordinates | axiclick position | | color <x>,<y> | Sample pixel color | axiclick color 100,200 | | focused | Show the currently focused UI element | axiclick focused |

screenshot supports --region <x>,<y>,<w>,<h> and --display <n>. It writes a sidecar metadata file at <path>.json so info and probe can convert image pixels back into screen coordinates.

probe writes an annotated PNG with a crosshair. Add --click to click the resolved screen point.

windows supports --app <name> to filter.

| Command | Description | Example | |---------|-------------|---------| | snapshot | Accessibility tree with UIDs | axiclick snapshot | | ax-click @<uid> | Click element by UID | axiclick ax-click @5 | | ax-fill @<uid> <text> | Set text field value | axiclick ax-fill @7 "query" |

snapshot supports --depth <n> to limit tree depth.

Note: Accessibility works best with native macOS apps (Finder, Safari, Xcode). Cross-platform apps (Electron, WeChat) may expose minimal trees — fall back to coordinate-based automation with screenshot + click.

| Command | Description | |---------|-------------| | focus <app> | Bring app to foreground | | install | Install Claude Code / Codex session hooks |

When to Use axiclick

| Scenario | Why axiclick | |----------|-------------| | Automate macOS apps with no CLI/API | Finder, WeChat, Xcode, System Settings | | Sites that block headless browsers | Cloudflare, reCAPTCHA — real mouse/keyboard via actual display | | iPhone Mirroring automation | Control iOS apps through the macOS mirroring window | | QA test any GUI application | Screenshot → verify visual state programmatically |

Agent Integration

Add to your CLAUDE.md or AGENTS.md:

Use `axiclick` for macOS desktop automation.

Requirements

| Requirement | Details | |-------------|---------| | OS | macOS 10.15+ | | Runtime | Node.js 18+ | | Dependencies | cliclick (brew install cliclick) | | SoM models | Python 3, ~2GB disk (for som-setup) | | Build tools | Xcode Command Line Tools (compiles Swift helpers on first run) | | Permissions | Accessibility for terminal app; Automation for System Events on first key use |

Acknowledgments

  • cliclick by Carsten Blum — the macOS mouse/keyboard engine axiclick wraps. BSD 3-Clause licensed.
  • AXI — the agent ergonomic interface standard this tool follows.
  • OmniParser V2 by Microsoft — the vision model powering Set-of-Mark detection.

License

MIT — see LICENSE for details, including third-party notices.