npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

superbased

v2.0.9

Published

Screenshot capture, AI analysis, OCR, visual regression testing, token compression — headless server + MCP tools for AI coding CLIs

Downloads

813

Readme

SuperBased

Agent eyes, ears, and hands — cross-platform, works offline. 72 MCP tools — full GUI automation (click / type / hotkey / scroll / drag / hover / AX-invoke / form-fill / dialog-handle / find-image / tab-management / find-in-page / virtual-desktop / tray-click) plus orientation tooling (workspace_sync / project / accessibility_tree / window_bounds / undo_last / progressive-disclosure dispatcher / stt_status), on top of screen capture / OCR / gallery / recording / monitor — with built-in humanization v2 (Bezier-curved cursor approaches with sin-shaped velocity envelope, gaussian click-target jitter, gamma-distributed inter-key timing, per-key + click hold variation, optional typo+correct sequences, per-process cross-session salt). Sign in for cloud AI.

Three-line quickstart

npm install -g superbased       # install
superbased mcp                  # start MCP server
# add this to your AI editor's MCP config:
# { "superbased": { "command": "superbased", "args": ["mcp"] } }

That's it. Your AI editor can now see your screen, read text from it, capture specific windows, record sessions, diff them visually, manage a capture gallery, and compress long text into token-efficient images — all without an account.

Sign in for cloud AI vision and dictation on top.


What you get without an account

Everything you need to give an AI eyes on your desktop, fully local:

| Capability | No account | Signed in | |---|:---:|:---:| | Screen capture (fullscreen / region / window targeting) | ✓ | ✓ | | OCR via local Tesseract (no data leaves the machine) | ✓ | ✓ | | Capture gallery with full-text search, tags, notes | ✓ | ✓ | | Token compression (text → optimized images) | ✓ | ✓ | | Recording sessions (smart change-detect, periodic, monitor) | ✓ | ✓ | | Visual diff & baseline regression testing | ✓ | ✓ | | Window list & specific-window capture (even minimized) | ✓ | ✓ | | Clipboard read/write (text + images) | ✓ | ✓ | | Annotate / redact captures | ✓ | ✓ | | Settings & instruction presets | ✓ | ✓ | | AI vision analysis (Claude / GPT / Gemini via backend proxy) | — | ✓ | | Voice dictation & cloud transcription | — | ✓ | | Frame-by-frame AI description + prose narration | — | ✓ | | Daily AI quota tracking | — | ✓ |

Local dictation (sherpa-onnx + native Windows/macOS engines) is on the near-term roadmap — see SuperBased issues.


"I want to..."

| Goal | Tool | |---|---| | See what's on the user's screen | superbased_capture_image | | Capture a specific window (even minimized) | superbased_capture_image with window="Slack" | | List all open windows | superbased_window_list | | Read text from a screenshot | superbased_ocr (local) or superbased_ai with instruction="/extract" | | Watch the screen for errors during a long agent run | superbased_recording with action=start mode=monitor | | Compare two recording sessions for visual regressions | superbased_diff | | Compress 50k tokens of logs into a single image | superbased_compress_text | | See an image the user just copied | superbased_clipboard with action=readImage | | Dictate via microphone (requires sign-in today) | superbased_dictate with mic=true | | Hide secrets/PII before sharing a capture | superbased_redact |


How it compares

| | SuperBased | Peekaboo | Playwright MCP | computer-use-mcp | |---|---|---|---|---| | Screen capture | ✓ Win + macOS + Linux | ✓ macOS only | — | ✓ | | OCR (local) | ✓ Tesseract | — | — | — | | Window targeting | ✓ (incl. minimized) | partial | — | partial | | Recording sessions | ✓ smart / periodic / monitor | — | — | — | | Visual regression diff | ✓ | — | partial | — | | Token-efficient text compression | ✓ | — | — | — | | AI provider | model-agnostic (Claude/GPT/Gemini) | own keys | own keys | Anthropic only | | Works offline | ✓ for most tools (cloud AI optional) | partial | — | — | | Browser automation | — (use Playwright MCP) | — | ✓ | — |

SuperBased is the desktop layer. For browser-only flows, run Playwright MCP alongside it.


MCP setup

Stdio (Claude Code, Cursor, Windsurf, Cline, OpenCode, Zed)

{
  "mcpServers": {
    "superbased": {
      "command": "superbased",
      "args": ["mcp"]
    }
  }
}

HTTP (OpenAI Codex)

Add to ~/.codex/config.toml:

[mcp_servers.superbased]
enabled = true
url = "http://127.0.0.1:47592/mcp"

Then run superbased mcp in a terminal.

See the full setup guide for plugin install badges and 8-editor configuration.


Sign in

Sign-in unlocks cloud AI features (vision analysis, transcription, dictation, frame narration). Everything above the divider in the table works without an account.

superbased auth login    # opens browser
superbased auth status   # shows current state
superbased auth logout   # clears stored token

CLI commands

superbased              Start headless server (default)
superbased serve        Start server with options (--port N)
superbased mcp          Start MCP stdio bridge (for AI editors)
superbased auth login   Sign in via browser
superbased auth status  Show authentication state
superbased auth logout  Clear stored token
superbased capture      Take a fullscreen screenshot
superbased --version    Show version
superbased --help       Show all commands

The API server runs on http://127.0.0.1:47592 and exposes 72 MCP tools + 13 resources. See SUPERBASED_SKILL.md for the full per-tool reference (parameters, return shapes, examples, error codes, decision guide, plus a CAPTCHA-solving Common Workflows section).


All 72 tools

  • superbased_screenshot — preferred screenshot (window targeting + resolution control in one call)
  • superbased_capture_image — advanced screenshot (region / explicit mode)
  • superbased_capture — screenshot (metadata-only response)
  • superbased_gallery_image — retrieve a saved image inline
  • superbased_window_list — list all open windows with active/minimized state
  • superbased_display_list — enumerate all connected displays (virtual-screen bounds, DPI, color space)
  • superbased_find_image — visual template match (locate a small PNG on screen)
  • superbased_capture_template — capture a region as a reusable template (with DPI + theme metadata sidecar)
  • superbased_ai — AI vision analysis (Claude/GPT/Gemini via backend; slash commands /extract, /summarize, /code, /explain, /translate, /table, /edit, /reformat)
  • superbased_ocr — local Tesseract OCR (no data leaves the machine)
  • superbased_compress_text — text → token-efficient images
  • superbased_describe_frames — AI description per recording frame (sign-in)
  • superbased_narrate — prose narrative summary of a session (sign-in)
  • superbased_recording — start / stop / pause / monitor (4 modes: interaction / periodic / smart change-detect / monitor with AI alerts)
  • superbased_sessions — list past sessions, get frames
  • superbased_export — ZIP / Markdown / PDF / HTML / GIF
  • superbased_diff — visual regression between two sessions
  • superbased_baseline — set / get / history of workflow baselines
  • superbased_dictate — transcribe with cleanup (file, base64, or live mic; sign-in)
  • superbased_transcribe — raw Whisper transcription (sign-in)
  • superbased_dictation_history — past transcriptions
  • superbased_stt_status — local STT engine probe (sherpa binary/model state, cloud signed-in, native availability, auto-resolved-to)

The agent's hands. All write tools accept humanize: 'off' | 'light' | 'human' | 'paranoid' per call (default 'light' via humanInputDefault setting). All gated behind guiAutomation.enabled (master toggle, default OFF) + per-action toggles + confirm: true flag.

  • superbased_click — click at coords OR resolved element (label / automationId / role+name); modifiers + clickCount (1/2/3) + button (left/right/middle); per-call humanization
  • superbased_type — type text into focused field (clearFirst, delayMs, ime for JKC, label-resolved targeting)
  • superbased_hotkey — keyboard shortcut (auto-classifies PostMessage vs SendInput delivery)
  • superbased_scroll — wheel / keys / auto; vertical or horizontal axis; per-page or per-tick
  • superbased_drag — press → interpolate → release with modifier wrap; multi-waypoint paths
  • superbased_hover — move + dwell for tooltips and hover-reveal UIs
  • superbased_pixel_color — read RGBA at coords with optional expected+tolerance match
  • superbased_mouse_position — current cursor coords (no synthesis)
  • superbased_wait — sleep N ms server-side (no per-iteration approval cost)
  • superbased_wait_for — server-side polling for window appearance, pixel match, AX state
  • superbased_locate — resolve label/AX/template to screen coords without acting
  • superbased_ui_dump — screenshot + OCR text elements + AX tree merged (every visible element with center coords) — flat element list, optimized for "find me a clickable thing"
  • superbased_accessibility_tree — raw native AX/UIA tree (hierarchical) for tree traversal, parent/child queries, full attribute access. Requires optional native binding (@superbased/macos-ax / @superbased/win-uia)
  • superbased_sequencepreferred for any multi-step flow. ONE approval, ONE activation, N steps. Step types: click, type, hotkey, scroll, drag, hover, screenshot, wait, wait_for, assert, locate, ax_invoke, find_image, form_fill, dialog_handle, find_in_page, tab_management, virtual_desktop, tray_click. Always end with a screenshot step.
  • superbased_scroll_to — scroll-until-found loop (one call; up to maxPages PageDown iterations)
  • superbased_scroll_capture — multi-page scroll + frame stitching with calibration (Windows; the MCP equivalent of Cmd+Shift+A)
  • superbased_find_title_bar_drag_region — pickup-safety helper for window-drag scenarios
  • superbased_ax_invoke — UIA pattern invocation (Invoke / Toggle / SelectionItem.Select / Value.SetValue / ExpandCollapse). Top of the AX reliability pyramid; bypasses synthesized clicks.
  • superbased_form_fill — one-call macro over AX-resolve + ValuePattern.SetValue (fast path) or click + clearFirst + type (fallback). Optional Tab-between, optional Submit. Shared sessionId audit trail.
  • superbased_dialog_handle — auto-detect modal foreground; confirm / dismiss / fill_path
  • superbased_context_menu_select — right-click + popup-detect + click matched menu item (one approval for the 4-call dance)
  • superbased_drag_file — Windows scaffold (full OLE DoDragDrop deferred)
  • superbased_window_state — minimize / maximize / restore / close / always_on_top_on/off/toggle / move_to_display
  • superbased_window_bounds — read window rect + minimized/maximized state (PURE QUERY — no activation, no focus change). For "is this window in the top-left yet?" reasoning without disturbing foreground.
  • superbased_resize_window — device presets (iphone-se / ipad-pro-13 / desktop-hd / etc.) + custom W×H + optional X/Y reposition; contentArea: true for browser device-emulation
  • superbased_focus_window — standalone focus (rare — never as setup before another tool; per-tool window targeting handles this in one approval)
  • superbased_launch_app — spawn process with launchAllowlist gate (substring or sha256:hex pin)
  • superbased_open_url — open URL in default or specific browser; optional waitForLoad
  • superbased_tab_management — new / close / next / prev / switch_to (browser tab control via Ctrl+T/W/Tab/Shift+Tab/N)
  • superbased_find_in_page — Ctrl+F + query + Enter / Shift+Enter + optional Escape (close-after)
  • superbased_tray_click — click a Windows system-tray icon by tooltip match (walks Shell_TrayWnd cross-process via VirtualAllocEx + ReadProcessMemory; optional includeOverflow)
  • superbased_virtual_desktop — switch desktops (Win+Ctrl+Left/Right on Windows; Ctrl+Left/Right on macOS Spaces)
  • superbased_doctor_gui_automation — hermetic probe (launches Notepad/TextEdit and runs every primitive, reports per-tool {ok, durationMs, errorCode})
  • superbased_dry_run — cross-tool preview wrapper ({tool, args} returns {errorCode: 'DRYRUN'} without dispatching)
  • superbased_replay — replay a range of ~/.superbased/audit.log entries via superbased_sequence (default-safe dryRun: true)
  • superbased_undo_last — reverse the last action from this process's in-memory stack. type undoes via Backspace count; click/hotkey/scroll/focus_window are popped but return not_reversible (call again to walk back further).
  • superbased_project — "where am I" snapshot for agents: cwd, git branch + status + ahead/behind, detected framework(s), recent captures, active recording (if any), last dictation preview. Recommended first call for many agent sessions — gives enough context to pick the right tool without extra round-trips.
  • superbased_workspace_sync — one-call bundled snapshot: fullscreen capture (base64 PNG) + OCR text (truncated 4000 chars) + active window metadata + timestamp + project dir. Replaces the capture → ocr → window_list chain with a single round-trip. Read-only; does NOT persist to gallery.
  • superbased_tools — progressive-disclosure dispatcher. When progressiveDisclosure.enabled = true in settings, only 15 core tools appear in tools/list; this tool reveals the hidden tools by category (capture / record / dictate / interact / window / layout / ax / ai / edit / clipboard / system). For trimming agent token cost without restricting access.
  • superbased_gallery — list / get / search / stats
  • superbased_gallery_update — update tags or notes
  • superbased_settings — view or change settings
  • superbased_presets — manage AI instruction presets
  • superbased_health — server status + capabilities
  • superbased_auth — sign-in state
  • superbased_license — plan limits + device info
  • superbased_ai_usage — daily AI quota status
  • superbased_redact — auto-redact secrets / PII
  • superbased_annotate — rectangles, highlights, blur, text labels, arrows
  • superbased_clipboard — read or write the system clipboard (text + images)

Plus 13 MCP resources for live status + subscribable event streams: superbased://status, settings, presets, captures/recent, auth, license, recording, recording/monitor, analytics/summary, ai/usage, plus three subscribable watcher streams (clipboard, active-window, file-watcher, screen-region).

Humanization v2 (default 'light', opt out via humanInputDefault: 'off') — Bezier-curved cursor approaches with sin-shaped velocity envelope, gaussian click-target jitter, gamma-distributed inter-key timing, click + key hold variation (50–110ms vs ~5ms atomic), pre-click settle dwell + tremor, optional typo+correct sequences (paranoid only), per-process cross-session salt mixed into seeds, inter-action catch-up pause. Defeats CAPTCHA-style trajectory and timing classifiers. Opt-in idle cursor drift via humanInputIdleDrift setting. CAPTCHA-solving workflow patterns documented in SUPERBASED_SKILL.md Common Workflows section.


Desktop app

Want the full GUI? Download the desktop app from superbased.app/download. Everything carries over automatically — captures, settings, auth, presets, recordings. Zero migration. Same data directory (~/.superbased/).


Links