superbased
v2.0.9
Published
Screenshot capture, AI analysis, OCR, visual regression testing, token compression — headless server + MCP tools for AI coding CLIs
Downloads
813
Maintainers
Readme
SuperBased
Agent eyes, ears, and hands — cross-platform, works offline. 72 MCP tools — full GUI automation (click / type / hotkey / scroll / drag / hover / AX-invoke / form-fill / dialog-handle / find-image / tab-management / find-in-page / virtual-desktop / tray-click) plus orientation tooling (workspace_sync / project / accessibility_tree / window_bounds / undo_last / progressive-disclosure dispatcher / stt_status), on top of screen capture / OCR / gallery / recording / monitor — with built-in humanization v2 (Bezier-curved cursor approaches with sin-shaped velocity envelope, gaussian click-target jitter, gamma-distributed inter-key timing, per-key + click hold variation, optional typo+correct sequences, per-process cross-session salt). Sign in for cloud AI.
Three-line quickstart
npm install -g superbased # install
superbased mcp # start MCP server
# add this to your AI editor's MCP config:
# { "superbased": { "command": "superbased", "args": ["mcp"] } }That's it. Your AI editor can now see your screen, read text from it, capture specific windows, record sessions, diff them visually, manage a capture gallery, and compress long text into token-efficient images — all without an account.
Sign in for cloud AI vision and dictation on top.
What you get without an account
Everything you need to give an AI eyes on your desktop, fully local:
| Capability | No account | Signed in | |---|:---:|:---:| | Screen capture (fullscreen / region / window targeting) | ✓ | ✓ | | OCR via local Tesseract (no data leaves the machine) | ✓ | ✓ | | Capture gallery with full-text search, tags, notes | ✓ | ✓ | | Token compression (text → optimized images) | ✓ | ✓ | | Recording sessions (smart change-detect, periodic, monitor) | ✓ | ✓ | | Visual diff & baseline regression testing | ✓ | ✓ | | Window list & specific-window capture (even minimized) | ✓ | ✓ | | Clipboard read/write (text + images) | ✓ | ✓ | | Annotate / redact captures | ✓ | ✓ | | Settings & instruction presets | ✓ | ✓ | | AI vision analysis (Claude / GPT / Gemini via backend proxy) | — | ✓ | | Voice dictation & cloud transcription | — | ✓ | | Frame-by-frame AI description + prose narration | — | ✓ | | Daily AI quota tracking | — | ✓ |
Local dictation (sherpa-onnx + native Windows/macOS engines) is on the near-term roadmap — see SuperBased issues.
"I want to..."
| Goal | Tool |
|---|---|
| See what's on the user's screen | superbased_capture_image |
| Capture a specific window (even minimized) | superbased_capture_image with window="Slack" |
| List all open windows | superbased_window_list |
| Read text from a screenshot | superbased_ocr (local) or superbased_ai with instruction="/extract" |
| Watch the screen for errors during a long agent run | superbased_recording with action=start mode=monitor |
| Compare two recording sessions for visual regressions | superbased_diff |
| Compress 50k tokens of logs into a single image | superbased_compress_text |
| See an image the user just copied | superbased_clipboard with action=readImage |
| Dictate via microphone (requires sign-in today) | superbased_dictate with mic=true |
| Hide secrets/PII before sharing a capture | superbased_redact |
How it compares
| | SuperBased | Peekaboo | Playwright MCP | computer-use-mcp | |---|---|---|---|---| | Screen capture | ✓ Win + macOS + Linux | ✓ macOS only | — | ✓ | | OCR (local) | ✓ Tesseract | — | — | — | | Window targeting | ✓ (incl. minimized) | partial | — | partial | | Recording sessions | ✓ smart / periodic / monitor | — | — | — | | Visual regression diff | ✓ | — | partial | — | | Token-efficient text compression | ✓ | — | — | — | | AI provider | model-agnostic (Claude/GPT/Gemini) | own keys | own keys | Anthropic only | | Works offline | ✓ for most tools (cloud AI optional) | partial | — | — | | Browser automation | — (use Playwright MCP) | — | ✓ | — |
SuperBased is the desktop layer. For browser-only flows, run Playwright MCP alongside it.
MCP setup
Stdio (Claude Code, Cursor, Windsurf, Cline, OpenCode, Zed)
{
"mcpServers": {
"superbased": {
"command": "superbased",
"args": ["mcp"]
}
}
}HTTP (OpenAI Codex)
Add to ~/.codex/config.toml:
[mcp_servers.superbased]
enabled = true
url = "http://127.0.0.1:47592/mcp"Then run superbased mcp in a terminal.
See the full setup guide for plugin install badges and 8-editor configuration.
Sign in
Sign-in unlocks cloud AI features (vision analysis, transcription, dictation, frame narration). Everything above the divider in the table works without an account.
superbased auth login # opens browser
superbased auth status # shows current state
superbased auth logout # clears stored tokenCLI commands
superbased Start headless server (default)
superbased serve Start server with options (--port N)
superbased mcp Start MCP stdio bridge (for AI editors)
superbased auth login Sign in via browser
superbased auth status Show authentication state
superbased auth logout Clear stored token
superbased capture Take a fullscreen screenshot
superbased --version Show version
superbased --help Show all commandsThe API server runs on http://127.0.0.1:47592 and exposes 72 MCP tools + 13 resources. See SUPERBASED_SKILL.md for the full per-tool reference (parameters, return shapes, examples, error codes, decision guide, plus a CAPTCHA-solving Common Workflows section).
All 72 tools
superbased_screenshot— preferred screenshot (window targeting + resolution control in one call)superbased_capture_image— advanced screenshot (region / explicit mode)superbased_capture— screenshot (metadata-only response)superbased_gallery_image— retrieve a saved image inlinesuperbased_window_list— list all open windows with active/minimized statesuperbased_display_list— enumerate all connected displays (virtual-screen bounds, DPI, color space)superbased_find_image— visual template match (locate a small PNG on screen)superbased_capture_template— capture a region as a reusable template (with DPI + theme metadata sidecar)
superbased_ai— AI vision analysis (Claude/GPT/Gemini via backend; slash commands/extract,/summarize,/code,/explain,/translate,/table,/edit,/reformat)superbased_ocr— local Tesseract OCR (no data leaves the machine)superbased_compress_text— text → token-efficient imagessuperbased_describe_frames— AI description per recording frame (sign-in)superbased_narrate— prose narrative summary of a session (sign-in)
superbased_recording— start / stop / pause / monitor (4 modes: interaction / periodic / smart change-detect / monitor with AI alerts)superbased_sessions— list past sessions, get framessuperbased_export— ZIP / Markdown / PDF / HTML / GIFsuperbased_diff— visual regression between two sessionssuperbased_baseline— set / get / history of workflow baselines
superbased_dictate— transcribe with cleanup (file, base64, or live mic; sign-in)superbased_transcribe— raw Whisper transcription (sign-in)superbased_dictation_history— past transcriptionssuperbased_stt_status— local STT engine probe (sherpa binary/model state, cloud signed-in, native availability, auto-resolved-to)
The agent's hands. All write tools accept humanize: 'off' | 'light' | 'human' | 'paranoid' per call (default 'light' via humanInputDefault setting). All gated behind guiAutomation.enabled (master toggle, default OFF) + per-action toggles + confirm: true flag.
superbased_click— click at coords OR resolved element (label / automationId / role+name); modifiers + clickCount (1/2/3) + button (left/right/middle); per-call humanizationsuperbased_type— type text into focused field (clearFirst, delayMs, ime for JKC, label-resolved targeting)superbased_hotkey— keyboard shortcut (auto-classifies PostMessage vs SendInput delivery)superbased_scroll— wheel / keys / auto; vertical or horizontal axis; per-page or per-ticksuperbased_drag— press → interpolate → release with modifier wrap; multi-waypoint pathssuperbased_hover— move + dwell for tooltips and hover-reveal UIssuperbased_pixel_color— read RGBA at coords with optional expected+tolerance matchsuperbased_mouse_position— current cursor coords (no synthesis)superbased_wait— sleep N ms server-side (no per-iteration approval cost)superbased_wait_for— server-side polling for window appearance, pixel match, AX statesuperbased_locate— resolve label/AX/template to screen coords without actingsuperbased_ui_dump— screenshot + OCR text elements + AX tree merged (every visible element with center coords) — flat element list, optimized for "find me a clickable thing"superbased_accessibility_tree— raw native AX/UIA tree (hierarchical) for tree traversal, parent/child queries, full attribute access. Requires optional native binding (@superbased/macos-ax/@superbased/win-uia)
superbased_sequence— preferred for any multi-step flow. ONE approval, ONE activation, N steps. Step types: click, type, hotkey, scroll, drag, hover, screenshot, wait, wait_for, assert, locate, ax_invoke, find_image, form_fill, dialog_handle, find_in_page, tab_management, virtual_desktop, tray_click. Always end with ascreenshotstep.superbased_scroll_to— scroll-until-found loop (one call; up to maxPages PageDown iterations)superbased_scroll_capture— multi-page scroll + frame stitching with calibration (Windows; the MCP equivalent of Cmd+Shift+A)superbased_find_title_bar_drag_region— pickup-safety helper for window-drag scenarios
superbased_ax_invoke— UIA pattern invocation (Invoke / Toggle / SelectionItem.Select / Value.SetValue / ExpandCollapse). Top of the AX reliability pyramid; bypasses synthesized clicks.superbased_form_fill— one-call macro over AX-resolve + ValuePattern.SetValue (fast path) or click + clearFirst + type (fallback). Optional Tab-between, optional Submit. Shared sessionId audit trail.superbased_dialog_handle— auto-detect modal foreground; confirm / dismiss / fill_pathsuperbased_context_menu_select— right-click + popup-detect + click matched menu item (one approval for the 4-call dance)superbased_drag_file— Windows scaffold (full OLE DoDragDrop deferred)
superbased_window_state— minimize / maximize / restore / close / always_on_top_on/off/toggle / move_to_displaysuperbased_window_bounds— read window rect + minimized/maximized state (PURE QUERY — no activation, no focus change). For "is this window in the top-left yet?" reasoning without disturbing foreground.superbased_resize_window— device presets (iphone-se / ipad-pro-13 / desktop-hd / etc.) + custom W×H + optional X/Y reposition;contentArea: truefor browser device-emulationsuperbased_focus_window— standalone focus (rare — never as setup before another tool; per-tool window targeting handles this in one approval)superbased_launch_app— spawn process withlaunchAllowlistgate (substring or sha256:hex pin)superbased_open_url— open URL in default or specific browser; optionalwaitForLoadsuperbased_tab_management— new / close / next / prev / switch_to (browser tab control via Ctrl+T/W/Tab/Shift+Tab/N)superbased_find_in_page— Ctrl+F + query + Enter / Shift+Enter + optional Escape (close-after)
superbased_tray_click— click a Windows system-tray icon by tooltip match (walks Shell_TrayWnd cross-process via VirtualAllocEx + ReadProcessMemory; optionalincludeOverflow)superbased_virtual_desktop— switch desktops (Win+Ctrl+Left/Right on Windows; Ctrl+Left/Right on macOS Spaces)superbased_doctor_gui_automation— hermetic probe (launches Notepad/TextEdit and runs every primitive, reports per-tool {ok, durationMs, errorCode})superbased_dry_run— cross-tool preview wrapper ({tool, args}returns{errorCode: 'DRYRUN'}without dispatching)superbased_replay— replay a range of~/.superbased/audit.logentries viasuperbased_sequence(default-safedryRun: true)superbased_undo_last— reverse the last action from this process's in-memory stack.typeundoes via Backspace count;click/hotkey/scroll/focus_windoware popped but returnnot_reversible(call again to walk back further).
superbased_project— "where am I" snapshot for agents: cwd, git branch + status + ahead/behind, detected framework(s), recent captures, active recording (if any), last dictation preview. Recommended first call for many agent sessions — gives enough context to pick the right tool without extra round-trips.superbased_workspace_sync— one-call bundled snapshot: fullscreen capture (base64 PNG) + OCR text (truncated 4000 chars) + active window metadata + timestamp + project dir. Replaces thecapture → ocr → window_listchain with a single round-trip. Read-only; does NOT persist to gallery.superbased_tools— progressive-disclosure dispatcher. WhenprogressiveDisclosure.enabled = truein settings, only 15 core tools appear intools/list; this tool reveals the hidden tools by category (capture / record / dictate / interact / window / layout / ax / ai / edit / clipboard / system). For trimming agent token cost without restricting access.
superbased_gallery— list / get / search / statssuperbased_gallery_update— update tags or notessuperbased_settings— view or change settingssuperbased_presets— manage AI instruction presetssuperbased_health— server status + capabilitiessuperbased_auth— sign-in statesuperbased_license— plan limits + device infosuperbased_ai_usage— daily AI quota statussuperbased_redact— auto-redact secrets / PIIsuperbased_annotate— rectangles, highlights, blur, text labels, arrowssuperbased_clipboard— read or write the system clipboard (text + images)
Plus 13 MCP resources for live status + subscribable event streams: superbased://status, settings, presets, captures/recent, auth, license, recording, recording/monitor, analytics/summary, ai/usage, plus three subscribable watcher streams (clipboard, active-window, file-watcher, screen-region).
Humanization v2 (default 'light', opt out via humanInputDefault: 'off') — Bezier-curved cursor approaches with sin-shaped velocity envelope, gaussian click-target jitter, gamma-distributed inter-key timing, click + key hold variation (50–110ms vs ~5ms atomic), pre-click settle dwell + tremor, optional typo+correct sequences (paranoid only), per-process cross-session salt mixed into seeds, inter-action catch-up pause. Defeats CAPTCHA-style trajectory and timing classifiers. Opt-in idle cursor drift via humanInputIdleDrift setting. CAPTCHA-solving workflow patterns documented in SUPERBASED_SKILL.md Common Workflows section.
Desktop app
Want the full GUI? Download the desktop app from superbased.app/download. Everything carries over automatically — captures, settings, auth, presets, recordings. Zero migration. Same data directory (~/.superbased/).
