@siteed/capture-helper
v0.2.1
Published
Generic window-capture CLI for window discovery, capture, recording, and agent evidence streams. macOS (ScreenCaptureKit) and Linux/X11 (XComposite + ffmpeg).
Maintainers
Readme
@siteed/capture-helper
Generic window capture helper for agents, automation tools, and evidence pipelines — macOS (ScreenCaptureKit) and Linux/X11 (XComposite + ffmpeg).
On macOS, capture-helper is a small Swift CLI built on ScreenCaptureKit: it discovers windows, captures exact window targets, streams H.264 Annex B bytes, and records MP4 evidence with a native AVFoundation writer (no ffmpeg). On Linux it reproduces the same CLI/protocol over X11 using an XComposite grabber plus ffmpeg — see Linux support.
Status
Early standalone extraction from Farmslot's internal tools/capture-helper.
The CLI is intentionally generic. Product-specific concepts such as Farmslot slots, iOS simulator aliases, MetaMask runners, or recipe semantics belong in the caller.
Requirements
- macOS 13.0+
- Xcode command-line tools / Swift toolchain
- Screen Recording permission for the terminal or parent app
- Optional:
ffmpegfor external workflows that still want it; macOSrecordandrecord --framedsession snapshots are native.
Linux support
On Linux the same CLI/protocol is implemented by a Node backend that drives a small
C grabber (XComposite per-window capture) and ffmpeg for H.264 encoding. The
capture-helper command auto-dispatches to it on Linux; commands, flags, the framed
stream protocol, and JSON events are identical to macOS (see docs/protocol.md).
Requirements:
X11 (Xorg) session. GNOME-on-Wayland per-window capture is portal-gated and not automatable — log into an Xorg session (e.g. "Ubuntu on Xorg", or set
WaylandEnable=falsein/etc/gdm3/custom.conf). A graphical desktop must be logged in (the GDM greeter has no app windows; use autologin for headless nodes).Captured apps must be X11 clients (Wayland-native surfaces are invisible to the X path). Force X11 where needed:
GDK_BACKEND=x11,QT_QPA_PLATFORM=xcb, Electron--ozone-platform=x11.Build/runtime deps (
ffmpegis required on Linux):sudo apt install -y gcc ffmpeg libx11-dev libxcomposite-dev libxdamage-dev libxfixes-dev libxext-dev # or run the helper: bash scripts/setup-linux-node.sh
Hardware H.264 encoding is opt-in via --encoder h264_nvenc (or
CAPTURE_HELPER_ENCODER=h264_nvenc); the default libx264 works everywhere.
Verify readiness with capture-helper doctor --json. On Linux, id values are X11
window ids (XIDs).
Install
Use with npx without installing globally:
npx -y @siteed/capture-helper@latest doctor --json
npx -y @siteed/capture-helper@latest list --jsonInstall globally with npm:
npm install -g @siteed/capture-helper
capture-helper doctor --jsonDownload the native release binary directly:
curl -L https://github.com/deeeed/capture-helper/releases/latest/download/capture-helper-darwin-arm64 \
-o /usr/local/bin/capture-helper
chmod +x /usr/local/bin/capture-helperBuild from source
swift build -c release
# or
npm run build:nativeThe npm build script copies the release binary to:
native/capture-helperWhen installed as an npm package, postinstall builds the native backend for the current platform if it's missing: on macOS it builds the Swift binary (native/capture-helper); on Linux it compiles the X11 grabber (native/x11-grabber) via gcc. If the toolchain or dev headers are absent it prints the install command and skips (the install never hard-fails). Set SITEED_CAPTURE_HELPER_SKIP_POSTINSTALL=1 to skip that step.
Commands
# Human default: list likely capturable windows
capture-helper
capture-helper -l
# Version / provenance
capture-helper version
capture-helper --version
# Environment readiness and permissions diagnostics
capture-helper doctor --json
# includes stable codes like screen_recording_denied, window_server_unavailable, ffmpeg_missing
# Request/open macOS Screen Recording permissions where possible
capture-helper permissions
capture-helper doctor --open-permissions --json
# List windows as a machine-readable JSON object
capture-helper list --json
# Human-readable table
capture-helper list --human
capture-helper -l
capture-helper list -h
capture-helper list --on-screen --capturable --human
capture-helper list --all --human
# Legacy JSON-lines listing
capture-helper --list-windows
capture-helper list --json-lines
# Capture a specific target as raw H.264 Annex B
capture-helper capture --window-id 12345 > /tmp/capture.h264
capture-helper capture --pid 12345 > /tmp/capture.h264
capture-helper capture --app-name Simulator --window-name "mm-1" > /tmp/capture.h264
# Legacy capture syntax remains supported
capture-helper --window-name "Simulator" > /tmp/capture.h264
# Framed multi-window stream with stdin control
capture-helper stream --framed --window-id 12345 > /tmp/windows.h264
# Record MP4 evidence (macOS: native AVFoundation, no ffmpeg; Linux: ffmpeg)
capture-helper record --window-id 12345 --duration 5 --output evidence.mp4 --open
# Record MP4 plus PNG proof frames from the same macOS native recording stream
{ sleep 1; echo "snapshot step-1.png"; echo stop; } | \
capture-helper record --framed --window-id 12345 --output evidence.mp4Resolve and snapshot
resolve lets agents debug target selection before starting video capture:
capture-helper resolve --app-name "Google Chrome" --window-name "MetaMask"It returns the selected window, selector type, and all candidates considered for that selector.
snapshot captures a one-frame PNG using the same target selectors:
capture-helper snapshot --window-id 12345 --output screenshot.pngTarget selectors
Prefer selectors in this order:
--window-idfromcapture-helper list --jsonfor exact capture.--pidwhen the caller owns the process tree and wants the largest suitable window.--app-name+--window-namefor human-friendly fallback matching.--window-namealone only for ad hoc use.
npm wrapper
This package exposes a Node wrapper so JavaScript-based agents can call the native binary through a normal bin entry:
node bin/capture-helper.js doctor --jsonThe wrapper resolves the binary in this order:
SITEED_CAPTURE_HELPER_BINnative/capture-helper.build/release/capture-helper/opt/homebrew/bin/capture-helper/usr/local/bin/capture-helper
Output contract
- raw capture stdout: H.264 Annex B byte stream
- 4-byte start codes:
00 00 00 01 - SPS/PPS emitted before keyframes
- baseline profile, no B-frames
- 4-byte start codes:
list/doctor/version: JSON on stdout by default- streaming/capture diagnostics: JSON lines on stderr
- command failures: JSON error lines on stderr with stable
codevalues such astarget_requiredandwindow_not_found - signal handling:
SIGINT/SIGTERMperform cleanup for direct capture;record --durationstops automatically
See docs/protocol.md for the framed stream and event contract.
Permissions
Grant Screen Recording permission to the terminal app, IDE, or agent host that launches the helper:
System Settings → Privacy & Security → Screen & System Audio Recording
After granting permission, restart the launching app. Use this to check readiness:
capture-helper doctor --jsonIf you see Code=-3801 / screen_recording_denied, especially over SSH, see docs/troubleshooting.md.
Integration principle
Keep this tool generic:
- good: window IDs, PIDs, app names, window titles, capture formats, diagnostics
- bad: Farmslot slots, project resources, simulator naming conventions, MetaMask-specific selectors
Higher-level tools should resolve their domain objects to a concrete macOS window target, then call capture-helper.
Development
swift build -c release
swift test # includes subprocess CLI/error-shape tests
npm run build:native
npm run doctor
npm pack --dry-runRelease
See docs/release.md.
License
MIT
