@tmhs/screencast-mcp
v0.8.12
Published
MCP server for Windows screen recording, frame sampling, and minimal ffmpeg edits
Maintainers
Readme
Screencast MCP
An MCP server that lets an agent record the screen, watch the footage, and make minimal ffmpeg edits — over stdio, on Windows.
Capture the desktop, a monitor, a window, or a region; sample a recording into frames the agent can actually look at; trim, crop, scale, overlay, compress, redact, and convert. No cloud, no streaming — ffmpeg and ffprobe wrapped behind nineteen typed tools.
[!NOTE] Screen capture uses
gdigraband is Windows-only; the watch, edit, and produce tools work anywhere ffmpeg runs. The full pipeline is in place: capture, watch, the edit surface,redact_regionsafety redaction, system-audio capture, and the produce layer (transitions, title cards, music bed, aspect variants, platform export). Cross-platform capture backends are next — see ROADMAP.md.
Overview
Screencast MCP gives an agent a screen recorder it can drive and reason about. It speaks Model Context Protocol over stdio and exposes ffmpeg as a small, typed tool surface rather than a flag soup. The defining capability is the watch loop: an agent records footage, samples it into PNG frames, and views those frames to confirm what actually happened on screen.
The design choices are deliberate rather than incidental:
- Capture is always explicit. A recording or screenshot happens only on a tool call — nothing auto-fires and there is no background or scheduled capture.
- Footage is made viewable.
sample_framesturns a video into images an agent can open, so "watch what happened" is a first-class operation, not an afterthought. - Presets over raw flags. Quality is
draft/standard/high; the agent never reasons about codecs, CRF, or pixel formats. - Safe by default. Output lands under
SCREENCAST_HOME, never inside a project checkout, and the public repo's.gitignoreblocks captured media from being committed. - Crash-safe sessions. A recording interrupted by a crash is reconciled on the next start (orphan reaping), so no ffmpeg child silently outlives the server.
Tools
Twenty-five tools across four concerns. The manifest in mcp-tools.json is the canonical surface and is kept in sync with src/tools/.
Capture
| Tool | Purpose |
| --- | --- |
| start_recording | Start a background recording. target = full | monitor:<index> | window:<title> | region:<x>,<y>,<w>,<h>; optional fps, quality, and audio (set audio.source = system to also capture loopback audio). Returns a session id and output path. |
| stop_recording | Stop a session by id. Sends ffmpeg a graceful quit so the file is finalized, not truncated. Returns the final path and duration. |
| list_sessions | List active and finished sessions. |
| get_session | Inspect a single session by id. |
| screenshot | Capture a single PNG of a target. |
| list_audio_devices | List the DirectShow audio devices ffmpeg can see and flag a likely system-audio loopback device for start_recording. |
Watch
| Tool | Purpose |
| --- | --- |
| sample_frames | Extract frames from a video — at a fixed fps or at explicit timestamps — so the agent can view what happened. Returns the frame paths. |
| get_media_info | ffprobe wrapper: duration, resolution, fps, codecs, container format, and size. |
Minimal edit
| Tool | Purpose |
| --- | --- |
| trim | Cut a sub-clip by start + (end or duration). Stream-copy for speed. |
| concat | Join two or more videos into one. |
| convert | Convert between mp4, gif, and webm. |
Edit surface
These tools re-encode (a filter rewrites pixels, so stream copy does not apply). They reuse the same draft/standard/high presets as capture.
| Tool | Purpose |
| --- | --- |
| crop | Crop to a pixel rectangle (x, y, width, height). A rectangle that runs off the frame is rejected, not silently clamped. |
| scale | Resize to a width and/or height. One side keeps the aspect ratio; both set an exact size. |
| speed | Change playback speed by a factor (>1 faster, <1 slower). Audio is retempo'd when present. |
| overlay | Composite a logo, watermark, or picture-in-picture onto a base video at a position, optionally scaled and time-limited. |
| compress | Re-encode smaller with a light/medium/heavy CRF ladder and an optional maxWidth that only downscales. |
| extract_audio | Write the audio track to its own file (mp3, aac, wav, or copy). |
| clip | Extract one or more frame-accurate sub-segments to separate files. Unlike trim, it re-encodes so cuts land exactly on the given times. |
Redact
| Tool | Purpose |
| --- | --- |
| redact_region | Cover declared rectangles in a video. style is box (default, a solid irreversible fill), blur, or pixelate; each region may be limited to a start/end window and expanded with pad. |
[!IMPORTANT]
redact_regioncovers the regions you declare. It is not automatic secret detection, so it cannot find a secret you did not point it at. The defaultboxstyle is a solid fill, which is irreversible;blurandpixelateare softer but can be partially recovered, so preferboxfor real secrets. A region that falls outside the frame is rejected rather than silently doing nothing.
Produce
The production layer turns raw clips into a finished piece. Tools that combine clips auto-normalize each input to a common resolution, fps, and audio rate first, so heterogeneous sources compose cleanly.
| Tool | Purpose |
| --- | --- |
| xfade_transition | Crossfade two videos into one with an xfade transition (fade, wipeleft, slideup, ...). Audio is crossfaded when both clips have a track. |
| assemble_highlights | Stitch two or more clips into one, with hard cuts (transition: "cut") or an xfade transition between each. |
| title_card | Generate a standalone title card (centered text on a solid background, with a silent track so it composes). Text uses a bundled font, so no system font is needed. |
| music_bed | Lay a music track under a video: looped/trimmed to length, faded in and out, leveled, and mixed with any existing audio (optionally ducked). |
| reframe | Re-aspect to 16:9, 9:16, 1:1, or 4:5 with pad (letterbox, no content lost) or crop (fill, trims overflow). |
| export_preset | Encode a platform-ready file (youtube, instagram_reel, tiktok, x, square): reframes, caps fps, and encodes H.264 at the platform bitrate with faststart. |
Targets
Every capture tool takes a single target string, so an agent never has to juggle quoting:
| Target | Captures |
| --- | --- |
| full | The whole virtual desktop |
| monitor:<index> | One display; 0 is always primary |
| window:<title> | The on-screen rectangle a window occupies (case-insensitive exact title, else substring; topmost wins) |
| region:<x>,<y>,<w>,<h> | An absolute pixel rectangle |
Output is written under SCREENCAST_HOME (default <homedir>/.screencast-mcp) into recordings/, frames/, screenshots/, and edits/. Any tool also accepts an explicit output path.
Prerequisites
ffmpeg and ffprobe are external dependencies and must be on PATH (or pointed at via the FFMPEG_PATH / FFPROBE_PATH environment variables). The server detects them per call and returns a clear error with an install hint if either is missing.
| Platform | Install |
| --- | --- |
| Windows | winget install Gyan.FFmpeg or choco install ffmpeg |
| macOS | brew install ffmpeg |
| Linux | apt install ffmpeg |
Installation
npm install -g @tmhs/screencast-mcpOr run it from a clone:
git clone https://github.com/TMHSDigital/screencast-mcp.git
cd screencast-mcp
npm install
npm run build # produces dist/index.js, the server entry pointMCP client configuration
{
"mcpServers": {
"screencast": {
"command": "npx",
"args": ["-y", "@tmhs/screencast-mcp"]
}
}
}[!TIP] Running from a clone instead of the published package? Point the client straight at the build:
"command": "node","args": ["C:/path/to/screencast-mcp/dist/index.js"].
Usage
A typical watch loop — record, sample, look:
// 1. record a region for a few seconds
start_recording { "target": "region:0,0,1280,720", "quality": "draft" }
// -> { "sessionId": "rec-…", "outputPath": "…/recordings/rec-….mp4" }
// 2. finalize the file
stop_recording { "sessionId": "rec-…" }
// -> { "durationSec": 4.2, "finalizedGracefully": true }
// 3. turn it into frames the agent can view
sample_frames { "input": "…/recordings/rec-….mp4", "timestamps": [0.5, 2, 3.5] }
// -> { "frames": ["…/frames/…/frame_000_0.5s.png", …] }screenshot { "target": "window:My App" } is the one-shot equivalent for a still.
Windows notes
- Multi-monitor offsets.
gdigrabhas no "capture monitor N" selector, so a monitor target captures the whole virtual desktop and crops to that display's pixel bounds (fromSystem.Windows.Forms.Screen.AllScreens).monitor:1grabs the second display at its real offset;monitor:0is always primary. - Window capture resolves the window to the on-screen rectangle it occupies and captures that, rather than the window's own surface —
gdigrab's nativetitle=grab returns a blank frame for GPU- or DirectComposition-composited windows (Chrome, Electron, UWP). The window must be visible, on top, and not minimized (a minimized window is rejected with a clear error), the capture includes anything drawn over that rectangle, and forstart_recordingthe rectangle is fixed once at start. True per-window background capture (Windows Graphics Capture API) is a future phase. - Fullscreen-exclusive apps often produce black frames under
gdigrab; run the source in borderless-windowed mode for reliable capture. - System audio needs a loopback device.
gdigrabis video-only, sostart_recordingwithaudio.source=systemcaptures from a separate dshow input. Windows has no native loopback, so this needs a virtual-audio device (enable Stereo Mix, or install a driver such as screen-capture-recorder'svirtual-audio-captureror VB-CABLE). Runlist_audio_devicesto find it. Microphone capture is intentionally not supported.
Threat model
[!WARNING] Screen capture can record anything on screen at the moment of capture — passwords, tokens, private messages, and other secrets.
window:captures the screen rectangle a window occupies, so overlays, notifications, or another window drawn over it are captured too. Treat recordings, screenshots, and sampled frames as sensitive by default.
- Capture is always explicit. Nothing auto-fires; capture is gated behind an explicit tool call.
- Output stays local. Files are written to the local filesystem only — never uploaded, streamed, or transmitted anywhere.
- Public repo, private captures. The
.gitignoreblocks recordings, frames, screenshots, and common video/image output so test media cannot be committed by accident. - Review before sharing. Sample frames or inspect a recording before handing a file to another tool or person, so you know what it contains.
- Redaction is declared, not detected.
redact_regioncovers only the rectangles you specify, so it depends on you (or the agent) having found the secret first. Use the defaultboxstyle for a solid irreversible fill, and still review the output before sharing. - System audio is sensitive too. When
audio.source=system, the recording captures everything playing on the machine (call audio, notifications, media). Treat audio-bearing recordings with the same care as the video.
Project structure
.
├── src/
│ ├── index.ts # MCP server entry (stdio); registers every tool, reaps orphans
│ ├── context.ts # shared session-store singleton
│ ├── tools/ # one file per tool (capture, watch, edit)
│ ├── utils/ # ffmpeg, monitors, windows, sessions, paths, targets
│ └── __tests__/ # vitest unit tests + guarded local-capture harness
├── docs/ # GitHub Pages documentation site
├── mcp-tools.json # canonical tool manifest (kept in sync with src/tools)
├── .github/workflows/ # CI, release, npm publish, Pages, ecosystem drift check
├── ROADMAP.md · CONTRIBUTING.md · SECURITY.md · LICENSE
└── package.jsonDevelopment
npm install
npm run build # tsc -> dist/
npm test # vitest (pure unit tests; no ffmpeg or display required)
npm run dev # tsx watchThe capture path can't be exercised on CI's headless Linux runners, so an end-to-end harness lives behind a flag and is skipped by default:
RUN_LOCAL_CAPTURE_TESTS=1 npm test # Windows + ffmpeg + a real displayContributing
Issues and pull requests are welcome — see CONTRIBUTING.md and the Code of Conduct. Security reports go through SECURITY.md.
License
Released under the CC-BY-NC-ND-4.0 license.
Documentation · Roadmap · Report an issue · License
Built by TMHSDigital · Back to top ↑
