@abyssbug/vision-mcp
v0.1.0
Published
Local, API-free Vision MCP server for image and video analysis
Maintainers
Readme
@abyssbug/vision-mcp
Local, API-free Vision MCP server for image and video analysis. Works with any MCP-compatible client (OpenCode, Claude, etc.) without requiring external API keys.
Features
- Image Analysis: Process images with optional resizing/compression
- Video Analysis: Extract frames using ffmpeg with uniform or scene-based sampling
- No API Keys: Works entirely locally with your chosen model
- Provider Agnostic: Compatible with GLM 4.6/4.5, Claude, and other vision-capable models
Installation
For OpenCode/Claude Desktop
Add to your MCP configuration:
{
"mcpServers": {
"vision-mcp": {
"command": "npx",
"args": [
"-y",
"@abyssbug/vision-mcp"
]
}
}
}Prerequisites
- Node.js >= 22.0.0
- ffmpeg and ffprobe (for video analysis)
Install ffmpeg:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
# Windows
# Download from https://ffmpeg.org/download.htmlUsage
Image Analysis
Call the image_analysis tool with:
{
"path": "./image.png",
"maxWidth": 1024
}Video Analysis
Call the video_analysis tool with:
{
"path": "./video.mp4",
"maxFrames": 12,
"width": 1024,
"strategy": "uniform"
}Configuration
Set environment variables for limits (optional):
MAX_BYTES=52428800 # Max file size (default: 50MB)
FRAME_LIMIT=24 # Max frames per video (default: 24)
DEFAULT_WIDTH=1024 # Default resize width (default: 1024)
TEMP_DIR=/tmp # Temp directory (default: system temp)How It Works
- No Inference: This MCP only preprocesses media (resize, extract frames)
- Model Agnostic: Your chosen model performs the actual vision understanding
- Local Processing: All operations happen locally with ffmpeg and sharp
- Base64 Output: Returns processed media as base64-encoded content
Tools
image_analysis
- Validates and optionally resizes images
- Returns base64-encoded image content
- Supports local paths and URLs
video_analysis
- Extracts frames using ffmpeg
- Supports uniform and scene-based sampling
- Returns multiple base64-encoded frame images
- Includes timestamps for each frame
License
MIT
