visual-observer
v0.2.0
Published
MCP server that translates browser visual state into structured text for Claude
Maintainers
Readme
Visual Observer
MCP server that connects to a running web app via Playwright and translates browser state into structured text — enabling Claude to "see" and reason about what's happening on screen.
What It Does
- Launches a browser or connects to an existing one via Chrome DevTools Protocol
- Extracts Redux state directly from the running app
- Parses screenshots with OmniParser v2 (YOLOv8 + Florence-2) into structured text descriptions
- Detects anomalies by comparing visual state against data state
- Captures raw screenshots as PNG images
- Exposes everything as MCP tools that any Claude Code session can call
Setup
Install from npm
npm install -g visual-observerOr run directly with npx
npx -y visual-observerAdd to Claude Code
Add to your project's .mcp.json or use the CLI:
claude mcp add --transport stdio --scope user visual-observer -- npx -y visual-observerOr manually in .mcp.json:
{
"mcpServers": {
"visual-observer": {
"command": "npx",
"args": ["-y", "visual-observer"]
}
}
}Expose your Redux store
In your app's store initialization, add:
if (typeof window !== "undefined") {
(window as any).__REDUX_STORE__ = store;
}OmniParser Setup (Optional — for visual parsing)
The get_visual_state and get_full_report tools require OmniParser v2 running as a Docker service:
cd /path/to/visual-observer
docker compose up -dRequirements:
- Docker with nvidia-container-toolkit
- NVIDIA GPU with 12GB+ VRAM
- First build downloads ~300MB of model weights
The MCP server works without OmniParser — get_page_state, get_screenshot, connect_browser, disconnect_browser, and list_pages all function independently. The get_full_report tool degrades gracefully to data-only mode when OmniParser is unavailable.
MCP Tools
| Tool | Parameters | Description |
|------|-----------|-------------|
| connect_browser | mode (launch/cdp), url?, cdpEndpoint?, headless? | Launch a browser or connect to existing via CDP |
| disconnect_browser | — | Close the browser session |
| get_page_state | path?, slice?, maxDepth? | Extract Redux state (full or specific slice) |
| get_screenshot | fullPage? | Capture a PNG screenshot |
| get_visual_state | — | Screenshot + OmniParser = structured text description of all UI elements |
| get_full_report | slice?, path?, maxDepth? | Visual state + Redux data + anomaly detection in one report |
| list_pages | — | List all open pages with URLs and titles |
Usage Example
1. connect_browser → mode: "launch", url: "http://localhost:5173", headless: false
2. get_page_state → slice: "resources"
3. get_visual_state (requires OmniParser)
4. get_full_report → slice: "progression" (requires OmniParser for full output)
5. disconnect_browserArchitecture
Game (browser)
|
v
Playwright (browser session)
|
├── Screenshot capture
| |
| v
| OmniParser v2 (Docker, GPU)
| - YOLOv8: UI element detection
| - Florence-2: semantic descriptions
| - Output: structured element list
|
├── Redux state extraction
| page.evaluate(() => store.getState())
|
├── Anomaly detection
| Visual state vs data state comparison
|
v
MCP Server (stdio transport)
- 7 tools Claude can callDevelopment
git clone https://github.com/attilakiss9000/visual-observer.git
cd visual-observer
npm install
npm run dev # Run in dev mode
npm run build # Compile TypeScript
npm run test # Run testsTech Stack
- Node.js + TypeScript (MCP server)
- Playwright (browser automation)
- @modelcontextprotocol/sdk (MCP protocol)
- Zod (schema validation)
- Python + FastAPI (OmniParser service)
- YOLOv8 + Florence-2 (visual parsing)
- Docker (OmniParser containerization)
License
MIT
