n8n-nodes-omniparser
v0.1.0
Published
n8n node for OmniParser - AI-powered UI screenshot analysis for desktop automation
Downloads
15
Maintainers
Readme
n8n-nodes-omniparser
This is an n8n community node that integrates Microsoft's OmniParser for AI-powered UI screenshot analysis and desktop automation.
What is OmniParser?
OmniParser is a comprehensive method for parsing user interface screenshots into structured, interpretable elements. It uses computer vision and AI to:
- Detect interactive UI elements (buttons, text fields, icons, etc.)
- Generate functional descriptions for each element
- Provide precise coordinates for automation
- Enable vision-based GUI automation
Features
- 🎯 Accurate Element Detection: Identifies clickable and interactive regions with high precision
- 📊 Structured Output: Returns indexed elements with coordinates, descriptions, and metadata
- 🖼️ Annotated Images: Optionally returns screenshot with bounding boxes drawn
- ⚙️ Configurable Thresholds: Adjust detection sensitivity and box merging
- 🔄 Multiple Input Types: Supports binary data and base64 encoded images
Installation
In n8n
- Go to Settings > Community Nodes
- Click Install a community node
- Enter
n8n-nodes-omniparser - Click Install
Manual Installation
npm install n8n-nodes-omniparserPrerequisites
You need a running OmniParser API instance. Two options:
Option 1: Docker (Recommended)
# Clone the omniparser-api repository
git clone https://github.com/addy999/omniparser-api.git
cd omniparser-api
# Build the Docker image
docker build -t omni-parser-app .
# Run with GPU support
docker run --gpus all -p 7860:7860 omni-parser-appRequirements:
- NVIDIA GPU with CUDA support
- 16GB RAM minimum
- Docker with nvidia-docker2
Option 2: Docker Compose with n8n
Add to your docker-compose.yml:
services:
n8n:
image: n8nio/n8n
ports:
- "5678:5678"
networks:
- n8n-network
omniparser:
image: omni-parser-app
ports:
- "7860:7860"
networks:
- n8n-network
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
networks:
n8n-network:
driver: bridgeThen in n8n credentials, use: http://omniparser:7860
Credentials
Configure the OmniParser API credentials:
- API Base URL: Your OmniParser API endpoint (e.g.,
http://omniparser:7860for Docker, orhttp://localhost:7860for local)
Operations
Parse Screenshot
Analyzes a UI screenshot to detect interactive elements.
Input Parameters:
Input Type: How to provide the screenshot
- Binary Data: Use output from previous node
- Base64 String: Paste base64 encoded image
Box Threshold (0.0-1.0): Detection confidence threshold. Lower values detect more elements.
IOU Threshold (0.0-1.0): Controls merging of overlapping bounding boxes
Output Format:
- Structured: Array of parsed elements with coordinates
- Raw: Original API response
Include Annotated Image: Whether to include base64 annotated image with bounding boxes
Output (Structured Format):
{
"elementsCount": 3,
"elements": [
{
"index": 0,
"description": "Username text field",
"coordinates": [100, 200, 300, 250],
"centerX": 200,
"centerY": 225,
"width": 200,
"height": 50
},
{
"index": 1,
"description": "Password text field",
"coordinates": [100, 300, 300, 350],
"centerX": 200,
"centerY": 325,
"width": 200,
"height": 50
},
{
"index": 2,
"description": "Login button",
"coordinates": [150, 400, 250, 450],
"centerX": 200,
"centerY": 425,
"width": 100,
"height": 50
}
],
"annotatedImageBase64": "iVBORw0KGgoAAAANSUhEUg..."
}Example Workflow
Desktop Automation with OmniParser
- Screenshot Node → Captures desktop screenshot
- OmniParser → Analyzes screenshot
- Filter → Find element by description (e.g., "Login button")
- Desktop Control Node → Click at coordinates
- Desktop Control Node → Type text
Screenshot → OmniParser → Filter ("Username field") → Click → Type "myusername"Use Cases
- 🖥️ Desktop Automation: Automate any GUI application
- 🤖 RPA (Robotic Process Automation): Automate repetitive desktop tasks
- 🧪 UI Testing: Automatically test application interfaces
- 📸 Screen Analysis: Extract structured data from screenshots
- 🎮 Game Automation: Detect and interact with game UI elements
- 📊 Data Entry: Fill forms across any application
Companion Nodes
For complete desktop automation, combine with:
- n8n-nodes-desktop-control (coming soon): Mouse/keyboard control with PyAutoGUI
- n8n-nodes-screenshot (coming soon): Cross-platform screenshot capture
Troubleshooting
"Failed to connect to OmniParser API"
- Check that OmniParser container is running:
docker ps - Verify the API URL in credentials matches your setup
- Test API manually:
curl http://localhost:7860/docs
"Detection not finding elements"
- Lower the Box Threshold (try 0.03 or 0.02)
- Ensure screenshot is clear and high resolution
- Check that UI elements are visible and not obscured
"Out of memory errors"
- OmniParser requires 16GB RAM minimum
- Enable swap if needed
- Close other GPU-intensive applications
Resources
License
MIT
Author
Alf-David Heermann ([email protected])
