npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

n8n-nodes-omniparser

v0.1.0

Published

n8n node for OmniParser - AI-powered UI screenshot analysis for desktop automation

Downloads

15

Readme

n8n-nodes-omniparser

This is an n8n community node that integrates Microsoft's OmniParser for AI-powered UI screenshot analysis and desktop automation.

What is OmniParser?

OmniParser is a comprehensive method for parsing user interface screenshots into structured, interpretable elements. It uses computer vision and AI to:

  • Detect interactive UI elements (buttons, text fields, icons, etc.)
  • Generate functional descriptions for each element
  • Provide precise coordinates for automation
  • Enable vision-based GUI automation

Features

  • 🎯 Accurate Element Detection: Identifies clickable and interactive regions with high precision
  • 📊 Structured Output: Returns indexed elements with coordinates, descriptions, and metadata
  • 🖼️ Annotated Images: Optionally returns screenshot with bounding boxes drawn
  • ⚙️ Configurable Thresholds: Adjust detection sensitivity and box merging
  • 🔄 Multiple Input Types: Supports binary data and base64 encoded images

Installation

In n8n

  1. Go to Settings > Community Nodes
  2. Click Install a community node
  3. Enter n8n-nodes-omniparser
  4. Click Install

Manual Installation

npm install n8n-nodes-omniparser

Prerequisites

You need a running OmniParser API instance. Two options:

Option 1: Docker (Recommended)

# Clone the omniparser-api repository
git clone https://github.com/addy999/omniparser-api.git
cd omniparser-api

# Build the Docker image
docker build -t omni-parser-app .

# Run with GPU support
docker run --gpus all -p 7860:7860 omni-parser-app

Requirements:

  • NVIDIA GPU with CUDA support
  • 16GB RAM minimum
  • Docker with nvidia-docker2

Option 2: Docker Compose with n8n

Add to your docker-compose.yml:

services:
  n8n:
    image: n8nio/n8n
    ports:
      - "5678:5678"
    networks:
      - n8n-network

  omniparser:
    image: omni-parser-app
    ports:
      - "7860:7860"
    networks:
      - n8n-network
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

networks:
  n8n-network:
    driver: bridge

Then in n8n credentials, use: http://omniparser:7860

Credentials

Configure the OmniParser API credentials:

  • API Base URL: Your OmniParser API endpoint (e.g., http://omniparser:7860 for Docker, or http://localhost:7860 for local)

Operations

Parse Screenshot

Analyzes a UI screenshot to detect interactive elements.

Input Parameters:

  • Input Type: How to provide the screenshot

    • Binary Data: Use output from previous node
    • Base64 String: Paste base64 encoded image
  • Box Threshold (0.0-1.0): Detection confidence threshold. Lower values detect more elements.

  • IOU Threshold (0.0-1.0): Controls merging of overlapping bounding boxes

  • Output Format:

    • Structured: Array of parsed elements with coordinates
    • Raw: Original API response
  • Include Annotated Image: Whether to include base64 annotated image with bounding boxes

Output (Structured Format):

{
  "elementsCount": 3,
  "elements": [
    {
      "index": 0,
      "description": "Username text field",
      "coordinates": [100, 200, 300, 250],
      "centerX": 200,
      "centerY": 225,
      "width": 200,
      "height": 50
    },
    {
      "index": 1,
      "description": "Password text field",
      "coordinates": [100, 300, 300, 350],
      "centerX": 200,
      "centerY": 325,
      "width": 200,
      "height": 50
    },
    {
      "index": 2,
      "description": "Login button",
      "coordinates": [150, 400, 250, 450],
      "centerX": 200,
      "centerY": 425,
      "width": 100,
      "height": 50
    }
  ],
  "annotatedImageBase64": "iVBORw0KGgoAAAANSUhEUg..."
}

Example Workflow

Desktop Automation with OmniParser

  1. Screenshot Node → Captures desktop screenshot
  2. OmniParser → Analyzes screenshot
  3. Filter → Find element by description (e.g., "Login button")
  4. Desktop Control Node → Click at coordinates
  5. Desktop Control Node → Type text
Screenshot → OmniParser → Filter ("Username field") → Click → Type "myusername"

Use Cases

  • 🖥️ Desktop Automation: Automate any GUI application
  • 🤖 RPA (Robotic Process Automation): Automate repetitive desktop tasks
  • 🧪 UI Testing: Automatically test application interfaces
  • 📸 Screen Analysis: Extract structured data from screenshots
  • 🎮 Game Automation: Detect and interact with game UI elements
  • 📊 Data Entry: Fill forms across any application

Companion Nodes

For complete desktop automation, combine with:

  • n8n-nodes-desktop-control (coming soon): Mouse/keyboard control with PyAutoGUI
  • n8n-nodes-screenshot (coming soon): Cross-platform screenshot capture

Troubleshooting

"Failed to connect to OmniParser API"

  • Check that OmniParser container is running: docker ps
  • Verify the API URL in credentials matches your setup
  • Test API manually: curl http://localhost:7860/docs

"Detection not finding elements"

  • Lower the Box Threshold (try 0.03 or 0.02)
  • Ensure screenshot is clear and high resolution
  • Check that UI elements are visible and not obscured

"Out of memory errors"

  • OmniParser requires 16GB RAM minimum
  • Enable swap if needed
  • Close other GPU-intensive applications

Resources

License

MIT

Author

Alf-David Heermann ([email protected])