@aspiresys/visor
v1.4.6
Published
Desktop visual automation framework using OpenCV, OCR, and desktop interaction APIs.
Downloads
4,443
Maintainers
Readme
Visor
Desktop Visual Automation Framework for Node.js and TypeScript.
Visor is a visual desktop automation framework that combines:
- OpenCV image matching
- OCR text recognition
- Mouse & keyboard automation
- Desktop application automation
Visor is designed for automating desktop workflows using visual interactions instead of traditional DOM/browser automation.
Features
- OpenCV-based image matching
- Multi-scale image matching
- OCR automation using Tesseract
- OCR occurrence indexing (beta)
- Region OCR support
- Automatic display scaling detection
- Mouse automation
- Region-based mouse automation
- Region OCR support
- Region-based mouse automation
- Target offset support
- Keyboard automation
- Drag & drop support
- Screenshot capture
- Desktop application automation
- OCR text searching
- Wait APIs
- Multi-image matching
- Config-driven initialization
- High-DPI display scaling support
What's New in 1.4.x
- Automatic resolution-aware template matching
- Template metadata support (.properties.json)
- visor.version()
- Region.capture()
- Region.waitAnyImg()
- Region.clickAny()
- Improved DPI-aware matching
- Faster image matching using predicted scaling
Installation
npm install @aspiresys/visorRequirements
- Windows
- Node.js 18+
- TypeScript
Visor Inspector
Visor includes an optional desktop Inspector tool for:
- Capturing templates
- Testing image matches
- Measuring screen coordinates
- Validating confidence thresholds
Run:
npx visor-inspectorTemplate Metadata
Visor Inspector automatically creates a .properties.json file alongside captured templates.
Example:
save.png save.properties.json
Metadata includes:
- Captured resolution
- Display scaling factor
- Capture environment
Visor uses this metadata to predict the correct image scale during automation, significantly improving matching speed and reliability across machines.
Quick Start
import { visor, Region } from '@aspiresys/visor';
async function main() {
visor.loadConfig({
imagePath: './images',
debug: true,
});
await visor.openApp('notepad');
await visor.wait('notepad.png');
await visor.click('notepad.png');
await visor.type('Hello from Visor');
}
main();Configuration
visor.loadConfig({
imagePath: './images',
debug: true,
});Configuration Options
| Option | Description | | ------------ | ---------------------------------------- | | scaleFactor | Optional manual display scaling override | | imagePath | Default image directory | | outputPath | Screenshot output directory | | debug | Enable debug logging |
Display Scaling
Visor automatically detects Windows display scaling and adjusts mouse coordinates accordingly.
Common scaling values:
| Scaling | Value | | ------- | ----- | | 100% | 1.0 | | 125% | 1.25 | | 150% | 1.5 | | 175% | 1.75 | | 200% | 2.0 |
Manual override is still supported:
visor.loadConfig({
scaleFactor: 1.5,
});Multi-Scale Image Matching
Visor automatically performs multi-scale template matching to support:
- Different Windows scaling settings
- Different screen resolutions
- High-DPI displays
- Cross-machine execution
By default Visor evaluates templates across multiple scale levels and automatically selects the best match.
Supported environments include:
- 050% scaling
- 075% scaling
- 100% scaling
- 125% scaling
- 150% scaling
- 175% scaling
- 200% scaling
This significantly improves image matching reliability when automation is executed across different machines.
Visual Automation APIs
Click Image
await visor.click('save.png');Find Image
const region = await visor.find('icon.png');Region-Based Automation
Regions can be obtained from:
- visor.find()
- visor.findAll()
- visor.findText()
- Visor Inspector
Move To Region
const region = await visor.find('save.png');
await visor.moveToRegion(region);Click Region
const region = await visor.find('save.png');
await visor.clickRegion(region);Double Click Region
await visor.doubleClickRegion(new Region(100, 200, 150, 50));Right Click Region
await visor.rightClickRegion(new Region(100, 200, 150, 50));Display scaling is automatically applied when using region-based APIs.
Region Object API
Regions returned by Visor are first-class objects that provide built-in automation methods.
Regions can be obtained from:
- visor.find()
- visor.findAll()
- visor.findText()
- Visor Inspector
Example:
const dialog = await visor.find('dialog.png');
const save = await dialog.find('save.png');
await save.click();## Region.find()
Search for an image within the current region.const dialog = await visor.find('dialog.png');
const save = await dialog.find('save.png');## Region.findAll()
Find all image matches within the current region.const dialog = await visor.find('dialog.png');
const buttons = await dialog.findAll('button.png');## Region.exists()
Check whether an image exists within the current region.const dialog = await visor.find('dialog.png');
const exists = await dialog.exists('save.png');## Region.findText()
Search for text within the current region.const dialog = await visor.find('dialog.png');
const submit = await dialog.findText('Submit');## Region.existsText()
Check whether text exists within the current region.const dialog = await visor.find('dialog.png');
const exists = await dialog.existsText('Success');## Region.readText()
Extract OCR text from the current region.const dialog = await visor.find('dialog.png');
const result = await dialog.readText();
console.log(result.text);## Region.click()const save = await visor.find('save.png');
await save.click();## Region.doubleClick()await save.doubleClick();## Region.rightClick()await save.rightClick();## Region.move()await save.move();Check Image Exists
const exists = await visor.exists('login.png');Wait For Image
await visor.wait('save.png');
await visor.wait('save.png', {
confidence: 0.9,
timeout: 10000,
});Wait For Multiple Images
await visor.waitAny(['light-theme.png', 'dark-theme.png']);Click Multiple Theme Variants
await visor.clickAny(['send-light.png', 'send-dark.png']);Drag & Drop
await visor.dragDrop('source.png', 'target.png');Hover
await visor.hover('menu.png');Target Offsets
Target offsets allow mouse actions to be performed relative to the center of a matched image.
Useful for:
- Dropdown arrows
- Adjacent controls
- Dynamic layouts
- Composite UI elements
Click With Offset
await visor.click('search.png', 0.8, {
x: 50,
y: 0,
});Hover With Offset
await visor.hover('menu.png', 0.8, {
x: -20,
y: 10,
});Offsets are applied relative to the center of the matched region before display scaling adjustments are performed.
OCR Automation
Visor includes OCR automation powered by Tesseract.js.
OCR supports:
- Full-screen OCR
- Region OCR
- Text search
- Text clicking
- Text waiting
- OCR occurrence indexing
Read Screen
const result = await visor.readScreen();
console.log(result.text);Read Region
const result = await visor.readRegion(new Region(100, 100, 500, 300));
console.log(result.text);Find Text
const region = visor.findText('Submit');Click Text
await visor.clickText('Login');Wait For Text
await visor.waitText('Success');OCR Occurrence Indexing
When the same text appears multiple times on screen, Visor allows selecting a specific occurrence.
await visor.clickText('Inbox', 0);
await visor.clickText('Inbox', 1);
await visor.clickText('Inbox', 2);OCR elements are processed from:
Top → Bottom
Left → RightThis improves automation stability when multiple matching text elements exist on screen.
OCR Optimizations
Visor includes:
- Shared OCR worker reuse
- OCR preprocessing
- Grayscale normalization
- Image sharpening
- Confidence filtering
- OCR occurrence indexing
Benefits:
- Faster OCR execution
- Improved OCR accuracy
- Lower memory usage
- Improved framework stability
Mouse Automation
Move Mouse
await visor.moveMouse(500, 300);Move To Inspector Region
await visor.moveToRegion(new Region(90, 61, 138, 69));Region coordinates can be copied directly from Visor Inspector match results.
Scroll Down
await visor.scrollDown(1000);Scroll Up
await visor.scrollUp(1000);Mouse Position
const pos = await visor.getMousePosition();Keyboard Automation
Type Text
await visor.type('Hello World');Press Keys
await visor.press(visor.Key.LeftControl, visor.Key.S);Screenshot Automation
await visor.captureScreenshot('./screenshots/home.png');Desktop Application Automation
Open Application
await visor.openApp('notepad');Close Application
await visor.closeApp('notepad.exe');Confidence Thresholds
Supported range:
0.0 - 1.0Recommended values:
| Confidence | Usage | | ---------- | --------------- | | 0.7 | Dynamic UI | | 0.8 | General usage | | 0.9 | Strict matching |
Performance Improvements
Visor includes:
- Shared OCR worker reuse
- Multi-scale image matching
- OCR preprocessing pipeline
- Automatic display scaling detection
These improvements increase reliability across varying display configurations and reduce OCR initialization overhead.
Troubleshooting
Image Not Found
Possible causes:
- Incorrect image path
- Low confidence threshold
- Theme mismatch
- Poor template quality
OCR Not Detecting Text
Possible causes:
- Small fonts
- Low contrast text
- Blurry UI elements
Mouse Clicking Incorrect Position
Visor automatically detects Windows display scaling.
If required, manually override:
visor.loadConfig({
scaleFactor: 1.5,
});Roadmap
- Match visualization overlay
- Inspector coordinate picker
- Multi-monitor support improvements
- Parallel image matching
- Advanced OCR tuning
- Electron recorder
- AI-assisted automation
Tech Stack
- OpenCV
- Tesseract.js
- screenshot-desktop
- sharp
- nut.js
Why Visor?
Unlike Selenium or Playwright, Visor automates desktop applications using image recognition and OCR.
Works with:
- Native Windows applications
- Citrix environments
- Remote desktops
- Thick-client applications
- Legacy systems
