stelo

v1.0.8

Published

a day ago

Give your AI hands. Computer-use SDK for building agents that see, click, type and hear — native Rust performance, zero compromise.

Stelo

The Most Advanced Universal Desktop Automation Framework

Mouse · Keyboard · Screen · Window · Recorder · Safety

Sub-millisecond native performance powered by Rust + napi-rs

Why Stelo?

| Feature | RobotJS | nutjs | Stelo | |---|---|---|---| | Active maintenance | ❌ | ⚠️ | ✅ | | Cross-platform | ✅ | ✅ | ✅ | | Native performance | ✅ | ❌ (screen slow) | ✅ Rust | | Smooth mouse movement | ❌ | ✅ basic | ✅ Bezier + Wind-Mouse | | Async non-blocking movement | ❌ | ❌ | ✅ | | Real-time cursor streaming | ❌ | ❌ | ✅ | | Parallel action execution | ❌ | ❌ | ✅ | | Humanized input | ❌ | ❌ | ✅ | | Window management | ❌ | ✅ | ✅ | | Screen capture | ✅ slow | ✅ | ✅ fast | | Automation recorder | ❌ | ❌ | ✅ | | Safety / failsafe | ❌ | ❌ | ✅ | | TypeScript-first | ❌ | ✅ | ✅ | | Prebuilt binaries | ❌ (rebuild) | ✅ | ✅ napi-rs | | Enterprise telemetry | ❌ | ❌ | ✅ | | Circuit breakers | ❌ | ❌ | ✅ | | Audit logging | ❌ | ❌ | ✅ | | Structured errors | ❌ | ❌ | ✅ | | Screen diffing | ❌ | ❌ | ✅ native Rust | | Change detection | ❌ | ❌ | ✅ | | Action verification | ❌ | ❌ | ✅ | | Grid analysis | ❌ | ❌ | ✅ | | Perceptual hashing | ❌ | ❌ | ✅ | | Action batching | ❌ | ❌ | ✅ | | Real-time agent control | ❌ | ❌ | ✅ | | Streaming cursor control | ❌ | ❌ | ✅ | | Gesture recording | ❌ | ❌ | ✅ |

Quick Start

npm install stelo

import { mouse, keyboard, screen } from 'stelo';

// Move mouse smoothly with bezier curve
await mouse.moveSmoothly(500, 400, { duration: 800, curve: 'bezier' });

// Click, type, hotkey
mouse.click();
keyboard.type('Hello from Stelo!');
keyboard.hotkey('ctrl', 's');

// Screen analysis
const color = screen.getPixelColor(100, 200);
console.log(color.hex); // '#3B82F6'

Node.js ≥ 18 required. Prebuilt binaries available — no compiler needed.

Built for Computer-Use Agents

Stelo is the premier SDK for building autonomous desktop agents. Every design decision prioritizes the needs of vision-language models and computer-use systems:

🎯 Native Resolution Everywhere

Forces Per-Monitor DPI Awareness V2 on Windows
Screenshot coordinates = mouse coordinates = vision model coordinates
Zero drift, zero scaling math, single coordinate space end-to-end

👁 Vision Primitives (Rust-native)

Screen diffing: Detect what changed between frames
Action verification: Confirm clicks/keypresses actually worked
Wait for stable: Pause until animations complete
Grid analysis: Efficiently identify regions of interest for vision models
Perceptual hashing: Fast similarity checks without full pixel comparison
Color clustering: Find UI elements like buttons by color

⚡ Subsecond Feedback Loop

Native Rust screen capture (10-50ms full screen)
BGRA capture mode (skip color conversion for vision pipelines)
Humanized input that evades bot detection

import { vision, mouse, screen } from 'stelo';

// Agent workflow: observe → act → verify → repeat
const before = vision.captureReference();
await mouse.click(button.x, button.y);

const result = await vision.verifyAction(
  async () => {}, // Already clicked
  0.5,            // Require 0.5% visual change
  2000
);

if (!result.verified) {
  // No visual feedback - click may have failed
  // Agent can retry or try alternative approach
}

// Wait for any animations to complete before next action
await vision.waitForStable(0.1, 200, 5000);

Real-Time Agent Control

For AI agents that need to control mouse and keyboard in real-time with minimal latency, Stelo provides a dedicated agent module optimized for continuous control loops:

🚀 Non-Blocking Real-Time Movement

Cursor moves visually on-screen in real-time — no teleporting
Event loop stays free during movement — do other things in parallel
Cancel movements mid-flight, redirect cursor seamlessly
All movement is cancellable via MovementHandle

⚡ Parallel Action Execution

Run multiple operations simultaneously via agent.parallel()
Move mouse while monitoring screen; type while waiting for UI changes
Zero blocking — everything is async/non-blocking

🔄 Streaming Cursor Control

Create persistent cursor streams with agent.createCursorStream()
Send continuous target updates — cursor smoothly follows in real-time
Velocity-based control for joystick/AI-style steering
Perfect for vision-language models that output continuous coordinates

🎯 Verification-First Design

Every action can verify it had the expected visual effect
Know immediately if a click "worked" or needs retry
Essential for robust autonomous agents

import { mouse, keyboard, agent, screen } from 'stelo';

// ── Non-Blocking Smooth Movement ────────────────────────────────────
// The cursor visually moves on-screen. No teleporting. No blocking.

await mouse.moveSmoothAsync(500, 400, { duration: 600 });

// Cancel mid-flight
const handle = mouse.moveSmoothAsync(800, 600);
setTimeout(() => handle.cancel(), 200);
await handle.promise;

// Humanized movement with jitter and overshoot — also non-blocking
await mouse.moveHumanizedAsync(500, 300, {
  duration: 800,
  jitter: 0.5,
}).promise;

// ── Parallel Operations ─────────────────────────────────────────────
// Move mouse while monitoring screen — simultaneously

const result = await agent.parallel({
  move: async () => {
    await mouse.moveSmoothAsync(500, 300, { duration: 600 }).promise;
  },
  monitor: async () => {
    await new Promise(r => setTimeout(r, 300));
    return screen.capture();
  },
});

// ── Streaming Cursor Control ────────────────────────────────────────
// For continuous real-time control by AI models

const cursor = agent.createCursorStream(120);

cursor.moveTo(500, 300);                     // Smooth movement to target
setTimeout(() => cursor.moveTo(800, 600), 200); // Redirect mid-flight
cursor.setVelocity(200, -100);               // Velocity-based steering
cursor.stop();                               // Stop all movement
cursor.destroy();                            // Release resources

// ── Smooth Click & Type ─────────────────────────────────────────────
// Visible cursor movement + click + type — all in one async call

await agent.smoothClickAndType(500, 300, 'search query', {
  duration: 400,
});

// ── Non-Blocking Typing ─────────────────────────────────────────────
// Type while doing other things simultaneously

await Promise.all([
  keyboard.typeHumanizedAsync('Hello World').promise,
  (async () => {
    await new Promise(r => setTimeout(r, 100));
    return screen.capture(); // Capture while typing
  })(),
]);

// ── Action Batching ─────────────────────────────────────────────────
// Execute multiple actions atomically in one native call

const batch = agent.executeBatch([
  { type: 'mouseClickAt', x: 500, y: 300 },
  { type: 'waitForStable', stableMs: 200 },
  { type: 'type', text: 'search query' },
  { type: 'keyPress', key: 'Enter' },
  { type: 'waitForChange', timeout: 3000 }
]);

// ── Fluent Sequence Builder ─────────────────────────────────────────
const workflow = agent.sequence()
  .clickAt(500, 300)
  .waitForStable()
  .type('Hello World')
  .press('Enter')
  .waitForChange()
  .executeAsync(); // Non-blocking version

// ── Verification ────────────────────────────────────────────────────
const verified = await agent.moveClickVerify(500, 300, {
  moveDuration: 400,
  minChangePercent: 0.5,
});
if (!verified.verified) {
  console.log('Click had no visual effect - retry needed');
}

Supported Batch Actions: | Type | Description | Parameters | |------|-------------|------------| | mouseMove | Instant move | x, y | | mouseMoveRel | Relative move | dx, dy | | mouseMoveSmooth | Animated move | x, y, duration? | | mouseClick | Click at current pos | button? | | mouseClickAt | Move and click | x, y, button? | | mouseDoubleClick | Double click | button? | | mouseDown/Up | Press/release | button? | | mouseScroll | Scroll wheel | amount, horizontal? | | mouseDrag | Drag to position | toX, toY, button? | | type | Type text instantly | text | | typeHumanized | Type naturally | text, minDelay?, maxDelay? | | keyPress | Press and release | key | | keyDown/Up | Hold/release key | key | | hotkey | Key combination | keys[] | | delay | Wait duration | ms | | waitForChange | Wait for visual change | threshold?, timeout?, region? | | waitForStable | Wait for stability | threshold?, stableMs?, timeout?, region? |

Installation

# npm
npm install stelo

# yarn
yarn add stelo

# pnpm
pnpm add stelo

Supported Platforms

| OS | Arch | Package | |---|---|---| | Windows | x64 | stelo-win32-x64-msvc | | Windows | arm64 | stelo-win32-arm64-msvc | | macOS | x64 (Intel) | stelo-darwin-x64 | | macOS | arm64 (Apple Silicon) | stelo-darwin-arm64 | | Linux | x64 (glibc) | stelo-linux-x64-gnu | | Linux | x64 (musl) | stelo-linux-x64-musl | | Linux | arm64 (glibc) | stelo-linux-arm64-gnu | | Linux | arm64 (musl) | stelo-linux-arm64-musl |

Platform packages are auto-installed via optionalDependencies.

API Overview

🖱 Mouse

import { mouse } from 'stelo';

// Position
mouse.move(x, y);                               // instant teleport
const { x, y } = mouse.getPosition();           // current position
console.log(mouse.x, mouse.y);                  // shorthand getters

// Smooth movement
await mouse.moveSmoothly(x, y, {
  duration: 500,                                 // ms
  curve: 'bezier' | 'easeInOut' | 'linear',
});
await mouse.moveHumanized(x, y);                // wind-mouse algorithm

// Clicks
mouse.click();                                   // left click
mouse.click('right');                            // right click
mouse.click('middle');                           // middle click
mouse.doubleClick();
mouse.tripleClick();
mouse.moveAndClick(x, y);                       // move + click in one

// Press / release
mouse.down('left');
mouse.up('left');

// Scroll
mouse.scroll(3, 'up');                           // 3 clicks up
mouse.scroll(5, 'down');
mouse.scroll(2, 'left');                         // horizontal

// Drag
mouse.drag(fromX, fromY, toX, toY);
mouse.drag(fromX, fromY, toX, toY, 'right');    // right-button drag

⌨️ Keyboard

import { keyboard } from 'stelo';

// Typing
keyboard.type('Hello World');                    // instant
await keyboard.typeHumanized('Hello', {          // natural speed
  minDelay: 40,
  maxDelay: 120,
});

// Key operations
keyboard.press('enter');                         // tap a key
keyboard.down('shift');                          // hold
keyboard.up('shift');                            // release
keyboard.tap('backspace', 5);                    // repeat N times

// Hotkeys
keyboard.hotkey('ctrl', 'c');                    // copy
keyboard.hotkey('ctrl', 'shift', 'p');           // multi-key combo

// Convenience shortcuts
keyboard.enter();
keyboard.tab();
keyboard.escape();
keyboard.backspace();
keyboard.selectAll();                            // ctrl+a / cmd+a
keyboard.copy();                                 // ctrl+c / cmd+c
keyboard.paste();                                // ctrl+v / cmd+v
keyboard.cut();                                  // ctrl+x / cmd+x
keyboard.undo();                                 // ctrl+z / cmd+z
keyboard.redo();                                 // ctrl+y / cmd+shift+z
keyboard.save();                                 // ctrl+s / cmd+s

// Query
const keys = keyboard.allKeys();                 // all supported key names
const pressed = keyboard.isPressed('shift');     // is key currently held?

🖥 Screen

import { screen } from 'stelo';

// Dimensions
const { width, height } = screen.getSize();

// Pixel colour
const color = screen.getPixelColor(x, y);
// → { r: 59, g: 130, b: 246, hex: '#3B82F6' }

// Capture
const img = screen.capture();                    // full screen
const region = screen.capture({                  // specific region
  x: 100, y: 100, width: 400, height: 300,
});
// img.data is a Buffer of RGBA pixels

// All displays
const displays = screen.getAllDisplays();
// → [{ id, x, y, width, height, isPrimary, scaleFactor }]

// Colour search
const point = screen.findColor('#FF0000', { tolerance: 10 });
const matches = screen.pixelMatches(x, y, '#FF0000', 5);

// Wait for colour
await screen.waitForColor(x, y, '#00FF00', {
  tolerance: 10,
  timeoutMs: 5000,
  intervalMs: 100,
});

🪟 Window

import { appWindow } from 'stelo';

// Query
const active = appWindow.getActive();
const all    = appWindow.getAll();
const found  = appWindow.find('Notepad');

// Focus
appWindow.focus(windowId);
appWindow.focusByTitle('Calculator');

// Geometry
appWindow.resize(id, 800, 600);
appWindow.moveTo(id, 100, 100);
appWindow.setBounds(id, { x: 0, y: 0, width: 1024, height: 768 });

// State
appWindow.minimize(id);
appWindow.maximize(id);
appWindow.restore(id);
appWindow.close(id);

// Wait for window to appear
const win = await appWindow.waitFor('MyApp', 10000);

🎬 Recorder

import { recorder } from 'stelo';

// Build a sequence
const steps = recorder.sequence()
  .moveTo(100, 200)
  .click()
  .delay(200)
  .type('hello')
  .press('enter')
  .hotkey('ctrl', 's')
  .build();

// Play it back
await recorder.play(steps);
await recorder.play(steps, { speed: 2.0 });     // 2× speed

// Serialize / restore
const json = recorder.toJSON(steps);
const restored = recorder.fromJSON(json);

🛡 Safety

import { safety } from 'stelo';

// Failsafe: move mouse to any screen corner → emergency stop
safety.enableFailsafe();
safety.enableFailsafe(10);           // 10px threshold

// Rate limiting
safety.setRateLimit(200);            // max 200 actions/sec

// Emergency stop / reset
safety.emergencyStop();
console.log(safety.isStopped());     // true
safety.reset();

// Configure all at once
safety.configure({
  failsafe: true,
  failsafeThreshold: 5,
  rateLimit: 500,
});

🏢 Enterprise Features

Stelo includes enterprise-grade utilities for production deployments.

Telemetry & Observability

import { telemetry } from 'stelo';

// Enable telemetry
telemetry.enable();

// Subscribe to events
telemetry.onEvent((event) => {
  console.log(`[${event.severity}] ${event.operation}: ${event.message}`);
});

// Track operations with timing
await telemetry.trackAsync('my-operation', async () => {
  await mouse.moveSmoothly(500, 400);
});

// Get metrics
const metrics = telemetry.getMetrics();
// → { totalOps, successRate, avgLatencyMs, p95LatencyMs, ... }

// Health check
const health = await telemetry.healthCheck();
// → { status: 'healthy', checks: [...], timestamp: '...' }

Circuit Breaker (Fault Tolerance)

import { CircuitBreaker } from 'stelo';

const breaker = new CircuitBreaker('external-service', {
  failureThreshold: 5,
  resetTimeoutMs: 30000,
});

try {
  await breaker.execute(async () => {
    return await riskyOperation();
  });
} catch (e) {
  if (breaker.getState() === 'open') {
    console.log('Circuit is open, service unavailable');
  }
}

Validation & Input Checking

import { validate, ValidationError } from 'stelo';

// Validate coordinates
validate.coordinate(x, 'x');
validate.coordinate(y, 'y');

// Validate within screen bounds
const screenSize = screen.getSize();
validate.inBounds({ x, y }, screenSize);

// Safe validation (returns Result instead of throwing)
const result = validate.safe.coordinate(x, 'x');
if (!result.ok) {
  console.error(result.error.message);
}

// Validate regions
validate.region({ x: 0, y: 0, width: 100, height: 100 });

Audit Logging & Security

import { security } from 'stelo';

// Enable audit logging
security.enableAudit();

// Subscribe to audit events
security.onAudit((entry) => {
  console.log(`[AUDIT] ${entry.operation}: ${entry.success ? 'OK' : 'FAIL'}`);
});

// Get audit log
const log = security.getAuditLog(100); // last 100 entries

// Sanitize user input before typing
const safeText = security.sanitizeText(userInput);
keyboard.type(safeText);

// Detect sandboxed environments
const { sandboxed, indicators } = security.detectSandbox();

Enhanced Error Handling

import { SteloError, ErrorCode, retryWithBackoff } from 'stelo';

try {
  await mouse.move(x, y);
} catch (e) {
  if (e instanceof SteloError) {
    console.log(`Error ${e.code}: ${e.message}`);
    console.log('Recovery hints:', e.hints);
    console.log('Diagnostics:', e.diagnostics);
    
    // Generate detailed error report
    console.log(e.toReport());
  }
}

// Retry with exponential backoff
const result = await retryWithBackoff(
  () => riskyOperation(),
  { maxAttempts: 5, initialDelayMs: 100 }
);

Rate Limiting

import { RateLimiter } from 'stelo';

// Token bucket rate limiter
const limiter = new RateLimiter(100, 10); // 100 max, 10/sec refill

if (limiter.tryConsume()) {
  await doOperation();
} else {
  console.log('Rate limited, try again later');
}

� Vision & Agent Primitives

Advanced screen analysis primitives designed for vision-based desktop automation.

Screen Diffing & Change Detection

import { vision, mouse } from 'stelo';

// Take reference screenshot
const before = vision.captureReference();

// Perform action
await mouse.click(500, 400);

// Compare: did the screen change?
const diff = vision.diff(before);
console.log(`${diff.changePercentage}% changed`);
if (diff.changedBounds) {
  console.log('Change occurred at:', diff.changedBounds);
}

Action Verification

// Verify an action caused visual change
const result = await vision.verifyAction(
  async () => { await mouse.click(100, 200); },
  0.5,  // require at least 0.5% change
  2000  // timeout
);

if (!result.verified) {
  console.log('Button click had no effect - retry?');
}

Wait for Screen Stability

// Wait for animations/transitions to complete
mouse.click();
await vision.waitForStable(
  0.1,   // max 0.1% change considered "stable"
  200,   // must be stable for 200ms
  5000   // timeout
);
// Screen is now stable - safe to continue

Grid Analysis for Vision Models

// Analyze screen as a grid (efficient for vision model region selection)
const grid = vision.analyzeGrid(16, 9); // 16x9 grid

// Find cells likely containing text or UI
const textCells = grid.cells.filter(c => c.likelyText);
const uiCells = grid.cells.filter(c => c.likelyUI);

// Click the center of cell [3, 2]
const center = vision.gridCellCenter(grid, 3, 2);
if (center) await mouse.click(center.x, center.y);

Perceptual Hashing

// Fast similarity check using perceptual hashes
const hash1 = vision.perceptualHash();
await performAction();
const hash2 = vision.perceptualHash();

const distance = vision.hashDistance(hash1, hash2);
if (distance < 5) {
  console.log('Screen looks mostly the same');
} else if (distance > 20) {
  console.log('Screen changed significantly');
}

Color Cluster Detection

// Find UI elements by color (e.g., buttons)
const blueClusters = vision.findColorClusters(
  { r: 0, g: 120, b: 215 }, // Windows accent blue
  40,   // tolerance
  50    // min 50 pixels
);

if (blueClusters.length > 0) {
  // Click center of first blue region
  const btn = blueClusters[0];
  mouse.click(btn.x + btn.width / 2, btn.y + btn.height / 2);
}

Agent (Real-Time Control)

Low-latency primitives for controlling mouse and keyboard in real-time.

Action Batching

import { agent } from 'stelo';

// Execute multiple actions atomically - minimal latency
const result = agent.executeBatch([
  { type: 'mouseClickAt', x: 500, y: 300 },
  { type: 'waitForStable', stableMs: 200 },
  { type: 'type', text: 'Hello World' },
  { type: 'keyPress', key: 'Enter' }
], { stopOnError: true });

// Inspect results
console.log(`${result.successCount}/${result.results.length} succeeded`);
console.log(`Total time: ${result.totalDurationMs}ms`);

Click with Verification

// Click and confirm the screen changed
const result = agent.clickAndVerify(500, 300, {
  button: 'left',
  minChangePercent: 0.5,
  timeoutMs: 2000,
  region: { x: 400, y: 200, width: 200, height: 200 }
});

if (!result.verified) {
  console.log('Click had no visual effect');
}

Fluent Sequence Builder

// Chain actions with a fluent API
const result = agent.sequence()
  .clickAt(500, 300)
  .waitForStable()
  .type('search query')
  .press('Enter')
  .waitForChange({ timeout: 3000 })
  .hotkey('ctrl', 'a')
  .execute();

Gesture Recording & Playback

// Record mouse movements
const trail = agent.recordMouseTrail(3000, 60); // 3 sec at 60Hz

// Later, replay the gesture
agent.replayMouseTrail(trail, 0.5); // Half speed

Error Recovery

// Release all held modifier keys (error recovery)
agent.releaseAllModifiers();

// Type and wait for autocomplete/validation to finish
const stable = agent.typeAndWaitStable('[email protected]', {
  stabilityThreshold: 0.1,
  stableDurationMs: 200,
  timeoutMs: 3000
});

🛠 Utilities

import { delay, retry, waitUntil, batch, bezier } from 'stelo';

await delay(500);                                // sleep

const result = await retry(() => riskyOp(), {    // auto-retry
  maxAttempts: 5,
  delayMs: 200,
  backoffMultiplier: 1.5,
});

await waitUntil(() => isReady(), {               // poll
  timeoutMs: 10000,
  intervalMs: 100,
});

await batch(tasks, { concurrency: 3 });          // parallel with limit

const pt = bezier.cubic(p0, p1, p2, p3, 0.5);   // bezier evaluation

FAQ

Does Stelo require a C++ compiler?

No. Prebuilt binaries are published for all supported platforms. If you're building from source, you need Rust (not C++).

How does humanized movement work?

Stelo implements the Wind Mouse algorithm — a physics-inspired model that simulates human hand tremor, acceleration, and overshoot. Combined with cubic bezier curves and arc-length parameterization, mouse paths look indistinguishable from real user input.

Is Stelo safe to use in production?

Yes. The safety module provides failsafe (mouse-to-corner abort), rate limiting, and emergency stop. Always enable failsafe in automation scripts.

Does Stelo work in headless / CI environments?

On Linux, use Xvfb (X Virtual Framebuffer) to run Stelo without a physical display. Windows and macOS require a desktop session.

What about Wayland on Linux?

Stelo currently uses X11 / XTest. It works under XWayland on most Wayland compositors. Native Wayland input injection is planned.