stelo
v1.0.8
Published
Give your AI hands. Computer-use SDK for building agents that see, click, type and hear — native Rust performance, zero compromise.
Maintainers
Readme
Stelo
The Most Advanced Universal Desktop Automation Framework
Mouse · Keyboard · Screen · Window · Recorder · Safety
Sub-millisecond native performance powered by Rust + napi-rs
Why Stelo?
| Feature | RobotJS | nutjs | Stelo | |---|---|---|---| | Active maintenance | ❌ | ⚠️ | ✅ | | Cross-platform | ✅ | ✅ | ✅ | | Native performance | ✅ | ❌ (screen slow) | ✅ Rust | | Smooth mouse movement | ❌ | ✅ basic | ✅ Bezier + Wind-Mouse | | Async non-blocking movement | ❌ | ❌ | ✅ | | Real-time cursor streaming | ❌ | ❌ | ✅ | | Parallel action execution | ❌ | ❌ | ✅ | | Humanized input | ❌ | ❌ | ✅ | | Window management | ❌ | ✅ | ✅ | | Screen capture | ✅ slow | ✅ | ✅ fast | | Automation recorder | ❌ | ❌ | ✅ | | Safety / failsafe | ❌ | ❌ | ✅ | | TypeScript-first | ❌ | ✅ | ✅ | | Prebuilt binaries | ❌ (rebuild) | ✅ | ✅ napi-rs | | Enterprise telemetry | ❌ | ❌ | ✅ | | Circuit breakers | ❌ | ❌ | ✅ | | Audit logging | ❌ | ❌ | ✅ | | Structured errors | ❌ | ❌ | ✅ | | Screen diffing | ❌ | ❌ | ✅ native Rust | | Change detection | ❌ | ❌ | ✅ | | Action verification | ❌ | ❌ | ✅ | | Grid analysis | ❌ | ❌ | ✅ | | Perceptual hashing | ❌ | ❌ | ✅ | | Action batching | ❌ | ❌ | ✅ | | Real-time agent control | ❌ | ❌ | ✅ | | Streaming cursor control | ❌ | ❌ | ✅ | | Gesture recording | ❌ | ❌ | ✅ |
Quick Start
npm install steloimport { mouse, keyboard, screen } from 'stelo';
// Move mouse smoothly with bezier curve
await mouse.moveSmoothly(500, 400, { duration: 800, curve: 'bezier' });
// Click, type, hotkey
mouse.click();
keyboard.type('Hello from Stelo!');
keyboard.hotkey('ctrl', 's');
// Screen analysis
const color = screen.getPixelColor(100, 200);
console.log(color.hex); // '#3B82F6'Node.js ≥ 18 required. Prebuilt binaries available — no compiler needed.
Built for Computer-Use Agents
Stelo is the premier SDK for building autonomous desktop agents. Every design decision prioritizes the needs of vision-language models and computer-use systems:
🎯 Native Resolution Everywhere
- Forces Per-Monitor DPI Awareness V2 on Windows
- Screenshot coordinates = mouse coordinates = vision model coordinates
- Zero drift, zero scaling math, single coordinate space end-to-end
👁 Vision Primitives (Rust-native)
- Screen diffing: Detect what changed between frames
- Action verification: Confirm clicks/keypresses actually worked
- Wait for stable: Pause until animations complete
- Grid analysis: Efficiently identify regions of interest for vision models
- Perceptual hashing: Fast similarity checks without full pixel comparison
- Color clustering: Find UI elements like buttons by color
⚡ Subsecond Feedback Loop
- Native Rust screen capture (10-50ms full screen)
- BGRA capture mode (skip color conversion for vision pipelines)
- Humanized input that evades bot detection
import { vision, mouse, screen } from 'stelo';
// Agent workflow: observe → act → verify → repeat
const before = vision.captureReference();
await mouse.click(button.x, button.y);
const result = await vision.verifyAction(
async () => {}, // Already clicked
0.5, // Require 0.5% visual change
2000
);
if (!result.verified) {
// No visual feedback - click may have failed
// Agent can retry or try alternative approach
}
// Wait for any animations to complete before next action
await vision.waitForStable(0.1, 200, 5000);Real-Time Agent Control
For AI agents that need to control mouse and keyboard in real-time with minimal latency, Stelo provides a dedicated agent module optimized for continuous control loops:
🚀 Non-Blocking Real-Time Movement
- Cursor moves visually on-screen in real-time — no teleporting
- Event loop stays free during movement — do other things in parallel
- Cancel movements mid-flight, redirect cursor seamlessly
- All movement is cancellable via
MovementHandle
⚡ Parallel Action Execution
- Run multiple operations simultaneously via
agent.parallel() - Move mouse while monitoring screen; type while waiting for UI changes
- Zero blocking — everything is async/non-blocking
🔄 Streaming Cursor Control
- Create persistent cursor streams with
agent.createCursorStream() - Send continuous target updates — cursor smoothly follows in real-time
- Velocity-based control for joystick/AI-style steering
- Perfect for vision-language models that output continuous coordinates
🎯 Verification-First Design
- Every action can verify it had the expected visual effect
- Know immediately if a click "worked" or needs retry
- Essential for robust autonomous agents
import { mouse, keyboard, agent, screen } from 'stelo';
// ── Non-Blocking Smooth Movement ────────────────────────────────────
// The cursor visually moves on-screen. No teleporting. No blocking.
await mouse.moveSmoothAsync(500, 400, { duration: 600 });
// Cancel mid-flight
const handle = mouse.moveSmoothAsync(800, 600);
setTimeout(() => handle.cancel(), 200);
await handle.promise;
// Humanized movement with jitter and overshoot — also non-blocking
await mouse.moveHumanizedAsync(500, 300, {
duration: 800,
jitter: 0.5,
}).promise;
// ── Parallel Operations ─────────────────────────────────────────────
// Move mouse while monitoring screen — simultaneously
const result = await agent.parallel({
move: async () => {
await mouse.moveSmoothAsync(500, 300, { duration: 600 }).promise;
},
monitor: async () => {
await new Promise(r => setTimeout(r, 300));
return screen.capture();
},
});
// ── Streaming Cursor Control ────────────────────────────────────────
// For continuous real-time control by AI models
const cursor = agent.createCursorStream(120);
cursor.moveTo(500, 300); // Smooth movement to target
setTimeout(() => cursor.moveTo(800, 600), 200); // Redirect mid-flight
cursor.setVelocity(200, -100); // Velocity-based steering
cursor.stop(); // Stop all movement
cursor.destroy(); // Release resources
// ── Smooth Click & Type ─────────────────────────────────────────────
// Visible cursor movement + click + type — all in one async call
await agent.smoothClickAndType(500, 300, 'search query', {
duration: 400,
});
// ── Non-Blocking Typing ─────────────────────────────────────────────
// Type while doing other things simultaneously
await Promise.all([
keyboard.typeHumanizedAsync('Hello World').promise,
(async () => {
await new Promise(r => setTimeout(r, 100));
return screen.capture(); // Capture while typing
})(),
]);
// ── Action Batching ─────────────────────────────────────────────────
// Execute multiple actions atomically in one native call
const batch = agent.executeBatch([
{ type: 'mouseClickAt', x: 500, y: 300 },
{ type: 'waitForStable', stableMs: 200 },
{ type: 'type', text: 'search query' },
{ type: 'keyPress', key: 'Enter' },
{ type: 'waitForChange', timeout: 3000 }
]);
// ── Fluent Sequence Builder ─────────────────────────────────────────
const workflow = agent.sequence()
.clickAt(500, 300)
.waitForStable()
.type('Hello World')
.press('Enter')
.waitForChange()
.executeAsync(); // Non-blocking version
// ── Verification ────────────────────────────────────────────────────
const verified = await agent.moveClickVerify(500, 300, {
moveDuration: 400,
minChangePercent: 0.5,
});
if (!verified.verified) {
console.log('Click had no visual effect - retry needed');
}Supported Batch Actions:
| Type | Description | Parameters |
|------|-------------|------------|
| mouseMove | Instant move | x, y |
| mouseMoveRel | Relative move | dx, dy |
| mouseMoveSmooth | Animated move | x, y, duration? |
| mouseClick | Click at current pos | button? |
| mouseClickAt | Move and click | x, y, button? |
| mouseDoubleClick | Double click | button? |
| mouseDown/Up | Press/release | button? |
| mouseScroll | Scroll wheel | amount, horizontal? |
| mouseDrag | Drag to position | toX, toY, button? |
| type | Type text instantly | text |
| typeHumanized | Type naturally | text, minDelay?, maxDelay? |
| keyPress | Press and release | key |
| keyDown/Up | Hold/release key | key |
| hotkey | Key combination | keys[] |
| delay | Wait duration | ms |
| waitForChange | Wait for visual change | threshold?, timeout?, region? |
| waitForStable | Wait for stability | threshold?, stableMs?, timeout?, region? |
Installation
# npm
npm install stelo
# yarn
yarn add stelo
# pnpm
pnpm add steloSupported Platforms
| OS | Arch | Package |
|---|---|---|
| Windows | x64 | stelo-win32-x64-msvc |
| Windows | arm64 | stelo-win32-arm64-msvc |
| macOS | x64 (Intel) | stelo-darwin-x64 |
| macOS | arm64 (Apple Silicon) | stelo-darwin-arm64 |
| Linux | x64 (glibc) | stelo-linux-x64-gnu |
| Linux | x64 (musl) | stelo-linux-x64-musl |
| Linux | arm64 (glibc) | stelo-linux-arm64-gnu |
| Linux | arm64 (musl) | stelo-linux-arm64-musl |
Platform packages are auto-installed via optionalDependencies.
API Overview
🖱 Mouse
import { mouse } from 'stelo';
// Position
mouse.move(x, y); // instant teleport
const { x, y } = mouse.getPosition(); // current position
console.log(mouse.x, mouse.y); // shorthand getters
// Smooth movement
await mouse.moveSmoothly(x, y, {
duration: 500, // ms
curve: 'bezier' | 'easeInOut' | 'linear',
});
await mouse.moveHumanized(x, y); // wind-mouse algorithm
// Clicks
mouse.click(); // left click
mouse.click('right'); // right click
mouse.click('middle'); // middle click
mouse.doubleClick();
mouse.tripleClick();
mouse.moveAndClick(x, y); // move + click in one
// Press / release
mouse.down('left');
mouse.up('left');
// Scroll
mouse.scroll(3, 'up'); // 3 clicks up
mouse.scroll(5, 'down');
mouse.scroll(2, 'left'); // horizontal
// Drag
mouse.drag(fromX, fromY, toX, toY);
mouse.drag(fromX, fromY, toX, toY, 'right'); // right-button drag⌨️ Keyboard
import { keyboard } from 'stelo';
// Typing
keyboard.type('Hello World'); // instant
await keyboard.typeHumanized('Hello', { // natural speed
minDelay: 40,
maxDelay: 120,
});
// Key operations
keyboard.press('enter'); // tap a key
keyboard.down('shift'); // hold
keyboard.up('shift'); // release
keyboard.tap('backspace', 5); // repeat N times
// Hotkeys
keyboard.hotkey('ctrl', 'c'); // copy
keyboard.hotkey('ctrl', 'shift', 'p'); // multi-key combo
// Convenience shortcuts
keyboard.enter();
keyboard.tab();
keyboard.escape();
keyboard.backspace();
keyboard.selectAll(); // ctrl+a / cmd+a
keyboard.copy(); // ctrl+c / cmd+c
keyboard.paste(); // ctrl+v / cmd+v
keyboard.cut(); // ctrl+x / cmd+x
keyboard.undo(); // ctrl+z / cmd+z
keyboard.redo(); // ctrl+y / cmd+shift+z
keyboard.save(); // ctrl+s / cmd+s
// Query
const keys = keyboard.allKeys(); // all supported key names
const pressed = keyboard.isPressed('shift'); // is key currently held?🖥 Screen
import { screen } from 'stelo';
// Dimensions
const { width, height } = screen.getSize();
// Pixel colour
const color = screen.getPixelColor(x, y);
// → { r: 59, g: 130, b: 246, hex: '#3B82F6' }
// Capture
const img = screen.capture(); // full screen
const region = screen.capture({ // specific region
x: 100, y: 100, width: 400, height: 300,
});
// img.data is a Buffer of RGBA pixels
// All displays
const displays = screen.getAllDisplays();
// → [{ id, x, y, width, height, isPrimary, scaleFactor }]
// Colour search
const point = screen.findColor('#FF0000', { tolerance: 10 });
const matches = screen.pixelMatches(x, y, '#FF0000', 5);
// Wait for colour
await screen.waitForColor(x, y, '#00FF00', {
tolerance: 10,
timeoutMs: 5000,
intervalMs: 100,
});🪟 Window
import { appWindow } from 'stelo';
// Query
const active = appWindow.getActive();
const all = appWindow.getAll();
const found = appWindow.find('Notepad');
// Focus
appWindow.focus(windowId);
appWindow.focusByTitle('Calculator');
// Geometry
appWindow.resize(id, 800, 600);
appWindow.moveTo(id, 100, 100);
appWindow.setBounds(id, { x: 0, y: 0, width: 1024, height: 768 });
// State
appWindow.minimize(id);
appWindow.maximize(id);
appWindow.restore(id);
appWindow.close(id);
// Wait for window to appear
const win = await appWindow.waitFor('MyApp', 10000);🎬 Recorder
import { recorder } from 'stelo';
// Build a sequence
const steps = recorder.sequence()
.moveTo(100, 200)
.click()
.delay(200)
.type('hello')
.press('enter')
.hotkey('ctrl', 's')
.build();
// Play it back
await recorder.play(steps);
await recorder.play(steps, { speed: 2.0 }); // 2× speed
// Serialize / restore
const json = recorder.toJSON(steps);
const restored = recorder.fromJSON(json);🛡 Safety
import { safety } from 'stelo';
// Failsafe: move mouse to any screen corner → emergency stop
safety.enableFailsafe();
safety.enableFailsafe(10); // 10px threshold
// Rate limiting
safety.setRateLimit(200); // max 200 actions/sec
// Emergency stop / reset
safety.emergencyStop();
console.log(safety.isStopped()); // true
safety.reset();
// Configure all at once
safety.configure({
failsafe: true,
failsafeThreshold: 5,
rateLimit: 500,
});🏢 Enterprise Features
Stelo includes enterprise-grade utilities for production deployments.
Telemetry & Observability
import { telemetry } from 'stelo';
// Enable telemetry
telemetry.enable();
// Subscribe to events
telemetry.onEvent((event) => {
console.log(`[${event.severity}] ${event.operation}: ${event.message}`);
});
// Track operations with timing
await telemetry.trackAsync('my-operation', async () => {
await mouse.moveSmoothly(500, 400);
});
// Get metrics
const metrics = telemetry.getMetrics();
// → { totalOps, successRate, avgLatencyMs, p95LatencyMs, ... }
// Health check
const health = await telemetry.healthCheck();
// → { status: 'healthy', checks: [...], timestamp: '...' }Circuit Breaker (Fault Tolerance)
import { CircuitBreaker } from 'stelo';
const breaker = new CircuitBreaker('external-service', {
failureThreshold: 5,
resetTimeoutMs: 30000,
});
try {
await breaker.execute(async () => {
return await riskyOperation();
});
} catch (e) {
if (breaker.getState() === 'open') {
console.log('Circuit is open, service unavailable');
}
}Validation & Input Checking
import { validate, ValidationError } from 'stelo';
// Validate coordinates
validate.coordinate(x, 'x');
validate.coordinate(y, 'y');
// Validate within screen bounds
const screenSize = screen.getSize();
validate.inBounds({ x, y }, screenSize);
// Safe validation (returns Result instead of throwing)
const result = validate.safe.coordinate(x, 'x');
if (!result.ok) {
console.error(result.error.message);
}
// Validate regions
validate.region({ x: 0, y: 0, width: 100, height: 100 });Audit Logging & Security
import { security } from 'stelo';
// Enable audit logging
security.enableAudit();
// Subscribe to audit events
security.onAudit((entry) => {
console.log(`[AUDIT] ${entry.operation}: ${entry.success ? 'OK' : 'FAIL'}`);
});
// Get audit log
const log = security.getAuditLog(100); // last 100 entries
// Sanitize user input before typing
const safeText = security.sanitizeText(userInput);
keyboard.type(safeText);
// Detect sandboxed environments
const { sandboxed, indicators } = security.detectSandbox();Enhanced Error Handling
import { SteloError, ErrorCode, retryWithBackoff } from 'stelo';
try {
await mouse.move(x, y);
} catch (e) {
if (e instanceof SteloError) {
console.log(`Error ${e.code}: ${e.message}`);
console.log('Recovery hints:', e.hints);
console.log('Diagnostics:', e.diagnostics);
// Generate detailed error report
console.log(e.toReport());
}
}
// Retry with exponential backoff
const result = await retryWithBackoff(
() => riskyOperation(),
{ maxAttempts: 5, initialDelayMs: 100 }
);Rate Limiting
import { RateLimiter } from 'stelo';
// Token bucket rate limiter
const limiter = new RateLimiter(100, 10); // 100 max, 10/sec refill
if (limiter.tryConsume()) {
await doOperation();
} else {
console.log('Rate limited, try again later');
}� Vision & Agent Primitives
Advanced screen analysis primitives designed for vision-based desktop automation.
Screen Diffing & Change Detection
import { vision, mouse } from 'stelo';
// Take reference screenshot
const before = vision.captureReference();
// Perform action
await mouse.click(500, 400);
// Compare: did the screen change?
const diff = vision.diff(before);
console.log(`${diff.changePercentage}% changed`);
if (diff.changedBounds) {
console.log('Change occurred at:', diff.changedBounds);
}Action Verification
// Verify an action caused visual change
const result = await vision.verifyAction(
async () => { await mouse.click(100, 200); },
0.5, // require at least 0.5% change
2000 // timeout
);
if (!result.verified) {
console.log('Button click had no effect - retry?');
}Wait for Screen Stability
// Wait for animations/transitions to complete
mouse.click();
await vision.waitForStable(
0.1, // max 0.1% change considered "stable"
200, // must be stable for 200ms
5000 // timeout
);
// Screen is now stable - safe to continueGrid Analysis for Vision Models
// Analyze screen as a grid (efficient for vision model region selection)
const grid = vision.analyzeGrid(16, 9); // 16x9 grid
// Find cells likely containing text or UI
const textCells = grid.cells.filter(c => c.likelyText);
const uiCells = grid.cells.filter(c => c.likelyUI);
// Click the center of cell [3, 2]
const center = vision.gridCellCenter(grid, 3, 2);
if (center) await mouse.click(center.x, center.y);Perceptual Hashing
// Fast similarity check using perceptual hashes
const hash1 = vision.perceptualHash();
await performAction();
const hash2 = vision.perceptualHash();
const distance = vision.hashDistance(hash1, hash2);
if (distance < 5) {
console.log('Screen looks mostly the same');
} else if (distance > 20) {
console.log('Screen changed significantly');
}Color Cluster Detection
// Find UI elements by color (e.g., buttons)
const blueClusters = vision.findColorClusters(
{ r: 0, g: 120, b: 215 }, // Windows accent blue
40, // tolerance
50 // min 50 pixels
);
if (blueClusters.length > 0) {
// Click center of first blue region
const btn = blueClusters[0];
mouse.click(btn.x + btn.width / 2, btn.y + btn.height / 2);
}Agent (Real-Time Control)
Low-latency primitives for controlling mouse and keyboard in real-time.
Action Batching
import { agent } from 'stelo';
// Execute multiple actions atomically - minimal latency
const result = agent.executeBatch([
{ type: 'mouseClickAt', x: 500, y: 300 },
{ type: 'waitForStable', stableMs: 200 },
{ type: 'type', text: 'Hello World' },
{ type: 'keyPress', key: 'Enter' }
], { stopOnError: true });
// Inspect results
console.log(`${result.successCount}/${result.results.length} succeeded`);
console.log(`Total time: ${result.totalDurationMs}ms`);Click with Verification
// Click and confirm the screen changed
const result = agent.clickAndVerify(500, 300, {
button: 'left',
minChangePercent: 0.5,
timeoutMs: 2000,
region: { x: 400, y: 200, width: 200, height: 200 }
});
if (!result.verified) {
console.log('Click had no visual effect');
}Fluent Sequence Builder
// Chain actions with a fluent API
const result = agent.sequence()
.clickAt(500, 300)
.waitForStable()
.type('search query')
.press('Enter')
.waitForChange({ timeout: 3000 })
.hotkey('ctrl', 'a')
.execute();Gesture Recording & Playback
// Record mouse movements
const trail = agent.recordMouseTrail(3000, 60); // 3 sec at 60Hz
// Later, replay the gesture
agent.replayMouseTrail(trail, 0.5); // Half speedError Recovery
// Release all held modifier keys (error recovery)
agent.releaseAllModifiers();
// Type and wait for autocomplete/validation to finish
const stable = agent.typeAndWaitStable('[email protected]', {
stabilityThreshold: 0.1,
stableDurationMs: 200,
timeoutMs: 3000
});🛠 Utilities
import { delay, retry, waitUntil, batch, bezier } from 'stelo';
await delay(500); // sleep
const result = await retry(() => riskyOp(), { // auto-retry
maxAttempts: 5,
delayMs: 200,
backoffMultiplier: 1.5,
});
await waitUntil(() => isReady(), { // poll
timeoutMs: 10000,
intervalMs: 100,
});
await batch(tasks, { concurrency: 3 }); // parallel with limit
const pt = bezier.cubic(p0, p1, p2, p3, 0.5); // bezier evaluationFAQ
Does Stelo require a C++ compiler?
No. Prebuilt binaries are published for all supported platforms. If you're building from source, you need Rust (not C++).
How does humanized movement work?
Stelo implements the Wind Mouse algorithm — a physics-inspired model that simulates human hand tremor, acceleration, and overshoot. Combined with cubic bezier curves and arc-length parameterization, mouse paths look indistinguishable from real user input.
Is Stelo safe to use in production?
Yes. The safety module provides failsafe (mouse-to-corner abort), rate limiting, and emergency stop. Always enable failsafe in automation scripts.
Does Stelo work in headless / CI environments?
On Linux, use Xvfb (X Virtual Framebuffer) to run Stelo without a physical display. Windows and macOS require a desktop session.
What about Wayland on Linux?
Stelo currently uses X11 / XTest. It works under XWayland on most Wayland compositors. Native Wayland input injection is planned.
License
Apache-2.0 © Stelo Contributors
