npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

stelo

v12.0.4

Published

The computer-use engine. Give any AI full autonomous control of any desktop — mouse, keyboard, screen, audio, vision, OCR, RTC — all native Rust, zero npm runtime dependencies, sub-millisecond.

Readme

Stelo

The Computer-Use Engine

npm License: Apache-2.0 Windows macOS Linux

Give any AI full autonomous control of any desktop.

Mouse . Keyboard . Screen . Vision . OCR . Audio . Mic . Speaker . Voice Isolation . RTC . Clipboard . Window . Process . Recording . Hooks . Template Matching . Gestures . Workflows

Every operation is native Rust. Sub-millisecond. Zero dependencies. Zero compromise.


What is Stelo?

Stelo is a complete computer-use engine. It gives AI agents -- or any program -- the ability to see, click, type, hear, speak, and stream an entire desktop, all through a single npm install with no native compiler required.

Everything runs in Rust through napi-rs. There is no Python subprocess, no Electron wrapper, no puppeteer shim. When you call screen.capture(), you get raw pixels from the GPU in under 10ms. When you call mouse.moveSmoothly(), a physics-modeled cursor path executes on a native thread while your event loop stays free. When you call ocr.recognize(), the OS-native OCR engine returns word-level bounding boxes with no external model.

v12.0.0 — Wayland-native Linux, zero runtime npm dependencies, hardened supply chain. 200+ native functions. 20 modules.

npm install stelo
import {
  mouse, keyboard, screen, vision, ocr, agent, browser,
  appWindow, recorder, safety, clipboard, process,
  stream, audioStream, micStream, speaker, voiceIsolation,
  hooks, template, rtc, workflow
} from 'stelo';

Node.js >= 20 required. Prebuilt binaries for Windows, macOS, and Linux -- no compiler needed.


150+ Native Functions Across 19 Modules

| Module | What It Does | Key Functions | | ------------------ | ---------------------------------------------------- | ---------------------------------------------------------------------------------------------- | | mouse | Full cursor control with physics-modeled movement | move, moveSmoothly, moveHumanized, click, drag, scroll, gesture* | | keyboard | Typing, hotkeys, sequences, batch input | type, typeHumanized, hotkey, sequence, typeBatch, waitForKey | | screen | GPU-accelerated capture, pixel analysis | capture, capturePng, captureJpeg, getPixelColor, findColor, getAllDisplays | | vision | Screen diffing, change detection, perceptual hashing | diff, waitForChange, waitForStable, analyzeGrid, perceptualHash, findColorClusters | | ocr | Native OS OCR with word-level positions | recognize, findText, clickText, waitForText, waitForTextAndClick | | agent | Batch execution, verification, parallel control | executeBatch, clickAndVerify, sequence(), parallel(), createCursorStream | | appWindow | Window discovery, focus, geometry, capture | getActive, find, focus, resize, capture, captureJpeg, sendClick, sendKey | | recorder | Record and replay input sequences | start, stop, replay, sequence().build() | | safety | Failsafe, rate limiting, emergency stop | enableFailsafe, setRateLimit, emergencyStop, reset | | clipboard | Text, HTML, and image clipboard access | readText, writeText, readHtml, readImage, formats | | process | Process listing, launching, killing | list, find, launch, kill, isRunning, foregroundPid | | browser | Cross-platform URL navigation | openUrl, normalizeBrowserUrl, findBrowserWindow | | stream | Real-time screen capture at 1-120 FPS | start, getLatestFrame, waitForFrame, getStats | | audioStream | System audio loopback capture (WASAPI) | start, getFormat, getLatestChunk, drainChunks | | micStream | Microphone capture | start, getFormat, getLatestChunk, drainChunks | | speaker | Audio playback (PCM16 mono) | start, write, clear, getStats | | voiceIsolation | Neural voice isolation with enrollment | enroll, enable, setAggression, getStats | | hooks | Global keyboard/mouse event listeners | start, drainEvents, eventCount | | template | Template matching via NCC | findOnScreen, findAllOnScreen, waitFor | | rtc | QUIC-based real-time communication | create, connect, sendScreenStart, sendAudioStart, channelOpen | | workflow | Declarative automation with retry/branching | create, step, condition, loop, run |


Quick Start

Open a URL in the Browser

import { openUrl } from 'stelo';

// Windows, macOS, Linux — CDP → keyboard → OS launch (start / open / xdg-open)
await openUrl('https://www.youtube.com');

On linuxserver Chromium + labwc, bake --remote-debugging-port=9222 so CDP opens tabs visible in Selkies. Do not use STELO_SANDBOX=1 on Wayland desktops.

See the Screen

import { screen, ocr } from 'stelo';

// Capture as PNG bytes (ready to send to any vision model)
const png = screen.capturePng();

// Or capture a region as JPEG with quality control
const jpeg = screen.captureJpeg({ quality: 80, x: 0, y: 0, width: 800, height: 600 });

// Read all text on screen with word-level bounding boxes
const result = ocr.recognize();
console.log(result.text);
result.lines.forEach(line => {
  line.words.forEach(word => {
    console.log(`"${word.text}" at (${word.x}, ${word.y}) confidence=${word.confidence}`);
  });
});

// Find and click text directly
ocr.clickText('Submit');

Control the Mouse

import { mouse } from 'stelo';

// Instant teleport
mouse.move(500, 400);

// Physics-modeled smooth movement (Bezier + arc-length parameterization)
await mouse.moveSmoothly(800, 600, { duration: 500, curve: 'bezier' });

// Human-indistinguishable movement (Wind Mouse algorithm)
await mouse.moveHumanized(300, 200, { jitter: 0.5, overshoot: true });

// Click, drag, scroll
mouse.click();
mouse.drag(100, 100, 500, 500);
mouse.scroll(3, 'down');

// Complex gestures
mouse.gestureCircle(400, 400, 100, 2000);
mouse.gestureSwipe(0, 500, 1920, 500, 300);
mouse.gestureSpiral(960, 540, 200, 10, 3, 3000);

Type and Use Shortcuts

import { keyboard } from 'stelo';

// Instant typing (single SendInput call, 10-100x faster than per-key)
keyboard.typeBatch('Hello World');

// Human-like typing with randomized delays
await keyboard.typeHumanized('Natural typing speed', { minDelay: 40, maxDelay: 120 });

// Hotkeys and sequences
keyboard.hotkey('ctrl', 'shift', 'p');
keyboard.sequence('ctrl+a | ctrl+c | alt+tab | ctrl+v', 50);

// Convenience shortcuts
keyboard.selectAll();
keyboard.copy();
keyboard.paste();
keyboard.save();

See What Changed

import { vision, screen, mouse } from 'stelo';

// Take reference, act, verify
const before = screen.capture();
mouse.click(500, 300);

const diff = vision.diff(before);
console.log(`${diff.changePercentage}% of screen changed`);

// Wait for animations to finish
await vision.waitForStable(0.1, 200, 5000);

// Perceptual hash for fast similarity checks
const hash1 = vision.perceptualHash();
// ... do something ...
const hash2 = vision.perceptualHash();
const distance = vision.hashDistance(hash1, hash2);
// distance < 5 = nearly identical, > 20 = major change

AI Agent Loop (Observe -> Act -> Verify -> Repeat)

import { agent, screen, ocr, vision } from 'stelo';

// The core agent loop
while (true) {
  // 1. Observe
  const screenshot = screen.captureJpeg({ quality: 70 });
  const text = ocr.recognize();

  // 2. Decide (send to your LLM -- Gemini, Claude, GPT, local model)
  const action = await yourLLM.decide(screenshot, text);

  // 3. Act with verification
  const result = agent.clickAndVerify(action.x, action.y, {
    minChangePercent: 0.5,
    timeoutMs: 2000,
  });

  // 4. Verify the action worked
  if (!result.verified) continue;

  // 5. Wait for stability before next action
  await vision.waitForStable(0.1, 200, 5000);
}

Core Capabilities

Native Resolution Everywhere

Stelo forces Per-Monitor DPI Awareness V2 on Windows automatically. Screenshot coordinates = mouse coordinates = vision model coordinates. Zero drift, zero scaling math, single coordinate space end-to-end.

import { screen, mouse } from 'stelo';

const { width, height } = screen.getSize(); // True native resolution
const pixel = screen.getPixelColor(960, 540); // Exact pixel, no DPI scaling
mouse.move(960, 540); // Lands exactly where expected

Parallel Action Execution

True OS-level parallelism on native threads. Not async pretend.

import { agent } from 'stelo';

const result = agent.parallelExecute([
  {
    groupId: 'mouse-work',
    actions: [
      { type: 'mouseMoveSmooth', x: 500, y: 300, duration: 600 },
      { type: 'mouseClick' },
    ]
  },
  {
    groupId: 'keyboard-work',
    startDelayMs: 300,
    actions: [
      { type: 'type', text: 'search query' },
      { type: 'keyPress', key: 'Enter' },
    ]
  }
]);

console.log(`All done in ${result.totalDurationMs}ms`);

Action Batching

Execute complex sequences atomically with minimal latency:

const result = agent.executeBatch([
  { type: 'mouseClickAt', x: 500, y: 300 },
  { type: 'waitForStable', stableMs: 200 },
  { type: 'type', text: 'hello world' },
  { type: 'keyPress', key: 'Enter' },
  { type: 'waitForChange', timeout: 3000 },
], { stopOnError: true });

// Or use the fluent builder
agent.sequence()
  .clickAt(500, 300)
  .waitForStable()
  .type('hello world')
  .press('Enter')
  .waitForChange({ timeout: 3000 })
  .execute();

Real-Time Screen Streaming

Continuous screen capture at up to 120 FPS with cursor tracking:

import { stream } from 'stelo';

stream.start({ fps: 60, scale: 0.5 });

const frame = stream.getLatestFrame();
// frame.data = raw RGBA pixels
// frame.cursorX, frame.cursorY = cursor position at capture time
// frame.measuredFps = actual FPS being achieved

const next = stream.waitForFrame(100); // Block until next frame
stream.stop();

Audio Pipeline

Full audio I/O: system loopback capture, microphone capture, speaker playback, and neural voice isolation.

import { audioStream, micStream, speaker, voiceIsolation } from 'stelo';

// Capture system audio
audioStream.start();
const format = audioStream.getFormat();
const chunks = audioStream.drainChunks();

// Capture microphone
micStream.start();
const micChunk = micStream.getLatestChunk();

// Voice isolation -- enroll your voice, suppress everything else
voiceIsolation.enroll();
voiceIsolation.enrollFinish();
voiceIsolation.enable();
voiceIsolation.setAggression(0.7);

// Play audio
speaker.start({ inputSampleRate: 24000 });
speaker.write(pcm16Buffer);

Window Control

Complete window management including background window capture and input:

import { appWindow } from 'stelo';

const wins = appWindow.find('Chrome');
appWindow.focus(wins[0].id);

// Capture background window (no focus required)
const capture = appWindow.capture(wins[0].id);
const jpeg = appWindow.captureJpeg(wins[0].id, 80);

// Send input to background window
appWindow.sendClick(wins[0].id, 100, 200);
appWindow.sendKey(wins[0].id, 0x0D);

// Window geometry
appWindow.resize(wins[0].id, 1920, 1080);
appWindow.center(wins[0].id);
appWindow.setAlwaysOnTop(wins[0].id, true);
appWindow.setOpacity(wins[0].id, 0.8);

const win = await appWindow.waitFor('MyApp', 10000);

Template Matching

Find images on screen using normalized cross-correlation:

import { template, mouse } from 'stelo';

const match = template.findOnScreen(buttonRgbaData, buttonWidth, buttonHeight, {
  threshold: 0.85,
});

if (match) {
  mouse.click(match.x + match.width / 2, match.y + match.height / 2);
}

const result = await template.waitFor(iconData, iconWidth, iconHeight, {
  threshold: 0.8,
  timeoutMs: 10000,
});

Real-Time Communication (RTC)

QUIC-based peer-to-peer streaming with <50ms latency:

import { rtc } from 'stelo';

// Host
rtc.create({ port: 9000 });
const offer = rtc.getOffer();

// Remote
rtc.create();
rtc.connect(offer, 5000);

// Stream screen, audio, mic
rtc.sendScreenStart({ fps: 60, scale: 0.75 });
rtc.sendAudioStart();
rtc.sendMicStart();

// Data channels
const ch = rtc.channelOpen('control');
rtc.channelSend(ch, Buffer.from(JSON.stringify({ action: 'click', x: 100, y: 200 })));

Global Input Hooks

Listen to all keyboard and mouse events system-wide:

import { hooks } from 'stelo';

hooks.start({
  captureKeyboard: true,
  captureMouseClicks: true,
  captureMouseMove: false,
  captureMouseWheel: true,
  bufferSize: 1000,
});

setInterval(() => {
  const events = hooks.drainEvents();
  for (const e of events) {
    if (e.eventType === 'key_down') {
      console.log(`Key pressed: ${e.keyName}`);
    }
  }
}, 100);

Clipboard

Full clipboard access -- text, HTML, and raw images:

import { clipboard } from 'stelo';

clipboard.writeText('Hello');
const text = clipboard.readText();

clipboard.writeHtml('<b>Bold</b>');
const html = clipboard.readHtml();

const image = clipboard.readImage();
const formats = clipboard.formats();
clipboard.clear();

Process Management

import { process } from 'stelo';

const all = process.list();
const chromes = process.find('chrome');
const running = process.isRunning(1234);

const pid = process.launch('notepad.exe');
const exitCode = process.launchAndWait('cmd.exe', ['/c', 'dir']);
process.kill(pid);

Safety and Failsafe

import { safety } from 'stelo';

safety.enableFailsafe(10);
safety.setRateLimit(200);

safety.emergencyStop();
console.log(safety.isStopped()); // true
safety.reset();

Workflow Engine

Declarative automation with built-in retry, conditional branching, and error recovery:

import { workflow, ocr, keyboard, vision, safety, screen } from 'stelo';

const loginFlow = workflow.create('login')
  .step('wait-for-page', async () => {
    await ocr.waitForText('Sign In', 10000);
  })
  .step('enter-credentials', async (ctx) => {
    ocr.clickText('Email');
    await keyboard.typeHumanized(ctx.get('email'));
    keyboard.press('tab');
    await keyboard.typeHumanized(ctx.get('password'));
  })
  .step('submit', async () => {
    ocr.clickText('Sign In');
    await vision.waitForStable(0.1, 500, 10000);
  })
  .retry({ maxAttempts: 3, delayMs: 1000 })
  .onError(async (error, ctx) => {
    screen.saveScreenshot(`error-${Date.now()}.png`);
    safety.emergencyStop();
  });

await loginFlow.run({ email: '[email protected]', password: '***' });

Architecture

+------------------------------------------------------+
|                    Your Application                   |
|         (AI Agent / Automation Script / RPA)          |
+------------------------------------------------------+
|                  Stelo TypeScript API                 |
|   mouse . keyboard . screen . vision . ocr . agent   |
|   window . recorder . safety . clipboard . process    |
|   stream . audio . mic . speaker . voice . hooks     |
|   template . rtc . workflow                          |
+------------------------------------------------------+
|                  Stelo Rust Core (napi-rs)            |
| +----------+ +----------+ +----------+ +----------+  |
| |  Input   | |  Screen  | |  Audio   | |  Network |  |
| | Mouse    | | Capture  | | WASAPI   | | QUIC/RTC |  |
| | Keyboard | | Vision   | | Mic      | | LZ4+Delta|  |
| | Hooks    | | OCR      | | Speaker  | | Channels |  |
| | Gestures | | Template | | Voice    | | H.264    |  |
| +----------+ +----------+ +----------+ +----------+  |
| +----------+ +----------+ +----------+ +----------+  |
| |  Window  | | Process  | |Clipboard | | Recorder |  |
| | Manage   | | Control  | | Text/HTML| | Record   |  |
| | Capture  | | Launch   | | Image    | | Replay   |  |
| | Send I/O | | Kill     | | Formats  | | Sequence |  |
| +----------+ +----------+ +----------+ +----------+  |
+------------------------------------------------------+
|              Platform Abstraction Layer               |
|  Win32 API  .  Core Graphics  .  X11/XCB/XTest       |
|  WASAPI     .  Core Audio     .  ALSA/PulseAudio     |
|  WinRT OCR  .  Vision.framework . Tesseract           |
+------------------------------------------------------+

Supported Platforms

| OS | Arch | Package | | ------- | --------------------- | ------------------------ | | Windows | x64 | @yuta123133/win32-x64-msvc | | Windows | arm64 | @yuta123133/win32-arm64-msvc| | macOS | x64 (Intel) | @yuta123133/darwin-x64 | | macOS | arm64 (Apple Silicon) | @yuta123133/darwin-arm64 | | Linux | x64 (glibc) | @yuta123133/linux-x64-gnu | | Linux | x64 (musl) | @yuta123133/linux-x64-musl | | Linux | arm64 (glibc) | @yuta123133/linux-arm64-gnu | | Linux | arm64 (musl) | @yuta123133/linux-arm64-musl|

Platform packages are auto-installed via optionalDependencies as @yuta123133/* (your npm user scope — no org required; avoids squatted unscoped names).

Linux sandboxes: npm run rebuild:linux (Docker + Ubuntu 22.04) builds both @yuta123133/linux-x64-gnu and @yuta123133/linux-x64-musl into npm/linux-x64-*.


Performance

| Operation | Latency | Notes | | --------------------------- | ------------ | --------------------------------------- | | Screen capture (full 1080p) | 5-10ms | DXGI/DWM on Windows, IOSurface on macOS | | Screen capture (full 4K) | 10-25ms | Native GPU path | | Mouse move (instant) | <0.1ms | Single SendInput call | | Mouse move (smooth, 500ms) | ~500ms | Non-blocking, event loop free | | Keyboard type batch | <0.5ms | Entire string in one SendInput | | OCR (full screen) | 50-200ms | WinRT OCR engine | | Vision diff (1080p) | 2-5ms | SIMD-optimized in Rust | | Template match (1080p) | 10-50ms | NCC with grayscale optimization | | Perceptual hash | 1-3ms | 64-bit DCT hash | | Action batch (10 actions) | 5-20ms | Sequential in native thread | | RTC frame transfer | <10ms | QUIC + LZ4 delta compression | | Audio chunk capture | ~10ms | WASAPI exclusive mode |


FAQ

Does Stelo require a C++ compiler?

No. Prebuilt binaries are published for all supported platforms. Building from source requires Rust (not C++).

How does humanized movement work?

Stelo implements the Wind Mouse algorithm -- a physics-inspired model that simulates human hand tremor, acceleration, and overshoot. Combined with cubic Bezier curves and arc-length parameterization, mouse paths are indistinguishable from real user input.

What about screen scaling / HiDPI?

Stelo forces Per-Monitor DPI Awareness V2 on Windows. All coordinates and captures use true native resolution -- no DPI virtualization.

Is it safe for production?

Yes. The safety module provides failsafe (mouse-to-corner abort), rate limiting, and emergency stop. Always enable failsafe.

Does it work headless / in CI?

On Linux, use a headless Wayland compositor (e.g. Weston with --backend=headless). Windows and macOS require a desktop session.

What about Wayland?

Linux builds are Wayland-native (wlr-screencopy, uinput, wlroots window APIs). X11/Xvfb is not supported.

Is the npm package safe to install?

Stelo has no runtime npm dependencies — only dev tools to build from source. Install with npm ci --ignore-scripts (recommended). See SECURITY.md for lockfile verification, audit gates, and provenance.

Stelo Sandbox (Selkies viewport for companies)

Give customers a browser desktop plus Stelo automation without building WebRTC yourself. Default stack: Ubuntu 26.04 + GNOME 50 + Selkies viewport.

npm run sandbox:build   # first time: build Ubuntu 26 / GNOME 50 image
npm run sandbox:up      # http://localhost:8080 (GNOME in browser)
npm run sandbox:url
import { sandbox } from 'stelo/sandbox';
const { viewportUrl } = sandbox.start({ password: 'change-me' });

Full setup, profiles, firewall/TURN, and security notes: docs/SELKIES-SANDBOX.md

How is this different from cloud computer-use services?

Cloud services charge monthly fees to remote-control a browser. Stelo is a local SDK that controls your entire desktop at native speed. You own it. You run it. No API key required.


License

Apache-2.0 (c) Stelo Contributors