@cuylabs/computer
v0.1.1
Published
Computer automation toolkit for AI agents - unified interface for Docker, Browser, VNC, and QEMU backends
Downloads
147
Maintainers
Readme
@cuylabs/computer
A unified TypeScript interface for desktop and browser automation, designed for AI-powered computer use applications.
Features
- 🖥️ Multiple Backends - Docker containers, headless browsers, VNC, QEMU VMs
- ⌨️ Full Input Control - Keyboard, mouse, scroll, drag operations
- 📸 Screen Capture - Screenshots in PNG/JPEG/WebP, video recording
- 🎯 Session Management - State tracking, lifecycle hooks, capture integration
- 🔌 VNC Protocol - Native RFB 3.8 client for remote desktop capture
- 📦 Zero Config - Sensible defaults, works out of the box
Installation
npm install @cuylabs/computerBackend-Specific Setup
Docker Backend (default, no extra setup):
# Just need Docker running on your system
docker --versionBrowser Backend (requires Playwright):
# Install Playwright
npm install playwright
# Download browser binaries (~300MB)
npm run setup:browser # Chromium only (recommended)
# OR
npm run setup:browser:all # All browsers (Chromium, Firefox, WebKit)Video Conversion (optional, for MP4/MOV/GIF output):
# Install bundled ffmpeg (~70MB, auto-downloads for your platform)
npm run setup:ffmpeg
# OR use system ffmpeg if already installed
brew install ffmpeg # macOS
apt install ffmpeg # Ubuntu/DebianQuick Start
Docker Backend
import { Computer, docker } from "@cuylabs/computer";
// Create a Docker-based computer
const computer = docker({
image: "ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest",
display: { width: 1024, height: 768 },
});
await computer.start();
await computer.waitUntilReady();
// Interact with the desktop
await computer.type("Hello, world!");
await computer.click("left", { x: 100, y: 200 });
await computer.key("Return");
// Take a screenshot
const screenshot = await computer.screenshot();
// Run commands
const result = await computer.execute("ls -la");
console.log(result.stdout);
await computer.stop();Browser Backend
import { browser } from "@cuylabs/computer";
// Headless browser automation
const computer = browser({
mode: "headless",
browserType: "chromium",
});
await computer.start();
// Navigate and interact
await computer.goto("https://example.com");
await computer.type("search query");
await computer.click("left", { x: 500, y: 300 });
// Execute JavaScript in page context
await computer.execute('document.querySelector("button").click()');
const screenshot = await computer.screenshot();
await computer.stop();Factory Functions
import { docker, browser, createRuntime } from "@cuylabs/computer";
// Convenience factory functions
const dockerComputer = docker({ image: "ubuntu:latest" });
const browserComputer = browser({ mode: "headed" });
// Or use the generic factory
const computer = createRuntime({
backend: "docker",
image: "my-image:latest",
display: { width: 1920, height: 1080 },
});API Reference
Computer Class
The main class for desktop automation.
import { Computer } from "@cuylabs/computer";
const computer = new Computer({
backend: "docker",
image: "ubuntu:latest",
display: { width: 1024, height: 768 },
});Lifecycle Methods
| Method | Description |
|--------|-------------|
| start() | Start the computer environment |
| stop() | Stop and cleanup resources |
| waitUntilReady(timeout?) | Wait for the environment to be ready |
| isReady() | Check if the environment is ready |
Input Methods
| Method | Description |
|--------|-------------|
| type(text) | Type text with realistic delays |
| key(key) | Press a key or key combination |
| click(button, coords?) | Click mouse button |
| doubleClick(x?, y?) | Double-click |
| rightClick(x?, y?) | Right-click |
| moveMouse(x, y) | Move mouse to coordinates |
| drag(from, to) | Drag from one point to another |
| scroll(direction, amount) | Scroll in a direction |
Screen Methods
| Method | Description |
|--------|-------------|
| screenshot(options?) | Capture screenshot as base64 |
| getCursorPosition() | Get current cursor position |
Command Execution
| Method | Description |
|--------|-------------|
| execute(command) | Run a shell command |
Display Configuration
import { DISPLAY } from "@cuylabs/computer";
// Preset display sizes
DISPLAY.XGA // 1024x768
DISPLAY.WXGA // 1280x800
DISPLAY.HD // 1366x768
DISPLAY.FHD // 1920x1080
DISPLAY.QHD // 2560x1440Session State
Track the current state of the computer session:
import { SessionState } from "@cuylabs/computer";
// States: idle | starting | ready | stopping | stopped | error
computer.state; // Current stateCapture Module
Built-in support for capturing screenshots and recording sessions.
import { Computer, ComputerCapture } from "@cuylabs/computer";
const computer = docker({ image: "ubuntu:latest" });
await computer.start();
// Create capture instance
const capture = new ComputerCapture(computer, {
outputDir: "./recordings",
screenshotOnActions: ["click", "type"],
captureVideo: true,
});
await capture.start();
// Interactions are automatically captured
await computer.click("left", { x: 100, y: 100 });
await computer.type("Hello");
// Manual screenshot
const meta = await capture.takeScreenshot("custom-moment");
await capture.stop();VNC Module
Native VNC client for remote desktop capture and recording.
import { VNCClient, RFBRecorder } from "@cuylabs/computer";
// Connect to VNC server
const client = new VNCClient({
host: "localhost",
port: 5900,
password: "secret",
});
await client.connect();
// Stream frames
for await (const frame of client.frames()) {
console.log(`Frame: ${frame.width}x${frame.height}`);
}
// Or record to file
const recorder = new RFBRecorder(client, {
outputPath: "./recording.rfb",
});
await recorder.start();
// ... do stuff ...
await recorder.stop();Backend Options
Docker
interface DockerRuntimeOptions {
image: string; // Docker image
display?: DisplayConfig; // Screen resolution
memory?: string; // Memory limit (e.g., "4g")
cpus?: number; // CPU limit
volumes?: string[]; // Volume mounts
env?: Record<string, string>; // Environment variables
ports?: Record<number, number>; // Port mappings
}Browser
interface BrowserRuntimeOptions {
mode: "headless" | "headed" | "viewer";
browserType?: "chromium" | "firefox" | "webkit";
display?: DisplayConfig;
viewport?: { width: number; height: number };
userAgent?: string;
timeout?: number;
}QEMU (Experimental)
interface QemuRuntimeOptions {
image: string; // QCOW2 disk image path
display?: DisplayConfig;
memory?: string;
cpus?: number;
vncPort?: number;
}Error Handling
import {
ComputerError,
ExecutionError,
InputError,
RuntimeNotReadyError,
ScreenCaptureError,
TimeoutError,
} from "@cuylabs/computer";
try {
await computer.execute("invalid-command");
} catch (err) {
if (err instanceof ExecutionError) {
console.log("Command failed:", err.message);
} else if (err instanceof RuntimeNotReadyError) {
console.log("Computer not ready yet");
}
}Logging
import { setLogLevel } from "@cuylabs/computer";
// Set log level: "trace" | "debug" | "info" | "warn" | "error" | "silent"
setLogLevel("debug");TypeScript Support
Full TypeScript support with exported types:
import type {
Backend,
Coordinate,
DisplayResolution,
ExecutionResult,
MouseButton,
ScrollDirection,
Key,
Runtime,
} from "@cuylabs/computer";Related Packages
- @cuylabs/computer-agent - AI SDK tools for computer use with Anthropic Claude
License
Apache-2.0
