@ui-tars-test/operator-nutjs
v0.3.5
Published
computer operator based NutJS for GUI Agent
Maintainers
Readme
NutJS Operator
Overview
NutJS Operator is a computer operator based on NutJS for GUI Agent. It provides a set of APIs to interact with the desktop environment, including taking screenshots, mouse operations, keyboard operations, and more.
Installation
npm install @ui-tars-test/operator-nutjsOr with yarn:
yarn add @ui-tars-test/operator-nutjsOr with pnpm:
pnpm add @ui-tars-test/operator-nutjsFeatures
- Screenshot: Capture the screen with proper scaling for high DPI displays
- Mouse Operations: Move, click, double-click, right-click, drag, etc.
- Keyboard Operations: Type text, press hotkeys, etc.
- Scroll: Scroll up and down
- Wait: Wait for a specified time
Usage
import { NutJSOperator } from '@ui-tars-test/operator-nutjs';
import { ConsoleLogger, LogLevel } from '@agent-infra/logger';
// Create a logger
const logger = new ConsoleLogger(undefined, LogLevel.DEBUG);
// Create an operator instance
const operator = new NutJSOperator(logger);
// Take a screenshot
const screenshot = await operator.screenshot();
console.log('Screenshot taken:', screenshot.status);
// Execute actions
const result = await operator.execute({
actions: [
{
type: 'click',
inputs: {
point: {
normalized: { x: 0.5, y: 0.5 } // Click at the center of the screen
}
}
},
{
type: 'type',
inputs: {
content: 'Hello, World!'
}
}
]
});API Reference
NutJSOperator
The main class that provides methods to interact with the desktop environment.
Constructor
constructor(logger: ConsoleLogger = defaultLogger)logger: A ConsoleLogger instance for logging. Default is a ConsoleLogger with LogLevel.DEBUG.
Methods
screenshot(): Promise<ScreenshotOutput>
Takes a screenshot of the screen.
- Returns: A promise that resolves to a
ScreenshotOutputobject containing:base64: The base64-encoded image datacontentType: The content type of the image (e.g., 'image/jpeg')status: The status of the operation ('success' or 'error')
execute(params: ExecuteParams): Promise<ExecuteOutput>
Executes a list of actions.
params: An object containing:actions: An array of action objects
- Returns: A promise that resolves to an
ExecuteOutputobject containing:status: The status of the operation ('success' or 'error')
Supported Actions
Mouse Actions
move,move_to,mouse_move,hover: Move the mouse to a specified positionclick,left_click,left_single: Perform a left mouse clickleft_double,double_click: Perform a double left mouse clickright_click,right_single: Perform a right mouse clickmiddle_click: Perform a middle mouse clickleft_click_drag,drag,select: Drag the mouse from one position to another
Keyboard Actions
type: Type texthotkey: Press a hotkey combinationpress: Press a keyrelease: Release a key
Other Actions
scroll: Scroll up or downwait: Wait for a specified timefinished: Do nothing (used to indicate the end of actions)
License
Apache-2.0
