@zegocloud/auto-web
v1.0.0
Published
Multi-tab web automation CLI based on Midscene.js
Downloads
152
Readme
@zegocloud/auto-web
Multi-tab web automation CLI powered by Midscene.js.
Manage multiple browser tabs with named identifiers, making it easy to test multi-user scenarios like chat applications.
Features
- Named tabs - Connect multiple tabs with memorable names, even to the same URL
- AI-driven actions - Use natural language to interact with web pages via Midscene
- Stateless CLI - Tab state persists across invocations, browser stays running in the background
- Full action set - tap, input, scroll, drag, hover, screenshot, and more
Prerequisites
AI Model Configuration
Midscene requires a vision-capable AI model. Set these environment variables:
# Example: Doubao Seed 2.0 Lite
export MIDSCENE_MODEL_API_KEY="your-api-key"
export MIDSCENE_MODEL_NAME="doubao-seed-2-0-lite"
export MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
export MIDSCENE_MODEL_FAMILY="doubao-seed"Or create a .env file in your working directory. See Midscene Model Configuration for all supported providers.
Chrome
Google Chrome or Chromium must be installed. Set MIDSCENE_MCP_CHROME_PATH to override the default path.
Installation
npx -y @zegocloud/auto-web <command>Or install globally:
npm install -g @zegocloud/auto-web
auto-web <command>Quick Start
Multi-User Chat Test
# Create two tabs pointing to the same URL
auto-web connect --url "http://localhost:3000" --tab user1
auto-web connect --url "http://localhost:3000" --tab user2
# Login as user1
auto-web act --tab user1 --prompt "Login with username 'alice' and password 'pass123'"
# Login as user2
auto-web act --tab user2 --prompt "Login with username 'bob' and password 'pass456'"
# user1 sends a message
auto-web act --tab user1 --prompt "Type 'Hello Bob!' in the chat and send it"
# user2 verifies and replies
auto-web act --tab user2 --prompt "Verify 'Hello Bob!' is visible, then reply 'Hi Alice!'"
# Take a screenshot of user2's tab
auto-web take_screenshot --tab user2
# Clean up
auto-web close-tab --tab user1
auto-web close-tab --tab user2Single Tab Usage
# Connect a tab (becomes the default)
auto-web connect --url "https://example.com" --tab main
# --tab is optional, uses the last connected tab
auto-web act --prompt "Click the login button"
# Explicitly target a tab
auto-web act --tab main --prompt "Search for 'automation'"Commands
Tab Management
| Command | Description |
|---------|-------------|
| connect --url <url> --tab <name> | Create a named tab connected to a URL |
| close-tab --tab <name> | Close a specific tab |
| list-tabs | List all connected tabs |
| disconnect | Disconnect all tabs (browser keeps running) |
| close | Close the browser completely |
Actions (all support --tab <name>)
| Command | Description |
|---------|-------------|
| act --prompt <text> | Execute a natural language action (multi-step) |
| tap --locate <json> | Click an element |
| input --value <text> --locate <json> | Type text into an element |
| scroll --direction <dir> | Scroll the page |
| hover --locate <json> | Hover over an element |
| take_screenshot | Capture a screenshot |
| keyboardpress --keyName <key> | Press a key or key combination |
| rightclick --locate <json> | Right-click an element |
| draganddrop --from <json> --to <json> | Drag and drop |
| swipe --direction <dir> --distance <px> | Swipe gesture |
Run auto-web <command> --help for detailed options.
How It Works
┌─────────────────────────────────────────┐
│ Chrome (headless) │
│ ┌──────────┐ ┌──────────┐ │
│ │ Tab A │ │ Tab B │ ... │
│ │ (user1) │ │ (user2) │ │
│ └──────────┘ └──────────┘ │
│ ▲ ▲ │
│ │ CDP (WebSocket) │
├───────┴─────────────┴───────────────────┤
│ State File: /tmp/zegocloud-auto-web-tabs.json │
│ Endpoint: /tmp/zegocloud-auto-web-endpoint │
└─────────────────────────────────────────┘- Browser persists as a detached Chrome process between CLI invocations
- Tab state (name, target ID, URL) is saved to a JSON file
- Each CLI invocation reconnects to the browser and resolves tabs by target ID
Environment Variables
| Variable | Description |
|----------|-------------|
| MIDSCENE_MODEL_API_KEY | AI model API key (required) |
| MIDSCENE_MODEL_NAME | Model name (required) |
| MIDSCENE_MODEL_BASE_URL | Model API base URL (required) |
| MIDSCENE_MODEL_FAMILY | Model family identifier |
| MIDSCENE_MCP_CHROME_PATH | Custom Chrome binary path |
License
MIT
