gemini-cua
v1.0.5
Published
Model Context Protocol server for Gemini computer use functionality with browser automation using Playwright
Downloads
18
Maintainers
Readme
Gemini CUA - Computer Use Automation
A Model Context Protocol (MCP) server that provides computer use and browser automation capabilities based on Google's Gemini computer use preview functionality. This server enables AI assistants to control web browsers and interact with web pages through natural language commands.
Features
This MCP server provides tools for browser automation and computer interaction with support for both local and remote environments:
Supported Environments
- Playwright (Local): Run browser automation locally with Chromium
- Browserbase (Remote): Use Browserbase's cloud browser infrastructure
Capabilities
- Browser Control: Open web browser, navigate to URLs, go back/forward
- Mouse Interactions: Click, hover, drag and drop at specific coordinates
- Keyboard Actions: Type text, execute key combinations
- Page Navigation: Scroll, search, navigate to specific URLs
- State Capture: Take screenshots and get current page state with base64 encoding
- Session Management: Automatic session creation and cleanup (Browserbase)
- Waiting: Built-in delays for page loading
Installation
From npm
npm install -g gemini-cuaFrom source
git clone https://github.com/snakecased/gemini-cua-mcp.git
cd gemini-cua
npm install
npm run buildUsage
As MCP Server (Recommended)
Add to your MCP client configuration (e.g., Claude Desktop, Cline):
{
"mcpServers": {
"gemini-computer-use": {
"command": "gemini-cua"
}
}
}Or if installed locally:
{
"mcpServers": {
"gemini-computer-use": {
"command": "npx",
"args": ["gemini-cua"]
}
}
}Direct Usage
# If installed globally
gemini-cua
# If installed locally
npx gemini-cua
# From source
npm run startRequirements
- Node.js 18+
- Chromium browser (automatically installed via Playwright)
- For Browserbase: API key and Project ID from browserbase.com
Environment Configuration
Environment Variables
Set the following environment variable to choose your browser environment:
# Use local Playwright (default)
export COMPUTER_USE_ENV=playwright
# Use Browserbase remote browsers
export COMPUTER_USE_ENV=browserbase
export BROWSERBASE_API_KEY=your_api_key_here
export BROWSERBASE_PROJECT_ID=your_project_id_hereBrowserbase Setup
- Sign up at browserbase.com
- Create a project and get your API key and Project ID
- Set the environment variables as shown above
Available Tools
Browser Management
open_web_browser: Launch browser and navigate to URLnavigate: Go to specific URLgo_back: Navigate to previous pagego_forward: Navigate to next pagesearch: Open search engine with optional query
Mouse Interactions
click_at(x, y): Click at coordinateshover_at(x, y): Hover at coordinatesdrag_and_drop(from_x, from_y, to_x, to_y): Drag and drop
Keyboard Actions
type_text(text): Type text at cursorkey_combination(keys): Execute keyboard shortcuts
Page Control
scroll_document(direction, amount): Scroll up/downscreen_size(): Get viewport dimensionscurrent_state(): Get screenshot and URLwait_5_seconds(): Wait for page loading
Dependencies
@modelcontextprotocol/sdk: MCP protocol implementationplaywright: Browser automation framework
Development
npm run dev # Run in development mode
npm run watch # Watch for changes
npm run build # Build for productionLicense
Apache-2.0 License (matching original Google Gemini computer use preview)
