@doksi/mcp-device
v0.1.6
Published
MCP server for cloud Android device control
Downloads
772
Readme
Doksi Device MCP
MCP server for controlling cloud-hosted Android devices. No local Android SDK, emulator, or ADB required — the device runs in the cloud and is fully accessible through natural language tool calls.
Get an API Key
Create an API key at app.doksi.ai/devices/api-keys and paste it as the DOKSI_API_KEY environment variable in your MCP config.
How it works
- Call
start_sessionto provision a cloud Android device (~90s startup) - Use any of the tools below to interact with it — take screenshots, tap, swipe, install apps, read logs, change settings, etc.
- Call
end_sessionwhen done to stop billing
The device persists across your conversation so you can build multi-step test flows, debugging sessions, or exploratory QA.
Quick Start
Claude Code
claude mcp add doksi-device --env DOKSI_API_KEY=your-api-key -- npx -y @doksi/mcp-deviceCursor
cursor --add-mcp '{"name":"doksi-device","command":"npx","args":["-y","@doksi/mcp-device"],"env":{"DOKSI_API_KEY":"your-api-key"}}'VS Code Copilot
code --add-mcp '{"name":"doksi-device","command":"npx","args":["-y","@doksi/mcp-device"],"env":{"DOKSI_API_KEY":"your-api-key"}}'Codex
codex mcp add doksi-device --env DOKSI_API_KEY=your-actual-key -- npx -y @doksi/mcp-deviceWindsurf
Add to ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"doksi-device": {
"command": "npx",
"args": ["-y", "@doksi/mcp-device"],
"env": { "DOKSI_API_KEY": "your-api-key" }
}
}
}OpenCode
Add to ~/.opencode/config.json:
{
"mcp": {
"doksi_device": {
"type": "local",
"command": [
"npx",
"-y",
"@doksi/mcp-device"
],
"environment": {
"DOKSI_API_KEY": "your-api-key"
}
}
}
}Other MCP clients
{
"mcpServers": {
"doksi-device": {
"command": "npx",
"args": ["-y", "@doksi/mcp-device"],
"env": { "DOKSI_API_KEY": "your-api-key" }
}
}
}Common config locations:
- Cline:
cline_mcp_settings.json - Continue.dev:
~/.continue/config.yamlor.continue/mcpServers/doksi.json - Zed:
~/.config/zed/settings.jsonunder"context_servers"
Global install (alternative)
npm install -g @doksi/mcp-deviceThen use doksi-device as the command instead of npx -y @doksi/mcp-device.
Configuration
| Variable | Required | Description |
|---|---|---|
| DOKSI_API_KEY | Yes | Your Doksi API key |
| DOKSI_API_URL | No | API base URL (defaults to https://api.doksi.ai) |
Session ID
All tools except start_session and get_sessions accept an optional sessionId parameter.
- If omitted — the MCP server uses the session it started in the current conversation (if any). You don't need to pass it when working within a single chat.
- If provided — the tool targets that specific session. Use this to connect to a session started in a different conversation or by another client. Call
get_sessionsto list active session IDs.
Tools Reference
Session
start_session
Provision a new cloud Android device. Takes ~90 seconds to start. Only one session can be active at a time.
| Parameter | Type | Required | Description |
|---|---|---|---|
| origin | string | Yes | Identify the calling agent (e.g. claude-code, cursor, codex, windsurf, gemini) |
| description | string | No | Optional label for this session |
Session abc123XYZ is starting. Wait 90 seconds before calling getSession —
do not poll repeatedly unless required. Remember to call end_session when done.get_sessions
List all active sessions for your API key.
Session abc123XYZ (running, 12 min)
Description: "Login flow test"
Device: Pixel 9, 1080x2400 @432dpi, portrait
Battery: 100%
App: com.example.myapp (foreground)end_session
Stop and deprovision a session. Stops billing.
| Parameter | Type | Required | Description |
|---|---|---|---|
| sessionId | string | No | ID of the session to end. If omitted, ends the session started in this conversation. |
The session has been ended.start_tunnel / stop_tunnel
Route all device traffic through a Tailscale exit node. Useful for testing against staging environments or region-specific backends.
| Parameter | Type | Description |
|---|---|---|
| authKey | string | Tailscale auth key |
| exitNodeIp | string | Exit node IP |
| sessionId | string | Optional. Session to target (defaults to this conversation's session). |
Observation
get_visual_state
Take a screenshot. Returns a compressed base64 PNG image. Use this to visually inspect the current screen state.
Returns an image — no text output.
get_textual_state
Get a structured text representation of all UI elements currently on screen — element type, text, bounds, and interactability. Faster and cheaper than screenshots for understanding layout.
Window: com.example.myapp
LinearLayout [0,0][1080,2400]
TextView "Welcome back" [120,300][960,380] clickable=false
EditText "Email" [120,420][960,520] clickable=true focused=false
EditText "Password" [120,560][960,660] clickable=true focused=false
Button "Log In" [120,700][960,800] clickable=true
TextView "Forgot password?" [340,840][740,900] clickable=trueget_general_state
Get the full device state snapshot in a single call — use this for orientation, system settings, connectivity, and display context without taking a screenshot.
My App (com.example.myapp) is visible. Device is a phone, in portrait mode at
1080×2400 @432dpi. The keyboard is hidden. Battery is at 87% and not charging.
WiFi on, Bluetooth off, Location on, Airplane mode off, Auto-rotate on.
Volume 5/15, silent mode off. GPS location is at 37.774929, -122.419416.
The clipboard is empty. No active call. All panels are collapsed.
System dark mode is inactive. Animations are enabled. System font scale is at 1x.get_what_changed
Diff the current screen against the state before the last action. Lightweight alternative to taking a full screenshot after every tap — use it to confirm an action had the expected effect.
Elements added:
Toast "Login successful" [120,2100][960,2180]
Elements removed:
Button "Log In" [120,700][960,800] (was clickable)
Elements changed:
TextView "Welcome back" → "Welcome, Alice"Interaction
All interaction tools require a reason parameter — a first-person description of what you're doing and why (e.g. "I will tap the login button to submit credentials and navigate to the home screen."). This is used for logging and audit trails.
tap
Tap at a specific screen coordinate.
| Parameter | Type | Description |
|---|---|---|
| x | number | X coordinate |
| y | number | Y coordinate |
| reason | string | What you're doing and why |
Tapped at (540, 750).long_press
Long press at a coordinate, optionally for a custom duration.
| Parameter | Type | Description |
|---|---|---|
| x | number | X coordinate |
| y | number | Y coordinate |
| duration | number | Duration in ms (default: 1000) |
| reason | string | What you're doing and why |
Long pressed at (540, 400) for 1000ms.swipe
Swipe between two points. Use for scrolling, swiping cards, pulling down notifications, etc.
| Parameter | Type | Description |
|---|---|---|
| startX | number | Start X |
| startY | number | Start Y |
| endX | number | End X |
| endY | number | End Y |
| duration | number | Duration in ms (default: 300) |
| reason | string | What you're doing and why |
Swiped from (540, 1800) to (540, 400).type_text
Type text into the currently focused input field.
| Parameter | Type | Description |
|---|---|---|
| text | string | Text to type |
| reason | string | What you're doing and why |
Typed "[email protected]".press_key
Press a system key. Use named keys or a numeric keyCode.
| Parameter | Type | Description |
|---|---|---|
| key | string | back, home, enter, recents, or a numeric keyCode |
| reason | string | What you're doing and why |
Pressed key: back.App Management
install_app
Install an APK from a local file path.
| Parameter | Type | Description |
|---|---|---|
| apkPath | string | Absolute path to the .apk file |
Installed com.example.myapp (version 2.4.1).uninstall_app
Uninstalled com.example.myapp.launch_app
Launched com.example.myapp.terminate_app
Force-stop an app without clearing its data.
Terminated com.example.myapp.clear_app_data
Wipe all app data and cache — equivalent to "Clear Storage" in Android settings. Resets the app to a fresh install state.
Cleared data for com.example.myapp.get_installed_packages
List all installed, enabled packages on the device.
Installed packages (142 total):
com.example.myapp — My App
com.google.android.apps.maps — Maps
com.android.chrome — Chrome
...grant_permission / revoke_permission
Grant or revoke Android runtime permissions without going through the system dialog.
| Parameter | Type | Description |
|---|---|---|
| packageName | string | App package name |
| permission | string | Full permission string (e.g. android.permission.CAMERA) |
Common permissions: CAMERA, RECORD_AUDIO, ACCESS_FINE_LOCATION, READ_CONTACTS, POST_NOTIFICATIONS, READ_MEDIA_IMAGES
Granted android.permission.CAMERA to com.example.myapp.Logs & Data
get_application_logs
Read filtered logcat output from the device.
| Parameter | Type | Description |
|---|---|---|
| filter | string | Logcat filter expression (e.g. MyTag:D *:S) |
| limit | number | Max number of log lines |
| since | number | Return logs after this line number |
| sinceMinutes | number | Return logs from the last N minutes |
| query | string | Text search or /regex/flags pattern |
[11:42:03] D/AuthManager: Token refreshed successfully
[11:42:03] I/NetworkClient: POST /api/v2/login → 200 OK (312ms)
[11:42:04] D/MainActivity: onResume called
[11:42:04] E/ImageLoader: Failed to decode bitmap: out of memoryget_network_logs
HTTP request/response log with headers, bodies, and timing. Captured at the network layer — works without any app-side instrumentation.
Same parameters as get_application_logs.
[11:42:03] POST https://api.example.com/v2/login
Request: {"email":"[email protected]","password":"[REDACTED]"}
Response: 200 OK (312ms)
{"token":"eyJ...","userId":"u_abc123","expiresIn":3600}
[11:42:05] GET https://api.example.com/v2/profile
Response: 401 Unauthorized (88ms)
{"error":"token_expired"}get_application_data
Read internal app storage: SharedPreferences XML files, SQLite database contents, and files in the app's data directory. Useful for verifying that state is persisted correctly.
| Parameter | Type | Description |
|---|---|---|
| packageName | string | App package name |
SharedPreferences (prefs.xml):
user_id = "u_abc123"
is_logged_in = true
onboarding_complete = true
last_sync_ts = 1741182923
Database (app.db — users table, 1 row):
id=1, [email protected], created_at=2026-01-15
Files:
cache/avatar_u_abc123.jpg (42 KB)Device Settings
set_orientation
Lock the device to portrait or landscape.
| Parameter | Type | Values |
|---|---|---|
| orientation | string | portrait, landscape |
Orientation set to landscape.set_device_ability
Toggle core device capabilities.
| Parameter | Type | Values |
|---|---|---|
| ability | string | wifi, bluetooth, location, airplaneMode, autoRotate |
| enabled | boolean | true / false |
WiFi has been disabled.set_battery_level
Set the simulated battery level (0–100).
Battery level set to 15%.set_charging_state
Simulate plugging in or unplugging the charger.
Charging state set to: not charging.set_gps
Override GPS coordinates. Optional: altitude, speed, bearing, satellite count.
GPS location set to 37.774929, -122.419416.set_dark_mode
Enable or disable system dark mode.
Dark mode has been enabled.set_font_scale
Change the system-wide font size. Allowed values: 0.85, 1.0, 1.15, 1.3, 1.5.
System font scale set to 1.3x.set_network_speed
Throttle network speed to simulate real-world conditions.
| Preset | Description |
|---|---|
| full | No throttle |
| lte | 4G LTE |
| hsdpa | 3G |
| edge | 2.5G |
| gprs | 2G |
| gsm | 2G (slowest) |
Network speed set to EDGE (2.5G).set_animations
Control system animation speed. Set to 0 to disable animations entirely — recommended for automated test flows to avoid timing issues.
| Parameter | Type | Description |
|---|---|---|
| scale | number | 0 = off, 1 = normal, 2+ = slow motion (max 10) |
Animation scale set to 0 (animations disabled).Performance & Memory
get_performance_metrics
Get rendering performance data for an app — frame counts, janky frame rate, and render time percentiles. Use this to catch jank and UI performance regressions.
| Parameter | Type | Description |
|---|---|---|
| packageName | string | App package name |
Performance for com.example.myapp:
Frames: 342 total, 12 janky (3.5%)
Render time: 50th=8ms, 90th=18ms, 95th=24ms, 99th=67msget_memory_info
Get current memory usage for an app. Useful for catching memory leaks and baseline comparisons.
| Parameter | Type | Description |
|---|---|---|
| packageName | string | App package name |
Memory for com.example.myapp:
Native heap: 18.3 MB
Dalvik heap: 9.7 MB
Total PSS: 61.2 MBNavigation
open_settings
Open the Android Settings app.
Settings app opened.open_chrome
Open Chrome, optionally navigating to a URL.
| Parameter | Type | Description |
|---|---|---|
| url | string | URL to open (optional) |
Chrome opened at https://example.com.open_deep_link
Launch any URI on the device — custom scheme deep links, HTTPS URLs, or intent URIs.
| Parameter | Type | Description |
|---|---|---|
| uri | string | URI to launch |
Deep link launched: myapp://checkout/cart/123Simulate Events
send_sms
Simulate an incoming SMS message.
| Parameter | Type | Description |
|---|---|---|
| srcAddress | string | Sender phone number |
| text | string | Message body |
SMS sent from +14155552671: "Your verification code is 847291"send_phone_call
Simulate phone call events. Chain operations to simulate a full call lifecycle.
| Operation | Value | Description |
|---|---|---|
| Init | 0 | Trigger an incoming call |
| Accept | 1 | Answer the call |
| Reject | 2 | Decline the call |
| Busy | 3 | Return a busy signal |
| Disconnect | 4 | End the call |
| Hold | 5 | Put the call on hold |
| Unhold | 6 | Resume the call |
Phone call initiated from +14155552671.send_fingerprint
Simulate a fingerprint sensor touch — use to test biometric authentication flows.
| Parameter | Type | Description |
|---|---|---|
| isTouching | boolean | Whether a finger is on the sensor |
| touchId | number | Fingerprint ID (optional) |
Fingerprint touch event sent (isTouching: true, touchId: 1).Clipboard & Files
get_clipboard / set_clipboard
Read or write the device clipboard. Useful for injecting test data or verifying copy behaviour.
Clipboard contains: "[email protected]"Clipboard set to: "[email protected]"push_file
Upload a local file to the device filesystem.
| Parameter | Type | Description |
|---|---|---|
| localPath | string | Absolute local path |
| devicePath | string | Destination path on device |
File uploaded to /sdcard/Download/test-data.json (4.2 KB).pull_file
Download a file from the device. Returns it as base64.
| Parameter | Type | Description |
|---|---|---|
| devicePath | string | Path on device |
Returns base64-encoded file contents.
System Panels & Volume
expand_settings / expand_notifications
Open the quick settings or notification shade.
Settings panel expanded.collapse_panels
Close all open system panels.
All panels collapsed.set_volume
Press volume up or down once.
| Parameter | Type | Values |
|---|---|---|
| direction | string | up, down |
Volume increased to 8/15.set_silent_mode
Toggle DND / silent mode on or off.
Silent mode enabled.Example Flows
Install and smoke-test an app
"Start a session, install the APK at
/build/app-debug.apk, launch it, take a screenshot, and tell me what's on screen."
Test a login flow end-to-end
"Open the app, type
[email protected]into the email field andpassword123into the password field, tap Login, and check the network logs for the auth response."
Reproduce a bug under poor network
"Set network speed to GPRS, launch the app, navigate to the product listing screen, and tell me if the images load or if there are any errors in the logs."
Verify deep link routing
"Open the deep link
myapp://product/detail/98765and take a screenshot to confirm the right screen opened."
Check for memory leaks
"Get the memory usage of com.example.myapp before and after navigating through the checkout flow 5 times, and compare the totals."
Test dark mode UI
"Enable dark mode, take a screenshot, then disable it and take another — compare both."
Simulate an OTP verification
"Tap 'Send verification code', then simulate an incoming SMS from +1234567890 with the text 'Your code is 928471', enter that code in the app, and confirm it accepted it."
Requirements
- Node.js >= 18.0.0
- A Doksi API key (doksi.ai)
License
MIT
