openpocket

v0.2.3

Published

4 months ago

OpenPocket Node.js TypeScript runtime

Downloads

0High
0Medium
0Low

sergiochan

OpenPocket

An Intelligent Phone That Never Sleeps.
OpenPocket runs an always-on agent phone locally, with privacy first.

Current Capability Snapshot

Status snapshot (February 2026):

Implemented and usable now

Local target runtime (via adb) driven by CLI + Telegram gateway + dashboard.
Deployment target abstraction with emulator and physical-phone ready today (android-tv and cloud are in progress).
Interactive setup (openpocket onboard) for consent, model/API key, Telegram, target selection, and human-auth mode.
Template-driven prompt system with runtime mode control (full|minimal|none) and workspace context budgets.
Bootstrap-driven chat onboarding (BOOTSTRAP.md, PROFILE_ONBOARDING.json) with persisted workspace onboarding state.
Model-driven progress and outcome narration (TASK_PROGRESS_REPORTER.md, TASK_OUTCOME_REPORTER.md) with anti-noise suppression.
Coding toolchain in task loop (read, write, edit, apply_patch, exec, process) with workspace/safety constraints.
Memory tools in task loop (memory_search, memory_get) for recall-oriented interactions.
Prompt observability via Telegram /context [list|detail|json].
Human-authorization relay (manual /auth, one-time web link, optional ngrok) with dynamic template pages and agentic delegation artifacts.
In-emulator permission dialogs auto-handled locally (no remote auth escalation for Android runtime permission popups).
Telegram bot display-name sync from profile identity changes.
Automatic reusable artifact generation after successful tasks (skills/auto, scripts/auto) with behavior fingerprint dedupe and semantic ui_target traces.
Auditable persistence for sessions, daily memory, screenshots, relay state, and script run artifacts.

Active improvement focus

Long-horizon memory quality (ranking/compaction/freshness).
Prompt evaluation and regression coverage for phone-use scenarios.
Cross-platform runtime hardening and operational reliability.

Latest Merged Updates (main)

1) PR #77: Multi-target deployment + agentic Human Auth + capability probe

Merged PR: #77

Highlights:

multi-target framework (emulator, physical-phone, android-tv, cloud)
USB/Wi-Fi ADB discovery and interactive selection in target flows
openpocket target pair for Wireless Debugging pairing
capability probe utility (phone-use-util) for camera/microphone/location/photos/payment signals
dynamic Human Auth portal templates (uiTemplate / templatePath)
secure payment path support via UI tree field extraction and delegated form collection
step-level timing observability and expanded no-browser human-auth E2E tests

2) Auto-Skill Experience Engine (latest commits on main)

Key commits:

032fa03 feat(skills): experience engine (active skill injection, UI semantic traces, replay relevance)
c02a870 fix: harden auto-skill prompt and ui_target escaping

Highlights:

runtime now injects active skill content, not only compact skill list metadata
auto-generated skills include stronger step semantics (ui_target) and reusable behavior fingerprints
skill loader gating/triggers and escaping hardening improve safety and prompt robustness

Quick Start

1. Prerequisites

Node.js 20+
Android platform-tools (adb) for all target types
For default emulator target: Android SDK Emulator + at least one Android AVD
For physical-phone target: one Android phone with Developer options + USB debugging enabled
API key for your selected model profile (make sure you have credit with your selected model provider)
Telegram bot token (for gateway mode) follow this instruction
(Optional, recommended) ngrok authtoken for remote approval (free to obtain)

NOTE: If gpt-5.3-codex is unavailable in your account/provider route, use gpt-5.2-codex.

2. Install

Option A: npm package (recommended for end users)

npm install -g openpocket
openpocket onboard
openpocket gateway start

Option B: source clone (recommended for contributors)

git clone [email protected]:SergioChan/openpocket.git
cd openpocket
npm install
npm run build
./openpocket onboard
./openpocket gateway start

3. What `onboard` configures

The onboarding wizard is interactive and persists progress to:

~/.openpocket/state/onboarding.json

It walks through:

User consent for local runtime and data boundaries.
Deployment target selection (emulator, physical-phone, android-tv, cloud).
Model profile selection and API key source (env or local config).
Channel setup — choose which messaging channels to enable:
- Telegram (bot token + chat allowlist policy)
- Discord (bot token + DM access policy + guild config)
- WhatsApp (Baileys QR session linking + DM access policy + chunking mode)
Emulator startup and manual Play Store/Gmail verification (emulator target only).
Human-auth bridge mode:
- disabled
- local LAN relay
- local relay + ngrok tunnel (remote approval link)

4. Run your first task

openpocket agent --model gpt-5.2-codex "Open Chrome and search weather"

Or send plain text directly to your configured messaging channel (Telegram, Discord, or WhatsApp) after gateway start.

5. Use a physical Android phone as Agent Phone

Enable Developer options on your phone:
- Settings -> About phone -> Build number
- tap Build number 7 times
- go back to Settings -> System -> Developer options
- enable USB debugging
Connect the phone via USB and approve the Allow USB debugging prompt.
Set deployment target:

adb devices -l
openpocket target set --type physical-phone
openpocket target show

When multiple devices are online, target set shows an arrow-key selector with explicit transport labels (USB ADB / WiFi ADB) so you can choose the exact device. You can also use aliases: openpocket target set-target ... or openpocket target config ....

Start runtime:

openpocket gateway start

Optional Wi-Fi ADB:

adb tcpip 5555
adb connect <phone-ip>:5555
openpocket target set --type physical-phone --adb-endpoint <phone-ip>:5555

Or use the built-in pairing wrapper (no manual adb commands):

openpocket target pair --host <device-ip> --pair-port <pair-port> --code <pairing-code> --type physical-phone

Notes:

Keep phone unlocked during first pairing/authorization.
android-tv and cloud targets already exist in config/CLI, and full deployment guides are still in progress.

6. Persistence and storage locations

For a full persistence map (OpenPocket runtime files + Android AVD/image storage and deletion/reset flow), see:

Deployment Playbook by OS

This section focuses on production-style runtime deployment for:

gateway start
emulator setup/startup
ngrok + human-auth relay configuration

Support Matrix

| Environment | Recommended | Notes | | --- | --- | --- | | macOS (Apple Silicon / Intel) | Yes | Best local developer experience. | | Windows (native host) | Yes | Use Android Emulator on Windows host (Hyper-V/WHPX). | | Linux Server (x86_64 + KVM) | Yes | Recommended for headless server runtime. | | Docker on ARM host running linux/amd64 emulator | Not recommended | Works unpredictably due nested software emulation. |

Common Config Baseline

Run onboarding once:

openpocket onboard

Or configure manually in ~/.openpocket/config.json:

{
  "target": {
    "type": "emulator",
    "adbEndpoint": "",
    "cloudProvider": ""
  },
  "emulator": {
    "avdName": "OpenPocket_AVD",
    "androidSdkRoot": "",
    "headless": true,
    "bootTimeoutSec": 180
  },
  "telegram": {
    "botTokenEnv": "TELEGRAM_BOT_TOKEN",
    "allowedChatIds": []
  },
  "humanAuth": {
    "enabled": true,
    "useLocalRelay": true,
    "localRelayHost": "127.0.0.1",
    "localRelayPort": 8787,
    "apiKeyEnv": "OPENPOCKET_HUMAN_AUTH_KEY",
    "tunnel": {
      "provider": "ngrok",
      "ngrok": {
        "enabled": true,
        "authtokenEnv": "NGROK_AUTHTOKEN"
      }
    }
  }
}

Required env vars for gateway + remote approval:

export TELEGRAM_BOT_TOKEN="<your_telegram_bot_token>"
export OPENPOCKET_HUMAN_AUTH_KEY="<your_human_auth_key>"
export NGROK_AUTHTOKEN="<your_ngrok_token>"

macOS (Local Runtime)

Install Android SDK + Emulator + at least one AVD (Android Studio preferred).
Verify toolchain:

adb version
~/Library/Android/sdk/emulator/emulator -list-avds

Start emulator:

openpocket emulator start
openpocket emulator status

Start gateway (dashboard auto-starts):

openpocket gateway start

If human-auth uses ngrok, gateway will auto-start local relay + tunnel and send approval URLs in Telegram.

Windows (Native Host Runtime)

Windows does not require WSL for OpenPocket runtime. Recommended setup is: Android Emulator + adb on Windows host, OpenPocket CLI also on Windows host.

Install Android Studio + SDK tools, create an AVD.
Ensure virtualization acceleration is enabled (WHPX/Hyper-V for emulator).
In PowerShell:

$env:ANDROID_SDK_ROOT="$env:LOCALAPPDATA\\Android\\Sdk"
adb version
& "$env:ANDROID_SDK_ROOT\\emulator\\emulator.exe" -list-avds
openpocket emulator start
openpocket gateway start

Set tokens in user env (or config file):

setx TELEGRAM_BOT_TOKEN "<token>"
setx OPENPOCKET_HUMAN_AUTH_KEY "<key>"
setx NGROK_AUTHTOKEN "<ngrok_token>"

WSL can still be used for development tooling, but running Android Emulator inside WSL/Linux guest is not the preferred path.

Linux Server (`x86_64` Headless)

This is the recommended server deployment target.

Validate architecture and KVM:

uname -m                # expect x86_64
ls -l /dev/kvm          # must exist

Install Android SDK cmdline tools, platform-tools, emulator, and create AVD.
Use headless mode in config:

"emulator": {
  "headless": true,
  "extraArgs": ["-no-window", "-no-audio", "-no-boot-anim", "-no-snapshot"]
}

Start runtime:

openpocket emulator start
openpocket gateway start

For service mode, run with systemd or tmux, and keep ngrok token configured for remote human-auth links.

Current End-to-End Tests

OpenPocket currently has two E2E paths:

test/integration/docker-agent-e2e.mjs
- Automated agent E2E.
- Simulates natural-language task -> planning -> emulator actions -> session assertions.
- Can run locally (direct host execution) and in Docker wrapper (npm run test:e2e:docker).
openpocket test permission-app run --case <scenario> --chat <chat_id>
- PermissionLab human-auth E2E.
- Validates agent + Telegram + relay/ngrok + approval handoff.
- Scenarios: camera, microphone, location, contacts, sms, calendar, photos, notification, 2fa.
scripts/smoke/dual-side-smoke.sh
- Dual-side smoke gate for coding + Android event lineage.
- Covers:
  - Telegram coding instruction -> local file write verification.
  - Android build/install/run/logcat-style tool chain events.
  - Unified session trace lineage checks across tool events.
- Runs fast with deterministic mocks and no real device dependency.

Run it locally:

bash scripts/smoke/dual-side-smoke.sh

Key Capabilities

Local device-first runtime: execution stays on your machine via adb, not a hosted cloud phone.
Always-on agent loop: model-driven planning + one-step action execution over Android UI primitives.
Prompt system aligned for agent behavior:
- prompt modes (full|minimal|none)
- workspace template injection with explicit char budgets
- task progress/outcome narrators driven by prompt templates
Remote authorization proxy (human-auth relay):
- agent emits request_human_auth only for real-device/sensitive checkpoints
- gateway sends one-time approval link and manual fallback commands
- local relay can auto-start with optional ngrok tunnel
In-emulator permission handling: Android runtime permission dialogs are auto-approved locally when detected.
Coding and memory tools inside task loop:
- coding: read, write, edit, apply_patch, exec, process
- memory: memory_search, memory_get
Dual control modes: direct user control and agent control on the same target runtime.
Production-style gateway operations: Telegram command menu bootstrap, heartbeat, cron jobs, restart loop, safe stop.
Script and coding safety controls: allowlist + deny patterns + timeout + output caps + run artifacts.
Prompt observability: /context command reports actual injected prompt context and budgets.
Auditable persistence: task sessions, daily memory, screenshots, script archives, and relay/auth state.

Roadmap

R1. Memory System (Core Intelligence)

Build a robust memory layer for long-horizon tasks:

semantic retrieval and episodic memory
memory compaction/summarization
conflict resolution and freshness policies
memory-aware planning loops

R2. Prompt Engineering for Phone-Use

Establish a production prompt stack tailored to mobile workflows:

phone-specific action planning prompts
app-state-aware prompting templates
failure-recovery prompting
prompt eval suite and regression benchmarks

R3. Multi-OS Runtime and Control Surface

Expand from macOS-first to full platform support:

Linux (Ubuntu and headless server scenarios)
Windows support
dashboard portability strategy:
- primary target: local/remote Web UI
- fallback: native OS-specific control apps only when needed

R4. Real Device Authorization + Permission Isolation

Strengthen system-level authorization architecture:

iOS and Android real-device compatibility
cross-device authorization where real phone and emulator differ
secure remote port authorization flow
strict permission boundary between real phone and emulator runtime

R5. Skill System Maturity

Evolve from static skills to dynamic capability generation:

agent-authored skills/code generation
safe execution sandbox and policy gates
skill validation, caching, and reuse

R6. Multi-Channel Control Integrations

Go beyond Telegram and support more communication entry points:

international platforms: Discord, WhatsApp, iMessage, Messenger
China-focused platforms: WeChat, QQ
unified channel abstraction for message, auth, and task control

R7. Account Login UX and Session Authorization

Improve real-world login workflows after app installation:

one-time session authorization links
2FA and SMS code handoff UX
low-friction human-in-the-loop checkpoints

R8. Reliability, Security, and Release Quality (Added)

Additional engineering tracks needed for production readiness:

end-to-end integration test matrix (including headless CI scenarios)
threat model and security hardening for relay/auth artifacts
observability improvements (structured logs, replay/debug traces)
packaging/release automation and upgrade safety

Contributor Task Board

The project is actively seeking contributors. If you want to help, pick one task area below and open a PR with the task ID in the title (for example: R2-T3).

Memory System

R1-T1: design memory schema v2 (episodic + semantic + working memory)
R1-T2: implement memory retrieval ranking and relevance filters
R1-T3: implement memory compaction/summarization jobs
R1-T4: add memory quality tests for multi-step phone tasks

Prompt Engineering

R2-T1: draft phone-use prompt templates per task category (shopping/social/entertainment)
R2-T2: add prompt fallback strategies for app-state ambiguity
R2-T3: build prompt regression suite with golden trajectories
R2-T4: add failure taxonomy and prompt tuning playbook

Cross-Platform Runtime + Dashboard

R3-T1: Linux runtime parity audit (CLI/emulator/gateway)
R3-T2: Windows runtime bring-up and compatibility fixes
R3-T3: define and implement Web UI dashboard MVP
R3-T4: headless server operator workflow (no GUI) documentation + scripts

Real Device Auth + Isolation

R4-T1: iOS real-device auth bridge prototype
R4-T2: Android real-device auth bridge hardening
R4-T3: permission isolation policy and enforcement checks
R4-T4: secure tunnel and one-time token lifecycle review

Skill System

R5-T1: agent-authored skill generation interface
R5-T2: skill static checks + runtime policy gate
R5-T3: skill test harness and reproducibility tools
R5-T4: skill marketplace-style metadata/index format

Multi-Channel Integrations

R6-T1: channel abstraction layer for inbound/outbound control
R6-T2: Discord connector
R6-T3: WhatsApp connector
R6-T4: WeChat/QQ connector research and adapter design

Login UX + Human-in-the-Loop

R7-T1: one-time account authorization session protocol
R7-T2: 2FA/SMS remote approval UX flow and timeout handling
R7-T3: user-facing auth status model and recovery paths
R7-T4: mobile-first approval page UX improvements

Reliability and Security

R8-T1: integration test matrix for onboarding + gateway + auth relay
R8-T2: security review for relay APIs and artifact storage
R8-T3: observability dashboard/log schema improvements
R8-T4: release pipeline hardening and rollback-safe packaging

Product Scenarios

OpenPocket is built for both developers and everyday users.

Typical scenarios include:

shopping flows across mobile apps
entertainment routines and repetitive app navigation
social task assistance with human-in-the-loop approvals
recurring mobile actions that benefit from automation and traceability

Runtime Flow

Telegram / CLI -> Gateway -> Agent Runtime -> Model Client -> adb -> Agent Phone Target

Architecture

flowchart LR
  U["Local User / Telegram"] --> G["OpenPocket Gateway"]
  G --> A["Agent Runtime"]
  A --> M["Model Client"]
  A --> D["ADB Runtime"]
  A --> S["Script Executor"]
  A --> C["Coding Executor"]
  A --> R["Memory Executor"]
  D --> E["Agent Phone Target (Local)"]
  A --> W["Workspace Store"]
  W --> SS["sessions/*.md"]
  W --> MM["memory/YYYY-MM-DD.md"]
  W --> RR["scripts/runs/*"]
  W --> AS["skills/auto/* + scripts/auto/*"]
  RP["User Phone (Human Auth Link)"] -.-> G

Configuration

Primary config file:

~/.openpocket/config.json (or OPENPOCKET_HOME/config.json)

Example config template:

openpocket.config.example.json

Skill compatibility mode:

agent.skillsSpecMode = "legacy" | "mixed" | "strict"
default is mixed (legacy + strict-compatible loading)
use strict to enforce directory-based SKILL.md validation

Skill validation command:

openpocket skills validate --strict

Skill workspace commands:

openpocket skills list        # show loaded workspace skills only
openpocket skills load        # interactive select bundled skills to copy into workspace
openpocket skills load --all  # copy all bundled skills missing from workspace

Coding runtime migration note:

agent.legacyCodingExecutor is now off by default.
agent.legacyCodingExecutor=true remains available as a temporary compatibility toggle, but it is deprecated and will be removed.
When fallback is disabled and a coding action is unsupported by pi coding tools, runtime errors point to this key explicitly.

Supported Model Providers

OpenPocket supports multiple AI model providers through OpenAI-compatible APIs:

OpenAI - Direct access to GPT models (gpt-5.2-codex, gpt-5.3-codex)

OpenRouter - Multi-provider routing for Claude models (claude-sonnet-4.6, claude-opus-4.6)

BlockRun - Pay-per-request micropayments with no subscriptions

Ideal for always-on agents with cost-effective pricing
Access to 30+ models: GPT-4o, Claude Sonnet 4, Gemini 2.0 Flash, DeepSeek
Model IDs: blockrun/gpt-4o, blockrun/claude-sonnet-4, blockrun/gemini-2.0-flash, blockrun/deepseek-chat
Get started at docs.blockrun.ai

AutoGLM - Phone-optimized multilingual model (autoglm-phone)

Common environment variables:

export OPENAI_API_KEY="<your_openai_key>"
export OPENROUTER_API_KEY="<your_openrouter_key>"
export BLOCKRUN_API_KEY="<your_blockrun_key>"
export AUTOGLM_API_KEY="<your_autoglm_key>"
export TELEGRAM_BOT_TOKEN="<your_telegram_bot_token>"
export OPENPOCKET_HUMAN_AUTH_KEY="<your_human_auth_relay_key>"
export NGROK_AUTHTOKEN="<your_ngrok_token>"
export ANDROID_SDK_ROOT="$HOME/Library/Android/sdk"
export OPENPOCKET_HOME="$HOME/.openpocket"

For Codex subscription auth (no OPENAI_API_KEY), OpenPocket can reuse Codex CLI credentials for codex models:

login once with the codex CLI
OpenPocket reads $CODEX_HOME/auth.json (or ~/.codex/auth.json)
on macOS, it also checks the Codex Auth keychain entry first

CLI Surface

Command prefix by install mode:

npm package install: use openpocket ...
local source clone: use ./openpocket ... (or openpocket ... after install-cli)

./openpocket --help
./openpocket install-cli
./openpocket onboard
./openpocket target show
./openpocket target set --type physical-phone
./openpocket target set --type physical-phone --adb-endpoint 192.168.1.25:5555
./openpocket config-show
./openpocket emulator start
./openpocket emulator status
./openpocket agent --model gpt-5.2-codex "Open Chrome and search weather"
./openpocket script run --text "echo hello"
./openpocket telegram setup
./openpocket telegram whoami
./openpocket skills list
./openpocket skills load
./openpocket skills load --all
./openpocket skills validate --strict
./openpocket gateway start
./openpocket dashboard start
./openpocket test permission-app deploy
./openpocket test permission-app task
./openpocket human-auth-relay start

human-auth-relay start is mainly a standalone debug mode. In normal gateway usage, local relay/tunnel startup is handled automatically from config.

gateway start now auto-starts the local Web dashboard (default http://127.0.0.1:51888, configurable in config.dashboard).
Use dashboard start when you want to run only the dashboard process.

Legacy aliases still work (deprecated): openpocket init, openpocket setup.

The legacy native macOS panel has been removed from the repository. Use openpocket dashboard start (or openpocket gateway start, which auto-starts dashboard).

Web Dashboard

The local Web dashboard is now the primary control surface.

Startup behavior

openpocket gateway start auto-starts dashboard and prints dashboard URL.
openpocket dashboard start starts dashboard only (no Telegram gateway).

Default dashboard config:

"dashboard": {
  "enabled": true,
  "host": "127.0.0.1",
  "port": 51888,
  "autoOpenBrowser": false
}

Runtime page layout

Left column: Gateway status, emulator controls, and core path config.
Right column: large emulator preview pane for tap/text control.

Auto refresh behavior

Preview auto-refresh updates image/metadata silently.
It does not spam status text with repeated "Refreshing emulator preview..." messages.

Human Authorization Modes

OpenPocket supports three human-auth configurations:

Disabled: no relay, no remote approval.
LAN relay: local relay exposed on LAN for phone access in the same network.
Relay + ngrok: gateway auto-starts local relay and ngrok, then issues public approval links.

When the agent emits request_human_auth, Telegram users can:

tap the web approval link
or run fallback commands:
- /auth approve <request-id> [note]
- /auth reject <request-id> [note]
for any auth wall, use the request-specific Human Auth page generated from uiTemplate (optional live remote takeover is still available), then approve/reject

Dynamic Human Auth Portal Templates

request_human_auth now supports an optional uiTemplate payload so each authorization page can be customized per request instead of using one fixed form.

Supported template controls include:

title/summary/capability hint text
theme style (brandColor, backgroundCss, fontFamily)
structured form fields (text, textarea, email, password, otp, card-number, expiry, cvc, select, ...)
agent-generated middle-section code (middleHtml, middleCss, middleScript)
agent-generated approval logic (approveScript)
reusable template file path from Agent Loop coding tools (templatePath, JSON in workspace)
delegation toggles (text/location/photo/audio/file attachments)
artifact policy (artifactKind, requireArtifactOnApprove)

Portal shell invariants (always present, not generated by template):

remote connection section (live takeover controls)
full context section (Show Full Context)
top title area
middle input/approve area (this part is generated/customized by uiTemplate)

This enables capability-specific flows such as:

OAuth login (credentials)
payment card confirmation (payment_card)
camera/photo delegation
microphone/audio delegation
location delegation
album/file selection delegation

High-level runtime behavior:

agent emits request_human_auth with capability and optional uiTemplate (or templatePath generated via coding tools in the same Agent Loop)
relay renders a fixed secure shell (remote connection, context, title) plus request-specific middle/approve content from sanitized uiTemplate
human approves/rejects and optionally uploads/enters delegated artifact
bridge returns decision/artifact to runtime and task continues

Important: current implementation is delegation-based (explicit artifact handoff after approve), not direct remote hardware passthrough from human phone sensors into Agent Phone OS APIs.

Credential security notes:

relay server and request state are hosted on the user machine
approval artifacts are persisted locally (state/human-auth-artifacts/)
no centralized OpenPocket credential relay service is used
use LAN mode (humanAuth.tunnel.provider=none) for zero third-party network hop

To inspect current chat allow policy and discover recent chat IDs for your bot:

openpocket telegram whoami

When a running task enters Android system permission UI (permissioncontroller / packageinstaller), OpenPocket handles it locally in the emulator (auto-approve policy) instead of escalating to remote human-auth.

PermissionLab E2E Test

Use the built-in Android test app to verify remote authorization flow end-to-end.

1) Start gateway + ngrok human-auth mode

openpocket gateway start

2) Build/install/launch test app on emulator

openpocket test permission-app deploy

Optional commands:

openpocket test permission-app launch
openpocket test permission-app reset
openpocket test permission-app uninstall
openpocket test permission-app cases
openpocket test permission-app task
openpocket test permission-app run --case camera --chat <your_chat_id>
openpocket test permission-app task --case camera --send --chat <your_chat_id>

3) Trigger scenario run (agent auto-clicks button)

Recommended command:

openpocket test permission-app run --case camera --chat <your_chat_id>

Or use task --send (same execution path, keeps backward compatibility):

openpocket test permission-app task --case camera --send --chat <your_chat_id>

Available scenario IDs: camera, microphone, location, contacts, sms, calendar, photos, notification, 2fa.

In this mode, OpenPocket will:

build/install/reset/launch PermissionLab
run the agent with a scenario-specific task
ask the agent to tap the exact scenario button
send Telegram only when human authorization is actually required

4) Approve from phone

When Telegram receives the human-auth message:

Open the URL button (ngrok public link).
Approve/reject on the web page.
Agent resumes and reports task result in Telegram.

Dockerized Agent E2E (Headless Linux)

OpenPocket can run on Linux/headless servers when Android SDK + emulator dependencies are present. The current auto-installer is macOS-only, but runtime execution is cross-platform.

For repeatable integration tests, use the Docker E2E harness:

npm run test:e2e:docker

What this flow does:

Build a Linux Docker image with Android SDK, emulator, and an AVD.
Start a headless emulator in the container.
Run OpenPocket agent with a natural-language task.
Verify session artifacts and action execution results.

Important notes:

On Linux with /dev/kvm, emulator boot is much faster.
On macOS Docker Desktop (no KVM passthrough), emulator usually works with software acceleration but can be significantly slower.
You can override the task text:

OPENPOCKET_E2E_TASK=\"Open Android Settings and then go home\" npm run test:e2e:docker

Documentation

Where the frontend is

The documentation frontend is implemented in this repository:

Site source: /frontend
VitePress config: /frontend/.vitepress/config.mjs
Custom homepage: /frontend/index.md
Custom theme styles: /frontend/.vitepress/theme/custom.css

Documentation Website

Start local docs server:

npm run docs:dev

Build static docs:

npm run docs:build

Build for Vercel (root base path):

npm run docs:build:vercel

Preview built docs:

npm run docs:preview

Deployment options

Vercel config: vercel.json
Deployment guide: /frontend/get-started/deploy-docs.md

Docs entry points

Repository Structure

/src: runtime source code (agent, gateway, device, tools, onboarding, dashboard)
/frontend: standalone frontend site (homepage + docs)
/test: runtime contract and integration tests
/dist: build output

Development

Run checks:

npm run check
npm test

Contributing

Prefer behavior-driven changes with matching tests.
Document new runtime capabilities under /frontend in the relevant hub.

Security and Safety Notes

run_script execution is guarded by an allowlist and deny patterns.
exec/process coding tools are guarded by allowlist, deny patterns, workspace boundaries, timeout, and output caps.
Timeout and output truncation are enforced for script/coding execution.
memory tools are read-scoped to MEMORY.md and memory/*.md.
Local paths are sanitized/redacted in Telegram-facing outputs.

License

MIT