openpocket
v0.2.3
Published
OpenPocket Node.js TypeScript runtime
Downloads
140
Readme
OpenPocket
An Intelligent Phone That Never Sleeps.
OpenPocket runs an always-on agent phone locally, with privacy first.
Current Capability Snapshot
Status snapshot (February 2026):
Implemented and usable now
- Local target runtime (via
adb) driven by CLI + Telegram gateway + dashboard. - Deployment target abstraction with
emulatorandphysical-phoneready today (android-tvandcloudare in progress). - Interactive setup (
openpocket onboard) for consent, model/API key, Telegram, target selection, and human-auth mode. - Template-driven prompt system with runtime mode control (
full|minimal|none) and workspace context budgets. - Bootstrap-driven chat onboarding (
BOOTSTRAP.md,PROFILE_ONBOARDING.json) with persisted workspace onboarding state. - Model-driven progress and outcome narration (
TASK_PROGRESS_REPORTER.md,TASK_OUTCOME_REPORTER.md) with anti-noise suppression. - Coding toolchain in task loop (
read,write,edit,apply_patch,exec,process) with workspace/safety constraints. - Memory tools in task loop (
memory_search,memory_get) for recall-oriented interactions. - Prompt observability via Telegram
/context [list|detail|json]. - Human-authorization relay (manual
/auth, one-time web link, optional ngrok) with dynamic template pages and agentic delegation artifacts. - In-emulator permission dialogs auto-handled locally (no remote auth escalation for Android runtime permission popups).
- Telegram bot display-name sync from profile identity changes.
- Automatic reusable artifact generation after successful tasks (
skills/auto,scripts/auto) with behavior fingerprint dedupe and semanticui_targettraces. - Auditable persistence for sessions, daily memory, screenshots, relay state, and script run artifacts.
Active improvement focus
- Long-horizon memory quality (ranking/compaction/freshness).
- Prompt evaluation and regression coverage for phone-use scenarios.
- Cross-platform runtime hardening and operational reliability.
Latest Merged Updates (main)
1) PR #77: Multi-target deployment + agentic Human Auth + capability probe
Merged PR: #77
Highlights:
- multi-target framework (
emulator,physical-phone,android-tv,cloud) - USB/Wi-Fi ADB discovery and interactive selection in target flows
openpocket target pairfor Wireless Debugging pairing- capability probe utility (
phone-use-util) for camera/microphone/location/photos/payment signals - dynamic Human Auth portal templates (
uiTemplate/templatePath) - secure payment path support via UI tree field extraction and delegated form collection
- step-level timing observability and expanded no-browser human-auth E2E tests
2) Auto-Skill Experience Engine (latest commits on main)
Key commits:
032fa03feat(skills): experience engine (active skill injection, UI semantic traces, replay relevance)c02a870fix: harden auto-skill prompt andui_targetescaping
Highlights:
- runtime now injects active skill content, not only compact skill list metadata
- auto-generated skills include stronger step semantics (
ui_target) and reusable behavior fingerprints - skill loader gating/triggers and escaping hardening improve safety and prompt robustness
Quick Start
1. Prerequisites
- Node.js 20+
- Android platform-tools (
adb) for all target types - For default emulator target: Android SDK Emulator + at least one Android AVD
- For physical-phone target: one Android phone with Developer options + USB debugging enabled
- API key for your selected model profile (make sure you have credit with your selected model provider)
- Telegram bot token (for gateway mode) follow this instruction
- (Optional, recommended) ngrok authtoken for remote approval (free to obtain)
NOTE: If gpt-5.3-codex is unavailable in your account/provider route, use gpt-5.2-codex.
2. Install
Option A: npm package (recommended for end users)
npm install -g openpocket
openpocket onboard
openpocket gateway startOption B: source clone (recommended for contributors)
git clone [email protected]:SergioChan/openpocket.git
cd openpocket
npm install
npm run build
./openpocket onboard
./openpocket gateway start3. What onboard configures
The onboarding wizard is interactive and persists progress to:
~/.openpocket/state/onboarding.json
It walks through:
- User consent for local runtime and data boundaries.
- Deployment target selection (
emulator,physical-phone,android-tv,cloud). - Model profile selection and API key source (env or local config).
- Channel setup — choose which messaging channels to enable:
- Telegram (bot token + chat allowlist policy)
- Discord (bot token + DM access policy + guild config)
- WhatsApp (Baileys QR session linking + DM access policy + chunking mode)
- Emulator startup and manual Play Store/Gmail verification (emulator target only).
- Human-auth bridge mode:
- disabled
- local LAN relay
- local relay + ngrok tunnel (remote approval link)
4. Run your first task
openpocket agent --model gpt-5.2-codex "Open Chrome and search weather"Or send plain text directly to your configured messaging channel (Telegram, Discord, or WhatsApp) after gateway start.
5. Use a physical Android phone as Agent Phone
- Enable Developer options on your phone:
Settings -> About phone -> Build number- tap
Build number7 times - go back to
Settings -> System -> Developer options - enable
USB debugging
- Connect the phone via USB and approve the
Allow USB debuggingprompt. - Set deployment target:
adb devices -l
openpocket target set --type physical-phone
openpocket target showWhen multiple devices are online, target set shows an arrow-key selector with explicit transport labels (USB ADB / WiFi ADB) so you can choose the exact device.
You can also use aliases: openpocket target set-target ... or openpocket target config ....
- Start runtime:
openpocket gateway startOptional Wi-Fi ADB:
adb tcpip 5555
adb connect <phone-ip>:5555
openpocket target set --type physical-phone --adb-endpoint <phone-ip>:5555Or use the built-in pairing wrapper (no manual adb commands):
openpocket target pair --host <device-ip> --pair-port <pair-port> --code <pairing-code> --type physical-phoneNotes:
- Keep phone unlocked during first pairing/authorization.
android-tvandcloudtargets already exist in config/CLI, and full deployment guides are still in progress.
6. Persistence and storage locations
For a full persistence map (OpenPocket runtime files + Android AVD/image storage and deletion/reset flow), see:
Deployment Playbook by OS
This section focuses on production-style runtime deployment for:
gateway start- emulator setup/startup
- ngrok + human-auth relay configuration
Support Matrix
| Environment | Recommended | Notes |
| --- | --- | --- |
| macOS (Apple Silicon / Intel) | Yes | Best local developer experience. |
| Windows (native host) | Yes | Use Android Emulator on Windows host (Hyper-V/WHPX). |
| Linux Server (x86_64 + KVM) | Yes | Recommended for headless server runtime. |
| Docker on ARM host running linux/amd64 emulator | Not recommended | Works unpredictably due nested software emulation. |
Common Config Baseline
Run onboarding once:
openpocket onboardOr configure manually in ~/.openpocket/config.json:
{
"target": {
"type": "emulator",
"adbEndpoint": "",
"cloudProvider": ""
},
"emulator": {
"avdName": "OpenPocket_AVD",
"androidSdkRoot": "",
"headless": true,
"bootTimeoutSec": 180
},
"telegram": {
"botTokenEnv": "TELEGRAM_BOT_TOKEN",
"allowedChatIds": []
},
"humanAuth": {
"enabled": true,
"useLocalRelay": true,
"localRelayHost": "127.0.0.1",
"localRelayPort": 8787,
"apiKeyEnv": "OPENPOCKET_HUMAN_AUTH_KEY",
"tunnel": {
"provider": "ngrok",
"ngrok": {
"enabled": true,
"authtokenEnv": "NGROK_AUTHTOKEN"
}
}
}
}Required env vars for gateway + remote approval:
export TELEGRAM_BOT_TOKEN="<your_telegram_bot_token>"
export OPENPOCKET_HUMAN_AUTH_KEY="<your_human_auth_key>"
export NGROK_AUTHTOKEN="<your_ngrok_token>"macOS (Local Runtime)
- Install Android SDK + Emulator + at least one AVD (Android Studio preferred).
- Verify toolchain:
adb version
~/Library/Android/sdk/emulator/emulator -list-avds- Start emulator:
openpocket emulator start
openpocket emulator status- Start gateway (dashboard auto-starts):
openpocket gateway start- If human-auth uses ngrok, gateway will auto-start local relay + tunnel and send approval URLs in Telegram.
Windows (Native Host Runtime)
Windows does not require WSL for OpenPocket runtime. Recommended setup is: Android Emulator + adb on Windows host, OpenPocket CLI also on Windows host.
- Install Android Studio + SDK tools, create an AVD.
- Ensure virtualization acceleration is enabled (WHPX/Hyper-V for emulator).
- In PowerShell:
$env:ANDROID_SDK_ROOT="$env:LOCALAPPDATA\\Android\\Sdk"
adb version
& "$env:ANDROID_SDK_ROOT\\emulator\\emulator.exe" -list-avds
openpocket emulator start
openpocket gateway start- Set tokens in user env (or config file):
setx TELEGRAM_BOT_TOKEN "<token>"
setx OPENPOCKET_HUMAN_AUTH_KEY "<key>"
setx NGROK_AUTHTOKEN "<ngrok_token>"WSL can still be used for development tooling, but running Android Emulator inside WSL/Linux guest is not the preferred path.
Linux Server (x86_64 Headless)
This is the recommended server deployment target.
- Validate architecture and KVM:
uname -m # expect x86_64
ls -l /dev/kvm # must exist- Install Android SDK cmdline tools, platform-tools, emulator, and create AVD.
- Use headless mode in config:
"emulator": {
"headless": true,
"extraArgs": ["-no-window", "-no-audio", "-no-boot-anim", "-no-snapshot"]
}- Start runtime:
openpocket emulator start
openpocket gateway start- For service mode, run with
systemdortmux, and keep ngrok token configured for remote human-auth links.
Current End-to-End Tests
OpenPocket currently has two E2E paths:
test/integration/docker-agent-e2e.mjs- Automated agent E2E.
- Simulates natural-language task -> planning -> emulator actions -> session assertions.
- Can run locally (direct host execution) and in Docker wrapper (
npm run test:e2e:docker).
openpocket test permission-app run --case <scenario> --chat <chat_id>- PermissionLab human-auth E2E.
- Validates agent + Telegram + relay/ngrok + approval handoff.
- Scenarios:
camera,microphone,location,contacts,sms,calendar,photos,notification,2fa.
scripts/smoke/dual-side-smoke.sh- Dual-side smoke gate for coding + Android event lineage.
- Covers:
- Telegram coding instruction -> local file write verification.
- Android build/install/run/logcat-style tool chain events.
- Unified session trace lineage checks across tool events.
- Runs fast with deterministic mocks and no real device dependency.
Run it locally:
bash scripts/smoke/dual-side-smoke.shKey Capabilities
- Local device-first runtime: execution stays on your machine via adb, not a hosted cloud phone.
- Always-on agent loop: model-driven planning + one-step action execution over Android UI primitives.
- Prompt system aligned for agent behavior:
- prompt modes (
full|minimal|none) - workspace template injection with explicit char budgets
- task progress/outcome narrators driven by prompt templates
- prompt modes (
- Remote authorization proxy (human-auth relay):
- agent emits
request_human_authonly for real-device/sensitive checkpoints - gateway sends one-time approval link and manual fallback commands
- local relay can auto-start with optional ngrok tunnel
- agent emits
- In-emulator permission handling: Android runtime permission dialogs are auto-approved locally when detected.
- Coding and memory tools inside task loop:
- coding:
read,write,edit,apply_patch,exec,process - memory:
memory_search,memory_get
- coding:
- Dual control modes: direct user control and agent control on the same target runtime.
- Production-style gateway operations: Telegram command menu bootstrap, heartbeat, cron jobs, restart loop, safe stop.
- Script and coding safety controls: allowlist + deny patterns + timeout + output caps + run artifacts.
- Prompt observability:
/contextcommand reports actual injected prompt context and budgets. - Auditable persistence: task sessions, daily memory, screenshots, script archives, and relay/auth state.
Roadmap
R1. Memory System (Core Intelligence)
Build a robust memory layer for long-horizon tasks:
- semantic retrieval and episodic memory
- memory compaction/summarization
- conflict resolution and freshness policies
- memory-aware planning loops
R2. Prompt Engineering for Phone-Use
Establish a production prompt stack tailored to mobile workflows:
- phone-specific action planning prompts
- app-state-aware prompting templates
- failure-recovery prompting
- prompt eval suite and regression benchmarks
R3. Multi-OS Runtime and Control Surface
Expand from macOS-first to full platform support:
- Linux (Ubuntu and headless server scenarios)
- Windows support
- dashboard portability strategy:
- primary target: local/remote Web UI
- fallback: native OS-specific control apps only when needed
R4. Real Device Authorization + Permission Isolation
Strengthen system-level authorization architecture:
- iOS and Android real-device compatibility
- cross-device authorization where real phone and emulator differ
- secure remote port authorization flow
- strict permission boundary between real phone and emulator runtime
R5. Skill System Maturity
Evolve from static skills to dynamic capability generation:
- agent-authored skills/code generation
- safe execution sandbox and policy gates
- skill validation, caching, and reuse
R6. Multi-Channel Control Integrations
Go beyond Telegram and support more communication entry points:
- international platforms: Discord, WhatsApp, iMessage, Messenger
- China-focused platforms: WeChat, QQ
- unified channel abstraction for message, auth, and task control
R7. Account Login UX and Session Authorization
Improve real-world login workflows after app installation:
- one-time session authorization links
- 2FA and SMS code handoff UX
- low-friction human-in-the-loop checkpoints
R8. Reliability, Security, and Release Quality (Added)
Additional engineering tracks needed for production readiness:
- end-to-end integration test matrix (including headless CI scenarios)
- threat model and security hardening for relay/auth artifacts
- observability improvements (structured logs, replay/debug traces)
- packaging/release automation and upgrade safety
Contributor Task Board
The project is actively seeking contributors. If you want to help, pick one task area below and open a PR with the task ID in the title (for example: R2-T3).
Memory System
R1-T1: design memory schema v2 (episodic + semantic + working memory)R1-T2: implement memory retrieval ranking and relevance filtersR1-T3: implement memory compaction/summarization jobsR1-T4: add memory quality tests for multi-step phone tasks
Prompt Engineering
R2-T1: draft phone-use prompt templates per task category (shopping/social/entertainment)R2-T2: add prompt fallback strategies for app-state ambiguityR2-T3: build prompt regression suite with golden trajectoriesR2-T4: add failure taxonomy and prompt tuning playbook
Cross-Platform Runtime + Dashboard
R3-T1: Linux runtime parity audit (CLI/emulator/gateway)R3-T2: Windows runtime bring-up and compatibility fixesR3-T3: define and implement Web UI dashboard MVPR3-T4: headless server operator workflow (no GUI) documentation + scripts
Real Device Auth + Isolation
R4-T1: iOS real-device auth bridge prototypeR4-T2: Android real-device auth bridge hardeningR4-T3: permission isolation policy and enforcement checksR4-T4: secure tunnel and one-time token lifecycle review
Skill System
R5-T1: agent-authored skill generation interfaceR5-T2: skill static checks + runtime policy gateR5-T3: skill test harness and reproducibility toolsR5-T4: skill marketplace-style metadata/index format
Multi-Channel Integrations
R6-T1: channel abstraction layer for inbound/outbound controlR6-T2: Discord connectorR6-T3: WhatsApp connectorR6-T4: WeChat/QQ connector research and adapter design
Login UX + Human-in-the-Loop
R7-T1: one-time account authorization session protocolR7-T2: 2FA/SMS remote approval UX flow and timeout handlingR7-T3: user-facing auth status model and recovery pathsR7-T4: mobile-first approval page UX improvements
Reliability and Security
R8-T1: integration test matrix for onboarding + gateway + auth relayR8-T2: security review for relay APIs and artifact storageR8-T3: observability dashboard/log schema improvementsR8-T4: release pipeline hardening and rollback-safe packaging
Product Scenarios
OpenPocket is built for both developers and everyday users.
Typical scenarios include:
- shopping flows across mobile apps
- entertainment routines and repetitive app navigation
- social task assistance with human-in-the-loop approvals
- recurring mobile actions that benefit from automation and traceability
Runtime Flow
Telegram / CLI -> Gateway -> Agent Runtime -> Model Client -> adb -> Agent Phone Target
Architecture
flowchart LR
U["Local User / Telegram"] --> G["OpenPocket Gateway"]
G --> A["Agent Runtime"]
A --> M["Model Client"]
A --> D["ADB Runtime"]
A --> S["Script Executor"]
A --> C["Coding Executor"]
A --> R["Memory Executor"]
D --> E["Agent Phone Target (Local)"]
A --> W["Workspace Store"]
W --> SS["sessions/*.md"]
W --> MM["memory/YYYY-MM-DD.md"]
W --> RR["scripts/runs/*"]
W --> AS["skills/auto/* + scripts/auto/*"]
RP["User Phone (Human Auth Link)"] -.-> GConfiguration
Primary config file:
~/.openpocket/config.json(orOPENPOCKET_HOME/config.json)
Example config template:
Skill compatibility mode:
agent.skillsSpecMode = "legacy" | "mixed" | "strict"- default is
mixed(legacy + strict-compatible loading) - use
strictto enforce directory-basedSKILL.mdvalidation
Skill validation command:
openpocket skills validate --strictSkill workspace commands:
openpocket skills list # show loaded workspace skills only
openpocket skills load # interactive select bundled skills to copy into workspace
openpocket skills load --all # copy all bundled skills missing from workspaceCoding runtime migration note:
agent.legacyCodingExecutoris now off by default.agent.legacyCodingExecutor=trueremains available as a temporary compatibility toggle, but it is deprecated and will be removed.- When fallback is disabled and a coding action is unsupported by pi coding tools, runtime errors point to this key explicitly.
Supported Model Providers
OpenPocket supports multiple AI model providers through OpenAI-compatible APIs:
OpenAI - Direct access to GPT models (gpt-5.2-codex, gpt-5.3-codex)
OpenRouter - Multi-provider routing for Claude models (claude-sonnet-4.6, claude-opus-4.6)
BlockRun - Pay-per-request micropayments with no subscriptions
- Ideal for always-on agents with cost-effective pricing
- Access to 30+ models: GPT-4o, Claude Sonnet 4, Gemini 2.0 Flash, DeepSeek
- Model IDs:
blockrun/gpt-4o,blockrun/claude-sonnet-4,blockrun/gemini-2.0-flash,blockrun/deepseek-chat - Get started at docs.blockrun.ai
AutoGLM - Phone-optimized multilingual model (autoglm-phone)
Common environment variables:
export OPENAI_API_KEY="<your_openai_key>"
export OPENROUTER_API_KEY="<your_openrouter_key>"
export BLOCKRUN_API_KEY="<your_blockrun_key>"
export AUTOGLM_API_KEY="<your_autoglm_key>"
export TELEGRAM_BOT_TOKEN="<your_telegram_bot_token>"
export OPENPOCKET_HUMAN_AUTH_KEY="<your_human_auth_relay_key>"
export NGROK_AUTHTOKEN="<your_ngrok_token>"
export ANDROID_SDK_ROOT="$HOME/Library/Android/sdk"
export OPENPOCKET_HOME="$HOME/.openpocket"For Codex subscription auth (no OPENAI_API_KEY), OpenPocket can reuse Codex CLI credentials for codex models:
- login once with the
codexCLI - OpenPocket reads
$CODEX_HOME/auth.json(or~/.codex/auth.json) - on macOS, it also checks the
Codex Authkeychain entry first
CLI Surface
Command prefix by install mode:
- npm package install: use
openpocket ... - local source clone: use
./openpocket ...(oropenpocket ...afterinstall-cli)
./openpocket --help
./openpocket install-cli
./openpocket onboard
./openpocket target show
./openpocket target set --type physical-phone
./openpocket target set --type physical-phone --adb-endpoint 192.168.1.25:5555
./openpocket config-show
./openpocket emulator start
./openpocket emulator status
./openpocket agent --model gpt-5.2-codex "Open Chrome and search weather"
./openpocket script run --text "echo hello"
./openpocket telegram setup
./openpocket telegram whoami
./openpocket skills list
./openpocket skills load
./openpocket skills load --all
./openpocket skills validate --strict
./openpocket gateway start
./openpocket dashboard start
./openpocket test permission-app deploy
./openpocket test permission-app task
./openpocket human-auth-relay starthuman-auth-relay start is mainly a standalone debug mode. In normal gateway usage, local relay/tunnel startup is handled automatically from config.
gateway start now auto-starts the local Web dashboard (default http://127.0.0.1:51888, configurable in config.dashboard).
Use dashboard start when you want to run only the dashboard process.
Legacy aliases still work (deprecated): openpocket init, openpocket setup.
The legacy native macOS panel has been removed from the repository.
Use openpocket dashboard start (or openpocket gateway start, which auto-starts dashboard).
Web Dashboard
The local Web dashboard is now the primary control surface.
Startup behavior
openpocket gateway startauto-starts dashboard and prints dashboard URL.openpocket dashboard startstarts dashboard only (no Telegram gateway).
Default dashboard config:
"dashboard": {
"enabled": true,
"host": "127.0.0.1",
"port": 51888,
"autoOpenBrowser": false
}Runtime page layout
- Left column: Gateway status, emulator controls, and core path config.
- Right column: large emulator preview pane for tap/text control.
Auto refresh behavior
- Preview auto-refresh updates image/metadata silently.
- It does not spam status text with repeated "Refreshing emulator preview..." messages.
Human Authorization Modes
OpenPocket supports three human-auth configurations:
- Disabled: no relay, no remote approval.
- LAN relay: local relay exposed on LAN for phone access in the same network.
- Relay + ngrok: gateway auto-starts local relay and ngrok, then issues public approval links.
When the agent emits request_human_auth, Telegram users can:
- tap the web approval link
- or run fallback commands:
/auth approve <request-id> [note]/auth reject <request-id> [note]
- for any auth wall, use the request-specific Human Auth page generated from
uiTemplate(optional live remote takeover is still available), then approve/reject
Dynamic Human Auth Portal Templates
request_human_auth now supports an optional uiTemplate payload so each authorization page can be customized per request instead of using one fixed form.
Supported template controls include:
- title/summary/capability hint text
- theme style (
brandColor,backgroundCss,fontFamily) - structured form fields (
text,textarea,email,password,otp,card-number,expiry,cvc,select, ...) - agent-generated middle-section code (
middleHtml,middleCss,middleScript) - agent-generated approval logic (
approveScript) - reusable template file path from Agent Loop coding tools (
templatePath, JSON in workspace) - delegation toggles (text/location/photo/audio/file attachments)
- artifact policy (
artifactKind,requireArtifactOnApprove)
Portal shell invariants (always present, not generated by template):
- remote connection section (live takeover controls)
- full context section (
Show Full Context) - top title area
- middle input/approve area (this part is generated/customized by
uiTemplate)
This enables capability-specific flows such as:
- OAuth login (
credentials) - payment card confirmation (
payment_card) - camera/photo delegation
- microphone/audio delegation
- location delegation
- album/file selection delegation
High-level runtime behavior:
- agent emits
request_human_authwithcapabilityand optionaluiTemplate(ortemplatePathgenerated via coding tools in the same Agent Loop) - relay renders a fixed secure shell (remote connection, context, title) plus request-specific middle/approve content from sanitized
uiTemplate - human approves/rejects and optionally uploads/enters delegated artifact
- bridge returns decision/artifact to runtime and task continues
Important: current implementation is delegation-based (explicit artifact handoff after approve), not direct remote hardware passthrough from human phone sensors into Agent Phone OS APIs.
Credential security notes:
- relay server and request state are hosted on the user machine
- approval artifacts are persisted locally (
state/human-auth-artifacts/) - no centralized OpenPocket credential relay service is used
- use LAN mode (
humanAuth.tunnel.provider=none) for zero third-party network hop
To inspect current chat allow policy and discover recent chat IDs for your bot:
openpocket telegram whoamiWhen a running task enters Android system permission UI
(permissioncontroller / packageinstaller), OpenPocket handles it locally in
the emulator (auto-approve policy) instead of escalating to remote human-auth.
PermissionLab E2E Test
Use the built-in Android test app to verify remote authorization flow end-to-end.
1) Start gateway + ngrok human-auth mode
openpocket gateway start2) Build/install/launch test app on emulator
openpocket test permission-app deployOptional commands:
openpocket test permission-app launch
openpocket test permission-app reset
openpocket test permission-app uninstall
openpocket test permission-app cases
openpocket test permission-app task
openpocket test permission-app run --case camera --chat <your_chat_id>
openpocket test permission-app task --case camera --send --chat <your_chat_id>3) Trigger scenario run (agent auto-clicks button)
Recommended command:
openpocket test permission-app run --case camera --chat <your_chat_id>Or use task --send (same execution path, keeps backward compatibility):
openpocket test permission-app task --case camera --send --chat <your_chat_id>Available scenario IDs: camera, microphone, location, contacts, sms,
calendar, photos, notification, 2fa.
In this mode, OpenPocket will:
- build/install/reset/launch PermissionLab
- run the agent with a scenario-specific task
- ask the agent to tap the exact scenario button
- send Telegram only when human authorization is actually required
4) Approve from phone
When Telegram receives the human-auth message:
- Open the URL button (ngrok public link).
- Approve/reject on the web page.
- Agent resumes and reports task result in Telegram.
Dockerized Agent E2E (Headless Linux)
OpenPocket can run on Linux/headless servers when Android SDK + emulator dependencies are present. The current auto-installer is macOS-only, but runtime execution is cross-platform.
For repeatable integration tests, use the Docker E2E harness:
npm run test:e2e:dockerWhat this flow does:
- Build a Linux Docker image with Android SDK, emulator, and an AVD.
- Start a headless emulator in the container.
- Run OpenPocket agent with a natural-language task.
- Verify session artifacts and action execution results.
Important notes:
- On Linux with
/dev/kvm, emulator boot is much faster. - On macOS Docker Desktop (no KVM passthrough), emulator usually works with software acceleration but can be significantly slower.
- You can override the task text:
OPENPOCKET_E2E_TASK=\"Open Android Settings and then go home\" npm run test:e2e:dockerDocumentation
Where the frontend is
The documentation frontend is implemented in this repository:
- Site source:
/frontend - VitePress config:
/frontend/.vitepress/config.mjs - Custom homepage:
/frontend/index.md - Custom theme styles:
/frontend/.vitepress/theme/custom.css
Documentation Website
- Start local docs server:
npm run docs:dev- Build static docs:
npm run docs:build- Build for Vercel (root base path):
npm run docs:build:vercel- Preview built docs:
npm run docs:previewDeployment options
- Vercel config:
vercel.json - Deployment guide:
/frontend/get-started/deploy-docs.md
Docs entry points
Repository Structure
/src: runtime source code (agent, gateway, device, tools, onboarding, dashboard)/frontend: standalone frontend site (homepage + docs)/test: runtime contract and integration tests/dist: build output
Development
Run checks:
npm run check
npm testContributing
- Prefer behavior-driven changes with matching tests.
- Document new runtime capabilities under
/frontendin the relevant hub.
Security and Safety Notes
run_scriptexecution is guarded by an allowlist and deny patterns.exec/processcoding tools are guarded by allowlist, deny patterns, workspace boundaries, timeout, and output caps.- Timeout and output truncation are enforced for script/coding execution.
- memory tools are read-scoped to
MEMORY.mdandmemory/*.md. - Local paths are sanitized/redacted in Telegram-facing outputs.
