@octoparse-cli/octoparse-cli
v0.1.26
Published
Standalone Octoparse engine CLI that runs local extraction without the Electron client.
Downloads
909
Readme
octoparse-cli
Command-line runner for Octoparse extraction tasks.
octoparse can list cloud tasks, run tasks locally, control active local
runs, and export collected data.
Requirements
- Node.js 20 or newer
- A valid Octoparse account or API key
Quick start
1. Install
Install the CLI globally:
npm install -g @octoparse-cli/octoparse-cliThe installed command is:
octoparseCheck the installation:
octoparse --version
octoparse doctor2. Log in
Most commands require Octoparse credentials. Run:
octoparse auth loginauth login lets you choose OAuth or API key login. OAuth opens the browser,
then saves the token locally after login.
To force OAuth login:
octoparse auth login --oauthAPI key login is still supported. Create the key here:
https://www.octoparse.com/console/account-center/api-keysIf you already copied the key, you can save time and pass it directly:
octoparse auth login XXXXXFor CI or scripts, set an environment variable instead:
OCTO_ENGINE_API_KEY=xxx octoparse task list --json
OCTO_ENGINE_ACCESS_TOKEN=xxx octoparse task list --json3. Use the CLI
Query the task list:
octoparse task list
octoparse task list --page 2 --page-size 20Query a single task:
octoparse task inspect <taskId>Run a task locally:
octoparse run <taskId>Local Chrome execution is supported on macOS x64/arm64, Windows x64, and Linux x64. Linux arm64 is not supported by the local CLI runtime because Chrome for Testing does not currently provide a Linux arm64 browser package; use a supported local platform or cloud extraction there.
Create a local task from a URL directly with CLI-only selection:
octoparse detect 'https://example.com/list' --auto --output task.json
octoparse detect 'https://example.com/search' --manual --query keyword --save-session --output task.jsondetect uses the protected SmartProxy detector by default and requires
configured credentials. Manual mode can save a cookies-only browser session for
later local runs. Agent mode is available through --agent --agent-command;
that command executes a local shell command and should only point to a trusted
agent runner.
If an LLM/agent is helping a user create a task with Octoparse CLI, it should
run octoparse capabilities --json first and follow
machineContract.recipes.createTaskFromUrlWithAgent. That recipe tells the
agent to use detect --agent with a trusted agent runner for the shortest
create-task path, adding --run-sample <n> when immediate sample rows are
needed. The lower-level prepare/plan/preview/apply workflow remains available
for audit and repair instead of asking the user to explain internal detect
flags, using --auto as the default path, or hand-writing JSON.
Agent workflows generate a full-page screenshot, an annotated screenshot, and
top candidate crop screenshots when boxes are available. These paths are exposed
through context.screenshot, context.visualArtifacts, and
context.decisionSummary; pass the user's natural-language request with --goal
so the agent can judge candidates against both the visual page and the stated
intent. The context also includes
resultValidationPolicy; agents should treat isolated missing fields in ads,
topic cards, sponsored items, or heterogeneous rows as normal partial data
instead of repeatedly recreating the task.
Run in the background:
octoparse run <taskId> --detachQuery the local run status, or stop the local process running a task:
octoparse local status <taskId>
octoparse local stop <taskId>Note: local run status is tracked by this CLI only and is not synchronized with the Octoparse desktop client status.
Export data:
octoparse data export <taskId> --source local --format xlsx
octoparse data export <taskId> --source cloud --format csvCommon commands
# Help and diagnostics
octoparse --help
octoparse doctor
octoparse browser doctor
# Authentication
octoparse auth login
octoparse auth login --oauth
octoparse auth login XXXXX
octoparse auth status
octoparse auth logout
# Task discovery
octoparse task list
octoparse task list --page 2 --page-size 20
octoparse task list --keyword news --page 2 --page-size 10
octoparse task inspect <taskId>
# Task creation
octoparse detect 'https://example.com/list' --auto --output task.json
octoparse detect 'https://example.com/search' --manual --query keyword --save-session --output task.json
# Local extraction
octoparse run <taskId>
octoparse run <taskId> --jsonl
octoparse run <taskId> --detach
octoparse local status <taskId>
octoparse local pause <taskId>
octoparse local resume <taskId>
octoparse local stop <taskId>
# Cloud extraction
octoparse cloud start <taskId>
octoparse cloud stop <taskId>
octoparse cloud status <taskId>
octoparse cloud history <taskId>
# Data
octoparse data history <taskId> --source local
octoparse data history <taskId> --source cloud
octoparse data export <taskId> --source local --format xlsx
octoparse data export <taskId> --source cloud --format csvBy default, local run artifacts are stored in ~/.octoparse/runs. If you
customize the run artifact directory with --output, use the same --output
again when reading local history or exporting local data:
octoparse run <taskId> --output ./runs
octoparse data history <taskId> --source local --output ./runs
octoparse data export <taskId> --source local --output ./runs --format xlsxAuthentication
Most commands require OAuth or API key credentials. Only setup and diagnostic commands such as
--help, --version, doctor, browser doctor, capabilities, and auth
can run before login.
For interactive OAuth login:
octoparse auth login
octoparse auth login --oauthCreate API keys in the Octoparse console:
https://www.octoparse.com/console/account-center/api-keysIf the API key is already copied:
octoparse auth login XXXXXUse --no-open if you want to copy the URL manually:
octoparse auth login --no-openFor CI or scripts:
OCTO_ENGINE_API_KEY=xxx octoparse task list --json
OCTO_ENGINE_ACCESS_TOKEN=xxx octoparse task list --jsonCredential precedence:
1. OCTO_ENGINE_API_KEY
2. OCTO_ENGINE_ACCESS_TOKEN
3. ~/.octoparse/credentials.jsonLocal task files
You can run or validate a local task definition file:
octoparse task validate <taskId> --task-file ./task.json
octoparse run <taskId> --task-file ./task.json
octoparse run sample --task-file ./sample.otdSupported local task file types:
.json.xml.otd
Kernel browser tasks are not supported in this CLI.
Machine-readable output
Use --json for one JSON response:
octoparse task list --json
octoparse local status <taskId> --jsonUse --jsonl for local run event streams:
octoparse run <taskId> --jsonlThe stream includes captcha and proxy events when the runtime asks the CLI
to resolve CAPTCHA or proxy resources automatically.
Local run artifacts are written under ~/.octoparse/runs by default, or under
the selected --output directory when configured:
<output>/<runId>/
meta.json
events.jsonl
logs.jsonl
rows.jsonlTroubleshooting
Check the local environment:
octoparse doctor
octoparse browser doctorIf the browser is not detected automatically, pass its path:
octoparse run <taskId> --chrome-path "/path/to/chrome"Linux arm64 local execution is not supported, even with --chrome-path,
because the bundled local runtime depends on Chrome for Testing platform
support.
Clean stale local control state:
octoparse local cleanup
octoparse runs cleanup