@openlibing/ds-cli

v1.4.0

Published

4 days ago

DolphinScheduler CLI with workflow diagnosis (假成功检测), log filtering, and AI agent skill integration (.agents/skills default).

Downloads

2,020

0High
0Medium
0Low

caoxiaolin

dolphinscheduler openlibing workflow scheduler cli ai mcp false-success-detection task-diagnosis

ds-cli — DolphinScheduler CLI

A command-line interface for DolphinScheduler with data query, analysis, workflow management, one-click instance diagnosis, and full MCP (Model Context Protocol) server support for AI agent integration.

Published by openlibing on npm.

Features

Data Query: Query datasources, view table structures, execute SQL
Data Analysis: Statistics on workflows, tasks, queues, and system health
Workflow Management: Create, run, schedule, and monitor workflows
Cross-Environment Workflow Sync (ds workflow export / ds workflow import): Recursively export a workflow definition (with all SUB_WORKFLOW children) to JSON and import into another environment. Topologically ordered (leaves-first) with automatic OFFLINE→edit→ONLINE state-machine handling, DS-assigned code remapping, and SUB_WORKFLOW reference rewriting via codeMap. Default dry-run; requires --confirm to write.
Run Sub-Workflow Tasks (ds run start): Start any workflow or sub-workflow instance with full DS API parameters (tenantCode, scheduleTime JSON, execType, warningGroupId). Includes search-and-run workflow (ds workflow list -s <keyword> → ds run start) and failure diagnosis (ds inspect) workflow.
Instance Diagnosis (ds diagnose): One-click false-success detection, sub-workflow drill-down, log structure analysis — designed for DS v3.x "假成功" patterns
One-Click Failure Analysis (ds analyze failure): Automatically locates instance, lists tasks, drills down SUB_WORKFLOW to leaf tasks, identifies failures (including false-success), summarizes error logs, and provides root cause — no need to manually chain env → project → instance → tasks → diagnose → log
End-to-End Investigation (ds investigate): Process-oriented 4-in-1 commands covering workflow overview, single-task deep-dive, cluster health check, and Gantt timeline — designed for "ask AI to investigate" scenarios
L3 Process Tools (4 new one-click tools):
- ds analyze slowness — find why a workflow instance is slow
- ds investigate delay — diagnose data delay issues
- ds investigate resource — check cluster Master/Worker/queue status
- ds investigate task-troubleshoot — quick diagnostic for a single task
Smart Log View (ds log view): Tail/grep/exec-only filters, full-log pagination, no temp files
Log Throughput Analysis (ds log throughput): Extract SeaTunnel Read/Write trends, delta rates, and average throughput from task logs
Bottleneck Detection (ds stats bottleneck): Identify top-N workflows by total duration, failure rate, and avg/max/min breakdown
Instance Top-N (ds instance top-n): Quickly find the longest-running workflow instances with auto-pagination
Gantt Timeline (ds instance gantt): Visualize task execution timeline for a workflow instance
Multi-Environment: Manage multiple DolphinScheduler instances (HTTP/HTTPS) with different tokens
AI Skills: Built-in YAML Skills for AI Agent integration (default .agents/skills/) — including multi-step Playbook Skills for incident response, data delay, and cluster health
4-Layer MCP Architecture (55 tools):
- L1 Atomic (~38): single-point read/write — list/get/create/update/delete
- L2 Analysis (~7): analyze_* — failure, slowness, throughput, bottleneck
- L3 Process (10): investigate_* + analyze failure/slowness — multi-step diagnostic
- L4 Skill (1): ds_skill_scenario — free-text → Playbook
- Multi-Environment Support: All tools accept an optional env parameter
- 18 Write Operations clearly marked with ⚠️ in description for AI safety
- Chinese Trigger Phrases: 5 L3 tools include natural Chinese triggers (e.g. "工作流 X 失败的原因是什么", "为什么这么慢")
- Data-Driven Recommendation Graph (skills/recommendation-graph.yaml): 20+ edges drive the → recommended_next: suggestions, with heuristic fallback
- Scenario Matching: ds_skill_scenario maps free-text user descriptions to the right Playbook
NPM Install: Installable via npm install -g

Installation

Global Install (recommended)

npm install -g @openlibing/ds-cli

Local Install

npm install @openlibing/ds-cli
npx @openlibing/ds-cli <command>

From Source

git clone https://github.com/openlibing/ds-cli.git
cd ds-cli
npm install
node bin/ds <command>

Quick Start

# 1. Initialize with interactive wizard (requires IP, Protocol and Token)
ds init

# 2. Or manually add an environment
ds env add -n dev -i localhost:12345 --protocol http -t <your-token>

# 3. Add another environment (e.g. production)
ds env add -n prod -i <prod-ip>:12345 -t <prod-token>

# 4. List all environments
ds env list

# 5. Switch environment
ds env use prod

# 6. Test the connection
ds env test

# 7. Try a command
ds project list

Note: No username/password/login required. Just provide IP, Protocol and Token, and you're ready to go.

For HTTPS endpoints:

ds env add -n beta -i <ip>:12345 -t <token> --protocol https

Commands

Environment Management

ds env list - List all environments
ds env current - Show current environment
ds env use <name> - Switch environment (positional arg, not -n)
ds env add -n <name> -i <ip:port> -t <token> - Add environment (only IP + Token)
- --protocol <protocol> - Protocol (http|https, default: http)
ds env remove -n <name> - Remove environment
ds env info [name] - Show environment details
ds env test [name] - Test environment connection
ds env set-token -n <name> -t <token> - Update token

Authentication

ds login -t <token> - Login with token (optional, for refreshing token)
ds logout - Logout

Note: ds-cli uses token-based authentication. No username/password login is required. Just provide the IP and Token when adding an environment.

Project

ds project list - List projects (auto-falls-back v3 → v2)
- --created-only - Only show projects created by current user
ds project create -n <name> - Create project
ds project delete <code> - Delete project
ds project info <code> - Show project details

Workflow

ds workflow list -p <project> - List workflows (with taskCount column)
- -s, --search <val> - Search by name
ds workflow create -p <project> -n <name> - Create workflow
ds workflow delete -p <project> -c <code> - Delete workflow
ds workflow info -p <project> -c <code> - Show workflow details
ds workflow release -p <project> -c <code> - Publish workflow
ds workflow export -p <project> -c <code> [-o <file>] [--no-subs] [-e <env>] - Export a workflow definition (recursively including all SUB_WORKFLOW children) to a JSON file
- -o, --output <file> - Output JSON path (default: ./workflow-<code>.json)
- --no-subs - Only export the root, skip child sub-workflows
ds workflow import -f <file> -p <project> [--confirm] [--no-release] [-e <env>] - Import workflow definitions from a JSON file into the target environment
- Default is dry-run: prints what would be UPDATED / CREATED / failed; nothing is written. Add --confirm to actually apply changes.
- --no-release - After import, leave workflows OFFLINE (default releases ONLINE in topological order).
- The CLI automatically handles: ONLINE→OFFLINE→edit→ONLINE state machine (DS code 50008), DS-assigned new code capture on CREATE, parent-workflow SUB_WORKFLOW reference rewriting (codeMap), and friendly hints for common DS error codes (50003 / 50008 / 10168).

Cross-Environment Workflow Sync Example

Sync the [workflow][raw->dm][codearts]流水线&构建任务数据获取 workflow (and all its sub-workflows) from prod to beta:

# 1. Export from prod (recursively pulls main workflow + all SUB_WORKFLOW children)
ds workflow export -p 169400948446944 -c 169579299839168 -o ./wf.json -e prod

# 2. Preview the diff against beta (dry-run, no writes)
ds workflow import -f ./wf.json -p 169400948446944 -e beta

# 3. Actually apply the changes (writes + releases ONLINE in topological order)
ds workflow import -f ./wf.json -p 169400948446944 -e beta --confirm

# Optional: leave imported workflows OFFLINE (skip ONLINE release)
ds workflow import -f ./wf.json -p 169400948446944 -e beta --confirm --no-release

How it works under the hood:

Topological sort by SUB_WORKFLOW reference — leaves first, root last. Ensures every child code exists in codeMap before the parent is updated.
Per-definition decision:
- If the source code already exists on the target → UPDATE via PUT /workflow-definition/{code} (after OFFLINE if needed).
- Otherwise → CREATE via POST /workflow-definition. DS will assign a new code; the CLI captures it into codeMap (sourceCode → newCode).
Reference rewriting: Before each UPDATE, the CLI rewrites all taskParams.workflowDefinitionCode fields in the task definition list using codeMap, so parents always point at the correct target-side child code.
State machine: Workflows that are ONLINE on the target are automatically OFFLINE'd before editing (works around DS error code 50008 does not allow edit).
Final release: After all writes succeed, workflows are released ONLINE in the same topological order (leaves first). Pass --no-release to skip.
Error hints: DS business error codes 50003 / 50008 / 10168 are translated into actionable suggestions.

Task

ds task list -p <project> -w <workflow> - List tasks in a workflow
ds task create -p <project> -w <workflow> -n <name> -t <type> - Create task
ds task delete -p <project> -c <code> - Delete task

Schedule

ds schedule list -p <project> - List schedules
ds schedule create -p <project> -w <code> --cron <expr> - Create schedule
ds schedule online -p <project> -i <id> - Bring schedule online
ds schedule offline -p <project> -i <id> - Take schedule offline

Run / Execution

ds run start -p <project> -c <code> - Start workflow
ds run stop -p <project> -i <instance> - Stop workflow
ds run pause -p <project> -i <instance> - Pause workflow
ds run resume -p <project> -i <instance> - Resume workflow

Instance

ds instance list -p <project> - List workflow instances
- -s, --state <state> - Filter by state (SUCCESS/FAILURE/RUNNING_EXECUTION/etc.)
- -w, --workflow <code> - Filter by workflow definition code
- --search <val> - Search by instance name
- --start-date <date> - Filter start date
- --end-date <date> - Filter end date
- --sort <field> - Sort by: duration|startTime (client-side)
- --order <dir> - Sort order: asc|desc
- -e, --env <name> - Environment name
ds instance info -p <project> -i <id> - Show instance details
ds instance tasks -p <project> -i <id> - List task instances (with duration, pid, sub-workflow drill-down)
ds instance top-n -p <project> - Show top-N longest-running workflow instances
- -n, --size <n> - Number of instances (default 10)
- -s, --state <state> - Filter by state
- -w, --workflow <code> - Filter by workflow definition code
- --start-date <date> / --end-date <date> - Date range filter
- -e, --env <name> - Environment name
ds instance gantt -p <project> -i <id> - Show Gantt-style task timeline
- -e, --env <name> - Environment name
ds instance delete -p <project> -i <id> - Delete instance

Datasource

ds datasource list - List datasources
ds datasource test -i <id> - Test connection
ds datasource tables -i <id> - List tables
ds datasource columns -i <id> -t <table> - List columns
ds datasource create -n <name> -t <type> - Create datasource
ds datasource delete -i <id> - Delete datasource

Query

ds query run -i <id> -s <sql> - Run SQL query
ds query preview -i <id> -t <table> - Preview table data (10 rows)

Statistics

ds stats workflow -p <project> - Workflow statistics (total/success/fail/avg/max/min duration)
- --days <n> - Days to look back (default 7)
- -w, --workflow <code> - Filter by workflow definition code
- -e, --env <name> - Environment name
ds stats task -p <project> - Task statistics per workflow instance
- --days <n> - Days to look back (default 7)
- -e, --env <name> - Environment name
ds stats bottleneck -p <project> - Identify bottleneck workflows by total/avg/max duration and failure rate
- --days <n> - Days to look back (default 7)
- --top <n> - Number of top workflows to show (default 10)
- -e, --env <name> - Environment name
ds stats queue - Queue statistics

Monitor

ds monitor master - View master nodes
ds monitor worker - View worker nodes
ds monitor database - Check database health

Log (Enhanced, no temp files)

ds log view -t <task-instance> - View task log with smart filters
- --tail <n> - Show only the last N lines
- --grep <pattern> - Filter lines matching the regex
- --exec-only - Show only execution output (skip DS metadata & script source)
- --full - Auto-paginate to fetch the complete log (handles truncation)
- --no-truncate - Do not truncate long lines (default: cap at 500 chars with note)
- --meta - Show task metadata (state, host, pid, duration) before log
- --limit <n> - Limit number of lines (default 100)
- --skip-lines <n> - Skip first N lines
- -e, --env <name> - Environment name
ds log throughput -t <task-instance> - Analyze SeaTunnel throughput trends from task log
- Auto-paginates the full log, extracts Read/Write row snapshots
- Computes delta trends and average rates (rows/second)
- -e, --env <name> - Environment name

All log output is streamed to stdout — no temporary files are generated.

Diagnose — One-click False-Success Detection

Encapsulates the workflow-instance analysis flow discovered during the openlibing GitCode issue-processing investigation.

ds diagnose instance -p <project> -i <instanceId> - Diagnose a workflow instance
- Lists state, tasks, false-success scan, SUB_WORKFLOW drill-down suggestions
- --limit <n> - Max leaf tasks to inspect (default 20)
- -e, --env <name> - Environment name
ds diagnose task -t <task-instance> - Diagnose a single task instance
- Shows metadata, false-success check, optional log dump
- --show-log - Show last 30 lines
- --show-exec - Show only execution output
- --show-meta - Show full task metadata JSON
- -e, --env <name> - Environment name
ds diagnose status -p <project> - Spot anomalies
- --state <state> - Filter by state (KILL/STOP/SUCCESS/FAILURE)
- --workflow <code> - Filter by workflow definition code
- --limit <n> - Max instances to show (default 30)
- -e, --env <name> - Environment name
- Marks "⚡快速成功(可疑)" if SUCCESS < 5min (possible false success), "🛑被中断" for STOP/KILL

False-success criteria (DS v3.x 假成功 pattern):

DS reports SUCCESS AND
pid=0 (DS plugin didn't fork a real Python process) AND
No execution output in the log
Optionally: duration < 5s (inconsistent with claimed SUCCESS)
Excludes: SUB_WORKFLOW/SUB_PROCESS tasks (pid=0 is normal for these)

Analyze — One-Click Failure Analysis

Use this when you need to answer "why did workflow X fail?" — it automatically chains env → project → instance → tasks → diagnose → log summary in a single call.

ds analyze failure -n <name> - One-click failure analysis
- Smart Instance Search: Uses DolphinScheduler API searchVal for server-side search first, then keyword fuzzy matching, then full pagination fallback. Auto-adds [workflow]/[job] prefix if missing from the name
- SUB_WORKFLOW Drill-Down: Recursively resolves SUB_WORKFLOW/SUB_PROCESS tasks to their leaf tasks. When subWorkflowInstanceId is unavailable (returns -), locates the sub-instance via name keyword matching + time proximity scoring
- False-Success Detection: Identifies tasks with SUCCESS state but pid=0 or duration<5s (excluding SUB_WORKFLOW tasks where pid=0 is normal)
- Log Analysis: Fetches and summarizes error/exception lines from failed and false-success tasks, with connection error handling for unreachable Worker log storage
- Root Cause Summary: Provides categorized root cause, sub-workflow drill-down map, and actionable next steps
- --show-log - Show last 50 lines of failed task log
- --throughput - Analyze SeaTunnel throughput trends for failed tasks
- -e, --env <name> - Environment name

Example:

# Analyze why a specific workflow instance failed in prod
ds analyze failure -n "获取构建和构建机对应关系-20260610021000015" -e prod

# Name without [workflow] prefix also works (auto-detected)
ds analyze failure -n "my-workflow-20260610" -e prod

# With log details and SeaTunnel throughput analysis
ds analyze failure -n "my-workflow-20260610" -e prod --show-log --throughput

Search Strategy (3-layer, fast → comprehensive):

searchVal exact search: Uses full name + [workflow]/[job] prefix variants via API searchVal parameter (fastest, works for instances > 1000)
searchVal keyword search: Extracts keywords from the name, searches each via API, then fuzzy-matches results (handles partial names)
Full pagination fallback: Iterates all instances page-by-page (covers edge cases)

SUB_WORKFLOW Drill-Down details:

Recursively resolves SUB_WORKFLOW/SUB_PROCESS tasks up to depth 5
Cycle prevention via visitedSubIds set
When subWorkflowInstanceId is missing, uses name keyword matching + start-time proximity to locate the sub-instance
Deduplicates leaf tasks by task ID

Investigate — End-to-End Process Tools

Designed for "let AI investigate this" scenarios. Each sub-command chains multiple atomic tools so a single MCP call returns a complete 360° view.

ds investigate workflow -p <project> -w <workflow> — Single workflow 360° overview
- Combines definition + 7-day statistics + top-N longest instances + recent failures
- --days <n> - Look-back window in days (default 7)
- --top-size <n> - Number of top-N instances to show (default 5)
- -e, --env <name> - Environment name
ds investigate task -t <task-instance> — Single task end-to-end deep-dive
- Shows metadata, false-success check, error/exception log dump, throughput trend
- --show-log - Include last 30 log lines
- --show-exec - Execution output only
- --throughput - SeaTunnel throughput analysis
- --show-meta - Full task metadata JSON
- -e, --env <name> - Environment name
ds investigate health-check [-e <env>] — One-shot cluster health probe
- Master nodes + Worker nodes + DB health + queue stats + recent failures
- Output as a single report ready for AI summarization
ds investigate workflow-timeline -p <project> -i <instance> — Gantt timeline + critical path
- Resolves SUB_WORKFLOW tasks, marks the longest path, surfaces bottlenecks
- -e, --env <name> - Environment name

When to use which: | User intent | Command | |---|---| | "Why is workflow X failing?" | ds analyze failure -n X (one-shot root-cause) | | "Give me a 360° view of workflow X" | ds investigate workflow -p P -w X (overview) | | "Drill into this task's log" | ds investigate task -t T (deep-dive) | | "Is the cluster healthy?" | ds investigate health-check (probe) | | "Show the timeline of instance Y" | ds investigate workflow-timeline -p P -i Y (timeline) |

Skill Management

ds skill list - List all available skills
ds skill show <name> - Show full skill content
ds skill playbooks - List all multi-step Playbook skills (if any)
ds skill scenario "<text>" - Match a free-text scenario to a Playbook (returns matches if any)
ds skill install - Install skills into agent directories (interactive, in CWD)
- --all - Install to the default agent (.agents/skills) without prompting
- -a, --agents <list> - Agent key to install for (default: agents)
- --target <dir:agentName> - Custom directory (e.g. ./.trae/skills:trae)
- --simulate - Preview without writing files
ds skill uninstall - Remove installed skills
- Same options as install
ds skill where - Show resolved directories based on current working directory
ds skill summary - Show compact summary of all skills

Global Options

-h, --help - Display help
-V, --version - Display version
-e, --env <name> - Specify environment
--json - Output in JSON format
-v, --verbose - Enable verbose mode
--no-color - Disable colored output
-c, --config <path> - Path to config file
--dry-run - Simulate execution

MCP Server (Model Context Protocol)

ds-cli doubles as a full MCP server, exposing all commands as tools for AI agents (Claude Desktop, Trae, Cursor, VS Code, Codex).

Logging: MCP server logs are automatically saved to ~/.ds/log/mcp-<timestamp>.log. Each startup creates a new log file containing server startup info and runtime errors.

Available MCP Tools (50)

Single-point tools (atomic operations) and process-oriented tools (multi-step orchestrations) are mixed below. Process-oriented tools are marked with ★/★★.

| Category | Tool | Description | |----------|------|-------------| | Environment | ds_env_list | List all configured environments (MCP env → ds CLI env fallback) | | Environment | ds_env_current | Show current environment | | Environment | ds_env_use | Switch environment | | Environment | ds_env_add | Add MCP env (stored in ~/.config/ds-cli/mcp-envs.json) | | Project | ds_project_list | List projects (optional env parameter) | | Project | ds_project_info | Get project metadata (optional env parameter) | | Project | ds_project_create | Create project (optional env parameter) | | Project | ds_project_delete | Delete project (optional env parameter) | | Workflow | ds_workflow_list | List workflow definitions (optional env parameter) | | Workflow | ds_workflow_info | Get workflow details (optional env parameter) | | Workflow | ds_workflow_create | Create workflow definition (optional env parameter) | | Workflow | ds_workflow_release | Publish workflow (optional env parameter) | | Workflow | ds_workflow_delete | Delete workflow definition (optional env parameter) | | Task | ds_task_list | List tasks in a workflow (optional env parameter) | | Task | ds_task_info | Get task definition details (optional env parameter) | | Task | ds_task_create | Create task in a workflow (optional env parameter) | | Task | ds_task_delete | Delete task definition (optional env parameter) | | Schedule | ds_schedule_list | List schedules (optional env parameter) | | Schedule | ds_schedule_create | Create schedule (optional env parameter) | | Schedule | ds_schedule_online | Bring schedule online (optional env parameter) | | Schedule | ds_schedule_offline | Take schedule offline (optional env parameter) | | Run | ds_run_start | Start a workflow instance (optional env parameter) | | Run | ds_run_stop | Stop a running instance (optional env parameter) | | Run | ds_run_pause | Pause a running instance (optional env parameter) | | Run | ds_run_resume | Resume a paused instance (optional env parameter) | | Monitor | ds_monitor_master | View master node status (optional env parameter) | | Monitor | ds_monitor_worker | View worker node status (optional env parameter) | | Monitor | ds_monitor_database | Check database health (optional env parameter) | | Instance | ds_instance_list | List workflow instances with sort/order/date filters (optional env parameter) | | Instance | ds_instance_info | Get instance details (optional env parameter) | | Instance | ds_instance_tasks | List tasks with duration/pid/sub-workflow (optional env parameter) | | Instance | ds_instance_topn | ★ Top-N longest-running instances (optional env parameter) | | Instance | ds_instance_gantt | Gantt-style task timeline (optional env parameter) | | Instance | ds_instance_delete | Delete workflow instance (optional env parameter) | | Log | ds_log_view | View task log with tail/grep/meta/full/exec-only (optional env parameter) | | Log | ds_log_throughput | ★ SeaTunnel throughput trend analysis (optional env parameter) | | Diagnose | ds_diagnose_instance | ★ One-click false-success detection (optional env parameter) | | Diagnose | ds_diagnose_task | Single task diagnosis (optional env parameter) | | Diagnose | ds_diagnose_status | Recent diagnose history with state/workflow filters (optional env parameter) | | Analyze | ds_analyze_failure | ★★ One-click failure analysis: auto-locates instance (searchVal + keyword + pagination), drills down SUB_WORKFLOW to leaf tasks, identifies failed & false-success tasks (excluding SUB_WORKFLOW), summarizes error logs, provides root cause (optional env parameter) | | Investigate | ds_investigate_workflow | ★★ Workflow 360°: definition + 7d stats + top-N + recent failures (optional env parameter) | | Investigate | ds_investigate_task | ★★ Task deep-dive: meta + false-success + log dump + throughput (optional env parameter) | | Investigate | ds_health_check | ★★ Cluster health probe: master + worker + DB + queue + recent failures (optional env parameter) | | Investigate | ds_workflow_timeline | ★★ Gantt timeline + critical path with SUB_WORKFLOW resolution (optional env parameter) | | Datasource | ds_datasource_list | List datasources (optional env parameter) | | Query | ds_query_run | Execute SQL query (optional env parameter) | | Stats | ds_stats_workflow | Workflow statistics with avg/max/min duration (optional env parameter) | | Stats | ds_stats_task | Task statistics per workflow instance (optional env parameter) | | Stats | ds_stats_bottleneck | ★ Bottleneck workflow detection (optional env parameter) | | Skill | ds_skill_list | List available AI Agent skills | | Skill | ds_skill_show | Show skill full content | | Skill | ds_skill_scenario | Match a free-text scenario to a Playbook (returns matches if any) |

Dynamic Recommendations (`recommended_next`)

Every MCP tool response auto-appends a → recommended_next: block so AI agents can chain follow-up tools without losing context. Two sources contribute:

Explicit declaration — defineTool(..., nextSteps) adds curated suggestions to the tool description, visible in tools/list (lets the agent plan ahead).
Heuristic scan — at runtime, the server scans the tool's stdout for keywords (FAIL, False-Success, SeaTunnel, SUB_WORKFLOW, Gantt, etc.) and injects up to 4 relevant follow-up tool calls.

Example (truncated response from ds_analyze_failure):

Root cause: SQL syntax error in task `dwd_user_active` (pid=0, no execution output)
False-Success scan: 1 task(s) flagged
Sub-Workflow drill-down: 1 sub-instance, 3 leaf task(s) inspected

→ recommended_next:
  1. ds_investigate_task (because false-success detected; drill into log + throughput)
  2. ds_log_view (because FAIL detected; fetch last 50 lines of error log)
  3. ds_instance_gantt (because SUB_WORKFLOW drill-down; visualize the timeline)

Multi-Environment Usage in MCP

All business tools (Project/Workflow/Instance/Log/Diagnose/Datasource/Query/Stats) accept an optional env parameter to specify which environment to use:

No env parameter: Uses the default current environment
With env parameter: Uses the specified environment (must exist in your config)

Example usage (Chat interface):

List projects in the beta environment using ds_project_list with env=beta

Example with tool call:

{
  "name": "ds_project_list",
  "arguments": {
    "env": "beta"
  }
}

If you specify an environment that doesn't exist, you'll get an error message listing all available environments. Use ds_env_list to see all configured environments first.

MCP Configuration for AI Clients

All clients support an optional env field to inject MCP env credentials directly, eliminating the need for ~/.ds/config.json.

Claude Desktop

%APPDATA%\Claude\claude_desktop_config.json (Windows) ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)

Basic (no env — falls back to ds CLI ~/.ds/config.json):

{
  "mcpServers": {
    "ds-cli": {
      "command": "npx",
      "args": ["-y", "@openlibing/ds-cli@latest", "mcp", "start"]
    }
  }
}

With MCP env config (credentials inlined, no ~/.ds/config.json needed):

{
  "mcpServers": {
    "ds-cli": {
      "command": "npx",
      "args": ["-y", "@openlibing/ds-cli@latest", "mcp", "start"],
      "env": {
        "DS_MCP_ENVS": "{\"current\":\"beta\",\"envs\":{\"beta\":{\"url\":\"<BETA_IP>:12345\",\"token\":\"<YOUR_TOKEN>\",\"protocol\":\"http\"},\"dev\":{\"url\":\"<DEV_IP>:12345\",\"token\":\"<YOUR_TOKEN>\",\"protocol\":\"http\"}}}"
      }
    }
  }
}

Trae IDE

~/.trae/mcp.json

Basic:

{
  "mcpServers": {
    "ds-cli": {
      "command": "npx",
      "args": ["-y", "@openlibing/ds-cli@latest", "mcp", "start"]
    }
  }
}

With MCP env config:

{
  "mcpServers": {
    "ds-cli": {
      "command": "npx",
      "args": ["-y", "@openlibing/ds-cli@latest", "mcp", "start"],
      "env": {
        "DS_MCP_ENVS": "{\"current\":\"beta\",\"envs\":{\"beta\":{\"url\":\"<BETA_IP>:12345\",\"token\":\"<YOUR_TOKEN>\",\"protocol\":\"http\"},\"dev\":{\"url\":\"<DEV_IP>:12345\",\"token\":\"<YOUR_TOKEN>\",\"protocol\":\"http\"}}}"
      }
    }
  }
}

Cursor

~/.cursor/mcp.json

Basic:

{
  "mcpServers": {
    "ds-cli": {
      "command": "npx",
      "args": ["-y", "@openlibing/ds-cli@latest", "mcp", "start"]
    }
  }
}

With MCP env config:

{
  "mcpServers": {
    "ds-cli": {
      "command": "npx",
      "args": ["-y", "@openlibing/ds-cli@latest", "mcp", "start"],
      "env": {
        "DS_MCP_ENVS": "{\"current\":\"beta\",\"envs\":{\"beta\":{\"url\":\"<BETA_IP>:12345\",\"token\":\"<YOUR_TOKEN>\",\"protocol\":\"http\"},\"dev\":{\"url\":\"<DEV_IP>:12345\",\"token\":\"<YOUR_TOKEN>\",\"protocol\":\"http\"}}}"
      }
    }
  }
}

VS Code (Copilot Chat)

%APPDATA%\Code\User\mcp.json (note: uses servers key)

Basic:

{
  "servers": {
    "ds-cli": {
      "command": "npx",
      "args": ["-y", "@openlibing/ds-cli@latest", "mcp", "start"]
    }
  }
}

With MCP env config:

{
  "servers": {
    "ds-cli": {
      "command": "npx",
      "args": ["-y", "@openlibing/ds-cli@latest", "mcp", "start"],
      "env": {
        "DS_MCP_ENVS": "{\"current\":\"beta\",\"envs\":{\"beta\":{\"url\":\"http://<BETA_IP>:12345\",\"token\":\"<YOUR_TOKEN>\",\"protocol\":\"http\",\"path\":\"\"},\"dev\":{\"url\":\"http://<DEV_IP>:12345\",\"token\":\"<YOUR_TOKEN>\",\"protocol\":\"http\",\"path\":\"\"}}}"
      }
    }
  }
}

Codex CLI

~/.codex/config.toml (TOML format)

Basic:

[mcp_servers.ds-cli]
command = "npx"
args = ["-y", "@openlibing/ds-cli@latest", "mcp", "start"]

With MCP env config:

[mcp_servers.ds-cli]
command = "npx"
args = ["-y", "@openlibing/ds-cli@latest", "mcp", "start"]
env.DS_MCP_ENVS = "{\"current\":\"beta\",\"envs\":{\"beta\":{\"url\":\"<BETA_IP>:12345\",\"token\":\"<YOUR_TOKEN>\",\"protocol\":\"http\"},\"dev\":{\"url\":\"<DEV_IP>:12345\",\"token\":\"<YOUR_TOKEN>\",\"protocol\":\"http\"}}}"

Tip: Generate these configs automatically with ds mcp config (basic) or ds mcp config --save --client <id> (auto-merge), then manually add the env field with your credentials.

MCP Env Config (separate from ds CLI env)

MCP server can manage its own environment config independently from ~/.ds/config.json. This is ideal for CI/CD, shared teams, or when you don't want credentials on disk.

Config priority (high → low):

DS_MCP_ENVS env var (JSON string)
<cwd>/.ds-mcp.json (project-level)
~/.config/ds-cli/mcp-envs.json (user-level)
ds CLI ~/.ds/config.json (fallback)

Config file format:

{
  "current": "beta",
  "envs": {
    "beta": { "url": "<BETA_IP>:12345", "token": "<YOUR_TOKEN>", "protocol": "http" },
    "dev":  { "url": "<DEV_IP>:12345",   "token": "<YOUR_TOKEN>", "protocol": "http" },
    "prod": { "url": "<PROD_HOST>:12345", "token": "<YOUR_TOKEN>", "protocol": "https" }
  }
}

ds mcp env subcommands (full management):

# List all MCP envs
ds mcp env list

# Show current MCP env
ds mcp env current

# Switch MCP env (in-memory)
ds mcp env use beta

# Add or REPLACE an MCP env (upsert — no duplicate warning)
ds mcp env add beta -u <BETA_IP>:12345 -t <YOUR_TOKEN> -p http

# Remove an MCP env
ds mcp env remove beta

# Show raw config JSON
ds mcp env show

# Show config source priority
ds mcp env where

ds mcp config — print / auto-save client configs:

# Print all client configs
ds mcp config

# Print specific client
ds mcp config --client claude-desktop

# Use global `ds` command instead of npx
ds mcp config --client trae --use-global

# Auto-merge into config file
ds mcp config --client claude-desktop --save

MCP CLI Commands

# Start MCP server (stdio mode)
ds mcp start
ds mcp              # same as start

# List all MCP tools
ds mcp tools

# Show MCP env config
ds mcp env list
ds mcp env show
ds mcp env where
ds mcp env add <name> -u <url> -t <token>
ds mcp env use <name>
ds mcp env remove <name>

# Generate client configs
ds mcp config
ds mcp config --client <id>

AI Skills

ds-cli ships with built-in YAML Skills that can be installed into AI agent directories. The default install location is .agents/skills/ (the standard convention used by Anthropic Claude Code). For other agent layouts, pass --target <dir>:<agentName>.

Single-Point Skills (atomic commands)

| Skill | Description | |-------|-------------| | ds-env | Environment management | | ds-project | Project management | | ds-workflow | Workflow management | | ds-workflow-sync | Cross-environment workflow definition sync (export + import with topo order, codeMap rewriting, ONLINE/OFFLINE state-machine, friendly error hints) | | ds-task | Task management | | ds-schedule | Schedule management | | ds-execute | Execution control | | ds-instance | Instance queries | | ds-datasource | Datasource management | | ds-query | SQL queries | | ds-stats | Statistics | | ds-inspect | Interactive instance analysis (env picker → instance lookup → branching by state) |

Installing Skills

Skills are installed into the current working directory (CWD) by default. The default target is <CWD>/.agents/skills/. Run ds skill install from your project directory.

# 1. Show resolved directories
ds skill where

# 2. Interactive install: 仅需输入一个"基础目录"（相对当前项目），
#    脚本会在其下自动创建 skills/ 子目录并生成所有 SKILL.md。
#    例：输入 .agents      → .agents/skills/
#        输入 my-agent     → my-agent/skills/
ds skill install

# 3. Install to the default agent (.agents/skills) without prompting
ds skill install --all

# 4. Install for the default agent explicitly
ds skill install -a agents

# 5. Custom directory (any path + agent name) — use this for trae/opencode/codex
ds skill install --target './.trae/skills:trae'
ds skill install --target './.opencode/skills:opencode'
ds skill install --target './.codex/skills:codex'   # also enables --user for ~/.codex/skills

# 6. Preview only (no files written)
ds skill install --simulate

# 7. Uninstall
ds skill uninstall --all

Programmatic Loading (Node.js)

const { loadAllSkills, getSkillsSummary, loadSkill } = require('@openlibing/ds-cli/src/utils/skill-loader');

// Get all skills as summary text for AI
console.log(getSkillsSummary());

// Load individual skill
const diagnoseSkill = loadSkill('ds-diagnose');
console.log(diagnoseSkill.commands);

Diagnosing False-Success Issues

The false-success pattern is a DS v3.x anomaly where a task reports SUCCESS but the Python/SQL script was never actually executed. Symptoms:

pid=0 (DS plugin failed before forking)
Log contains only the script source code (the second echo of the script body)
duration < 5s (contradicting the claimed success)
Downstream tables not updated

Quick diagnosis flow:

# 1. Find the longest-running instances
ds instance top-n -p <project> -n 10

# 2. Identify bottleneck workflows
ds stats bottleneck -p <project> --days 7 --top 10

# 3. One-click diagnose
ds diagnose instance -p <project> -i 2291

# 4. Drill into a suspicious task
ds diagnose task -t 4877

# 5. Inspect the raw log (with full-log pagination)
ds log view -t 4877 --full --tail 50

# 6. Filter to just execution output
ds log view -t 4877 --exec-only

# 7. Analyze SeaTunnel throughput trends
ds log throughput -t 4877

Common root causes (in order of frequency):

Worker resource contention — multiple parallel tasks saturate a single worker
Python interpreter missing or decrypt-tool.jar failed to load
Working directory permission denied — /tmp/dolphinscheduler/exec/process/<taskId>/
Worker group misconfiguration — workerGroup doesn't exist or has no workers
Task timeout too short

DolphinScheduler v3.x Compatibility

Several API paths changed in DS 3.x. ds-cli auto-falls-back from v3 to v2:

| Resource | v2 path | v3 path (used by ds-cli) | |----------|---------|--------------------------| | Projects | /projects/list-paging | /projects/created-and-authed | | Workflow defs | /workflow-definition/list-paging | /workflow-definition/list | | Workflow tasks | /task-definition/list-by-workflow | /workflow-definition/{code}/tasks | | Workflow instances | /workflow-instances/list-paging | /workflow-instances (no suffix) | | Log query | pageNum=... | pageNo=... (we use pageNo) |

If your DS version is 2.x, ds-cli will still work because the call paths are usually backwards-compatible, but you may see some fields default to - when a v3-only field is missing.

Configuration

Configuration is stored in ~/.ds/config.json (mode 0o600):

{
  "default": "prod",
  "envs": {
    "dev": {
      "url": "127.0.0.1:12345",
      "token": "your-token-here",
      "protocol": "http",
      "timeout": 30000
    },
    "beta": {
      "url": "<beta-ip>:12345",
      "token": "beta-token-here",
      "protocol": "https",
      "timeout": 30000
    },
    "production": {
      "url": "<prod-ip>:12345",
      "token": "prod-token-here",
      "protocol": "http",
      "timeout": 30000
    }
  }
}

Field semantics:

url — IP:PORT only, no http:// or https:// prefix (v0.4.0+). The CLI strips any prefix it finds for forward compatibility.
protocol — http or https (lowercased on write, defaults to http).
host — not stored on disk; built at runtime as ${protocol}://${url}. This eliminates the historic inconsistency where host: http://... and protocol: https could coexist in the same file.
timeout — request timeout in ms (default 30000).

Simplified usage: Only url and token are required for each environment. Add --protocol https when adding the env if you need TLS.

Legacy TOML: ds-cli auto-detects a pre-existing ~/.ds/config.toml on first load, converts it to JSON, and renames the original to config.toml.bak (one-time migration, irreversible). After that the JSON file is the only source of truth.

Architecture

ds-cli is organized as follows:

src/services/api.js — Axios-based HTTP client with token auth, retry/back-off, error mapping (DS100–DS599 codes), and timeout enforcement.
src/services/auth.js — Token-based auth (no username/password).
src/utils/config.js — ~/.ds/config.json I/O with mode 0o600, multi-env management, legacy TOML auto-migration.
src/utils/skill-loader.js — YAML skill + playbook parser with playbook: block awareness, getAllPlaybooks() / matchPlaybook(text).
src/commands/ — One file per command domain (analyze, diagnose, investigate, instance, skill, …). Each command is a commander sub-command module.
src/mcp-server.js — Exposes CLI commands as MCP tools via @modelcontextprotocol/sdk. Supports nextSteps declaration + runtime recommended_next injection.
bin/ds — CLI entry, registers all command modules.
skills/*.yaml — Skill + Playbook definitions consumed by AI agents (installed to .agents/skills/ by default).

License

Apache-2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ds-cli — DolphinScheduler CLI

Features

Installation

Global Install (recommended)

Local Install

From Source

Quick Start

Commands

Environment Management

Authentication

Project

Workflow

Cross-Environment Workflow Sync Example

Task

Schedule

Run / Execution

Instance

Datasource

Query

Statistics

Monitor

Log (Enhanced, no temp files)

Diagnose — One-click False-Success Detection

Analyze — One-Click Failure Analysis

Investigate — End-to-End Process Tools

Skill Management

Global Options

MCP Server (Model Context Protocol)

Available MCP Tools (50)

Dynamic Recommendations (recommended_next)

Multi-Environment Usage in MCP

MCP Configuration for AI Clients

Claude Desktop

Trae IDE

Cursor

VS Code (Copilot Chat)

Codex CLI

MCP Env Config (separate from ds CLI env)

MCP CLI Commands

AI Skills

Single-Point Skills (atomic commands)

Installing Skills

Programmatic Loading (Node.js)

Diagnosing False-Success Issues

DolphinScheduler v3.x Compatibility

Configuration

Architecture

License

Dynamic Recommendations (`recommended_next`)