bailian-ai-toolkit

v1.0.1

Published

16 days ago

Claude Code Skill - Aliyun Bailian (百炼) AI Toolkit. One-line commands for image/video generation/editing, TTS, ASR, vision, chat, web search, and file upload via DashScope.

🚀 Bailian AI Toolkit

Aliyun Model Studio (DashScope) CLI Toolkit — One-line commands for all AI generation tasks.

bailian-ai-toolkit is a Claude Code / OpenClaw skill that wraps the Aliyun Bailian (百炼) / DashScope AI platform into a single, unified CLI. Generate images, edit videos, synthesize speech, recognize voice, run vision understanding, chat with LLMs, search the web, and upload files — all with one-line commands.

✨ Features

| Category | Capability | Models | |----------|-----------|--------| | 🎨 Image | Text-to-image, image editing, style transfer | qwen-image-2.0 | | 🎬 Video | Text-to-video, image-to-video, video editing | happyhorse-1.0-t2v, happyhorse-1.0-video-edit | | 🗣️ TTS | Text-to-speech with 50+ voices | cosyvoice-v3-flash | | 👂 ASR | Speech recognition, speaker diarization | fun-asr | | 👁️ Vision | Image/video understanding, OCR | qwen-vl-max | | 💬 Chat | Multi-model text generation | qwen3.6-plus, qwen3.6-max, deepseek-v3 | | 🌐 Search | Web search with AI summarization | DashScope MCP | | 📤 Upload | File upload to OSS (48h expiry) | OSS |

📦 Installation

1. Install the CLI

npm install -g bailian-cli

2. Configure API Key

# Get your API key from: https://bailian.console.aliyun.com/
bl auth login --api-key sk-xxxxxxxxxxxx

# Verify
bl auth status --output json

3. Install the Skill (for Claude Code / OpenClaw)

Copy SKILL.md to your skills directory, or use the skill manager.

🚀 Quick Start

# 🎨 Generate an image
bl image generate --prompt "A cyberpunk city at night" --n 4 --out-dir ./out/

# 🎬 Create a video
bl video generate --prompt "Ocean waves at sunset" --download sunset.mp4

# 🗣️ Synthesize speech
bl speech synthesize --text "你好，世界" --voice <voice_id> --out hello.mp3

# 👂 Transcribe audio
bl speech recognize --url ./meeting.mp3 --language zh

# 👁️ Understand an image
bl vision describe --image ./photo.jpg

# 💬 Chat with LLM
bl text chat --message "Explain quantum computing in simple terms"

# 🌐 Search the web
bl search web --query "Latest AI trends 2026"

# 📤 Upload a file
bl file upload ./document.pdf

📖 Detailed Usage

🎨 Image Generation

# Basic generation (4 images, 1:1 aspect ratio)
bl image generate --prompt "一只穿太空服的猫在火星上漫步" --n 4 --out-dir ./out/

# Custom size
bl image generate --prompt "..." --size 16:9 --n 2 --out-dir ./wallpapers/

# Custom output prefix
bl image generate --prompt "..." --n 6 --out-dir ./images/ --out-prefix my-image

# Disable automatic prompt expansion
bl image generate --prompt "..." --no-prompt-extend

✂️ Image Editing

# Background replacement
bl image edit --image ./photo.png --prompt "把背景换成海滩日落"

# Multi-image compositing
bl image edit --image ./a.png --image ./b.png --prompt "把两张图合并成一张拼图"

# Object removal
bl image edit --image ./photo.png --prompt "移除背景中的人物"

# Style transfer
bl image edit --image ./photo.png --prompt "转换为水彩画风格"

🎬 Video Generation

# Text-to-video
bl video generate --prompt "海边日落的延时摄影" --download sunset.mp4

# Image-to-video
bl video generate --image ./photo.jpg --prompt "让画面动起来" --download animated.mp4

# Video editing (style transfer)
bl video edit --video ./input.mp4 --prompt "转换为动漫风格" --download anime.mp4

# Video editing (object replacement)
bl video edit --video ./input.mp4 --prompt "替换衣服" --ref-image ./clothes.png

# Async mode (don't wait for completion)
bl video generate --prompt "..." --no-wait

🗣️ Speech Synthesis (TTS)

# List available voices
bl speech synthesize --list-voices --model cosyvoice-v3-flash

# Chinese TTS
bl speech synthesize --text "你好，欢迎使用阿里云百炼" --voice <voice_id> --out output.mp3

# English TTS
bl speech synthesize --text "Hello, welcome to Alibaba Cloud" --voice <voice_id> --language en --out output.mp3

👂 Speech Recognition (ASR)

# Basic recognition (local file)
bl speech recognize --url ./meeting.mp3 --language zh

# Speaker diarization
bl speech recognize --url ./meeting.wav --diarization --speaker-count 3

# URL-based recognition
bl speech recognize --url https://example.com/audio.mp3

👁️ Vision Understanding

# Describe an image
bl vision describe --image ./photo.jpg

# Ask a specific question about an image
bl vision describe --image ./screenshot.png --prompt "这张截图中有哪些按钮？"

# Video understanding
bl vision describe --video ./clip.mp4 --prompt "总结视频内容"

💬 Text Chat

# Basic chat
bl text chat --message "用中文写一首诗"

# Multi-turn conversation
bl text chat --message "..." --conversation-id <id>

# Specify model
bl text chat --message "..." --model qwen3.6-max

🌐 Web Search

bl search web --query "2026年AI发展趋势" --count 10

📤 File Upload

bl file upload ./document.pdf
# Returns an OSS URL valid for 48 hours

🛒 E-Commerce Product Image Template

# Front view - white background
bl image generate `
  --prompt "纯黑色男士短袖T恤，亚马逊电商主图，白色背景，专业产品摄影，正面平铺展示" `
  --n 2 --size 1:1 --out-dir ./product/ --out-prefix front

# Back view
bl image generate `
  --prompt "纯黑色男士短袖T恤，亚马逊电商主图，白色背景，背面平铺展示" `
  --n 2 --size 1:1 --out-dir ./product/ --out-prefix back

# Model wearing
bl image generate `
  --prompt "年轻男模穿着纯黑色T恤，亚马逊电商主图，白色背景，专业时尚摄影" `
  --n 2 --size 1:1 --out-dir ./product/ --out-prefix model

# Detail close-up
bl image generate `
  --prompt "纯黑色男士T恤领口细节特写，面料纹理清晰，微距摄影" `
  --n 2 --size 1:1 --out-dir ./product/ --out-prefix detail

# Lifestyle scene
bl image generate `
  --prompt "年轻男士穿着纯黑色T恤在户外咖啡馆，自然光线，生活方式摄影" `
  --n 2 --size 1:1 --out-dir ./product/ --out-prefix lifestyle

⚙️ Global Options

| Flag | Description | |------|-------------| | --api-key sk-xxx | Override API Key for this command | | --region cn/us/intl | Region (default: cn) | | --output json/text | Output format | | --quiet | Minimal output | | --non-interactive | Disable interactive prompts | | --help | Show command help |

🔑 Authentication

# Login with API Key
bl auth login --api-key sk-xxxxxxxxxxxx

# Check status
bl auth status --output json

# Update CLI
bl update

🔑 Get your API Key: Aliyun Bailian Console

📋 Key Rules

Local files work directly — Pass local paths, the CLI auto-uploads for you
Never hardcode API keys — Use --api-key or environment variables
Check --help first — Every command has comprehensive built-in help
Image generation expands prompts — Use --no-prompt-extend to disable
Video generation is async — CLI waits by default; use --no-wait for fire-and-forget

📁 File Structure

bailian-ai-toolkit/
├── SKILL.md              # Claude Code / OpenClaw skill definition
├── README.md             # This file
└── reference/            # Full command reference (in bailian-cli skill)

🤝 Contributing

Issues and PRs welcome! This is a skill wrapper — the core CLI is at bailian-cli.

📄 License

Built with ❤️ for the AI developer community