doubao-ai-toolkit

v1.0.1

Published

16 days ago

Claude Code Skill - ByteDance Doubao (豆包) AI Toolkit. One-line commands for image/video generation, TTS, ASR, chat, search, and embeddings via Volcengine Ark.

🚀 Doubao AI Toolkit

ByteDance Volcengine Ark AI Toolkit — One-line commands for image generation, video creation, TTS, ASR, chat, search, and embeddings.

doubao-ai-toolkit is a Claude Code / OpenClaw skill that wraps ByteDance's Volcengine Ark (火山方舟) AI platform into a simple, unified CLI. Powered by Doubao (豆包), Seedream, and Seedance models — generate images, create videos, synthesize speech, recognize voice, chat with LLMs, search the web, and compute embeddings, all with one-line commands.

✨ Features

| Category | Capability | Models | |----------|-----------|--------| | 🎨 Image | Text-to-image, image-to-image, group generation | doubao-seedream-4-0 | | 🎬 Video | Text-to-video, image-to-video | doubao-seedance-1-0-pro, -lite, -fast | | 🗣️ TTS | Text-to-speech with 4+ Chinese voices | zh_female_xiaohe, zh_male_xiaoqiu | | 👂 ASR | Speech recognition (URL + local files) | fun-asr | | 💬 Chat | Multi-model text generation (streaming) | doubao-seed-1-8, doubao-pro-32k | | 🌐 Search | Web + image search | - | | 🧮 Embedding | Text, image, and video embeddings | - |

📦 Installation

1. Install the CLIs

# Core CLI (image, video, TTS, ASR, chat, search, embedding)
npm install -g coze-coding-dev-sdk

# Image generation specialist (Seedream 4.0 — more control)
npm install -g seedream-ark

# Video generation specialist (Seedance 2.0)
npm install -g seedance

2. Configure API Key

# Windows PowerShell
$env:ARK_API_KEY = "your-api-key-here"

# Linux / macOS
export ARK_API_KEY="your-api-key-here"

🔑 Get your API Key: Volcengine Ark Console

3. Install the Skill (for Claude Code / OpenClaw)

Copy SKILL.md to your skills directory, or use the skill manager.

🚀 Quick Start

# 🎨 Generate an image
coze-coding-ai image -p "A cat astronaut on Mars" -o cat.png

# 🎬 Create a video
coze-coding-ai video -p "Ocean waves in slow motion" -d 5 -o wave.json

# 🗣️ Synthesize speech
coze-coding-ai tts -t "你好，欢迎使用豆包"

# 👂 Transcribe audio
coze-coding-ai asr -f ./meeting.mp3

# 💬 Chat with LLM
coze-coding-ai chat -p "Explain quantum computing"

# 🌐 Search the web
coze-coding-ai search -q "Latest AI trends 2026" --count 10

# 🧮 Compute embeddings
coze-coding-ai embedding -t "AI is transforming the world" -d 1024

📖 Detailed Usage

🎨 Image Generation

coze-coding-ai (simplest)

# Basic text-to-image
coze-coding-ai image -p "一只穿太空服的猫在火星上漫步" -o cat.png

# 2K high-res output
coze-coding-ai image \
  -p "纯黑色男士短袖T恤，亚马逊电商主图，纯白背景" \
  --size 2K \
  -o tshirt.png

# 4K ultra-high-res
coze-coding-ai image \
  -p "山水风景画，中国水墨风格，云雾缭绕" \
  --size 4K \
  -o landscape.png

seedream (advanced control)

# Single image
seedream generate \
  --prompt "未来城市天际线，赛博朋克风格" \
  --size 4K \
  --output ./generated/

# Group generation (1–15 images)
seedream generate \
  --prompt "同一只白色猫咪的9种不同表情和姿态" \
  --group \
  --max-images 9 \
  --size 2K \
  --output ./cats/

# Image-to-image (reference image)
seedream generate \
  --prompt "将这张照片转换为油画风格" \
  --image ./photo.jpg \
  --size 2K \
  --output ./styled/

# Custom dimensions
seedream generate \
  --prompt "手机壁纸，极简风格" \
  --size 1080x1920 \
  --output ./wallpapers/

# Dry run (preview only)
seedream generate --prompt "test" --dry-run

🎬 Video Generation

# Text-to-video (5 seconds)
coze-coding-ai video \
  -p "海浪拍打礁石，慢动作，电影质感" \
  -d 5 \
  -o wave.json

# Text-to-video (10 seconds, custom resolution)
coze-coding-ai video \
  -p "城市夜景延时摄影，车流光轨，4K" \
  -d 10 \
  -s 1920x1080 \
  -o city_night.json

# Image-to-video
coze-coding-ai video \
  -p "让画面中的人物微笑并眨眼" \
  -i https://example.com/portrait.jpg \
  -d 5 \
  -o animate.json

# Fixed camera + no watermark
coze-coding-ai video \
  -p "产品360度旋转展示" \
  --camerafixed \
  --no-watermark \
  -d 5

# Custom model
coze-coding-ai video \
  -p "科幻场景" \
  --model doubao-seedance-1-0-pro-fast-251015 \
  -d 5

# Async callback mode
coze-coding-ai video \
  -p "..." \
  --callback-url https://your-server.com/callback \
  -d 10

Video Model Selection:

| Model ID | Type | |----------|------| | doubao-seedance-1-0-pro-fast-251015 | Fast (default) | | doubao-seedance-1-0-pro-251015 | High quality | | doubao-seedance-1-0-lite-t2v-250428 | Lite text-to-video | | doubao-seedance-1-0-lite-i2v-250428 | Lite image-to-video |

🗣️ Speech Synthesis (TTS)

# Basic Chinese TTS
coze-coding-ai tts -t "你好，欢迎使用豆包语音合成"

# Specify speaker
coze-coding-ai tts \
  -t "今天天气真不错，适合出去走走" \
  --speaker zh_female_xiaohe_uranus_bigtts

# Long text
coze-coding-ai tts \
  -t "春眠不觉晓，处处闻啼鸟。夜来风雨声，花落知多少。"

Available Speakers:

| Speaker ID | Description | |-----------|-------------| | zh_female_xiaohe_uranus_bigtts | Female - Xiaohe (default) | | zh_male_xiaoqiu_uranus_bigtts | Male - Xiaoqiu | | zh_female_qingxin_uranus_bigtts | Female - Qingxin | | zh_female_shuangkuai_uranus_bigtts | Female - Shuankuai |

👂 Speech Recognition (ASR)

# URL-based
coze-coding-ai asr -u https://example.com/audio.mp3

# Local file
coze-coding-ai asr -f ./meeting.mp3

# Long audio
coze-coding-ai asr -f ./lecture.wav

# Verbose logging
coze-coding-ai asr -f ./audio.mp3 --verbose

💬 Text Chat

# Basic chat
coze-coding-ai chat -p "用中文写一首关于春天的诗"

# With system prompt
coze-coding-ai chat \
  -s "你是一个专业的技术文档撰写助手" \
  -p "帮我写一段 REST API 文档"

# Custom model + temperature
coze-coding-ai chat \
  -p "解释量子计算的基本原理" \
  --model doubao-seed-1-8-251228 \
  --temperature 0.3

# Streaming output
coze-coding-ai chat \
  -p "讲一个关于AI的短故事" \
  --stream

Available Chat Models:

doubao-seed-1-8-251228 (default) — Doubao Seed 1.8
doubao-pro-32k-241215 — Doubao Pro 32K
doubao-lite-32k-241215 — Doubao Lite 32K

🌐 Web Search

# Web search
coze-coding-ai search -q "2026年最新AI技术趋势" --count 10

# Image search
coze-coding-ai search \
  -q "埃菲尔铁塔" \
  --type image \
  --count 5

# Custom count
coze-coding-ai search -q "今天天气" --type web --count 3

🧮 Embeddings

# Text embedding
coze-coding-ai embedding -t "人工智能正在改变世界" -d 1024

# Multiple texts
coze-coding-ai embedding \
  -t "第一段文字" \
  -t "第二段文字" \
  -d 1024 \
  -o embeddings.json

# Image embedding
coze-coding-ai embedding --image-url https://example.com/photo.jpg -d 1024

# Video embedding
coze-coding-ai embedding --video-url https://example.com/video.mp4 -d 1024

🛒 E-Commerce Product Image Template

# Set API Key
$env:ARK_API_KEY = "your-api-key"

# White background - front flat lay
coze-coding-ai image \
  -p "纯黑色男士短袖T恤，亚马逊电商主图，纯白背景，正面平铺展示，圆领设计，高级面料质感，专业商业产品摄影，影棚布光" \
  --size 2K \
  -o ./tshirt-front.png

# White background - back flat lay
coze-coding-ai image \
  -p "纯黑色男士短袖T恤，亚马逊电商主图，纯白背景，背面平铺展示，圆领后领设计，高级面料质感，专业产品摄影" \
  --size 2K \
  -o ./tshirt-back.png

# Model wearing
coze-coding-ai image \
  -p "年轻亚洲男模穿着纯黑色圆领短袖T恤，亚马逊电商主图，纯白背景，正面全身展示，专业时尚摄影，自然站姿" \
  --size 2K \
  -o ./tshirt-model.png

# Detail close-up
coze-coding-ai image \
  -p "纯黑色男士T恤领口细节特写，面料纹理清晰可见，亚马逊电商产品图，微距摄影，专业商业摄影" \
  --size 2K \
  -o ./tshirt-detail.png

# Lifestyle scene
coze-coding-ai image \
  -p "年轻男士穿着纯黑色T恤在户外咖啡馆，自然光线，生活方式摄影，亚马逊电商场景图，休闲时尚" \
  --size 2K \
  -o ./tshirt-lifestyle.png

# Batch group generation (seedream)
seedream generate \
  --prompt "纯黑色男士短袖T恤的6种不同角度产品展示，亚马逊电商主图，纯白背景，专业摄影" \
  --group \
  --max-images 6 \
  --size 2K \
  --output ./product-shots/

🔑 Authentication

Three methods, listed by priority:

# Method 1: Environment variable (recommended)
export ARK_API_KEY="your-api-key-here"

# Method 2: Command-line flag (seedream)
seedream generate --api-key "your-key" --prompt "..."

# Method 3: HTTP Header (coze-coding-ai)
coze-coding-ai image -p "..." -H "Authorization: Bearer your-key"

| Variable | Purpose | Applies To | |----------|---------|------------| | ARK_API_KEY | Volcengine Ark API Key | seedream / coze-coding-ai |

📌 coze-coding-ai also supports -H "Authorization: Bearer <key>" without relying on environment variables.

📋 Key Rules

API Key security — Use environment variables, never hardcode in scripts
Image generation — Default model is doubao-seedream-4-0, supports 2K/4K output
Video generation is synchronous — coze-coding-ai video waits for completion
TTS needs no output path — Audio is returned via API response
ASR supports local files — -f flag auto-encodes to base64 for upload
Search has web/image modes — Switch with --type
Use --help — Every subcommand has built-in documentation

📁 File Structure

doubao-ai-toolkit/
├── SKILL.md              # Claude Code / OpenClaw skill definition
├── README.md             # This file

🆚 Comparison: doubao-ai-toolkit vs bailian-ai-toolkit

| Feature | doubao-ai-toolkit (豆包) | bailian-ai-toolkit (百炼) | |---------|--------------------------|---------------------------| | Platform | ByteDance Volcengine Ark | Alibaba Cloud DashScope | | Image Gen | ✅ Seedream 4.0 (2K/4K) | ✅ Qwen Image 2.0 | | Video Gen | ✅ Seedance 1.0 Pro | ✅ HappyHorse 1.0 | | TTS | ✅ 4+ Chinese voices | ✅ 50+ voices, multi-language | | ASR | ✅ fun-asr | ✅ fun-asr + diarization | | Vision | ❌ | ✅ Qwen-VL-Max | | Chat | ✅ Doubao Seed 1.8 | ✅ Qwen 3.6 + DeepSeek | | Search | ✅ Web + Image | ✅ Web | | Embedding | ✅ Text + Image + Video | ❌ | | File Upload | ❌ | ✅ OSS (48h) | | Auth | ARK_API_KEY env var | bl auth login --api-key |

🤝 Contributing

Issues and PRs welcome! This is a skill wrapper — the core CLIs are:

📄 License

Built with ❤️ for the AI developer community