@sogni-ai/sogni-creative-agent-skill
v2.1.3
Published
Sogni Creative Agent Skill: agent skill and CLI for Sogni AI image and video generation.
Downloads
822
Readme
Sogni Creative Agent Skill: Image & Video Generation for Agents
Sogni Creative Agent Skill gives AI agent runtimes such as Claude Code, OpenClaw, Hermes Agent, Manus AI, and more — image generation, video generation, and creative-media tools powered by Sogni AI's decentralized GPU network.
Drop it into the setup you already have:
- as a standalone Node.js CLI
- as a skill source for Hermes Agent, Manus AI, and other agent frameworks
- as an OpenClaw plugin
For install requests, use the CLI plus skill setup by default.
With Sogni Creative Agent Skill, an agent can:
- generate images from prompts
- edit and restyle existing images
- create videos from text, images, audio, or reference video
- save personas, preferences, and last-render state across sessions
- check balances, list models, and refine previous results
Quick Start
- Create Sogni credentials once. See Setup.
- Install the command-line tool:
npm install -g @sogni-ai/sogni-creative-agent-skill@latest
sogni-agent --version- Point your agent/runtime at this repository's
SKILL.md.
Then ask your agent to do something simple, for example:
- "Generate an image of a sunset over mountains"
- "Edit this image to add a rainbow"
- "Make a video of a cat playing piano"
- "Turn my selfie into James Bond using photobooth"
- "Refine the last image at higher quality"
Installation
Default install behavior: when someone asks to install Sogni Creative Agent Skill, install the command-line tool and skill source.
npm install -g @sogni-ai/sogni-creative-agent-skill@latest
sogni-agent --versionThen point the agent/runtime at this repository's SKILL.md.
Agent-Safe Upgrade
When upgrading from inside an agent runtime, prefer direct package-manager or existing-checkout commands. Avoid asking the agent to build a clone-or-pull shell bootstrap script with set -e, bash -c, sh -c, or an inline repository URL; some sandboxes correctly route those through approval.
For the CLI:
npm install -g @sogni-ai/sogni-creative-agent-skill@latest
sogni-agent --versionFor an existing local checkout:
DEST="$HOME/Documents/git/sogni/sogni-creative-agent-skill"
git -C "$DEST" pull --ff-only
npm --prefix "$DEST" installIf the checkout is missing, use the npm install path above or explicitly approve a clone.
OpenClaw Plugin
For the published plugin:
openclaw plugins install sogni-creative-agent-skillThe installed plugin loads its behavior from SKILL.md via openclaw.plugin.json.
For a local checkout that you want to update continuously, link the minimal OpenClaw surface instead of the repository root:
cd /path/to/sogni-creative-agent-skill
npm install
npm link
npm run openclaw:sync
openclaw plugins install -l "$PWD/.openclaw-link"
openclaw gateway restartTo update that linked install later:
cd /path/to/sogni-creative-agent-skill
git pull --ff-only
npm install
npm link
npm run openclaw:sync
openclaw gateway restartDo not run openclaw plugins install -l "$PWD" from the repository root. The root contains development tests that use child_process, and OpenClaw correctly blocks those during plugin safety scanning. The generated .openclaw-link/ directory is only for OpenClaw; Hermes, Manus, and other skill-based agents should continue using the root SKILL.md.
Hermes Agent / Manus / Other Frameworks
Point the agent to this repository's SKILL.md for behavior guidance and llm.txt for install/setup help. By default, the agent should invoke the globally installed sogni-agent CLI.
Manual Installation
gh repo clone Sogni-AI/sogni-creative-agent-skill
cd sogni-creative-agent-skill
npm installMaintainer Runtime Sync
This public skill keeps CLI/runtime glue here, but Sogni model routing, video workflow defaults, quality tiers, and prompt guardrails are generated from the private sogni-creative-agent repo. With both repos checked out as siblings, refresh the generated runtime before publishing:
npm run sync:creative-agent-runtimenpm test runs npm run check:creative-agent-runtime first, which regenerates this file and fails if it differs from the committed copy.
The generated file is committed at generated/creative-agent-runtime.mjs so public installs do not need access to the private repo.
Advanced OpenClaw Config
When loaded through OpenClaw, Sogni Creative Agent Skill reads plugin defaults from OpenClaw config. CLI flags always override those defaults.
The supported config shape is defined in openclaw.plugin.json. Common overrides include default models, video workflow models, token type, seed strategy, timeouts, and media paths. If your OpenClaw config lives elsewhere, set OPENCLAW_CONFIG_PATH.
Setup
- Create a Sogni account at https://app.sogni.ai/
- Create credentials file:
mkdir -p ~/.config/sogni
cat > ~/.config/sogni/credentials << 'EOF'
SOGNI_API_KEY=your_api_key
# or:
# SOGNI_USERNAME=your_username
# SOGNI_PASSWORD=your_password
EOF
chmod 600 ~/.config/sogni/credentialsYou can also skip the file and set SOGNI_API_KEY, or SOGNI_USERNAME + SOGNI_PASSWORD, in your environment.
Filesystem Paths and Overrides
Defaults live under ~/.config/sogni/ for credentials, last-render metadata, personas, memories, and personality. Advanced path overrides are available through SOGNI_CREDENTIALS_PATH, SOGNI_LAST_RENDER_PATH, SOGNI_MEDIA_INBOUND_DIR, and OPENCLAW_CONFIG_PATH.
Usage
# Image generation
sogni-agent -Q hq -o dragon.png "a dragon eating tacos"
# Edit an image
sogni-agent -c subject.jpg "add a neon cyberpunk glow"
# Photobooth face transfer
sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
# Text-to-video (t2v)
sogni-agent --video "A narrator says \"welcome to the story\" as ocean waves crash"
# Short-side targeting preserves the current shape without forcing landscape
sogni-agent --video --target-resolution 768 \
"A calm cinematic shot of lanterns drifting across a night lake"
# Seedance 2.0 explicit aliases (4-15s vendor video path)
sogni-agent --video -m seedance2 --duration 8 \
"A polished product reveal with native ambient sound"
# Seedance multimodal context with public HTTPS references
sogni-agent --video -m seedance2 --workflow t2v \
--ref https://cdn.example.com/product.png \
--ref-video https://cdn.example.com/motion.mp4 \
--ref-audio https://cdn.example.com/music.m4a \
"Use @Image1 for product identity, @Video1 for camera movement, and @Audio1 for music rhythm"
# Image-to-video (i2v)
sogni-agent --video --ref cat.jpg "gentle camera pan"
# Image+audio-to-video (auto-routes to LTX 2.3 ia2v)
sogni-agent --video --ref cover.jpg --ref-audio song.mp3 \
"music video with synchronized motion"
# Persona or voice identity with LTX native audio
sogni-agent --video --reference-audio-identity voice.webm \
"NARRATOR: \"This is my voice.\""
# Segment a source video, then stitch clips locally with an external soundtrack
sogni-agent --video --workflow v2v --ref-video dance.mp4 \
--video-start 10 --duration 8 --controlnet-name pose -o /tmp/clip-2.mp4 \
"robot dancing"
sogni-agent --concat-videos /tmp/final.mp4 /tmp/clip-1.mp4 /tmp/clip-2.mp4 \
--concat-audio song.mp3 --concat-audio-start 0
# Balances and help
sogni-agent --balance
sogni-agent --helpFor local multi-clip workflows, prefer the built-in FFmpeg wrappers over raw shell commands. --video-start, --audio-start, and --audio-duration let you generate focused segments, while --concat-videos can stitch them and optionally mux a single soundtrack with --concat-audio.
V2V defaults mirror the Sogni Chat workflow tuning: canny, pose, and depth use ControlNet strength 0.85 with detailer assist, while detailer uses strength 1.0. Use -m seedance2-v2v for Seedance V2V without ControlNet. Seedance also accepts public HTTPS image, video, and audio references; audio references must be paired with an image or video reference.
LTX-2.3 Prompting Guide
When you use ltx23-22b-fp8_t2v_distilled, do not feed it short tag prompts like "cinematic drone shot over tropical cliffs". LTX-2.3 renders more reliably from a dense natural-language scene description.
- Write one unbroken paragraph with no line breaks, bullets, headers, or tag blocks.
- Use 4-8 flowing present-tense sentences describing one continuous shot, not a montage.
- Start with shot scale and scene identity, then cover environment, time of day, textures, and named light sources.
- Keep characters and objects concrete and stable. Describe one main action thread from start to finish.
- If the user wants dialogue, include the exact spoken words in double quotes with the speaker and delivery identified inline.
- Express mood through visible behavior, motion, and sound cues instead of vague adjectives.
- Use positive phrasing. Avoid script formatting, negative prompts, on-screen text/logo requests, and generic filler words like "beautiful" or "nice".
- Match scene density to clip length. For the default short clips, describe one main beat rather than several unrelated actions.
Example rewrite:
User ask: "make a 4k video of a woman in a neon alley"
LTX-2.3 prompt: "A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood."Photobooth (Face Transfer)
Generate stylized portraits from a face photo using InstantID ControlNet:
sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
sogni-agent --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"Uses SDXL Turbo (coreml-sogniXLturbo_alpha1_ad) at 1024x1024 by default. The face image is passed via --ref and styled according to the prompt. Cannot be combined with --video or -c/--context.
Multi-angle mode auto-builds the <sks> prompt and applies the multiple_angles LoRA.
--angles-360-video generates i2v clips between consecutive angles (including last→first) and concatenates them with ffmpeg for a seamless loop.
--balance / --balances does not require a prompt and exits after printing current SPARK and SOGNI balances.
Video Sizing Rules (Aspect Ratios)
- WAN models use dimensions divisible by 16, min 480px, max 1536px.
- LTX family models (
ltx2-*,ltx23-*) use dimensions divisible by 64. The current wrapper caps non-WAN video dimensions at 2048px on the long side. - Seedance runs at fixed 24fps and supports 4-15s durations. Other default/WAN video paths support up to 10s; LTX and WAN animate workflows support up to 20s.
- The script auto-normalizes video sizes to satisfy those constraints.
- Use
--target-resolution <px>for bare resolution requests such as "720p" when the user did not specify exact pixels. It targets the short side and preserves the inherited aspect ratio. - For i2v (and any workflow using
--ref/--ref-end), the client wrapper resizes the reference image with a strict aspect-fit (fit: inside) and then uses the resized reference dimensions as the final video size. Because that resize uses rounding, a “valid” requested size can still produce an invalid final size (example:1024x1536requested, but ref becomes1024x1535). sogni-agentdetects this for local refs and will auto-adjust the requested size to a nearby safe size so the resized reference matches the model divisor.- If you want the script to fail instead of auto-adjusting, pass
--strict-sizeand it will print a suggested size.
Error Reporting
Failures use a non-zero exit code and human-readable stderr. Add --json when an agent needs structured success/error output.
Options
Run sogni-agent --help for the complete CLI. These are the options most agents should reach for first:
| Option | Use |
|--------|-----|
| -Q fast|hq|pro | Pick image quality without memorizing model IDs |
| -o <path> | Save output locally |
| -c <path> | Provide image context for edits |
| --video | Generate video instead of image |
| --ref, --ref-audio, --ref-video | Provide image/audio/video references; Seedance HTTPS references are forwarded as URL context |
| --target-resolution <px> | Target the short side while preserving aspect ratio |
| --workflow <type> | Force t2v, i2v, s2v, ia2v, a2v, v2v, or animate workflows |
| --persona <name> | Use a saved persona reference |
| --concat-videos <out> <clips...> | Stitch clips locally with FFmpeg |
| --json | Return structured output for agents |
Quality Presets
Instead of remembering model IDs, use --quality / -Q to auto-select the right model, steps, and dimensions:
| Preset | Model | Steps | Size | Speed |
|--------|-------|-------|------|-------|
| fast | z_image_turbo_bf16 | 8 | 512x512 | ~5-10s |
| hq | z_image_turbo_bf16 | default | 768x768 | ~10-15s |
| pro | flux2_dev_fp8 | 40 | 1024x1024 | ~2min |
Explicit --model overrides the quality preset's model. Explicit -w/-h overrides dimensions.
Dynamic Prompt Variations
Generate diverse images in a single call using {option1|option2|option3} syntax:
# Generates 3 images: "a red car", "a blue car", "a green car"
sogni-agent -n 3 "a {red|blue|green} car"
# Multiple variation groups cycle independently
sogni-agent -n 4 "a {cat|dog} in a {garden|kitchen}"
# → "a cat in a garden", "a dog in a kitchen", "a cat in a garden", "a dog in a kitchen"Options cycle sequentially per image. Without {...} syntax, -n generates multiple images with the same prompt as before.
Token Auto-Fallback
Use --token-type auto to automatically retry with SOGNI tokens if SPARK balance is insufficient:
sogni-agent --token-type auto "a dragon eating tacos"This tries SPARK first (free daily tokens), then falls back to SOGNI if the balance is too low.
Personas
Named people with saved reference photos and optional voice clips for identity-preserving generation:
# Add a persona
sogni-agent --persona-add "Mark" --ref face.jpg --relationship self --description "30s male, brown hair"
# Add with voice clip for video voice cloning
sogni-agent --persona-add "Sarah" --ref sarah.jpg --relationship partner --voice-clip voice.webm
# Generate an image using a persona (auto-injects photo as context)
sogni-agent --persona "Mark" -o hero.png "superhero in dramatic lighting"
# Generate video using a persona photo plus saved voice identity
sogni-agent --video --persona "Sarah" "SARAH: \"This is my voice.\""
# List / remove
sogni-agent --persona-list
sogni-agent --persona-remove "Mark"Personas are stored at ~/.config/sogni/personas/. Pronouns like "me"/"myself" auto-resolve to the self persona. "my wife" resolves to partner, etc.
Memory (Persistent Preferences)
Save preferences that agents respect across sessions:
sogni-agent --memory-set preferred_style "watercolor and soft lighting"
sogni-agent --memory-set aspect_ratio "16:9"
sogni-agent --memory-list
sogni-agent --memory-remove preferred_styleStored at ~/.config/sogni/memories.json.
Personality (Custom Agent Instructions)
Set how the agent should behave:
sogni-agent --personality-set "Be concise, always use cinematic lighting"
sogni-agent --personality-get
sogni-agent --personality-clearStored at ~/.config/sogni/personality.txt.
Models
Prefer -Q fast|hq|pro for images and automatic workflow routing for video. Only pass -m when you need a specific model family.
| Need | Recommended model or alias |
|------|----------------------------|
| Default images | z_image_turbo_bf16 |
| Highest quality images | flux2_dev_fp8 or -Q pro |
| Image editing | qwen_image_edit_2511_fp8_lightning |
| Photobooth face transfer | coreml-sogniXLturbo_alpha1_ad |
| Text-to-video with native dialogue/audio | ltx23-22b-fp8_t2v_distilled |
| Image+audio-to-video | ltx23-22b-fp8_ia2v_distilled |
| Audio-to-video | ltx23-22b-fp8_a2v_distilled |
| Video-to-video with ControlNet | ltx23-22b-fp8_v2v_distilled |
| Seedance text-to-video | seedance2 or seedance2-fast |
| Seedance video-to-video without ControlNet | seedance2-v2v |
| Face lip-sync with uploaded audio | wan_v2.2-14b-fp8_s2v_lightx2v |
License
MIT
