storycanvas
v0.2.0
Published
Transform books into multimedia content using Google Gemini AI
Maintainers
Readme
StoryCanvas
Transform books and text into multimedia content using Google Gemini AI.
StoryCanvas is an interactive CLI tool that converts text files (TXT, PDF, EPUB) into illustrated videos with AI-generated images, narration, and background music. It leverages the Google Gemini ecosystem including Imagen for image generation, Gemini TTS for narration, and Veo for AI video generation.
Features
- Multiple Input Formats: Support for TXT, PDF, EPUB, and Markdown files
- Project Gutenberg Integration: Search and download classic books directly
- AI Image Generation: Create character and scene illustrations using Imagen 4 or Nano Banana
- TTS Narration: Generate spoken narration with 30+ voice options
- Video Creation: Choose between image slideshow or Veo AI video generation
- Background Music: Mix in royalty-free background music
- YouTube Metadata: Auto-generate titles, descriptions, and tags
Installation
npm install -g storycanvasOr run directly with npx:
npx storycanvasRequirements
- Node.js 22 or later
- Google Gemini API key (Get one here)
Quick Start
Run the setup wizard:
storycanvas onboardThis will guide you through:
- API key configuration
- Model selection
- Output directory setup
Create multimedia from a book:
storycanvas create --file my-book.epubOr download from Project Gutenberg:
storycanvas create --gutenberg 74 # Tom Sawyer
Commands
storycanvas onboard
Interactive setup wizard for first-time configuration. Sets up your API key, preferred models, and output directories.
storycanvas create
Create multimedia content from text.
# Interactive mode
storycanvas create
# From local file
storycanvas create --file book.epub
storycanvas create --file article.pdf
storycanvas create --file story.txt
# From Project Gutenberg
storycanvas create --gutenberg 74
# Specify stages
storycanvas create --file book.txt --stages illustrations,video,music
# Use Veo AI video instead of slideshow
storycanvas create --file book.txt --mode veoOptions:
-f, --file <path>: Path to input file-g, --gutenberg <id>: Project Gutenberg book ID-s, --stages <stages>: Comma-separated stages (illustrations, narration, video, music, metadata)-m, --mode <mode>: Video mode (slideshow or veo)
storycanvas books
Browse and download books from Project Gutenberg.
# Interactive mode
storycanvas books
# Search for books
storycanvas books --search "alice wonderland"
# Download by ID
storycanvas books --download 11
# List downloaded books
storycanvas books --liststorycanvas doctor
Run diagnostics to check your setup.
storycanvas doctorChecks:
- Node.js version
- FFmpeg availability
- API key validity
- Configuration status
storycanvas config
View and manage configuration.
# Show current config
storycanvas config --show
# Edit interactively
storycanvas config --edit
# Reset to defaults
storycanvas config --reset
# Show config file path
storycanvas config --pathConfiguration
Configuration is stored in ~/.storycanvasrc. You can edit it manually or use storycanvas config --edit.
{
"apiKey": "your-gemini-api-key",
"models": {
"text": "gemini-2.5-flash",
"image": "imagen-4.0-fast-generate-001",
"tts": "gemini-2.5-flash-preview-tts",
"video": "veo-3.1-fast"
},
"image": {
"maxCharacterImages": 30,
"maxSceneImages": 50,
"aspectRatio": "9:16",
"personGeneration": "allow_adult"
},
"video": {
"mode": "slideshow",
"fps": 0.5,
"resolution": "1080p"
},
"tts": {
"enabled": true,
"voice": "Kore"
},
"audio": {
"musicVolume": 0.3,
"narrationVolume": 1.0
},
"directories": {
"output": "./storycanvas-output",
"music": "./music",
"books": "./books"
}
}Available Models
Text/Chat
gemini-2.5-flash(default, fast)gemini-2.5-pro(enhanced reasoning)
Image Generation
imagen-4.0-fast-generate-001(default, fast)imagen-4.0-ultra-generate-001(highest quality)imagen-4.0-generate-001(standard)gemini-2.5-flash-image(Nano Banana, native Gemini)gemini-3-pro-image-preview(Nano Banana Pro)
Text-to-Speech
gemini-2.5-flash-preview-tts(default)gemini-2.5-pro-preview-tts
Video
veo-3.1-fast(default, faster)veo-3.1(higher quality)
Pipeline Stages
- Input Processing: Extract text from TXT/PDF/EPUB or download from Gutenberg
- Illustration Generation: Create character and scene images with AI
- Narration: Generate TTS audio from the text
- Video Creation: Combine images into slideshow or generate with Veo
- Background Music: Mix in audio tracks
- Metadata Generation: Create YouTube-ready title, description, and tags
Background Music
Place your royalty-free music files in the ./music directory (or configure a different path). Supported formats: MP3, M4A, WAV, AAC, OGG.
License
MIT
Credits
Built with:
- Google Gemini API
- @clack/prompts for terminal UI
- fluent-ffmpeg for video processing
