storycanvas

v0.2.0

Published

17 days ago

Transform books into multimedia content using Google Gemini AI

0High
0Medium
0Low

happyjamespanda

gemini ai book video illustration tts text-to-speech cli

StoryCanvas

Transform books and text into multimedia content using Google Gemini AI.

StoryCanvas is an interactive CLI tool that converts text files (TXT, PDF, EPUB) into illustrated videos with AI-generated images, narration, and background music. It leverages the Google Gemini ecosystem including Imagen for image generation, Gemini TTS for narration, and Veo for AI video generation.

Features

Multiple Input Formats: Support for TXT, PDF, EPUB, and Markdown files
Project Gutenberg Integration: Search and download classic books directly
AI Image Generation: Create character and scene illustrations using Imagen 4 or Nano Banana
TTS Narration: Generate spoken narration with 30+ voice options
Video Creation: Choose between image slideshow or Veo AI video generation
Background Music: Mix in royalty-free background music
YouTube Metadata: Auto-generate titles, descriptions, and tags

Installation

npm install -g storycanvas

Or run directly with npx:

npx storycanvas

Requirements

Node.js 22 or later
Google Gemini API key (Get one here)

Quick Start

Run the setup wizard:
```
storycanvas onboard
```
This will guide you through:
- API key configuration
- Model selection
- Output directory setup
Create multimedia from a book:
```
storycanvas create --file my-book.epub
```

Or download from Project Gutenberg:

storycanvas create --gutenberg 74  # Tom Sawyer

Commands

`storycanvas onboard`

Interactive setup wizard for first-time configuration. Sets up your API key, preferred models, and output directories.

`storycanvas create`

Create multimedia content from text.

# Interactive mode
storycanvas create

# From local file
storycanvas create --file book.epub
storycanvas create --file article.pdf
storycanvas create --file story.txt

# From Project Gutenberg
storycanvas create --gutenberg 74

# Specify stages
storycanvas create --file book.txt --stages illustrations,video,music

# Use Veo AI video instead of slideshow
storycanvas create --file book.txt --mode veo

Options:

-f, --file <path>: Path to input file
-g, --gutenberg <id>: Project Gutenberg book ID
-s, --stages <stages>: Comma-separated stages (illustrations, narration, video, music, metadata)
-m, --mode <mode>: Video mode (slideshow or veo)

`storycanvas books`

Browse and download books from Project Gutenberg.

# Interactive mode
storycanvas books

# Search for books
storycanvas books --search "alice wonderland"

# Download by ID
storycanvas books --download 11

# List downloaded books
storycanvas books --list

`storycanvas doctor`

Run diagnostics to check your setup.

storycanvas doctor

Checks:

Node.js version
FFmpeg availability
API key validity
Configuration status

`storycanvas config`

View and manage configuration.

# Show current config
storycanvas config --show

# Edit interactively
storycanvas config --edit

# Reset to defaults
storycanvas config --reset

# Show config file path
storycanvas config --path

Configuration

Configuration is stored in ~/.storycanvasrc. You can edit it manually or use storycanvas config --edit.

{
  "apiKey": "your-gemini-api-key",
  "models": {
    "text": "gemini-2.5-flash",
    "image": "imagen-4.0-fast-generate-001",
    "tts": "gemini-2.5-flash-preview-tts",
    "video": "veo-3.1-fast"
  },
  "image": {
    "maxCharacterImages": 30,
    "maxSceneImages": 50,
    "aspectRatio": "9:16",
    "personGeneration": "allow_adult"
  },
  "video": {
    "mode": "slideshow",
    "fps": 0.5,
    "resolution": "1080p"
  },
  "tts": {
    "enabled": true,
    "voice": "Kore"
  },
  "audio": {
    "musicVolume": 0.3,
    "narrationVolume": 1.0
  },
  "directories": {
    "output": "./storycanvas-output",
    "music": "./music",
    "books": "./books"
  }
}

Available Models

Text/Chat

gemini-2.5-flash (default, fast)
gemini-2.5-pro (enhanced reasoning)

Image Generation

imagen-4.0-fast-generate-001 (default, fast)
imagen-4.0-ultra-generate-001 (highest quality)
imagen-4.0-generate-001 (standard)
gemini-2.5-flash-image (Nano Banana, native Gemini)
gemini-3-pro-image-preview (Nano Banana Pro)

Text-to-Speech

gemini-2.5-flash-preview-tts (default)
gemini-2.5-pro-preview-tts

Video

veo-3.1-fast (default, faster)
veo-3.1 (higher quality)

Pipeline Stages

Input Processing: Extract text from TXT/PDF/EPUB or download from Gutenberg
Illustration Generation: Create character and scene images with AI
Narration: Generate TTS audio from the text
Video Creation: Combine images into slideshow or generate with Veo
Background Music: Mix in audio tracks
Metadata Generation: Create YouTube-ready title, description, and tags

Background Music

Place your royalty-free music files in the ./music directory (or configure a different path). Supported formats: MP3, M4A, WAV, AAC, OGG.

License

MIT

Credits

Built with:

Google Gemini API
@clack/prompts for terminal UI
fluent-ffmpeg for video processing

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme