@guanghechen/kit-video
v0.3.4
Published
AI-powered video generation from scenario files
Maintainers
Readme
@guanghechen/kit-video
AI-powered video generation from scenario files. This package provides a complete pipeline for generating presentation videos from text scenarios.
Installation
pnpm add @guanghechen/kit-videoUsage
Full Pipeline (autogen)
# Generate complete video from scenario
kit-video autogen -s /path/to/scenario -o /path/to/output --video
# With PDF and PPTX export
kit-video autogen -s /path/to/scenario -o /path/to/output --video --pdf --pptx
# Dry run (show what would be executed)
kit-video autogen -s /path/to/scenario --dry-runNote: preference / outline / transcript stages currently validate existing files in workspace. They do not generate these files automatically.
Individual Commands
# Prepare workspace
kit-video prepare -s /path/to/scenario -w /path/to/output
# Validate preference
kit-video preference -w /path/to/workspace
# Validate outline
kit-video outline -w /path/to/workspace
# Validate transcript
kit-video transcript -w /path/to/workspace
# Generate images
kit-video image -w /path/to/workspace
# Run OCR
kit-video ocr -w /path/to/workspace
# Generate PDF (full edition)
kit-video pdf -w /path/to/workspace
# Generate PPTX (full edition)
kit-video pptx -w /path/to/workspace
# Generate TTS audio
kit-video tts -w /path/to/workspace
# Align TTS with slides
kit-video tts-align -w /path/to/workspace
# Compose video
kit-video video -w /path/to/workspace --srt --transition fadePipeline Stages
The video generation pipeline consists of the following stages:
| Stage | Description | Output |
| ------------ | ------------------------------------------------ | -------------------------- |
| prepare | Prepare workspace from scenario directory | workspace structure |
| preference | Validate existing presentation preferences | preference.json |
| outline | Validate existing slide outline | outline.json |
| transcript | Validate existing transcript for slides | transcript.md |
| image | Generate slide images | img/slide_N_V.png |
| ocr | Extract text from slide images | ocr_data.json |
| pdf | Generate PDF from images (full edition) | presentation.pdf |
| pptx | Generate PPTX from images (full edition) | presentation.pptx |
| tts | Generate TTS audio from transcript | voice.mp3 |
| tts-align | Align TTS audio with slides | slide_durations.json |
| video | Compose final video from images and audio | video.mp4, video.done |
Scenario Directory Structure
scenario/
├── query.md # Topic/query for generation
├── material/ # (Optional) Reference materials
│ ├── doc1.pdf
│ └── ...
├── preference.json # (Optional) Pre-defined preferences
├── outline.json # (Optional) Pre-generated outline
└── transcript.md # (Optional) Pre-generated transcriptWorkspace Structure
After running the pipeline, the workspace will contain:
workspace/
├── material.md # (Optional) Copied material file
├── material/ # (Optional) Copied material directory
├── query.md # (Optional) Copied query
├── preference.json # Presentation preferences
├── outline.json # Slide outline
├── transcript.md # Narration transcript
├── img/ # Generated images
│ ├── slide_0_1.png
│ ├── slide_1_1.png
│ └── ...
├── ocr_data.json # OCR results
├── voice.mp3 # TTS audio
├── slide_durations.json # Timing information
├── subtitles.srt # (Optional) Subtitles
├── presentation.pdf # (Optional) PDF export
├── presentation.pptx # (Optional) PPTX export
├── video.done # Completion marker
└── video.mp4 # Final videoAutogen Options
| Option | Type | Default | Description |
| ------------------------ | ------- | ------------ | --------------------------------------- |
| -s, --scenario | string | (required) | Path to scenario directory |
| -o, --output | string | auto | Output directory |
| --pdf | boolean | false | Enable PDF generation |
| --pptx | boolean | false | Enable PPTX generation |
| --tts | boolean | false | Enable TTS voice generation |
| --video | boolean | false | Enable video generation (implies --tts) |
| --ocr | boolean | false | Enable OCR text detection |
| --dry-run | boolean | false | Show what stages would run |
| --debug | boolean | false | Enable debug output |
| --image-source | string | gpt-image-1-5 | Image source |
| --image-quality | string | high | Image quality (low/medium/high) |
| --tts-source | string | speech | TTS source (speech/llmapi) |
| --speech-voice | string | - | Azure Speech voice name |
| --transition | string | wipeleft | Transition effect |
| --transition-duration | string | 0.8 | Transition duration in seconds |
| --stage-image-parallel | string | 1 | Image stage parallelism |
| --stage-ocr-parallel | string | 2 | OCR stage parallelism |
| --srt | boolean | false | Generate and burn subtitles |
| --stt | boolean | false | Enable STT for precise timing |
| --query | string | - | Inline query text |
| --query-path | string | - | Query file path |
| --force-<stage> | boolean | false | Force re-run specific stage |
| --only-<stage> | boolean | false | Only run specific stage |
Video Options
| Option | Short | Type | Default | Description |
| ------------------------ | ----- | ------- | -------- | ---------------------------------- |
| -w, --workspace-dir | | string | required | Workspace directory |
| --srt | | boolean | false | Burn subtitles into video |
| --transition | | string | wipeleft | Transition type between slides |
| --transition-duration | | string | 0.8 | Transition duration in seconds |
| --force | | boolean | false | Force regenerate video |
Transition Types
Supported transition types:
none- No transitionfade,fadeblack,fadewhitewipeleft,wiperight,wipeup,wipedownslideleft,slideright,slideup,slidedowncircleopen,circleclosedissolve,pixelize,radial
Requirements
- Node.js >= 24.0.0
- ffmpeg and ffprobe (must be installed and available in PATH)
