@vindepemarte/vee

v0.1.3

Published

9 days ago

Local-first AI video editing engine with OpenRouter/local/CLI model routing

0High
0Medium
0Low

vindepemarte

video ai editing local-first openrouter ollama

vee — Video Editing Engine

A local-first, AI-driven video production suite for macOS (Apple Silicon). Drop a raw screen recording (plus optional camera video and microphone track) into a folder — get back a polished, publish-ready YouTube video, vertical short-form clips, and a timestamped markdown transcript. The default LLM path is fully local; .env can route editorial reasoning through OpenRouter or the logged-in Codex/Claude CLIs when you want stronger models.

projects/my-tutorial_undone/        ──►  projects/my-tutorial_done/
├── screen.mp4   (required)              └── output/
├── camera.mp4   (optional)                  ├── final_youtube.mp4
├── voice.mp3    (optional)                  ├── shorts/clip_01.mp4 …
└── notes.md     (optional)                  ├── edited_transcribed_video.md
                                             ├── metadata.md  (title/desc/tags/chapters)
                                             ├── edit_plan.md
                                             └── qc_report.md

What it does

Ingest & sync — classifies your files, cross-correlates audio to lock screen/camera/voice in sync even if recordings started at different times.
Voice enhancement — DeepFilterNet denoise → EQ/compression ("body") → two-pass EBU R128 loudness normalization to −14 LUFS.
Transcription — WhisperX large-v3 with word-level timestamps.
Cutting — silences, filler words (um/uh), repeated takes and false starts. The configured LLM adjudicates ambiguous retakes; cuts are applied frame-accurately to screen AND camera together.
Scene understanding — PySceneDetect + configured vision LLM labels what's on screen (code editor, terminal, browser, slides…).
AI direction — the configured direction model decides camera layouts (fullscreen / square PiP / hidden), title & chapter cards, text callouts, transition SFX, YouTube chapters, title, description and tags. Constrained to your brand kit's template catalog — it fills slots, it never free-forms design.
Asset generation — HyperFrames renders brand-kit HTML templates to video (ProRes alpha for overlays); ffmpeg fallback if node is unavailable.
Render — segment-accurate cuts, rounded square camera bubble, burned brand-styled captions, SFX ducked under voice, loudness normalized on the edited timeline. Hardware encoding (VideoToolbox).
Shorts — 1–3 self-contained 9:16 highlights with big captions and a hook title.
QC — duration/loudness/mid-word-cut/black-frame checks + vision LLM spot-checks frames; report in qc_report.md.
Deliverables — edited transcript markdown (new timestamps, layout log, removed-segment list), YouTube metadata, chapters.

Install (global)

npm install -g @vindepemarte/vee
vee setup
vee

The global command creates an isolated user workspace:

~/.vee/
├── .env                 # provider, API keys, per-task model choices
├── config.yaml          # thresholds, web port, projects_dir
├── brandkit/            # brand.yaml + editable HTML templates
├── projects/            # default watched project folder
└── .venv/               # Python runtime managed by uv

On startup vee checks npm for a newer @vindepemarte/vee release. If one exists, it asks before running npm install -g @vindepemarte/vee@latest. Use vee update to check manually, vee --no-update-check for one run, or VEE_NO_UPDATE_CHECK=1 for unattended launches.

vee setup asks for provider mode (local, openrouter, hybrid, codex, or claude), OpenRouter API key when needed, model slugs/names, and basic brand kit values. If a configured local Ollama model is missing, setup offers to pull it.

Source/dev setup

# 1. system deps
brew install ollama ffmpeg          # ollama ≥ 0.30 (or the Ollama.app)
ollama pull gemma4:12b

# 2. full ffmpeg (homebrew's ffmpeg 8.x lacks libass → no caption burning)
#    put a full static build in ~/.vee/bin (e.g. from ffmpeg.martin-riedl.de):
mkdir -p ~/.vee/bin   # engine prefers ~/.vee/bin/ffmpeg automatically
# (or: export VEE_FFMPEG=/path/to/full/ffmpeg)

# 3. python env
uv sync --extra asr --extra enhance --extra dev
uv pip install -e .

# 4. optional stronger LLM routing
cp .env.example .env
# edit .env: VEE_LLM_PROVIDER=local|openrouter|hybrid|codex|claude
# for OpenRouter: set VEE_OPENROUTER_API_KEY and per-task model slugs

# 5. verify everything
.venv/bin/vee doctor

Usage

vee daemon                  # watch projects dir + dashboard at localhost:8765
vee run <folder>            # process one folder now (terminal progress bar)
vee resume <folder>         # continue after a failure, from the last checkpoint
vee review <folder>         # approve a paused edit plan and continue
vee stats                   # learned per-stage speed (drives the ETAs)
vee doctor                  # environment check

Folder tags (the contract)

| tag | meaning | |---|---| | _undone | queued — daemon picks it up when file sizes settle | | _undone_review | queued, but pause after the edit plan for approval | | _processing | currently running | | _review | review-gated: queued before the edit plan, waiting for approval after it | | _done | finished — everything in output/ | | _failed | failed — see output/error_report.md, fix, vee resume |

The dashboard (http://localhost:8765)

Live progress bar with the current stage in plain words, %, and an ETA learned from your machine's actual throughput. Also: queue, output previews (final + shorts players), edit-plan review/approve, brand-kit and config editors.

Brand kit

Edit ~/.vee/brandkit/brand.yaml for the global install, or brandkit/brand.yaml in a source checkout. Every title card, chapter card, callout and caption follows it. Templates in brandkit/templates/*.html are plain HTML+GSAP (HyperFrames compositions); tweak them freely.

Configuration

~/.vee/config.yaml for the global install, or repo-local config.yaml in a source checkout — every threshold is documented inline: silence gap, cut padding, PiP size/corner, loudness target, shorts length, review gate, direction on/off, QC strictness.

~/.vee/.env for the global install, or repo-local .env in a source checkout — local secrets and model routing. Key controls:

VEE_LLM_PROVIDER=local|openrouter|hybrid|codex|claude
VEE_OPENROUTER_API_KEY=...
VEE_LLM_MODEL_DIRECTION, VEE_LLM_MODEL_RETAKES, VEE_LLM_MODEL_REASONING, VEE_LLM_MODEL_VISION, VEE_LLM_MODEL_QC
VEE_LLM_PROVIDER_DIRECTION, VEE_LLM_PROVIDER_VISION, etc. for per-task provider overrides.

When using VEE_LLM_PROVIDER=claude, route VISION/QC to local, openrouter, or codex; the Claude CLI backend here is text-only.

Architecture notes

Timebases. Everything (transcript, EDL, layouts) lives in the master audio timebase; SyncOffsets maps it into each source video (source_t = master_t + offset). The model reasons in the edited timeline; fullscreen cards shift downstream times via RenderPlan.to_final.
Local-first. Whisper/pyannote telemetry is disabled; GSAP is vendored; SFX are synthesized with ffmpeg. In local mode, the LLM path stays on your Mac. In openrouter/hybrid modes, only configured LLM calls leave the machine.
Memory budget (32 GB). Local LLMs and WhisperX never need to stay loaded together — keep_alive: 0 unloads Ollama models between stages.
Crash-safe. Every stage checkpoints to work/state.json; vee resume (or the daemon) continues from the failed stage. Media paths are stored relative so folder retagging never breaks resume.

Tests

.venv/bin/python -m pytest -q            # unit tests (fast)
.venv/bin/python -m pytest tests/test_e2e.py  # full pipeline on a synthetic
                                              # TTS fixture (~1 min, needs `say`)
npm pack --dry-run                       # verify global npm package contents

Maintainer release

npm pack --dry-run
npm publish --access public
npm view @vindepemarte/vee version