@vindepemarte/vee
v0.1.3
Published
Local-first AI video editing engine with OpenRouter/local/CLI model routing
Maintainers
Readme
vee — Video Editing Engine
A local-first, AI-driven video production suite for macOS (Apple Silicon).
Drop a raw screen recording (plus optional camera video and microphone track)
into a folder — get back a polished, publish-ready YouTube video, vertical
short-form clips, and a timestamped markdown transcript. The default LLM path
is fully local; .env can route editorial reasoning through OpenRouter or the
logged-in Codex/Claude CLIs when you want stronger models.
projects/my-tutorial_undone/ ──► projects/my-tutorial_done/
├── screen.mp4 (required) └── output/
├── camera.mp4 (optional) ├── final_youtube.mp4
├── voice.mp3 (optional) ├── shorts/clip_01.mp4 …
└── notes.md (optional) ├── edited_transcribed_video.md
├── metadata.md (title/desc/tags/chapters)
├── edit_plan.md
└── qc_report.mdWhat it does
- Ingest & sync — classifies your files, cross-correlates audio to lock screen/camera/voice in sync even if recordings started at different times.
- Voice enhancement — DeepFilterNet denoise → EQ/compression ("body") → two-pass EBU R128 loudness normalization to −14 LUFS.
- Transcription — WhisperX large-v3 with word-level timestamps.
- Cutting — silences, filler words (um/uh), repeated takes and false starts. The configured LLM adjudicates ambiguous retakes; cuts are applied frame-accurately to screen AND camera together.
- Scene understanding — PySceneDetect + configured vision LLM labels what's on screen (code editor, terminal, browser, slides…).
- AI direction — the configured direction model decides camera layouts (fullscreen / square PiP / hidden), title & chapter cards, text callouts, transition SFX, YouTube chapters, title, description and tags. Constrained to your brand kit's template catalog — it fills slots, it never free-forms design.
- Asset generation — HyperFrames renders brand-kit HTML templates to video (ProRes alpha for overlays); ffmpeg fallback if node is unavailable.
- Render — segment-accurate cuts, rounded square camera bubble, burned brand-styled captions, SFX ducked under voice, loudness normalized on the edited timeline. Hardware encoding (VideoToolbox).
- Shorts — 1–3 self-contained 9:16 highlights with big captions and a hook title.
- QC — duration/loudness/mid-word-cut/black-frame checks + vision LLM
spot-checks frames; report in
qc_report.md. - Deliverables — edited transcript markdown (new timestamps, layout log, removed-segment list), YouTube metadata, chapters.
Install (global)
npm install -g @vindepemarte/vee
vee setup
veeThe global command creates an isolated user workspace:
~/.vee/
├── .env # provider, API keys, per-task model choices
├── config.yaml # thresholds, web port, projects_dir
├── brandkit/ # brand.yaml + editable HTML templates
├── projects/ # default watched project folder
└── .venv/ # Python runtime managed by uvOn startup vee checks npm for a newer @vindepemarte/vee release. If one
exists, it asks before running npm install -g @vindepemarte/vee@latest.
Use vee update to check manually, vee --no-update-check for one run, or
VEE_NO_UPDATE_CHECK=1 for unattended launches.
vee setup asks for provider mode (local, openrouter, hybrid, codex,
or claude), OpenRouter API key when needed, model slugs/names, and basic
brand kit values. If a configured local Ollama model is missing, setup offers
to pull it.
Source/dev setup
# 1. system deps
brew install ollama ffmpeg # ollama ≥ 0.30 (or the Ollama.app)
ollama pull gemma4:12b
# 2. full ffmpeg (homebrew's ffmpeg 8.x lacks libass → no caption burning)
# put a full static build in ~/.vee/bin (e.g. from ffmpeg.martin-riedl.de):
mkdir -p ~/.vee/bin # engine prefers ~/.vee/bin/ffmpeg automatically
# (or: export VEE_FFMPEG=/path/to/full/ffmpeg)
# 3. python env
uv sync --extra asr --extra enhance --extra dev
uv pip install -e .
# 4. optional stronger LLM routing
cp .env.example .env
# edit .env: VEE_LLM_PROVIDER=local|openrouter|hybrid|codex|claude
# for OpenRouter: set VEE_OPENROUTER_API_KEY and per-task model slugs
# 5. verify everything
.venv/bin/vee doctorUsage
vee daemon # watch projects dir + dashboard at localhost:8765
vee run <folder> # process one folder now (terminal progress bar)
vee resume <folder> # continue after a failure, from the last checkpoint
vee review <folder> # approve a paused edit plan and continue
vee stats # learned per-stage speed (drives the ETAs)
vee doctor # environment checkFolder tags (the contract)
| tag | meaning |
|---|---|
| _undone | queued — daemon picks it up when file sizes settle |
| _undone_review | queued, but pause after the edit plan for approval |
| _processing | currently running |
| _review | review-gated: queued before the edit plan, waiting for approval after it |
| _done | finished — everything in output/ |
| _failed | failed — see output/error_report.md, fix, vee resume |
The dashboard (http://localhost:8765)
Live progress bar with the current stage in plain words, %, and an ETA learned from your machine's actual throughput. Also: queue, output previews (final + shorts players), edit-plan review/approve, brand-kit and config editors.
Brand kit
Edit ~/.vee/brandkit/brand.yaml for the global install, or
brandkit/brand.yaml in a source checkout. Every title card, chapter card,
callout and caption follows it. Templates in brandkit/templates/*.html are
plain HTML+GSAP (HyperFrames compositions); tweak them freely.
Configuration
~/.vee/config.yaml for the global install, or repo-local config.yaml in a
source checkout — every threshold is documented inline: silence gap, cut
padding, PiP size/corner, loudness target, shorts length, review gate,
direction on/off, QC strictness.
~/.vee/.env for the global install, or repo-local .env in a source
checkout — local secrets and model routing. Key controls:
VEE_LLM_PROVIDER=local|openrouter|hybrid|codex|claudeVEE_OPENROUTER_API_KEY=...VEE_LLM_MODEL_DIRECTION,VEE_LLM_MODEL_RETAKES,VEE_LLM_MODEL_REASONING,VEE_LLM_MODEL_VISION,VEE_LLM_MODEL_QCVEE_LLM_PROVIDER_DIRECTION,VEE_LLM_PROVIDER_VISION, etc. for per-task provider overrides.
When using VEE_LLM_PROVIDER=claude, route VISION/QC to local,
openrouter, or codex; the Claude CLI backend here is text-only.
Architecture notes
- Timebases. Everything (transcript, EDL, layouts) lives in the master
audio timebase;
SyncOffsetsmaps it into each source video (source_t = master_t + offset). The model reasons in the edited timeline; fullscreen cards shift downstream times viaRenderPlan.to_final. - Local-first. Whisper/pyannote telemetry is disabled; GSAP is vendored;
SFX are synthesized with ffmpeg. In
localmode, the LLM path stays on your Mac. Inopenrouter/hybridmodes, only configured LLM calls leave the machine. - Memory budget (32 GB). Local LLMs and WhisperX never need to stay loaded
together —
keep_alive: 0unloads Ollama models between stages. - Crash-safe. Every stage checkpoints to
work/state.json;vee resume(or the daemon) continues from the failed stage. Media paths are stored relative so folder retagging never breaks resume.
Tests
.venv/bin/python -m pytest -q # unit tests (fast)
.venv/bin/python -m pytest tests/test_e2e.py # full pipeline on a synthetic
# TTS fixture (~1 min, needs `say`)
npm pack --dry-run # verify global npm package contentsMaintainer release
npm pack --dry-run
npm publish --access public
npm view @vindepemarte/vee version