@notedit/vtake

v1.0.1

Published

21 days ago

Local video takeaway generator (skill-driven)

Downloads

374

0High
0Medium
0Low

leeoxiang

VTake

VTake turns a local video into transcript, metadata, and an AI-composed card-based takeaway video. The CLI handles the deterministic, narrow steps (audio extraction, transcription); a Claude Code skill drives the creative work (storyboard planning, per-card HTML, GSAP timeline), and hyperframes renders the final MP4.

The mainline product is local-first and skill-driven:

vtake npm CLI — extraction, transcription, and a doctor healthcheck
skills/vtake/SKILL.md — agent workflow that designs cards, writes HTML, and calls hyperframes to render
web/ — static landing page + thin /api/transcribe proxy on Cloudflare Workers

Install

npm install -g vtake

From this repository:

npm install
npm run build
npm run dev:cli -- --help

CLI

vtake doctor                                  # check local prerequisites
vtake extract <video> --out-dir <dir>         # → metadata.json + audio.mp3
vtake transcribe <audio> --out-dir <dir>      # → transcript.json
                       [--asr elevenlabs]

Rendering and storyboard authoring are not in the CLI by design — the skill writes per-card HTML and an index.html composition in chat, then calls npx hyperframes render <dir> to produce the MP4. Run vtake doctor before starting to confirm ffmpeg, ffprobe, the staged fonts, and the gsap.min.js vendor file are present.

Environment

ASR (vtake transcribe):

ELEVEN_API_KEY — direct ElevenLabs Scribe (no rate limit). When unset, the CLI falls back to https://vtake.app/api/transcribe, rate-limited to 3 requests per minute per IP.
VTAKE_TRANSCRIBE_ENDPOINT — override the proxy URL (e.g. for local Wrangler dev: http://localhost:8787/api/transcribe).

Rendering (hyperframes render, run by the skill):

PRODUCER_BROWSER_GPU_MODE=hardware — strongly recommended on macOS.

The CLI loads .env from the repository root when running from this project.

The mainline does not include Supabase, Redis, BullMQ, R2, auth, credits, upload queues, or rendering workers.

Skill Workflow

Use skills/vtake/SKILL.md when you want an agent to drive the card-based composition:

vtake extract — audio + metadata
vtake transcribe — word-level transcript
Agent corrects ASR errors in chat
Agent drafts a lightweight storyboard.json outline in chat (no CLI)
Agent writes one HTML fragment per card
Agent assembles public/index.html with the GSAP master timeline
npx hyperframes render public -o output.mp4

The storyboard / cards / composition all live in the user's working directory — there is no central server, no upload, no queue.

Cloudflare Workers Deployment

The hosted surface is intentionally tiny: web/out is served via Workers Static Assets and web/worker.ts exposes:

GET /api, GET /api/health, GET /api/version, GET /api/capabilities
POST /api/transcribe — authenticated thin relay to ElevenLabs Scribe with per-IP (3/min) and global (60/min) rate limiting

Use Node.js 22+ for Wrangler 4.

cd web
npm install
npm run build
npm run dev:worker
npm run deploy

web/wrangler.jsonc maps the Worker to the Custom Domain vtake.app. See docs/cloudflare-workers.md for the deployment plan.

Development

npm run build
npm test

cd web
npm install
npm run typecheck
npm run build

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme