@mylesiyabor/agent-loom
v0.1.4
Published
AI-generated narrated screen recordings. Point at a URL or topic — get a video narrated in your cloned voice.
Maintainers
Readme
agent-loom
AI-generated narrated screen recordings. Point it at a URL or topic — get a video narrated in your cloned voice.
Like Loom, but the AI does the talking.
Install
npm install -g agent-loomQuick Start
# Interactive setup — installs voice engine, clones your voice
agent-loom init
# Record a website walkthrough
agent-loom record https://example.com -o demo.mp4
# Record a research briefing
agent-loom research "AI agent market in 2026" -o briefing.mp4How It Works
- Capture — Headless Chrome takes screenshots via CDP as it scrolls through the page
- Narrate — LLM writes casual, conversational narration based on page content
- Speak — LuxTTS generates speech in your cloned voice (~0.8s on Apple Silicon)
- Composite — ffmpeg stitches frames + audio into an MP4
Voice Cloning
agent-loom initOpens a browser page where you read a prompt for ~8 seconds. That audio becomes your voice reference. LuxTTS clones your voice from that single sample.
No voice data leaves your machine. Everything runs locally.
To re-record:
agent-loom voiceModes
record — Website Walkthrough
Points Chrome at a URL, scrolls through sections, captures screenshots, generates natural narration about what's visible.
agent-loom record https://betterbotagent.com -o walkthrough.mp4
agent-loom record https://github.com/trending -o github.mp4research — Topic Briefing
Takes a topic, researches it via LLM, creates presentation slides, narrates each section.
agent-loom research "comparison of React vs Svelte in 2026" -o frameworks.mp4
agent-loom research "state of AI agents" -o agents.mp4Options
--output, -o <file> Output filename (default: loom.mp4)
--width <px> Video width (default: 1920)
--height <px> Video height (default: 1080)
--model <model> LLM model override
--provider <name> openrouter (default) or openai
--no-voice Skip voice cloning, use default TTSEnvironment Variables
# Pick one:
export OPENROUTER_API_KEY=sk-or-... # OpenRouter (default, 200+ models)
export OPENAI_API_KEY=sk-... # OpenAI
# Optional:
export VOICE_REF=/path/to/voice.wav # Custom voice reference
export LLM_MODEL=gpt-4o # Override model
export LLM_PROVIDER=openai # Override providerRequirements
- Node.js 18+
- Python 3.9+ (for LuxTTS voice engine)
- ffmpeg —
brew install ffmpeg - Chrome or Chromium
agent-loom init handles Python dependencies automatically.
How Voice Cloning Works
Agent Loom uses LuxTTS, a fast voice cloning model that runs locally on your machine. From an 8-second voice sample, it generates speech that sounds like you.
On Apple Silicon (M1+), generation takes ~0.8 seconds per sentence. On CPU it's slower but still works.
The voice server runs on localhost:8777 and accepts POST /tts with { "text": "..." }.
Architecture
agent-loom/
├── bin/cli.mjs # CLI entry point
├── lib/
│ ├── init.mjs # Interactive setup
│ ├── recorder.mjs # Core recording engine (CDP + LLM + TTS + ffmpeg)
│ └── voice-recorder.mjs # Browser-based voice clone UI
├── package.json
├── LICENSE # MIT
└── README.mdLicense
MIT
