@mylesiyabor/agent-loom

v0.1.4

Published

2 months ago

AI-generated narrated screen recordings. Point at a URL or topic — get a video narrated in your cloned voice.

0High
0Medium
0Low

mylesiyabor

ai loom screen-recording video voice-cloning tts narration agent cdp chrome

agent-loom

AI-generated narrated screen recordings. Point it at a URL or topic — get a video narrated in your cloned voice.

Like Loom, but the AI does the talking.

Install

npm install -g agent-loom

Quick Start

# Interactive setup — installs voice engine, clones your voice
agent-loom init

# Record a website walkthrough
agent-loom record https://example.com -o demo.mp4

# Record a research briefing
agent-loom research "AI agent market in 2026" -o briefing.mp4

How It Works

Capture — Headless Chrome takes screenshots via CDP as it scrolls through the page
Narrate — LLM writes casual, conversational narration based on page content
Speak — LuxTTS generates speech in your cloned voice (~0.8s on Apple Silicon)
Composite — ffmpeg stitches frames + audio into an MP4

Voice Cloning

agent-loom init

Opens a browser page where you read a prompt for ~8 seconds. That audio becomes your voice reference. LuxTTS clones your voice from that single sample.

No voice data leaves your machine. Everything runs locally.

To re-record:

agent-loom voice

Modes

`record` — Website Walkthrough

Points Chrome at a URL, scrolls through sections, captures screenshots, generates natural narration about what's visible.

agent-loom record https://betterbotagent.com -o walkthrough.mp4
agent-loom record https://github.com/trending -o github.mp4

`research` — Topic Briefing

Takes a topic, researches it via LLM, creates presentation slides, narrates each section.

agent-loom research "comparison of React vs Svelte in 2026" -o frameworks.mp4
agent-loom research "state of AI agents" -o agents.mp4

Options

--output, -o <file>     Output filename (default: loom.mp4)
--width <px>            Video width (default: 1920)
--height <px>           Video height (default: 1080)
--model <model>         LLM model override
--provider <name>       openrouter (default) or openai
--no-voice              Skip voice cloning, use default TTS

Environment Variables

# Pick one:
export OPENROUTER_API_KEY=sk-or-...    # OpenRouter (default, 200+ models)
export OPENAI_API_KEY=sk-...           # OpenAI

# Optional:
export VOICE_REF=/path/to/voice.wav    # Custom voice reference
export LLM_MODEL=gpt-4o               # Override model
export LLM_PROVIDER=openai            # Override provider

Requirements

Node.js 18+
Python 3.9+ (for LuxTTS voice engine)
ffmpeg — brew install ffmpeg
Chrome or Chromium

agent-loom init handles Python dependencies automatically.

How Voice Cloning Works

Agent Loom uses LuxTTS, a fast voice cloning model that runs locally on your machine. From an 8-second voice sample, it generates speech that sounds like you.

On Apple Silicon (M1+), generation takes ~0.8 seconds per sentence. On CPU it's slower but still works.

The voice server runs on localhost:8777 and accepts POST /tts with { "text": "..." }.

Architecture

agent-loom/
├── bin/cli.mjs           # CLI entry point
├── lib/
│   ├── init.mjs          # Interactive setup
│   ├── recorder.mjs      # Core recording engine (CDP + LLM + TTS + ffmpeg)
│   └── voice-recorder.mjs # Browser-based voice clone UI
├── package.json
├── LICENSE               # MIT
└── README.md

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme