videostil

v0.3.5

Published

2 months ago

Convert videos into LLM-friendly input by extracting and deduplicating frames

0High
0Medium
0Low

arjun27

video frames llm deduplication ffmpeg computer-vision

videostil

Videos are large. LLM context windows are small. You need a way to "compress" videos so they fit. We've got you.

videostil extracts and deduplicates video frames, converting videos into LLM-friendly formats that fit within context windows.

Features

🎬 Extract frames at configurable FPS (default: 25)
🔍 Smart deduplication (3 algorithms: greedy, dynamic programming, sliding window)
📦 Simple API and CLI
⚡ Fast processing with caching
🎯 Absolute frame indexing preserves video timeline
🖼️ Automatic scaling and padding to 1280x720
🌐 Built-in analysis viewer server
📊 Frame similarity analysis and visualization
🤖 Optional LLM-powered video analysis (supports Anthropic, Google, OpenAI)

Installation

npm install videostil

Requirements: ffmpeg and ffprobe must be installed on your system.

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Quick Start

CLI Usage

# Extract frames from a video (automatically opens viewer)
npx videostil https://example.com/video.mp4

# With custom parameters
npx videostil video.mp4 --fps 20 --threshold 0.01 --algo dp

# Extract without opening viewer
npx videostil video.mp4 --no-serve

# Start viewer server for existing analyses
npx videostil serve
npx videostil serve --port 8080 --no-open

API Usage

import { extractUniqueFrames, startAnalysisServer, analyseFrames } from 'videostil';

// Extract and deduplicate frames
const result = await extractUniqueFrames({
  videoUrl: 'https://example.com/video.mp4',
  fps: 25,
  threshold: 0.001,
  algo: 'gd' // greedy (default), or 'dp', 'sw'
});

console.log(`Extracted ${result.uniqueFrames.length} unique frames`);
console.log(`Video duration: ${result.videoDurationSeconds}s`);

// Access frame data
for (const frame of result.uniqueFrames) {
  console.log(`Frame ${frame.index} at ${frame.timestamp}`);
  console.log(`Path: ${frame.path}`);
  console.log(`Base64: ${frame.base64.substring(0, 50)}...`);
}

// Optional: Analyze frames with LLM
const analysisResult = await analyseFrames({
  selectedModel: 'claude-sonnet-4-6',
  workingDirectory: result.uniqueFramesDir,
  frameBatch: result.uniqueFrames.map(frame => ({
    name: frame.fileName,
    contentType: 'image/png',
    base64Data: frame.base64
  })),
  systemPrompt: 'You are a video analysis assistant.',
  initialUserPrompt: 'Analyze these video frames and provide insights.',
  apiKeys: {
    ANTHROPIC_API_KEY: process.env.ANTHROPIC_API_KEY,
    GOOGLE_API_KEY: process.env.GOOGLE_API_KEY,
    OPENAI_API_KEY: process.env.OPENAI_API_KEY
  }
});

console.log(analysisResult.analysis);

// Start analysis viewer server
const server = await startAnalysisServer({
  port: 63745,
  openBrowser: true
});

console.log(`Server running at ${server.url}`);

API Reference

`extractUniqueFrames(options)`

Extract and deduplicate frames from a video.

Options:

videoUrl (string, required): Video URL or local file path
fps (number, default: 25): Frames per second to extract
threshold (number, default: 0.001): Deduplication similarity threshold (0.0-1.0)
startTime (string , optional): Start extraction at specified time in MM:SS format (e.g., "1:30")
duration (number, optional): Extract only specified duration in seconds
algo (string, default: "gd"): Deduplication algorithm
- "gd" - Greedy (fastest, compares with previous frame only)
- "dp" - Dynamic Programming (looks back at N frames, default 5)
- "sw" - Sliding Window (maintains rolling window, default 3 frames)
workingDir (string, optional): Absolute path for output directory (default: ~/.videostil/{hash})

Returns: ExtractResult

totalFramesCount (number): Total frames before deduplication
uniqueFrames (FrameInfo[]): Array of unique frames
videoDurationSeconds (number): Video duration in seconds
uniqueFramesDir (string): Path to unique frames directory

`startAnalysisServer(options)`

Start analysis viewer server to browse and visualize extracted frames.

Options:

port (number, default: 63745): Server port
host (string, default: "127.0.0.1"): Server host
openBrowser (boolean, default: true): Automatically open browser
workingDir (string, optional): Custom directory to serve analyses from (default: ~/.videostil/)

Returns: ServerHandle

url (string): Server URL
port (number): Server port
host (string): Server host
server (http.Server): Node.js HTTP server instance
close (() => Promise): Function to close the server

`analyseFrames(options)`

Analyze video frames using AI models from Anthropic, Google, or OpenAI.

Options:

selectedModel (string, required): Model to use (e.g., "claude-sonnet-4-6", "gpt-5-2025-08-07", "gemini-2.5-pro")
workingDirectory (string, required): Directory containing the frames
frameBatch (Attachment[], required): Array of frame attachments with base64 data
systemPrompt (string, required): System prompt for the AI
initialUserPrompt (string, required): User prompt for the AI
apiKeys (object, optional): API keys object with ANTHROPIC_API_KEY, GOOGLE_API_KEY, and/or OPENAI_API_KEY

Returns: AnalyseFramesResult

analysis (string): Raw analysis text from the AI
allMessages (CanonicalMessage[]): All messages in the conversation
parsedXml (VideoAnalysisSection[]): Parsed XML sections from analysis
keyFrameImagesMap (Map): Map of key frame images

Note: At least one API key must be provided either in the apiKeys parameter or as environment variables.

Frame Information

Each frame in uniqueFrames contains:

index (number): Absolute frame position in original video
path (string): Local file path to frame image
fileName (string): Normalized filename (e.g., "frame_000123.png")
base64 (string): Base64-encoded PNG image data
timestamp (string): Human-readable timestamp (e.g., "1m23s")

CLI Reference

Extract Command

videostil <video-url> [options]

Options:
  --fps <number>          Frames per second (default: 25)
  --threshold <number>    Deduplication threshold (default: 0.01)
  --algo <string>         Algorithm: gd|dp|sw (default: gd)
  --start <MM:SS>         Start time in video (format: MM:SS, e.g., 1:30)
  --duration <seconds>    Duration to extract in seconds
  --output <dir>          Output directory
  --no-serve              Don't start viewer after extraction
  --model <string>        LLM model for analysis (e.g., claude-sonnet-4-6)
  --system-prompt <str>   System prompt for LLM analysis
  --user-prompt <str>     User prompt for LLM analysis
  --help                  Show help
  --version               Show version

Note: LLM analysis requires API keys. Set ANTHROPIC_API_KEY, GOOGLE_API_KEY, or OPENAI_API_KEY environment variables to enable automatic frame analysis.

Serve Command

videostil serve [options]

Options:
  --port <number>       Server port (default: 63745)
  --host <string>       Server host (default: 127.0.0.1)
  --no-open             Don't open browser automatically
  --help                Show help

How It Works

Download/Copy Video: Fetches video from URL or copies local file
Extract Frames: Uses ffmpeg to extract frames at specified FPS
Scale & Pad: Normalizes all frames to 1280x720 (maintains aspect ratio)
Deduplicate: Removes duplicate frames using selected algorithm
Store Results: Saves unique frames and metadata
Analyze (Optional): If API keys are available, analyzes frames using LLM

Deduplication Algorithms

Greedy (gd)

Compares each frame only with the previous unique frame
Fastest, lowest memory usage
Best for videos with gradual scene changes

Dynamic Programming (dp)

Compares each frame with previous N unique frames (lookback window)
Default lookback: 5 frames
Better for detecting duplicates with brief scene interruptions

Sliding Window (sw)

Maintains a rolling window of recent unique frames
Default window size: 3 frames
Good balance between accuracy and performance

Output Structure

~/.videostil/{hash}/
├── video_{hash}.webm          # Downloaded video
├── frames/                    # All extracted frames
│   └── frame_000000.png
├── unique_frames/             # Deduplicated frames only
│   └── frame_000000.png
├── analysis-result.json       # Analysis metadata
└── frame-diff-data.json      # Cached similarity data

Examples

Extract Specific Time Range

import { extractUniqueFrames } from 'videostil';

// Extract 30 seconds starting at 1 minute
const result = await extractUniqueFrames({
  videoUrl: 'video.mp4',
  startTime: '1:00',
  duration: 30,
  fps: 30
});

Use Different Algorithms

import { extractUniqueFrames } from 'videostil';

// Try each algorithm and compare results
const algorithms = ['gd', 'dp', 'sw'] as const;

for (const algo of algorithms) {
  const result = await extractUniqueFrames({
    videoUrl: 'video.mp4',
    algo,
    threshold: 0.01
  });

  console.log(`${algo}: ${result.uniqueFrames.length} unique frames`);
}

Custom Working Directory

import { extractUniqueFrames } from 'videostil';
import path from 'path';

const result = await extractUniqueFrames({
  videoUrl: 'video.mp4',
  workingDir: path.join(process.cwd(), 'my-frames')
});

Analysis Viewer Server

import { startAnalysisServer } from 'videostil';

// Start server with custom options
const server = await startAnalysisServer({
  port: 8080,
  host: '0.0.0.0',
  openBrowser: false
});

console.log(`Analysis viewer: ${server.url}`);

// Close server when done
await server.close();

LLM Frame Analysis

import { extractUniqueFrames, analyseFrames } from 'videostil';

// First extract frames
const result = await extractUniqueFrames({
  videoUrl: 'video.mp4',
  fps: 25,
  threshold: 0.01
});

// Analyze with Claude
const analysis = await analyseFrames({
  selectedModel: 'claude-sonnet-4-6',
  workingDirectory: result.uniqueFramesDir,
  frameBatch: result.uniqueFrames.map(frame => ({
    name: frame.fileName,
    contentType: 'image/png',
    base64Data: frame.base64
  })),
  systemPrompt: 'Analyze these video frames for key events and transitions.',
  initialUserPrompt: 'What are the main scenes in this video?'
});

console.log(analysis.analysis);

Contributing

Contributions are welcome! Please see the GitHub repository for guidelines.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

videostil

Features

Installation

Quick Start

CLI Usage

API Usage

API Reference

extractUniqueFrames(options)

startAnalysisServer(options)

analyseFrames(options)

Frame Information

CLI Reference

Extract Command

Serve Command

How It Works

Deduplication Algorithms

Output Structure

Examples

Extract Specific Time Range

Use Different Algorithms

Custom Working Directory

Analysis Viewer Server

LLM Frame Analysis

Contributing

License

Links

`extractUniqueFrames(options)`

`startAnalysisServer(options)`

`analyseFrames(options)`