semantic-video

v1.0.7

Published

5 days ago

Intelligent video analysis powered by OpenAI Vision API. Extract frames, analyze content, search videos semantically, and estimate token costs with FFmpeg and GPT-4/GPT-5 models.

Semantic Video

Intelligent video analysis using OpenAI vision models. Extract frames, generate descriptions, and search video content with natural language.

https://github.com/user-attachments/assets/c813edec-72f4-486e-b3c2-b4fa975ba844

Overview

Semantic Video provides concurrent video processing with AI-powered frame analysis:

Concurrent batch processing with configurable parallelism
Frame extraction and analysis using OpenAI vision models
Natural language descriptions of video content
Simple keyword search across analyzed videos
Token usage tracking and cost estimation

Use Cases

Content moderation and safety filtering
Accessibility (automated alt-text and audio descriptions)
Video search engines and content discovery
Educational content analysis and indexing
Security and surveillance monitoring
Media asset management and tagging
Sports and entertainment analytics
Medical and scientific video analysis
Quality assurance and compliance checking
Automated video summarization and highlights

Important Notes

Token estimates may vary by ±10-20% from actual usage
Not all models have been extensively tested - start with gpt-5-nano or gpt-5-mini
Search uses simple keyword matching - descriptions provided as foundation for building advanced semantic search
Logger at normal/verbose levels shows detailed breakdowns of estimates, token usage, and costs

Installation

Prerequisites

Node.js 16+
FFmpeg installed (macOS: brew install ffmpeg, Ubuntu: sudo apt-get install ffmpeg)
OpenAI API key

npm install semantic-video

Quick Start

import SemanticVideoClient from 'semantic-video';

const client = new SemanticVideoClient(process.env.OPENAI_API_KEY);

const frames = await client.analyzeVideo('video.mp4');
const results = client.searchAllVideos('person');

Usage Examples

Basic Analysis

import SemanticVideoClient from 'semantic-video';

const client = new SemanticVideoClient(apiKey);

// With defaults (10 frames, quality 10, 720p, gpt-5-nano)
const frames = await client.analyzeVideo('video.mp4');

// Custom parameters
await client.analyzeVideo(
  'video.mp4',
  20,                                     // frames
  'Describe the scene in detail',        // prompt
  10,                                     // quality (2-31)
  720,                                    // scale
  'gpt-5-nano'                           // model
);

Token Estimation with Logger

Enable logger to see detailed breakdowns of estimates, usage, and costs. Processes multiple videos concurrently:

const client = new SemanticVideoClient(apiKey, {
  enabled: true,
  level: 'normal'    // Shows progress, estimates table, and summaries
}, 3);  // Process up to 3 videos concurrently

const estimate = await client.estimateMultipleVideosTokens([
  { videoPath: 'video1.mp4', numPartitions: 10 },
  { videoPath: 'video2.mp4', numPartitions: 15 }
]);

// Logger displays detailed estimate table automatically
// Access programmatically:
if (estimate.grandTotal.estimatedCost < 1.0) {
  await client.analyzeMultipleVideos([
    { videoPath: 'video1.mp4', numPartitions: 10 },
    { videoPath: 'video2.mp4', numPartitions: 15 }
  ]);
}

Custom Prompts

// Content moderation
await client.analyzeVideo(
  'upload.mp4',
  20,
  'Identify inappropriate, violent, or harmful content'
);

// Accessibility
await client.analyzeVideo(
  'tutorial.mp4',
  50,
  'Describe all visual elements, text, and interactions for audio description',
  5  // Higher quality for detail
);

Preventing Rate Limits

// Estimate before processing to avoid rate limits
const estimate = await client.estimateVideoTokens('large-video.mp4', 50);

// Check against your rate limit (e.g., 200k tokens/min for tier 1)
const RATE_LIMIT = 200000;
if (estimate.total.totalTokens > RATE_LIMIT) {
  // Adjust parameters to stay under limit
  const reducedFrames = Math.floor(RATE_LIMIT / estimate.perFrame.totalTokens);
  console.log(`Reducing from 50 to ${reducedFrames} frames to avoid rate limit`);
  await client.analyzeVideo('large-video.mp4', reducedFrames);
} else {
  await client.analyzeVideo('large-video.mp4', 50);
}

Search

await client.analyzeVideo('video.mp4');
const results = client.searchAllVideos('person');
// Returns: [{ videoPath, frames: FrameData[] }]

Direct Class Usage

import { SemanticVideo } from 'semantic-video';
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const video = new SemanticVideo('video.mp4', apiKey, openai);

await video.analyze(15, 'Describe the scene', 10, 720, 'gpt-5-nano');
const frame5 = video.getFrame(5);
const matches = video.searchFrames('person');
const tokens = video.getTokensUsed();  // { inputTokens, outputTokens, totalTokens, model }
video.saveFrame(5, 'frame-5.jpg');
await video.cleanup();

Supported Models

Pricing: input/output per 1M tokens

GPT-5 Family

gpt-5-nano (default): $0.05/$0.40
gpt-5-mini: $0.25/$2.00
gpt-5: $1.25/$10.00
gpt-5.1: $1.25/$10.00
gpt-5.2: $1.75/$14.00
gpt-5-pro: $15.00/$120.00
gpt-5.2-pro: $21.00/$168.00

GPT-4.1 Family

gpt-4.1-nano: $0.10/$0.40
gpt-4.1-mini: $0.40/$1.60
gpt-4.1: $2.00/$8.00

O-Series

o4-mini: $1.10/$4.40
o3: $2.00/$8.00
o3-pro: $20.00/$80.00
o1: $15.00/$60.00

Configuration

Quality vs Cost

Quality parameter affects token usage and file size:

| Quality | Tokens/Frame | Use Case | |---------|--------------|----------| | 2-5 | ~500-800 | Critical detail | | 10 (default) | ~300-400 | Balanced | | 15-20 | ~150-250 | Cost-effective | | 25-31 | ~100-150 | Maximum savings |

Scale (Resolution)

-1: Original (may be large, uses more tokens)
1080: High detail, more tokens
720 (default): Balanced
480: Fewer tokens, faster

Reducing Token Usage

Reduce numPartitions (fewer frames)
Increase quality value (15-20)
Decrease scale (480)
Use gpt-5-nano

Troubleshooting

FFmpeg not found: Install FFmpeg (brew install ffmpeg on macOS, sudo apt-get install ffmpeg on Ubuntu)

High token usage: Use token estimation before analysis, adjust quality/scale/partitions

Memory issues: Reduce maxConcurrency, process in smaller batches, call await video.cleanup()

Contributing

Open an issue on GitHub

License

ISC