video-query
v1.0.0
Published
Video surveillance screenshot query service using AI vision models
Maintainers
Readme
video-query
A video surveillance screenshot query service using AI vision models. Extract frames from videos, create labeled grid mosaics, and use vision AI models to locate specific scenes or objects.
Features
- Extract frames from videos using FFmpeg WASM (no local FFmpeg installation required)
- Compose multiple images into labeled grid mosaics with row/column annotations
- Analyze images using OpenAI multimodal models and other vision AI providers
- Automatically parse AI responses into structured results
- Support for custom system prompts
- Extensible custom model adapters
Installation
npm install video-queryTo use OpenAI models, also install:
npm install openaiQuick Start
Basic Usage
import { QueryVideo } from 'video-query';
import OpenAI from 'openai';
import fs from 'fs';
// Initialize OpenAI client
const openai = new OpenAI({ apiKey: 'your-api-key' });
// Create query instance
const query = new QueryVideo({
mosaic: {
columns: 4, // 4 columns per row
size: { width: 2048, height: 2048 }, // Total mosaic size (rows auto-calculated)
// cellAspectRatio: 1, // Optional: cell aspect ratio, default 1 (square)
},
model: {
sdk: openai, // SDK type auto-detected
defaultModel: 'gpt-4o',
},
});
// Extract frames from video
await query.addVideo({
source: fs.readFileSync('video.mp4'), // Or pass file path directly
interval: 5, // Extract one frame every 5 seconds
startTime: 0, // Start from 0 seconds
});
// Execute query
const result = await query.query('Find all frames where someone is wearing red clothes');
if (result.success) {
console.log('Matches found:', result.matches.length);
for (const match of result.matches) {
console.log(`- ${match.description}`);
console.log(` Time: ${match.item.metadata?.videoTime}s`);
}
}Using Images
import { QueryVideo } from 'video-query';
import fs from 'fs';
const query = new QueryVideo({
mosaic: {
columns: 3,
size: { width: 1280, height: 720 }, // Total mosaic size
},
model: {
sdk: openai,
},
});
// Add single image (Buffer)
query.addImage(fs.readFileSync('image1.png'), { source: 'camera-1' });
// Add Base64 image
const base64Image = '...';
query.addImage(base64Image, { source: 'camera-2' });
// Add multiple images
query.addImages([
{ data: fs.readFileSync('image2.png'), metadata: { source: 'camera-3' } },
{ data: 'iVBORw0KGgo...', metadata: { source: 'camera-4' } }, // Pure Base64 (without prefix)
]);
const result = await query.query('Are there any animals in the images?');Generate Mosaics (for debugging)
import { QueryVideo } from 'video-query';
import fs from 'fs';
const query = new QueryVideo({ /* ... */ });
// Add images
query.addImage(/* ... */);
// Generate mosaics for debugging or saving
const mosaics = await query.generateMosaics();
for (const m of mosaics) {
fs.writeFileSync(`mosaic-${m.index}.png`, m.buffer);
console.log(`Saved mosaic ${m.index}: ${m.width}x${m.height}, contains ${m.imageCount} images`);
}API
QueryVideo
Main class that combines all modules to provide query functionality.
Constructor
new QueryVideo(config: QueryVideoConfig)QueryVideoConfig:
| Parameter | Type | Description |
|-----------|------|-------------|
| mosaic | MosaicConfig | Mosaic configuration |
| model | ModelAdapterConfig | Model adapter configuration |
| systemPromptTemplate | string? | Custom system prompt |
| debug | boolean? | Debug mode |
Methods
| Method | Description |
|--------|-------------|
| addImage(data, metadata?) | Add single image (supports Buffer or Base64 string) |
| addImages(items) | Add multiple images (supports Buffer or Base64 string) |
| addVideo(config) | Extract frames from video and add them |
| query(prompt) | Execute query |
| generateMosaics() | Generate mosaics (for debugging or saving) |
| getImages() | Get all added images |
| getImageCount() | Get image count |
| clearImages() | Clear all images |
| preloadFFmpeg() | Preload FFmpeg |
MosaicConfig
Mosaic configuration.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| columns | number | - | Number of columns per row |
| cellAspectRatio | number? | 1 | Cell aspect ratio (width/height), 1 for square |
| size | SizeConfig | - | Size configuration |
| labelFontSize | number? | 14 | Label font size (label area auto-calculated as fontSize * 1.5) |
| backgroundColor | string? | #000000 | Background color |
| labelColor | string? | #FFFFFF | Label color |
| gridColor | string? | #333333 | Grid line color |
| gridWidth | number? | 1 | Grid line width |
SizeConfig:
// Specify total mosaic size (including label area and grid lines)
// Cell size and row count are auto-calculated based on total size
// Empty slots display as solid black placeholders
{ width: 2048, height: 2048 }VideoConfig
Video frame extraction configuration.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| source | Buffer \| string | - | Video data or file path |
| startTime | number? | 0 | Start time for extraction (seconds) |
| interval | number? | 1 | Extraction interval (seconds) |
| duration | number? | - | Extraction duration (seconds) |
| sourceId | string? | - | Video source identifier |
QueryResult
Query result.
interface QueryResult {
success: boolean; // Whether successful
matches: MatchedImage[]; // List of matched images
error?: string; // Error message
rawResponse?: string; // Raw model response
duration?: number; // Query duration (milliseconds)
}
interface MatchedImage {
item: IImageItem; // Original image item
confidence?: number; // Match confidence
description?: string; // Description from model
}ImageData
Data parameter type for addImage() and addImages() methods.
type ImageData = Buffer | string | null;Buffer: Binary image datastring: Base64 encoded image data, supports two formats:- With prefix:
... - Pure Base64:
iVBORw0KGgo...
- With prefix:
null: Solid black placeholder image
MosaicBuffer
Return type for generateMosaics() method.
interface MosaicBuffer {
index: number; // Mosaic index
buffer: Buffer; // Image Buffer
base64: string; // Base64 encoded image data
width: number; // Width
height: number; // Height
imageCount: number; // Number of images contained
}Extending Model Adapters
The library includes a built-in OpenAI adapter. You can also extend custom adapters to support other AI providers:
import { BaseModelAdapter, createModelAdapter } from 'video-query';
import type { VisionRequest, VisionResponse, IModelAdapter } from 'video-query';
// Method 1: Extend BaseModelAdapter
class MyCustomAdapter extends BaseModelAdapter {
async callVision(request: VisionRequest): Promise<VisionResponse> {
// Implement your AI call logic
const response = await myAIClient.chat({
systemPrompt: request.systemPrompt,
userPrompt: request.userPrompt,
images: request.images,
});
return { content: response.text };
}
}
// Method 2: Implement IModelAdapter interface
const myAdapter: IModelAdapter = {
validate: () => true,
callVision: async (request) => {
// ...
return { content: '...' };
},
};
// Use in QueryVideo
const query = new QueryVideo({
mosaic: { /* ... */ },
model: {
sdk: openai, // SDK type auto-detected
},
});How It Works
- Frame Extraction: Use FFmpeg WASM to extract frames from video at specified intervals
- Mosaic Generation: Compose multiple images into labeled grid mosaics with row/column annotations (A, B, C... / 1, 2, 3...)
- AI Analysis: Send mosaics to vision model, which returns matching positions based on coordinate labels
- Result Parsing: Parse AI responses into structured results, mapping back to original images
Requirements
- Node.js >= 18.0.0
- ES Module support
License
MIT
