video-query

v1.0.0

Published

10 days ago

Video surveillance screenshot query service using AI vision models

0High
0Medium
0Low

liaokaime

video surveillance query ai vision openai image-processing ffmpeg

video-query

A video surveillance screenshot query service using AI vision models. Extract frames from videos, create labeled grid mosaics, and use vision AI models to locate specific scenes or objects.

中文文档

Features

Extract frames from videos using FFmpeg WASM (no local FFmpeg installation required)
Compose multiple images into labeled grid mosaics with row/column annotations
Analyze images using OpenAI multimodal models and other vision AI providers
Automatically parse AI responses into structured results
Support for custom system prompts
Extensible custom model adapters

Installation

npm install video-query

To use OpenAI models, also install:

npm install openai

Quick Start

Basic Usage

import { QueryVideo } from 'video-query';
import OpenAI from 'openai';
import fs from 'fs';

// Initialize OpenAI client
const openai = new OpenAI({ apiKey: 'your-api-key' });

// Create query instance
const query = new QueryVideo({
  mosaic: {
    columns: 4,           // 4 columns per row
    size: { width: 2048, height: 2048 },  // Total mosaic size (rows auto-calculated)
    // cellAspectRatio: 1,  // Optional: cell aspect ratio, default 1 (square)
  },
  model: {
    sdk: openai,          // SDK type auto-detected
    defaultModel: 'gpt-4o',
  },
});

// Extract frames from video
await query.addVideo({
  source: fs.readFileSync('video.mp4'),  // Or pass file path directly
  interval: 5,                            // Extract one frame every 5 seconds
  startTime: 0,                           // Start from 0 seconds
});

// Execute query
const result = await query.query('Find all frames where someone is wearing red clothes');

if (result.success) {
  console.log('Matches found:', result.matches.length);
  for (const match of result.matches) {
    console.log(`- ${match.description}`);
    console.log(`  Time: ${match.item.metadata?.videoTime}s`);
  }
}

Using Images

import { QueryVideo } from 'video-query';
import fs from 'fs';

const query = new QueryVideo({
  mosaic: {
    columns: 3,
    size: { width: 1280, height: 720 },  // Total mosaic size
  },
  model: {
    sdk: openai,
  },
});

// Add single image (Buffer)
query.addImage(fs.readFileSync('image1.png'), { source: 'camera-1' });

// Add Base64 image
const base64Image = 'data:image/png;base64,iVBORw0KGgo...';
query.addImage(base64Image, { source: 'camera-2' });

// Add multiple images
query.addImages([
  { data: fs.readFileSync('image2.png'), metadata: { source: 'camera-3' } },
  { data: 'iVBORw0KGgo...', metadata: { source: 'camera-4' } },  // Pure Base64 (without prefix)
]);

const result = await query.query('Are there any animals in the images?');

Generate Mosaics (for debugging)

import { QueryVideo } from 'video-query';
import fs from 'fs';

const query = new QueryVideo({ /* ... */ });

// Add images
query.addImage(/* ... */);

// Generate mosaics for debugging or saving
const mosaics = await query.generateMosaics();

for (const m of mosaics) {
  fs.writeFileSync(`mosaic-${m.index}.png`, m.buffer);
  console.log(`Saved mosaic ${m.index}: ${m.width}x${m.height}, contains ${m.imageCount} images`);
}

API

QueryVideo

Main class that combines all modules to provide query functionality.

Constructor

new QueryVideo(config: QueryVideoConfig)

QueryVideoConfig:

| Parameter | Type | Description | |-----------|------|-------------| | mosaic | MosaicConfig | Mosaic configuration | | model | ModelAdapterConfig | Model adapter configuration | | systemPromptTemplate | string? | Custom system prompt | | debug | boolean? | Debug mode |

Methods

| Method | Description | |--------|-------------| | addImage(data, metadata?) | Add single image (supports Buffer or Base64 string) | | addImages(items) | Add multiple images (supports Buffer or Base64 string) | | addVideo(config) | Extract frames from video and add them | | query(prompt) | Execute query | | generateMosaics() | Generate mosaics (for debugging or saving) | | getImages() | Get all added images | | getImageCount() | Get image count | | clearImages() | Clear all images | | preloadFFmpeg() | Preload FFmpeg |

MosaicConfig

Mosaic configuration.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | columns | number | - | Number of columns per row | | cellAspectRatio | number? | 1 | Cell aspect ratio (width/height), 1 for square | | size | SizeConfig | - | Size configuration | | labelFontSize | number? | 14 | Label font size (label area auto-calculated as fontSize * 1.5) | | backgroundColor | string? | #000000 | Background color | | labelColor | string? | #FFFFFF | Label color | | gridColor | string? | #333333 | Grid line color | | gridWidth | number? | 1 | Grid line width |

SizeConfig:

// Specify total mosaic size (including label area and grid lines)
// Cell size and row count are auto-calculated based on total size
// Empty slots display as solid black placeholders
{ width: 2048, height: 2048 }

VideoConfig

Video frame extraction configuration.

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | source | Buffer \| string | - | Video data or file path | | startTime | number? | 0 | Start time for extraction (seconds) | | interval | number? | 1 | Extraction interval (seconds) | | duration | number? | - | Extraction duration (seconds) | | sourceId | string? | - | Video source identifier |

QueryResult

Query result.

interface QueryResult {
  success: boolean;          // Whether successful
  matches: MatchedImage[];   // List of matched images
  error?: string;            // Error message
  rawResponse?: string;      // Raw model response
  duration?: number;         // Query duration (milliseconds)
}

interface MatchedImage {
  item: IImageItem;          // Original image item
  confidence?: number;       // Match confidence
  description?: string;      // Description from model
}

ImageData

Data parameter type for addImage() and addImages() methods.

type ImageData = Buffer | string | null;

Buffer: Binary image data
string: Base64 encoded image data, supports two formats:
- With prefix: data:image/png;base64,iVBORw0KGgo...
- Pure Base64: iVBORw0KGgo...
null: Solid black placeholder image

MosaicBuffer

Return type for generateMosaics() method.

interface MosaicBuffer {
  index: number;       // Mosaic index
  buffer: Buffer;      // Image Buffer
  base64: string;      // Base64 encoded image data
  width: number;       // Width
  height: number;      // Height
  imageCount: number;  // Number of images contained
}

Extending Model Adapters

The library includes a built-in OpenAI adapter. You can also extend custom adapters to support other AI providers:

import { BaseModelAdapter, createModelAdapter } from 'video-query';
import type { VisionRequest, VisionResponse, IModelAdapter } from 'video-query';

// Method 1: Extend BaseModelAdapter
class MyCustomAdapter extends BaseModelAdapter {
  async callVision(request: VisionRequest): Promise<VisionResponse> {
    // Implement your AI call logic
    const response = await myAIClient.chat({
      systemPrompt: request.systemPrompt,
      userPrompt: request.userPrompt,
      images: request.images,
    });

    return { content: response.text };
  }
}

// Method 2: Implement IModelAdapter interface
const myAdapter: IModelAdapter = {
  validate: () => true,
  callVision: async (request) => {
    // ...
    return { content: '...' };
  },
};

// Use in QueryVideo
const query = new QueryVideo({
  mosaic: { /* ... */ },
  model: {
    sdk: openai,  // SDK type auto-detected
  },
});

How It Works

Frame Extraction: Use FFmpeg WASM to extract frames from video at specified intervals
Mosaic Generation: Compose multiple images into labeled grid mosaics with row/column annotations (A, B, C... / 1, 2, 3...)
AI Analysis: Send mosaics to vision model, which returns matching positions based on coordinate labels
Result Parsing: Parse AI responses into structured results, mapping back to original images

Requirements

Node.js >= 18.0.0
ES Module support

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

video-query

Features

Installation

Quick Start

Basic Usage

Using Images

Generate Mosaics (for debugging)

API

QueryVideo

Constructor

Methods

MosaicConfig

VideoConfig

QueryResult

ImageData

MosaicBuffer

Extending Model Adapters

How It Works

Requirements

License