chat-about-video

v5.12.8

Published

a month ago

Chat about a video clip using ChatGPT hosted in OpenAI or Azure, or Gemini provided by Google

Downloads

762

0High
0Medium
0Low

james-hu

ChatGPT OpenAI Gemini LLM GPT Video chat AI AWS Azure

chat-about-video

Chat about zero or one or more video clip(s) or audio file(s) using the powerful OpenAI ChatGPT (hosted in OpenAI or Microsoft Azure) or Google Gemini (hosted in Google Could). It provides a standardized interface for interacting with OpenAI ChatGPT (OpenAI or Azure) and Google Gemini,

chat-about-video is a powerful Unified Abstraction Layer designed to accelerate the development of conversational AI applications. It provides a standardized interface for interacting with OpenAI ChatGPT (OpenAI or Azure) and Google Gemini, allowing you to switch between providers with zero or minimal changes to your application logic.

Why use chat-about-video?

Provider Agnostic: Write your code once and swap between ChatGPT and Gemini via configuration. This future-proofs your application against model changes or pricing shifts.
Unified Video Handling: Seamlessly handles the complexities of frame extraction and cloud storage uploading (for ChatGPT) or direct ingestion (for Gemini) through a single API.
Simplified Tool Calling: A standardized way to define and handle tool/function calls across different model providers.
Production Ready: Built-in retries for throttling, server errors, and connectivity issues.

Key features

Switch providers effortlessly: Change from ChatGPT to Gemini (or vice-versa) without rewritten your conversation logic.
Multi-Cloud Support: Supports models hosted in Azure OpenAI, OpenAI, NVIDIA NIM (OpenAI compatible), and Google Cloud.
Flexible Media Input: Extract frames automatically via FFmpeg, supply your own images, or provide audio files.
Rich Conversations: Supports multiple videos, image groups, and audio files in a single chat.
Mandated Output: Force JSON responses with or without schemas.
Resilient: Automatic backoff and retries for 429, 5xx, and network errors.
Usage Tracking: Built-in token usage metadata collection.

Usage

Installation (quick start)

To use chat-about-video in your Node.js application, add it as a dependency along with other necessary packages based on your usage scenario. Below are examples for typical setups:

# ChatGPT on OpenAI or Azure with Azure Blob Storage
npm i chat-about-video openai @ffmpeg-installer/ffmpeg @azure/storage-blob
# Gemini in Google Cloud
npm i chat-about-video @google/generative-ai @ffmpeg-installer/ffmpeg
# ChatGPT on OpenAI or Azure with AWS S3
npm i chat-about-video openai @ffmpeg-installer/ffmpeg @handy-common-utils/aws-utils @aws-sdk/s3-request-presigner @aws-sdk/client-s3

If ffmpeg binary is already available, you don't need to add dependency @ffmpeg-installer/ffmpeg.

Optional dependencies

ChatGPT

To use ChatGPT hosted on OpenAI or Azure:

npm i openai

Gemini

To use Gemini hosted on Google Cloud:

npm i @google/generative-ai

ffmpeg

If you need ffmpeg for extracting video frame images, ensure it is installed. You can use a system package manager or an NPM package:

sudo apt install ffmpeg
# or
npm i @ffmpeg-installer/ffmpeg

Azure Blob Storage

To use Azure Blob Storage for frame images (not needed for Gemini):

npm i @azure/storage-blob

AWS S3

To use AWS S3 for frame images (not needed for Gemini):

npm i @handy-common-utils/aws-utils @aws-sdk/s3-request-presigner @aws-sdk/client-s3

How the video is provided to ChatGPT or Gemini

ChatGPT

chat-about-video supports uploading video frames into cloud storage and making them available to ChatGPT.

Integrate ChatGPT from Microsoft Azure or OpenAI effortlessly.
Utilize ffmpeg integration provided by this package for frame image extraction or opt for a DIY approach.
Store frame images with ease, supporting Azure Blob Storage and AWS S3.
Models hosted in Azure seems to allow less number of images per request than models hosted in OpenAI.

Gemini

chat-about-video supports sending video frames directly to Google's API without requiring cloud storage.

Utilize ffmpeg integration provided by this package for frame image extraction or opt for a DIY approach.
The number of frame images is only limited by the Gemini API in Google Cloud.

Concrete types and low level clients

ChatAboutVideo and Conversation are generic classes. Use them without concrete generic type parameters when you want the flexibility to easily switch between ChatGPT and Gemini.

Otherwise, you may want to use concrete type. Below are some examples:

// cast to a concrete type
const castToChatGpt = chat as ChatAboutVideoWithChatGpt;

// you can also just leave the ChatAboutVideo instance generic, but narrow down the conversation type
const conversationWithGemini = (await chat.startConversation(...)) as ConversationWithGemini;
const conversationWithChatGpt = await (chat as ChatAboutVideoWithChatGpt).startConversation(...);

To access the underlying API wrapper, use the getApi() function on the ChatAboutVideo instance. To get the raw API client, use the getClient() function on the awaited object returned from getApi().

Cleaning up

Intermediate files, such as extracted frame images, can be saved locally or in the cloud. To remove these files when they are no longer needed, remember to call the end() function on the Conversation instance when the conversion finishes.

Switching between configurations

You can define multiple configurations and switch between them using the activeSupportedChatApiOptions function. This is useful when you want to easily switch between different environments (e.g. dev, prod) or different models. Note that nested objects are deeply merged, while arrays are replaced rather than concatenated.

import { activeSupportedChatApiOptions, ChatAboutVideo } from 'chat-about-video';

const options = {
  active: process.env.ACTIVE_CONFIG || 'dev',
  base: {
    storage: {
      azureStorageConnectionString: process.env.AZURE_STORAGE_CONNECTION_STRING!,
    },
  },
  dev: {
    credential: { key: process.env.DEV_KEY! },
    completionOptions: { model: 'gpt-4o' },
  },
  prod: {
    credential: { key: process.env.PROD_KEY! },
    completionOptions: { model: 'gpt-4' },
  },
};

const chat = new ChatAboutVideo(activeSupportedChatApiOptions(options));

Mandating JSON response

JSON response can be guaranteed either with a JSON Schema or without. Below example code works for both ChatGPT and Gemini:

// Without specifying a JSON schema
const explanation = await conversation.say(
  'Explain your answer. The response should be in JSON like this: {"referencedFrames": [1, 5], "why": "Reason for giving this response."}',
  { jsonResponse: true },
);
console.log(chalk.grey("\nAI's Explanation: " + JSON.stringify(JSON.parse(explanation!), null, 2)));

// With a JSON schema
const detailedExplanation = await conversation.say('Explain your answer in detail. The response should be in JSON.', {
  jsonResponse: {
    name: 'DetailedExplanation',
    schema: {
      type: 'object',
      properties: {
        referencedFrames: {
          type: 'array',
          items: { type: 'integer' },
        },
        understandingOfTheQuestion: { type: 'string' },
        reasoningSteps: { type: 'array', items: { type: 'string' } },
      },
      required: ['referencedFrames', 'understandingOfTheQuestion', 'reasoningSteps'],
    },
  },
});
console.log(chalk.grey("\nAI's detailed explanation: " + JSON.stringify(JSON.parse(detailedExplanation!), null, 2)));

Tool Calling (Function Calling)

chat-about-video supports tool calling for both ChatGPT and Gemini. This allows the AI to request information by calling functions you've defined.

1. Define Tools

Pass your tool definitions in the completion options. The structure follows the underlying API (OpenAI or Gemini). You can also use the ChatGPT style structure for Gemini providers, as the package will automatically convert it for Gemini if needed:

const tools = [
  {
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get the current weather',
      parameters: {
        type: 'object',
        properties: {
          location: { type: 'string' },
        },
        required: ['location'],
      },
    },
  },
];

const answer = await conversation.say<ConversationResponse>("What's the weather like in Melbourne?", { tools });

2. Handle Tool Calls

The say and submitToolCallResults methods will return an object containing toolCalls if the AI wants to call tools. You are responsible for executing the tools and submitting the results back.

import { ConversationResponse, ToolCallResult } from 'chat-about-video';

let response = await conversation.say<ConversationResponse>('What is the weather in Melbourne?', { tools });

// Loop to handle potential multiple rounds of tool calling
while (typeof response !== 'string' && response?.toolCalls) {
  if (response.responseText) {
    console.log(`AI: ${response.responseText}`);
  }
  const toolResults: ToolCallResult[] = [];
  for (const call of response.toolCalls) {
    console.log(`AI requests tool: ${call.name}(${JSON.stringify(call.arguments)})`);

    // Execute your tool logic
    const result = await myWeatherFunction(call.arguments.location);

    toolResults.push({
      name: call.name,
      result: { temperature: result.temp, unit: 'C' },
      toolCallId: call.id, // Required for OpenAI
    });
  }
  // Submit results back to the AI
  response = await conversation.submitToolCallResults<ConversationResponse>(toolResults);
}

// Final text response
console.log('AI Answer:', response);

Customisation

Frame extraction

If you would like to customise how frame images are extracted and stored, consider these:

In the options object passed to the constructor of ChatAboutVideo, there's a property extractVideoFrames. This property allows you to customise how frame images are extracted.
- format, interval, limit, width, height - These allows you to specify your expectation on the extraction.
- deleteFilesWhenConversationEnds - This flag allows you to specify whether you want extracted frame images to be deleted from the local file system when the conversation ends, or not.
- framesDirectoryResolver - You can supply a function for determining where extracted frame image files should be stored locally.
- extractor - You can supply a function for doing the extraction.
In the options object passed to the constructor of ChatAboutVideo, there's a property storage. For ChatGPT, storing frame images in the cloud is recommended. You can use this property to customise how frame images are stored in the cloud.
- azureStorageConnectionString - If you would like to use Azure Blob Storage, you need to put the connection string in this property. If this property does not have a value, ChatAboutVideo would assume that you'd like to use AWS S3, and default AWS identity/credential will be picked up from the OS.
- storageContainerName, storagePathPrefix - They allows you to specify where those images should be stored.
- downloadUrlExpirationSeconds - For images stored in the cloud, presigned download URLs with expiration are generated for ChatGPT to access. This property allows you to control the expiration time.
- deleteFilesWhenConversationEnds - This flag allows you to specify whether you want extracted frame images to be deleted from the cloud when the conversation ends, or not.
- uploader - You can supply a function for uploading images into the cloud.

Settings of the underlying model

In the options object passed to the constructor of ChatAboutVideo, there's a property clientSettings, and there's another property completionSettings. Settings of the underlying model can be configured through those two properties.

You can also override settings using the last parameter of startConversation(...) function on ChatAboutVideo, or the last parameter of say(...) function on Conversation.

Code examples

The following integration test files demonstrate various features and providers:

Example 1: Using ChatGPT hosted in OpenAI with Azure Blob Storage

Source: test/integration/chatgpt-openai-azure-storage.ts

// This is a demo utilising ChatGPT hosted in OpenAI.
// Video frame images are uploaded to Azure Blob Storage and then made available to GPT from there.
//
// This script can be executed with a command line like this from the project root directory:
// export OPENAI_API_KEY=...
// export AZURE_STORAGE_CONNECTION_STRING=...
// export OPENAI_MODEL_NAME=...
// export AZURE_STORAGE_CONTAINER_NAME=...
// ENABLE_DEBUG=true DEMO_VIDEO=~/Downloads/test1.mp4 npx ts-node test/integration/chatgpt-openai-azure-storage.ts
//

import { consoleWithColour } from '@handy-common-utils/misc-utils';
import chalk from 'chalk';
import readline from 'node:readline';

import { ChatAboutVideo, ConversationWithChatGpt } from '../src';

async function demo() {
  const chat = new ChatAboutVideo(
    {
      credential: {
        key: process.env.OPENAI_API_KEY!,
      },
      storage: {
        azureStorageConnectionString: process.env.AZURE_STORAGE_CONNECTION_STRING!,
        storageContainerName: process.env.AZURE_STORAGE_CONTAINER_NAME || 'vision-experiment-input',
        storagePathPrefix: 'video-frames/',
      },
      completionOptions: {
        // model is required by OpenAI
        model: process.env.OPENAI_MODEL_NAME || 'gpt-4o', // 'gpt-4-vision-preview', // or gpt-4o
      },
      extractVideoFrames: {
        limit: 100,
        interval: 2,
      },
    },
    consoleWithColour({ debug: process.env.ENABLE_DEBUG === 'true' }, chalk),
  );

  const conversation = (await chat.startConversation(process.env.DEMO_VIDEO!)) as ConversationWithChatGpt;

  const rl = readline.createInterface({ input: process.stdin, output: process.stdout });
  const prompt = (question: string) => new Promise<string>((resolve) => rl.question(question, resolve));
  while (true) {
    const question = await prompt(chalk.red('\nUser: '));
    if (!question) {
      continue;
    }
    if (['exit', 'quit', 'q', 'end'].includes(question)) {
      await conversation.end();
      break;
    }
    const answer = await conversation.say(question, { max_tokens: 2000 });
    console.log(chalk.blue('\nAI:' + answer));
  }
  console.log('Demo finished');
  rl.close();
}

demo().catch((error) => console.log(chalk.red(JSON.stringify(error, null, 2))));

Example 2: Multiple videos using ChatGPT hosted in OpenAI with Azure Blob Storage

Source: test/integration/chatgpt-openai-azure-storage-multi-video.ts

async function demo() {

  ...

  const conversation = (await chat.startConversation([
    { videoFile: process.env.DEMO_VIDEO_1!, promptText: 'This is the first video:' },
    { videoFile: process.env.DEMO_VIDEO_2!, promptText: 'This is the second video:' },
    { videoFile: process.env.DEMO_VIDEO_1!, promptText: 'This is the third video:' },
  ])) as ConversationWithChatGpt;

  ...

}

Example 3: Using ChatGPT hosted in Azure with Azure Blob Storage

Source: test/integration/chatgpt-azure-azure-storage-json.ts

// This is a demo utilising ChatGPT hosted in Azure.
// Video frame images are uploaded to Azure Blob Storage and then made available to GPT from there.
//
// This script can be executed with a command line like this from the project root directory:
// export AZURE_OPENAI_API_ENDPOINT=..
// export AZURE_OPENAI_API_KEY=...
// export AZURE_OPENAI_DEPLOYMENT_NAME=...
// export AZURE_STORAGE_CONNECTION_STRING=...
// export AZURE_STORAGE_CONTAINER_NAME=...
// ENABLE_DEBUG=true DEMO_VIDEO=~/Downloads/test1.mp4 npx ts-node test/integration/chatgpt-azure-azure-storage-json.ts

import { consoleWithColour } from '@handy-common-utils/misc-utils';
import chalk from 'chalk';
import readline from 'node:readline';

import { ChatAboutVideo, ConversationWithChatGpt } from '../src';

async function demo() {
  const chat = new ChatAboutVideo(
    {
      endpoint: process.env.AZURE_OPENAI_API_ENDPOINT!,
      credential: {
        key: process.env.AZURE_OPENAI_API_KEY!,
      },
      storage: {
        azureStorageConnectionString: process.env.AZURE_STORAGE_CONNECTION_STRING!,
        storageContainerName: process.env.AZURE_STORAGE_CONTAINER_NAME || 'vision-experiment-input',
        storagePathPrefix: 'video-frames/',
      },
      clientSettings: {
        // deployment is required by Azure
        deployment: process.env.AZURE_OPENAI_DEPLOYMENT_NAME || 'gpt4vision',
        // apiVersion is required by Azure
        apiVersion: '2024-10-21',
      },
    },
    consoleWithColour({ debug: process.env.ENABLE_DEBUG === 'true' }, chalk),
  );

  const conversation = (await chat.startConversation(process.env.DEMO_VIDEO!)) as ConversationWithChatGpt;

  const rl = readline.createInterface({ input: process.stdin, output: process.stdout });
  const prompt = (question: string) => new Promise<string>((resolve) => rl.question(question, resolve));
  while (true) {
    const question = await prompt(chalk.red('\nUser: '));
    if (!question) {
      continue;
    }
    if (['exit', 'quit', 'q', 'end'].includes(question)) {
      await conversation.end();
      break;
    }
    const answer = await conversation.say(question, { max_tokens: 2000 });
    console.log(chalk.blue('\nAI:' + answer));
  }
  console.log('Demo finished');
  rl.close();
}

demo().catch((error) => console.log(chalk.red(JSON.stringify(error, null, 2))));

Example 4: Using Gemini hosted in Google Cloud

Source: test/integration/gemini-json.ts

// This is a demo utilising Google Gemini through Google Generative Language API.
// Google Gemini allows many frame images to be supplied because of its huge context length.
// Video frame images are sent through Google Generative Language API directly.
//
// This script can be executed with a command line like this from the project root directory:
// export GEMINI_API_KEY=...
// ENABLE_DEBUG=true DEMO_VIDEO=~/Downloads/test1.mp4 npx ts-node test/integration/gemini-json.ts

import { consoleWithColour } from '@handy-common-utils/misc-utils';
import chalk from 'chalk';
import readline from 'node:readline';

import { HarmBlockThreshold, HarmCategory } from '@google/generative-ai';

import { ChatAboutVideo, ConversationWithGemini } from '../src';

async function demo() {
  const chat = new ChatAboutVideo(
    {
      credential: {
        key: process.env.GEMINI_API_KEY!,
      },
      clientSettings: {
        modelParams: {
          model: 'gemini-2.5-flash',
        },
      },
      extractVideoFrames: {
        limit: 100,
        interval: 0.5,
      },
      completionOptions: {
        safetySettings: [
          {
            category: 'HARM_CATEGORY_HATE_SPEECH' as any,
            threshold: 'BLOCK_NONE' as any,
          },
        ],
      },
    },
    consoleWithColour({ debug: process.env.ENABLE_DEBUG === 'true' }, chalk),
  );

  const conversation = (await chat.startConversation(process.env.DEMO_VIDEO!)) as ConversationWithGemini;

  const rl = readline.createInterface({ input: process.stdin, output: process.stdout });
  const prompt = (question: string) => new Promise<string>((resolve) => rl.question(question, resolve));
  while (true) {
    const question = await prompt(chalk.red('\nUser: '));
    if (!question) {
      continue;
    }
    if (['exit', 'quit', 'q', 'end'].includes(question)) {
      await conversation.end();
      break;
    }
    const answer = await conversation.say(question, {
      safetySettings: [{ category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT, threshold: HarmBlockThreshold.BLOCK_NONE }],
    });
    console.log(chalk.blue('\nAI:' + answer));
  }
  console.log('Demo finished');
  rl.close();
}

demo().catch((error) => console.log(chalk.red(JSON.stringify(error, null, 2)), error));

Example 5: Multiple groups of extracted frame images using ChatGPT hosted in Azure with Azure Blob Storage

Source: test/integration/chatgpt-manual-frames.ts

async function demo() {
  const tmpDir = os.tmpdir();
  const video1 = process.env.DEMO_VIDEO_1!;
  const video2 = process.env.DEMO_VIDEO_2!;
  const outputDir1 = path.join(tmpDir, 'video1-frames');
  const outputDir2 = path.join(tmpDir, 'video2-frames');

  console.log(chalk.green('Extracting frames from the first video...'));
  const { relativePaths: frames1, cleanup: cleanupFrames1 } = await extractVideoFramesWithFfmpeg(video1, outputDir1, 1, 'jpg', 200);

  console.log(chalk.green('Extracting frames from the second video...'));
  const { relativePaths: frames2, cleanup: cleanupFrames2 } = await extractVideoFramesWithFfmpeg(video2, outputDir2, 3, 'jpg', 200);

  const chat = new ChatAboutVideo(
    {
      credential: {
        key: process.env.OPENAI_API_KEY!,
      },
      storage: {
        azureStorageConnectionString: process.env.AZURE_STORAGE_CONNECTION_STRING!,
        storageContainerName: process.env.AZURE_STORAGE_CONTAINER_NAME || 'vision-experiment-input',
        storagePathPrefix: 'video-frames/',
      },
      completionOptions: {
        model: process.env.OPENAI_MODEL_NAME || 'gpt-4o',
      },
    },
    consoleWithColour({ debug: process.env.ENABLE_DEBUG === 'true' }, chalk),
  );

  const conversation = (await chat.startConversation([
    {
      promptText: 'Frame images from sample 1:',
      images: frames1.map((frame, i) => ({ imageFile: path.join(outputDir1, frame), promptText: `Frame CodeRed-${i + 1}` })),
    },
    {
      promptText: 'Frame images from sample 2, also known as the "good example":',
      images: frames2.map((frame) => ({ imageFile: path.join(outputDir2, frame) })),
    },
  ])) as ConversationWithChatGpt;

  ...

}

Example 6: Using NVIDIA NIM (OpenAI-compatible)

Source: test/integration/nvidia-nim-tools.ts

// This is a demo utilizing NVIDIA NIM via its OpenAI-compatible API.
//
// This script can be executed with a command line like this from the project root directory:
// export NVIDIA_NIM_API_KEY=...
// ENABLE_DEBUG=true npx ts-node test/integration/nvidia-nim-tools.ts

import { consoleWithColour, consoleWithoutColour } from '@handy-common-utils/misc-utils';
import chalk from 'chalk';
import readline from 'node:readline';

import { ChatAboutVideo, ConversationWithChatGpt, ToolCallResult } from '../src';

async function demo() {
  const chat = new ChatAboutVideo(
    {
      endpoint: process.env.NVIDIA_NIM_API_ENDPOINT || 'https://integrate.api.nvidia.com/v1',
      credential: {
        key: process.env.NVIDIA_NIM_API_KEY!,
      },
      completionOptions: {
        model: process.env.NVIDIA_NIM_MODEL || 'qwen/qwen3.5-397b-a17b',
      },
    },
    consoleWithColour({ debug: process.env.ENABLE_DEBUG === 'true' }, chalk),
  );

  const conversation = (await chat.startConversation(consoleWithoutColour({ debug: false, quiet: false }))) as ConversationWithChatGpt;

  const tools: any[] = [
    {
      type: 'function',
      function: {
        name: 'get_current_time',
        description: 'Get the current local time',
        parameters: {
          type: 'object',
          properties: {},
        },
      },
    },
  ];

  // ... handling tool calls as shown in other examples ...
}

Example 7: Using audio files with Gemini

Source: test/integration/gemini-audio.ts

import { consoleWithColour } from '@handy-common-utils/misc-utils';
import chalk from 'chalk';
import path from 'node:path';
import readline from 'node:readline';

import { ChatAboutVideo, ConversationWithGemini } from '../src';

const sampleAudioFile = path.resolve(__dirname, '../sample-media-files/engine-start.h264.aac.mp4'); // Or a real audio file like an mp3

async function demo() {
  const chat = new ChatAboutVideo(
    {
      credential: {
        key: process.env.GEMINI_API_KEY!,
      },
      clientSettings: {
        modelParams: {
          model: 'gemini-2.5-flash',
        },
      },
    },
    consoleWithColour({ debug: process.env.ENABLE_DEBUG === 'true' }, chalk),
  );

  const conversation = (await chat.startConversation([{ audioFile: sampleAudioFile }])) as ConversationWithGemini;

  const rl = readline.createInterface({ input: process.stdin, output: process.stdout });
  const prompt = (question: string) => new Promise<string>((resolve) => rl.question(question, resolve));

  while (true) {
    const question = await prompt(chalk.red('\nUser: '));
    if (!question) continue;
    if (['exit', 'quit', 'q', 'end'].includes(question)) {
      await conversation.end();
      break;
    }
    const answer = await conversation.say(question);
    console.log(chalk.blue('\nAI: ' + answer));
  }
  rl.close();
}

demo().catch((error) => console.log(chalk.red(JSON.stringify(error, null, 2))));

API

chat-about-video

Modules

Classes

Class: ChatAboutVideo<CLIENT, OPTIONS, PROMPT, RESPONSE>

chat.ChatAboutVideo

Type parameters

| Name | Type | | :--------- | :--------------------------------------------------------------------------------------------- | | CLIENT | any | | OPTIONS | extends AdditionalCompletionOptions = any | | PROMPT | any | | RESPONSE | any |

Constructors

constructor

• new ChatAboutVideo<CLIENT, OPTIONS, PROMPT, RESPONSE>(options, log?)

Type parameters

Parameters

| Name | Type | | :-------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | options | SupportedChatApiOptions | | log | undefined | LineLogger<(message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void> |

Properties

| Property | Description | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------- | | Protected apiPromise: Promise<ChatApi<CLIENT, OPTIONS, PROMPT, RESPONSE>> | | | Protected log: undefined | LineLogger<(message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void> | | | Protected options: SupportedChatApiOptions | |

Methods

getApi

▸ getApi(): Promise<ChatApi<CLIENT, OPTIONS, PROMPT, RESPONSE>>

Get the underlying API instance.

Returns

Promise<ChatApi<CLIENT, OPTIONS, PROMPT, RESPONSE>>

The underlying API instance.

startConversation

▸ startConversation(log?): Promise<Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>>

Start a conversation without a video

Parameters

| Name | Type | Description | | :----- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------- | | log? | LineLogger<(message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void> | Optional logger for this conversation, if not provided, the logger of ChatAboutVideo instance will be used. |

Returns

Promise<Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>>

The conversation.

▸ startConversation(options?, log?): Promise<Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>>

Start a conversation without a video

Parameters

Returns

Promise<Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>>

The conversation.

▸ startConversation(videoFile, log?): Promise<Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>>

Start a conversation about a video.

Parameters

| Name | Type | Description | | :---------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------- | | videoFile | string | Path to a video file in local file system. | | log? | LineLogger<(message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void> | Optional logger for this conversation, if not provided, the logger of ChatAboutVideo instance will be used. |

Returns

Promise<Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>>

The conversation.

▸ startConversation(videoFile, options?, log?): Promise<Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>>

Start a conversation about a video.

Parameters

| Name | Type | Description | | :---------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------- | | videoFile | string | Path to a video file in local file system. | | options? | OPTIONS | Overriding options for this conversation | | log? | LineLogger<(message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void> | Optional logger for this conversation, if not provided, the logger of ChatAboutVideo instance will be used. |

Returns

Promise<Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>>

The conversation.

▸ startConversation(videos, log?): Promise<Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>>

Start a conversation about a video.

Parameters

| Name | Type | Description | | :------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | videos | (VideoInput | ImagesInput | AudioInput)[] | Array of videos, images, or audios to be used in the conversation. For each video/audio, the file path and the prompt before it should be provided. For each group of images, the image file paths and the prompt before the image group should be provided. | | log? | LineLogger<(message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void> | Optional logger for this conversation, if not provided, the logger of ChatAboutVideo instance will be used. |

Returns

Promise<Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>>

The conversation.

▸ startConversation(videos, options?, log?): Promise<Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>>

Start a conversation about a video.

Parameters

| Name | Type | Description | | :--------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | videos | (VideoInput | ImagesInput | AudioInput)[] | Array of videos, images, or audios to be used in the conversation. For each video/audio, the file path and the prompt before it should be provided. For each group of images, the image file paths and the prompt before the image group should be provided. | | options? | OPTIONS | Overriding options for this conversation | | log? | LineLogger<(message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void> | Optional logger for this conversation, if not provided, the logger of ChatAboutVideo instance will be used. |

Returns

Promise<Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>>

The conversation.

Class: Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>

chat.Conversation

Type parameters

Constructors

constructor

• new Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>(conversationId, api, prompt, options, cleanup?, log?)

Type parameters

Parameters

Properties

| Property | Description | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------- | | Protected api: ChatApi<CLIENT, OPTIONS, PROMPT, RESPONSE> | | | Protected Optional cleanup: () => Promise<any> | | | Protected conversationId: string | | | Protected log: undefined | LineLogger<(message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void, (message?: any, ...optionalParams: any[]) => void> | | | Protected options: OPTIONS | | | Protected prompt: undefined | PROMPT | | | Protected usage: undefined | UsageMetadata | |

Methods

end

▸ end(): Promise<void>

Returns

Promise<void>

getApi

▸ getApi(): ChatApi<CLIENT, OPTIONS, PROMPT, RESPONSE>

Get the underlying API instance.

Returns

ChatApi<CLIENT, OPTIONS, PROMPT, RESPONSE>

The underlying API instance.

getPrompt

▸ getPrompt(): undefined | PROMPT

Get the prompt for the current conversation. The prompt is the accumulated messages in the conversation so far.

Returns

undefined | PROMPT

The prompt which is the accumulated messages in the conversation so far.

getUsage

▸ getUsage(): undefined | UsageMetadata

Get usage statistics of the conversation. Please note that the usage statistics would be undefined before the first say call. It could also be undefined if the underlying API does not support usage statistics. The usage statistics may not cover those failed requests due to content filtering or other reasons. Therefore, it could be less than the billable usage.

Returns

undefined | UsageMetadata

The usage statistics of the conversation. Or undefined if not available.

progressConversation

▸ Protected progressConversation(updatedPrompt, effectiveOptions): Promise<undefined | string | ConversationResponse>

Parameters

Returns

Promise<undefined | string | ConversationResponse>

say

▸ say<RT>(message, options?): Promise<RT>

Say something in the conversation, and get the response from AI

Type parameters

| Name | Type | Description | | :--- | :---------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | RT | extends string | ConversationResponse = string | The type of the response. It can be a string | undefined, or ConversationResponse, or the combination of them. You need to choose the correct type based on whether tool call could be returned. |

Parameters

Returns

Promise<RT>

The response text if there's no tool call, or a ConversationResponse object if there's tool call.

submitToolCallResults

▸ submitToolCallResults<RT>(toolResults, options?): Promise<RT>

Submit tool call results to the conversation, and get the response from AI.

Type parameters

| Name | Type | Description | | :--- | :---------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | RT | extends string | ConversationResponse = string | The type of the response. It can be a string or ConversationResponse, or the combination of them. You need to choose the correct type based on whether tool call could be returned. |

Parameters

Returns

Promise<RT>

The response text if there's no further tool call, or a ConversationResponse object if there's further tool call.

▸ submitToolCallResults<RT>(toolResults, additionalMessage?, options?): Promise<RT>

Submit tool call results to the conversation, and get the response from AI.

Type parameters

| Name | Type | Description | | :--- | :---------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | RT | extends string | ConversationResponse = string | The type of the response. It can be a string or ConversationResponse, or the combination of them. You need to choose the correct type based on whether tool call could be returned. |

Parameters

Returns

Promise<RT>

The response text if there's no further tool call, or a ConversationResponse object if there's further tool call.

Class: ChatGptApi

chat-gpt.ChatGptApi

Implements

ChatApi<ChatGptClient, ChatGptCompletionOptions, any[], ChatGptResponse>

Constructors

constructor

• new ChatGptApi(options)

Parameters

Properties

| Property | Description | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | | Protected client: ChatGptClient | | | Protected Optional extractVideoFrames: EffectiveExtractVideoFramesOptions | | | Protected options: ChatGptOptions | | | Protected Optional storage: Required<Pick<StorageOptions, "uploader">> & StorageOptions | | | Protected tmpDir: string | |

Methods

appendToPrompt

▸ appendToPrompt(newPromptOrResponse, prompt?): Promise<ChatCompletionMessageParam[]>

Append a new prompt or response to the form a full prompt. This function is useful to build a prompt that contains conversation history.

Parameters

| Name | Type | Description | | :-------------------- | :------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | newPromptOrResponse | ChatCompletionMessageParam[] | ChatCompletion | A new prompt to be appended, or previous response to be appended. | | prompt? | ChatCompletionMessageParam[] | The conversation history which is a prompt containing previous prompts and responses. If it is not provided, the conversation history returned will contain only what is in newPromptOrResponse. |

Returns

Promise<ChatCompletionMessageParam[]>

The full prompt which is effectively the conversation history.

Implementation of

ChatApi.appendToPrompt

buildAudioPrompt

▸ buildAudioPrompt(audioFile, _conversationId?): Promise<BuildPromptOutput<ChatCompletionMessageParam[], ChatGptCompletionOptions>>

Build prompt for sending audio content to AI. Sometimes, to include audio in the conversation, additional options and/or clean up is needed. In such case, options to be passed to generateContent function and/or a clean up callback function can be returned from this function.

Parameters

Returns

Promise<BuildPromptOutput<ChatCompletionMessageParam[], ChatGptCompletionOptions>>

An object containing the prompt, optional options, and an optional cleanup function.

Implementation of

ChatApi.buildAudioPrompt

buildImagesPrompt

▸ buildImagesPrompt(imageInputs, conversationId?): Promise<BuildPromptOutput<ChatCompletionMessageParam[], ChatGptCompletionOptions>>

Build prompt for sending images content to AI. Sometimes, to include images in the conversation, additional options and/or clean up is needed. In such case, options to be passed to generateContent function and/or a clean up callback function can be returned from this function.

Parameters

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

chat-about-video

Why use chat-about-video?

Key features

Usage

Installation (quick start)

Optional dependencies

How the video is provided to ChatGPT or Gemini

ChatGPT

Gemini

Concrete types and low level clients

Cleaning up

Switching between configurations

Mandating JSON response

Tool Calling (Function Calling)

1. Define Tools

2. Handle Tool Calls

Customisation

Frame extraction

Settings of the underlying model

Code examples

Example 1: Using ChatGPT hosted in OpenAI with Azure Blob Storage

Example 2: Multiple videos using ChatGPT hosted in OpenAI with Azure Blob Storage

Example 3: Using ChatGPT hosted in Azure with Azure Blob Storage

Example 4: Using Gemini hosted in Google Cloud

Example 5: Multiple groups of extracted frame images using ChatGPT hosted in Azure with Azure Blob Storage

Example 6: Using NVIDIA NIM (OpenAI-compatible)

Example 7: Using audio files with Gemini

API

chat-about-video

Modules

Classes

Class: ChatAboutVideo<CLIENT, OPTIONS, PROMPT, RESPONSE>

Type parameters

Constructors

constructor

Type parameters

Parameters

Properties

Methods

getApi

Returns

startConversation

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Class: Conversation<CLIENT, OPTIONS, PROMPT, RESPONSE>

Type parameters

Constructors

constructor

Type parameters

Parameters

Properties

Methods

end

Returns

getApi

Returns

getPrompt

Returns

getUsage

Returns

progressConversation

Parameters

Returns

say