clipify-web-transcoder

v1.0.0

Published

18 days ago

A TypeScript module for streaming video frames, editing them in real-time, and outputting a new video with WebCodecs

0High
0Medium
0Low

crbuch

webcodecs video video-processing video-editing frame-processing streaming typescript mp4 webm vp9

clipify-web-transcoder

A TypeScript library for streaming video transcoding with ONNX tensor processing. Built on top of mediabunny for high-performance WebCodecs-based video processing.

Features

Streaming transcoder: Decode → process → encode loop with constant memory usage
ONNX tensor integration: Input frames are converted to [1, 3, H, W] RGB tensors for direct use with ONNX models
Transparent video output: Outputs WebM with VP9 codec and alpha channel support
Memory efficient: Online processing keeps memory usage constant regardless of video length
TypeScript first: Full type definitions included

Installation

npm install clipify-web-transcoder

Quick Start

import { VideoTranscoder } from 'clipify-web-transcoder';

// Open a video file
const video = await VideoTranscoder.open(videoFile, { dtype: 'float32' });

// Access video metadata
const { width, height, frameRate, frameCount } = video.metadata;
console.log(`Video: ${width}x${height} @ ${frameRate}fps, ${frameCount} frames`);

// Set up frame processing
video.processFrame(async (frame) => {
    // frame.tensor: [1, 3, H, W] RGB tensor (normalized 0-1)
    // frame.index: current frame number (0-based)
    // frame.time: presentation time in seconds
    // frame.width, frame.height: frame dimensions
    
    // Your processing here - e.g., run through an ONNX model
    const results = await session.run({ src: frame.tensor });
    
    // Return foreground and alpha tensors
    return { foreground: results.fgr, alpha: results.pha };
});

// Optional: track progress
video.onProgress((progress, stage) => {
    console.log(`${stage}: ${Math.round(progress * 100)}%`);
});

// Run the transcoding pipeline
const outputBuffer = await video.run();

// Create downloadable blob
const blob = new Blob([outputBuffer], { type: 'video/webm' });

// Release resources
video.close();

Why WebM with Alpha?

This library outputs WebM with VP9 codec because it's currently the only web-compatible format that supports transparent video. This enables use cases like:

Background removal with AI models (like RobustVideoMatting)
Video compositing in web applications
Green screen replacement without pre-keying
Overlays for video editing

API Reference

`VideoTranscoder.open(source, options?)`

Static factory method to create a transcoder instance.

Parameters:

| Parameter | Type | Description | |-----------|------|-------------| | source | VideoSourceData | Video source (File, Blob, ArrayBuffer, or URL string) | | options | TranscoderOptions | Optional configuration | | options.dtype | TensorDataType | Tensor data type: 'float32' (default) or 'float16' | | options.outputBitrate | number | Output video bitrate in bps (default: 10000000) |

Returns: Promise<VideoTranscoder>

// From file input
const video = await VideoTranscoder.open(fileInput.files[0]);

// From URL with options
const video = await VideoTranscoder.open('video.mp4', { 
    dtype: 'float32',
    outputBitrate: 20_000_000 
});

`VideoTranscoder` Instance

`metadata`

Read-only property containing video metadata. Available immediately after opening.

Type: VideoMetadata

const { width, height, frameRate, duration, frameCount, hasAudio } = video.metadata;

`processFrame(handler)`

Set the frame processing callback. Must be called before run().

Parameters:

| Parameter | Type | Description | |-----------|------|-------------| | handler | FrameHandler | Function called for each frame |

Returns: VideoTranscoder (for chaining)

video.processFrame(async (frame) => {
    // Process the frame...
    return { foreground: fgrTensor, alpha: alphaTensor };
});

`onProgress(callback)`

Set the callback for progress updates during processing.

Parameters:

| Parameter | Type | Description | |-----------|------|-------------| | callback | ProgressCallback | Function called with progress updates |

Returns: VideoTranscoder (for chaining)

video.onProgress((progress, stage) => {
    // progress: 0-1
    // stage: 'decoding' or 'encoding'
    console.log(`${stage}: ${Math.round(progress * 100)}%`);
});

`run(signal?)`

Start the transcode pipeline and process all frames.

Parameters:

| Parameter | Type | Description | |-----------|------|-------------| | signal | AbortSignal | Optional signal for cancellation |

Returns: Promise<ArrayBuffer> - The processed video (WebM format with VP9 codec)

// Basic usage
const output = await video.run();

// With abort support
const controller = new AbortController();
setTimeout(() => controller.abort(), 30000); // 30 second timeout
const output = await video.run(controller.signal);

`abort()`

Abort the current processing operation.

video.abort();

`close()`

Release all resources. Call this when done with the transcoder.

video.close();

Types

`TensorDataType`

type TensorDataType = 'float32' | 'float16';

| Type | Description | |------|-------------| | 'float32' | 32-bit float tensors (wider compatibility) | | 'float16' | 16-bit float tensors (requires browser support, better performance) |

`VideoFrame`

The object passed to the frame handler.

interface VideoFrame {
    /** Frame index (0-based) */
    index: number;
    /** Presentation time in seconds */
    time: number;
    /** Input tensor with shape [1, 3, height, width] (RGB, normalized 0-1) */
    tensor: Tensor;
    /** Frame width in pixels */
    width: number;
    /** Frame height in pixels */
    height: number;
}

`FrameResult`

The object returned from the frame handler.

interface FrameResult {
    /** Foreground tensor with shape [1, 3, height, width] (RGB, 0-1) */
    foreground: Tensor;
    /** Alpha tensor with shape [1, 1, height, width] (0-1) */
    alpha: Tensor;
}

`FrameHandler`

type FrameHandler = (frame: VideoFrame) => FrameResult | Promise<FrameResult>;

`ProgressCallback`

type ProgressCallback = (progress: number, stage: 'decoding' | 'encoding') => void;

`VideoMetadata`

interface VideoMetadata {
    readonly width: number;
    readonly height: number;
    readonly frameRate: number;
    readonly duration: number;
    readonly hasAudio: boolean;
    readonly frameCount: number;
}

`TranscoderOptions`

interface TranscoderOptions {
    /** Tensor data type (default: 'float32') */
    dtype?: TensorDataType;
    /** Output video bitrate in bps (default: 10000000) */
    outputBitrate?: number;
}

`VideoSourceData`

type VideoSourceData = string | ArrayBuffer | Blob | File;

Utility Functions

These are exported for advanced use cases:

`rgbaToTensor(rgba, width, height, dataType, reuseBuffer?)`

Convert RGBA Uint8ClampedArray to an ONNX tensor with shape [1, 3, H, W].

import { rgbaToTensor } from 'clipify-web-transcoder';

// Basic usage
const tensor = rgbaToTensor(rgbaData, 1920, 1080, 'float32');

// With buffer reuse for better performance
const buffer = new Float32Array(3 * 1920 * 1080);
const tensor = rgbaToTensor(rgbaData, 1920, 1080, 'float32', buffer);

`tensorsToRgba(foreground, alpha, width, height, reuseBuffer?)`

Convert foreground [1, 3, H, W] and alpha [1, 1, H, W] tensors to RGBA Uint8ClampedArray.

import { tensorsToRgba } from 'clipify-web-transcoder';

// Basic usage
const rgba = tensorsToRgba(fgrTensor, alphaTensor, 1920, 1080);

// With buffer reuse for better performance
const buffer = new Uint8ClampedArray(1920 * 1080 * 4);
const rgba = tensorsToRgba(fgrTensor, alphaTensor, 1920, 1080, buffer);

`isFloat16Supported()`

Check if Float16Array is supported in the current environment.

import { isFloat16Supported } from 'clipify-web-transcoder';
if (isFloat16Supported()) {
    // Safe to use float16 tensors
}

`VideoProcessor` Class

For advanced usage, you can use the VideoProcessor class directly:

import { VideoProcessor } from 'clipify-web-transcoder';

const processor = new VideoProcessor('float32');
await processor.load(videoBlob);

const metadata = processor.getMetadata();

processor.setFrameHandler(async (frame) => {
    const results = await session.run({ src: frame.tensor });
    return { foreground: results.fgr, alpha: results.pha };
});

const output = await processor.run();
processor.close();

Browser Support

This library requires browser support for:

WebCodecs API (Chrome 94+, Edge 94+, Opera 80+)
WebAssembly (for onnxruntime-web)
Float16Array (Chrome 118+, Firefox 129+) - only if using 'float16' data type

Examples

Background Removal with RobustVideoMatting

import { VideoTranscoder } from 'clipify-web-transcoder';
import * as ort from 'onnxruntime-web';

// Load the RVM model
const session = await ort.InferenceSession.create('rvm_mobilenetv3_fp32.onnx');

// Initialize recurrent states (r1-r4) as zeros
let r1 = new ort.Tensor('float32', new Float32Array(1 * 16 * 1 * 1), [1, 16, 1, 1]);
let r2 = new ort.Tensor('float32', new Float32Array(1 * 20 * 1 * 1), [1, 20, 1, 1]);
let r3 = new ort.Tensor('float32', new Float32Array(1 * 40 * 1 * 1), [1, 40, 1, 1]);
let r4 = new ort.Tensor('float32', new Float32Array(1 * 64 * 1 * 1), [1, 64, 1, 1]);

const video = await VideoTranscoder.open(videoFile, { dtype: 'float32' });

video.processFrame(async (frame) => {
    // Run the model with recurrent states
    const results = await session.run({
        src: frame.tensor,
        r1i: r1, r2i: r2, r3i: r3, r4i: r4,
        downsample_ratio: new ort.Tensor('float32', [0.25], [1])
    });
    
    // Update recurrent states for next frame
    r1 = results.r1o;
    r2 = results.r2o;
    r3 = results.r3o;
    r4 = results.r4o;
    
    return { foreground: results.fgr, alpha: results.pha };
});

const outputBuffer = await video.run();
const blob = new Blob([outputBuffer], { type: 'video/webm' });
// Download or use the video with transparent background...

video.close();

Simple Pass-through (No Model)

import { VideoTranscoder } from 'clipify-web-transcoder';
import { Tensor } from 'onnxruntime-web';

const video = await VideoTranscoder.open(videoFile, { dtype: 'float32' });
const { width, height } = video.metadata;

// Pre-allocate alpha buffer ONCE (optimization)
const alphaData = new Float32Array(height * width).fill(1.0);
const alpha = new Tensor('float32', alphaData, [1, 1, height, width]);

video.processFrame((frame) => {
    // Pass through the input as foreground, reuse pre-built alpha
    return { foreground: frame.tensor, alpha };
});

const output = await video.run();
video.close();

Gradient Alpha Mask

import { VideoTranscoder } from 'clipify-web-transcoder';
import { Tensor } from 'onnxruntime-web';

const video = await VideoTranscoder.open(videoFile, { dtype: 'float32' });
const { width, height } = video.metadata;

// Pre-compute gradient alpha ONCE outside the callback (optimization)
const alphaData = new Float32Array(height * width);
for (let y = 0; y < height; y++) {
    for (let x = 0; x < width; x++) {
        alphaData[y * width + x] = x / width; // 0 on left, 1 on right
    }
}
const alpha = new Tensor('float32', alphaData, [1, 1, height, width]);

video.processFrame((frame) => {
    return { foreground: frame.tensor, alpha };
});

const output = await video.run();
video.close();

With Cancellation Support

import { VideoTranscoder } from 'clipify-web-transcoder';

const video = await VideoTranscoder.open(videoFile, { dtype: 'float32' });

video.processFrame((frame) => {
    return { foreground: frame.tensor, alpha: solidAlpha };
});

// Cancel after 10 seconds
const controller = new AbortController();
setTimeout(() => controller.abort(), 10000);

try {
    const output = await video.run(controller.signal);
    // Processing completed
} catch (err) {
    if (err.name === 'AbortError') {
        console.log('Processing was cancelled');
    }
} finally {
    video.close();
}

Performance Tips

For optimal performance, especially with long videos:

1. Pre-allocate buffers outside the callback

// ❌ Bad: Allocates new arrays every frame
video.processFrame((frame) => {
    const alphaData = new Float32Array(height * width); // Allocates ~8MB per frame!
    // ...
    return { foreground, alpha };
});

// ✅ Good: Pre-allocate once, reuse every frame
const alphaData = new Float32Array(height * width);
const alpha = new Tensor('float32', alphaData, [1, 1, height, width]);

video.processFrame((frame) => {
    // Reuse pre-built tensor
    return { foreground: frame.tensor, alpha };
});

2. Reuse input tensor when possible

// If your foreground output equals the input (pass-through), don't copy it:
video.processFrame((frame) => {
    return { foreground: frame.tensor, alpha }; // Zero-copy reuse
});

3. Use TypedArray methods

// ❌ Slow: Nested loops
for (let y = 0; y < height; y++) {
    for (let x = 0; x < width; x++) {
        alphaData[y * width + x] = 1.0;
    }
}

// ✅ Fast: Use fill()
alphaData.fill(1.0);

Memory Usage

The library uses a streaming architecture that processes and encodes frames immediately, keeping memory usage constant regardless of video length:

| Video Length | Memory Usage | |-------------|--------------| | 20 seconds | ~50 MB | | 5 minutes | ~50 MB | | 1 hour | ~50 MB |

This is achieved by:

Online processing: Frames are decoded, processed, and encoded in a streaming loop
Buffer pooling: Internal buffers are reused across frames
Immediate cleanup: Input samples are closed after processing

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

clipify-web-transcoder

Features

Installation

Quick Start

Why WebM with Alpha?

API Reference

VideoTranscoder.open(source, options?)

VideoTranscoder Instance

metadata

processFrame(handler)

onProgress(callback)

run(signal?)

abort()

close()

Types

TensorDataType

VideoFrame

FrameResult

FrameHandler

ProgressCallback

VideoMetadata

TranscoderOptions

VideoSourceData

Utility Functions

rgbaToTensor(rgba, width, height, dataType, reuseBuffer?)

tensorsToRgba(foreground, alpha, width, height, reuseBuffer?)

isFloat16Supported()

VideoProcessor Class

Browser Support

Examples

Background Removal with RobustVideoMatting

Simple Pass-through (No Model)

Gradient Alpha Mask

With Cancellation Support

Performance Tips

1. Pre-allocate buffers outside the callback

2. Reuse input tensor when possible

3. Use TypedArray methods

Memory Usage

License

`VideoTranscoder.open(source, options?)`

`VideoTranscoder` Instance

`metadata`

`processFrame(handler)`

`onProgress(callback)`

`run(signal?)`

`abort()`

`close()`

`TensorDataType`

`VideoFrame`

`FrameResult`

`FrameHandler`

`ProgressCallback`

`VideoMetadata`

`TranscoderOptions`

`VideoSourceData`

`rgbaToTensor(rgba, width, height, dataType, reuseBuffer?)`

`tensorsToRgba(foreground, alpha, width, height, reuseBuffer?)`

`isFloat16Supported()`

`VideoProcessor` Class