clipify-web-transcoder
v1.0.0
Published
A TypeScript module for streaming video frames, editing them in real-time, and outputting a new video with WebCodecs
Maintainers
Readme
clipify-web-transcoder
A TypeScript library for streaming video transcoding with ONNX tensor processing. Built on top of mediabunny for high-performance WebCodecs-based video processing.
Features
- Streaming transcoder: Decode → process → encode loop with constant memory usage
- ONNX tensor integration: Input frames are converted to
[1, 3, H, W]RGB tensors for direct use with ONNX models - Transparent video output: Outputs WebM with VP9 codec and alpha channel support
- Memory efficient: Online processing keeps memory usage constant regardless of video length
- TypeScript first: Full type definitions included
Installation
npm install clipify-web-transcoderQuick Start
import { VideoTranscoder } from 'clipify-web-transcoder';
// Open a video file
const video = await VideoTranscoder.open(videoFile, { dtype: 'float32' });
// Access video metadata
const { width, height, frameRate, frameCount } = video.metadata;
console.log(`Video: ${width}x${height} @ ${frameRate}fps, ${frameCount} frames`);
// Set up frame processing
video.processFrame(async (frame) => {
// frame.tensor: [1, 3, H, W] RGB tensor (normalized 0-1)
// frame.index: current frame number (0-based)
// frame.time: presentation time in seconds
// frame.width, frame.height: frame dimensions
// Your processing here - e.g., run through an ONNX model
const results = await session.run({ src: frame.tensor });
// Return foreground and alpha tensors
return { foreground: results.fgr, alpha: results.pha };
});
// Optional: track progress
video.onProgress((progress, stage) => {
console.log(`${stage}: ${Math.round(progress * 100)}%`);
});
// Run the transcoding pipeline
const outputBuffer = await video.run();
// Create downloadable blob
const blob = new Blob([outputBuffer], { type: 'video/webm' });
// Release resources
video.close();Why WebM with Alpha?
This library outputs WebM with VP9 codec because it's currently the only web-compatible format that supports transparent video. This enables use cases like:
- Background removal with AI models (like RobustVideoMatting)
- Video compositing in web applications
- Green screen replacement without pre-keying
- Overlays for video editing
API Reference
VideoTranscoder.open(source, options?)
Static factory method to create a transcoder instance.
Parameters:
| Parameter | Type | Description |
|-----------|------|-------------|
| source | VideoSourceData | Video source (File, Blob, ArrayBuffer, or URL string) |
| options | TranscoderOptions | Optional configuration |
| options.dtype | TensorDataType | Tensor data type: 'float32' (default) or 'float16' |
| options.outputBitrate | number | Output video bitrate in bps (default: 10000000) |
Returns: Promise<VideoTranscoder>
// From file input
const video = await VideoTranscoder.open(fileInput.files[0]);
// From URL with options
const video = await VideoTranscoder.open('video.mp4', {
dtype: 'float32',
outputBitrate: 20_000_000
});VideoTranscoder Instance
metadata
Read-only property containing video metadata. Available immediately after opening.
Type: VideoMetadata
const { width, height, frameRate, duration, frameCount, hasAudio } = video.metadata;processFrame(handler)
Set the frame processing callback. Must be called before run().
Parameters:
| Parameter | Type | Description |
|-----------|------|-------------|
| handler | FrameHandler | Function called for each frame |
Returns: VideoTranscoder (for chaining)
video.processFrame(async (frame) => {
// Process the frame...
return { foreground: fgrTensor, alpha: alphaTensor };
});onProgress(callback)
Set the callback for progress updates during processing.
Parameters:
| Parameter | Type | Description |
|-----------|------|-------------|
| callback | ProgressCallback | Function called with progress updates |
Returns: VideoTranscoder (for chaining)
video.onProgress((progress, stage) => {
// progress: 0-1
// stage: 'decoding' or 'encoding'
console.log(`${stage}: ${Math.round(progress * 100)}%`);
});run(signal?)
Start the transcode pipeline and process all frames.
Parameters:
| Parameter | Type | Description |
|-----------|------|-------------|
| signal | AbortSignal | Optional signal for cancellation |
Returns: Promise<ArrayBuffer> - The processed video (WebM format with VP9 codec)
// Basic usage
const output = await video.run();
// With abort support
const controller = new AbortController();
setTimeout(() => controller.abort(), 30000); // 30 second timeout
const output = await video.run(controller.signal);abort()
Abort the current processing operation.
video.abort();close()
Release all resources. Call this when done with the transcoder.
video.close();Types
TensorDataType
type TensorDataType = 'float32' | 'float16';| Type | Description |
|------|-------------|
| 'float32' | 32-bit float tensors (wider compatibility) |
| 'float16' | 16-bit float tensors (requires browser support, better performance) |
VideoFrame
The object passed to the frame handler.
interface VideoFrame {
/** Frame index (0-based) */
index: number;
/** Presentation time in seconds */
time: number;
/** Input tensor with shape [1, 3, height, width] (RGB, normalized 0-1) */
tensor: Tensor;
/** Frame width in pixels */
width: number;
/** Frame height in pixels */
height: number;
}FrameResult
The object returned from the frame handler.
interface FrameResult {
/** Foreground tensor with shape [1, 3, height, width] (RGB, 0-1) */
foreground: Tensor;
/** Alpha tensor with shape [1, 1, height, width] (0-1) */
alpha: Tensor;
}FrameHandler
type FrameHandler = (frame: VideoFrame) => FrameResult | Promise<FrameResult>;ProgressCallback
type ProgressCallback = (progress: number, stage: 'decoding' | 'encoding') => void;VideoMetadata
interface VideoMetadata {
readonly width: number;
readonly height: number;
readonly frameRate: number;
readonly duration: number;
readonly hasAudio: boolean;
readonly frameCount: number;
}TranscoderOptions
interface TranscoderOptions {
/** Tensor data type (default: 'float32') */
dtype?: TensorDataType;
/** Output video bitrate in bps (default: 10000000) */
outputBitrate?: number;
}VideoSourceData
type VideoSourceData = string | ArrayBuffer | Blob | File;Utility Functions
These are exported for advanced use cases:
rgbaToTensor(rgba, width, height, dataType, reuseBuffer?)
Convert RGBA Uint8ClampedArray to an ONNX tensor with shape [1, 3, H, W].
import { rgbaToTensor } from 'clipify-web-transcoder';
// Basic usage
const tensor = rgbaToTensor(rgbaData, 1920, 1080, 'float32');
// With buffer reuse for better performance
const buffer = new Float32Array(3 * 1920 * 1080);
const tensor = rgbaToTensor(rgbaData, 1920, 1080, 'float32', buffer);tensorsToRgba(foreground, alpha, width, height, reuseBuffer?)
Convert foreground [1, 3, H, W] and alpha [1, 1, H, W] tensors to RGBA Uint8ClampedArray.
import { tensorsToRgba } from 'clipify-web-transcoder';
// Basic usage
const rgba = tensorsToRgba(fgrTensor, alphaTensor, 1920, 1080);
// With buffer reuse for better performance
const buffer = new Uint8ClampedArray(1920 * 1080 * 4);
const rgba = tensorsToRgba(fgrTensor, alphaTensor, 1920, 1080, buffer);isFloat16Supported()
Check if Float16Array is supported in the current environment.
import { isFloat16Supported } from 'clipify-web-transcoder';
if (isFloat16Supported()) {
// Safe to use float16 tensors
}VideoProcessor Class
For advanced usage, you can use the VideoProcessor class directly:
import { VideoProcessor } from 'clipify-web-transcoder';
const processor = new VideoProcessor('float32');
await processor.load(videoBlob);
const metadata = processor.getMetadata();
processor.setFrameHandler(async (frame) => {
const results = await session.run({ src: frame.tensor });
return { foreground: results.fgr, alpha: results.pha };
});
const output = await processor.run();
processor.close();Browser Support
This library requires browser support for:
- WebCodecs API (Chrome 94+, Edge 94+, Opera 80+)
- WebAssembly (for onnxruntime-web)
- Float16Array (Chrome 118+, Firefox 129+) - only if using
'float16'data type
Examples
Background Removal with RobustVideoMatting
import { VideoTranscoder } from 'clipify-web-transcoder';
import * as ort from 'onnxruntime-web';
// Load the RVM model
const session = await ort.InferenceSession.create('rvm_mobilenetv3_fp32.onnx');
// Initialize recurrent states (r1-r4) as zeros
let r1 = new ort.Tensor('float32', new Float32Array(1 * 16 * 1 * 1), [1, 16, 1, 1]);
let r2 = new ort.Tensor('float32', new Float32Array(1 * 20 * 1 * 1), [1, 20, 1, 1]);
let r3 = new ort.Tensor('float32', new Float32Array(1 * 40 * 1 * 1), [1, 40, 1, 1]);
let r4 = new ort.Tensor('float32', new Float32Array(1 * 64 * 1 * 1), [1, 64, 1, 1]);
const video = await VideoTranscoder.open(videoFile, { dtype: 'float32' });
video.processFrame(async (frame) => {
// Run the model with recurrent states
const results = await session.run({
src: frame.tensor,
r1i: r1, r2i: r2, r3i: r3, r4i: r4,
downsample_ratio: new ort.Tensor('float32', [0.25], [1])
});
// Update recurrent states for next frame
r1 = results.r1o;
r2 = results.r2o;
r3 = results.r3o;
r4 = results.r4o;
return { foreground: results.fgr, alpha: results.pha };
});
const outputBuffer = await video.run();
const blob = new Blob([outputBuffer], { type: 'video/webm' });
// Download or use the video with transparent background...
video.close();Simple Pass-through (No Model)
import { VideoTranscoder } from 'clipify-web-transcoder';
import { Tensor } from 'onnxruntime-web';
const video = await VideoTranscoder.open(videoFile, { dtype: 'float32' });
const { width, height } = video.metadata;
// Pre-allocate alpha buffer ONCE (optimization)
const alphaData = new Float32Array(height * width).fill(1.0);
const alpha = new Tensor('float32', alphaData, [1, 1, height, width]);
video.processFrame((frame) => {
// Pass through the input as foreground, reuse pre-built alpha
return { foreground: frame.tensor, alpha };
});
const output = await video.run();
video.close();Gradient Alpha Mask
import { VideoTranscoder } from 'clipify-web-transcoder';
import { Tensor } from 'onnxruntime-web';
const video = await VideoTranscoder.open(videoFile, { dtype: 'float32' });
const { width, height } = video.metadata;
// Pre-compute gradient alpha ONCE outside the callback (optimization)
const alphaData = new Float32Array(height * width);
for (let y = 0; y < height; y++) {
for (let x = 0; x < width; x++) {
alphaData[y * width + x] = x / width; // 0 on left, 1 on right
}
}
const alpha = new Tensor('float32', alphaData, [1, 1, height, width]);
video.processFrame((frame) => {
return { foreground: frame.tensor, alpha };
});
const output = await video.run();
video.close();With Cancellation Support
import { VideoTranscoder } from 'clipify-web-transcoder';
const video = await VideoTranscoder.open(videoFile, { dtype: 'float32' });
video.processFrame((frame) => {
return { foreground: frame.tensor, alpha: solidAlpha };
});
// Cancel after 10 seconds
const controller = new AbortController();
setTimeout(() => controller.abort(), 10000);
try {
const output = await video.run(controller.signal);
// Processing completed
} catch (err) {
if (err.name === 'AbortError') {
console.log('Processing was cancelled');
}
} finally {
video.close();
}Performance Tips
For optimal performance, especially with long videos:
1. Pre-allocate buffers outside the callback
// ❌ Bad: Allocates new arrays every frame
video.processFrame((frame) => {
const alphaData = new Float32Array(height * width); // Allocates ~8MB per frame!
// ...
return { foreground, alpha };
});
// ✅ Good: Pre-allocate once, reuse every frame
const alphaData = new Float32Array(height * width);
const alpha = new Tensor('float32', alphaData, [1, 1, height, width]);
video.processFrame((frame) => {
// Reuse pre-built tensor
return { foreground: frame.tensor, alpha };
});2. Reuse input tensor when possible
// If your foreground output equals the input (pass-through), don't copy it:
video.processFrame((frame) => {
return { foreground: frame.tensor, alpha }; // Zero-copy reuse
});3. Use TypedArray methods
// ❌ Slow: Nested loops
for (let y = 0; y < height; y++) {
for (let x = 0; x < width; x++) {
alphaData[y * width + x] = 1.0;
}
}
// ✅ Fast: Use fill()
alphaData.fill(1.0);Memory Usage
The library uses a streaming architecture that processes and encodes frames immediately, keeping memory usage constant regardless of video length:
| Video Length | Memory Usage | |-------------|--------------| | 20 seconds | ~50 MB | | 5 minutes | ~50 MB | | 1 hour | ~50 MB |
This is achieved by:
- Online processing: Frames are decoded, processed, and encoded in a streaming loop
- Buffer pooling: Internal buffers are reused across frames
- Immediate cleanup: Input samples are closed after processing
License
MIT
