react-native-vision-utils

v1.2.2

Published

4 months ago

High-performance React Native SDK for image pre-processing for ML/AI pipelines. Native implementations for pixel extraction, tensor conversion, quantization, augmentation, and model-specific preprocessing (YOLO, MobileNet, etc.)

Provides comprehensive tools for pixel data extraction, tensor manipulation, image augmentation, and model-specific preprocessing with native implementations in Swift (iOS) and Kotlin (Android).

📋 Table of Contents

✨ Features

🚀 High Performance: Native implementations in Swift (iOS) and Kotlin (Android)
🔢 Raw Pixel Data: Extract pixel values as typed arrays (Float32, Int32, Uint8) ready for ML inference
🎨 Multiple Color Formats: RGB, RGBA, BGR, BGRA, Grayscale, HSV, HSL, LAB, YUV, YCbCr
📐 Flexible Resizing: Cover, contain, stretch, and letterbox strategies
🔢 ML-Ready Normalization: ImageNet, TensorFlow, custom presets
📊 Multiple Data Layouts: HWC, CHW, NHWC, NCHW (PyTorch/TensorFlow compatible)
📦 Batch Processing: Process multiple images with concurrency control
🖼️ Multiple Sources: URL, file, base64, assets, photo library
🤖 Model Presets: Pre-configured settings for YOLO, MobileNet, EfficientNet, ResNet, ViT, CLIP, SAM, DINO, DETR
🔄 Image Augmentation: Rotation, flip, brightness, contrast, saturation, blur
🎨 Color Jitter: Granular brightness/contrast/saturation/hue control with range support and seeded randomness
✂️ Cutout/Random Erasing: Mask random regions with constant/noise fill for robustness training
📈 Image Analysis: Statistics, metadata, validation, blur detection
🧮 Tensor Operations: Channel extraction, patch extraction, permutation, batch concatenation
🔙 Tensor to Image: Convert processed tensors back to images
🎯 Native Quantization: Float→Int8/Uint8/Int16 with per-tensor and per-channel support (TFLite compatible)
🏷️ Label Database: Built-in labels for COCO, ImageNet, VOC, CIFAR, Places365, ADE20K
📹 Camera Frame Utils: Direct YUV/NV12/BGRA→tensor conversion for vision-camera integration
📦 Bounding Box Utilities: Format conversion (xyxy/xywh/cxcywh), scaling, clipping, IoU, NMS
🖼️ Letterbox Padding: YOLO-style letterbox preprocessing with reverse coordinate transform
🎨 Drawing/Visualization: Draw boxes, keypoints, masks, and heatmaps for debugging
🎬 Video Frame Extraction: Extract frames from videos at timestamps, intervals, or evenly-spaced for temporal ML models
🔲 Grid/Patch Extraction: Extract image patches in grid patterns for sliding window inference
🎲 Random Crop with Seed: Reproducible random crops for data augmentation pipelines
✅ Tensor Validation: Validate tensor shapes, dtypes, and value ranges before inference
📦 Batch Assembly: Combine multiple images into NCHW/NHWC batch tensors

📦 Installation

npm install react-native-vision-utils

For iOS, run:

cd ios && pod install

🚀 Quick Start

Basic Usage

import { getPixelData } from 'react-native-vision-utils';

const result = await getPixelData({
  source: { type: 'url', value: 'https://example.com/image.jpg' },
});

console.log(result.data); // Float array of pixel values
console.log(result.width); // Image width
console.log(result.height); // Image height
console.log(result.shape); // [height, width, channels]

Using Model Presets

import { getPixelData, MODEL_PRESETS } from 'react-native-vision-utils';

// Use pre-configured YOLO settings
const result = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
  ...MODEL_PRESETS.yolov8,
});
// Automatically configured: 640x640, letterbox resize, RGB, scale normalization, NCHW layout

// Or MobileNet
const mobileNetResult = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
  ...MODEL_PRESETS.mobilenet,
});
// Configured: 224x224, cover resize, RGB, ImageNet normalization, NCHW layout

Available Model Presets

| Preset | Size | Resize | Normalization | Layout | | ----------------- | --------- | --------- | ------------- | ------ | | yolo / yolov8 | 640×640 | letterbox | scale | NCHW | | mobilenet | 224×224 | cover | ImageNet | NHWC | | mobilenet_v2 | 224×224 | cover | ImageNet | NHWC | | mobilenet_v3 | 224×224 | cover | ImageNet | NHWC | | efficientnet | 224×224 | cover | ImageNet | NHWC | | resnet | 224×224 | cover | ImageNet | NCHW | | resnet50 | 224×224 | cover | ImageNet | NCHW | | vit | 224×224 | cover | ImageNet | NCHW | | clip | 224×224 | cover | CLIP-specific | NCHW | | sam | 1024×1024 | contain | ImageNet | NCHW | | dino | 224×224 | cover | ImageNet | NCHW | | detr | 800×800 | contain | ImageNet | NCHW |

📖 API Reference

🔧 Core Functions

`getPixelData(options)`

Extract pixel data from a single image.

const result = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
  resize: { width: 224, height: 224, strategy: 'cover' },
  colorFormat: 'rgb',
  normalization: { preset: 'imagenet' },
  dataLayout: 'nchw',
  outputFormat: 'float32Array',
});

Options

| Property | Type | Default | Description | | --------------- | --------------- | --------------------- | ----------------------------- | | source | ImageSource | required | Image source specification | | colorFormat | ColorFormat | 'rgb' | Output color format | | resize | ResizeOptions | - | Resize options | | roi | Roi | - | Region of interest to extract | | normalization | Normalization | { preset: 'scale' } | Normalization settings | | dataLayout | DataLayout | 'hwc' | Data layout format | | outputFormat | OutputFormat | 'array' | Output format |

Result

interface PixelDataResult {
  data: number[] | Float32Array | Uint8Array;
  width: number;
  height: number;
  channels: number;
  colorFormat: ColorFormat;
  dataLayout: DataLayout;
  shape: number[];
  processingTimeMs: number;
}

`batchGetPixelData(optionsArray, batchOptions)`

Process multiple images with concurrency control.

const results = await batchGetPixelData(
  [
    {
      source: { type: 'url', value: 'https://example.com/1.jpg' },
      ...MODEL_PRESETS.mobilenet,
    },
    {
      source: { type: 'url', value: 'https://example.com/2.jpg' },
      ...MODEL_PRESETS.mobilenet,
    },
    {
      source: { type: 'url', value: 'https://example.com/3.jpg' },
      ...MODEL_PRESETS.mobilenet,
    },
  ],
  { concurrency: 4 }
);

console.log(results.totalTimeMs);
results.results.forEach((result, index) => {
  if ('error' in result) {
    console.log(`Image ${index} failed: ${result.message}`);
  } else {
    console.log(`Image ${index}: ${result.width}x${result.height}`);
  }
});

🔍 Image Analysis

`getImageStatistics(source)`

Calculate image statistics for analysis and preprocessing decisions.

import { getImageStatistics } from 'react-native-vision-utils';

const stats = await getImageStatistics({
  type: 'file',
  value: '/path/to/image.jpg',
});

console.log(stats.mean); // [r, g, b] mean values (0-1)
console.log(stats.std); // [r, g, b] standard deviations
console.log(stats.min); // [r, g, b] minimum values
console.log(stats.max); // [r, g, b] maximum values
console.log(stats.histogram); // { r: number[], g: number[], b: number[] }

`getImageMetadata(source)`

Get image metadata without loading full pixel data.

import { getImageMetadata } from 'react-native-vision-utils';

const metadata = await getImageMetadata({
  type: 'file',
  value: '/path/to/image.jpg',
});

console.log(metadata.width); // Image width
console.log(metadata.height); // Image height
console.log(metadata.channels); // Number of channels
console.log(metadata.colorSpace); // Color space (sRGB, etc.)
console.log(metadata.hasAlpha); // Has alpha channel
console.log(metadata.aspectRatio); // Width / height ratio

`validateImage(source, options)`

Validate an image against specified criteria.

import { validateImage } from 'react-native-vision-utils';

const validation = await validateImage(
  { type: 'file', value: '/path/to/image.jpg' },
  {
    minWidth: 224,
    minHeight: 224,
    maxWidth: 4096,
    maxHeight: 4096,
    requiredAspectRatio: 1.0,
    aspectRatioTolerance: 0.1,
  }
);

if (validation.isValid) {
  console.log('Image meets all requirements');
} else {
  console.log('Issues:', validation.issues);
}

`detectBlur(source, options?)`

Detect if an image is blurry using Laplacian variance analysis.

import { detectBlur } from 'react-native-vision-utils';

const result = await detectBlur(
  { type: 'file', value: '/path/to/photo.jpg' },
  {
    threshold: 100, // Higher = stricter blur detection
    downsampleSize: 500, // Downsample for faster processing
  }
);

console.log(result.isBlurry); // true if blurry
console.log(result.score); // Laplacian variance score
console.log(result.threshold); // Threshold used
console.log(result.processingTimeMs); // Processing time

Options:

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | threshold | number | 100 | Variance threshold - images with score below this are considered blurry | | downsampleSize | number | 500 | Max dimension for downsampling (improves performance on large images) |

Result:

| Property | Type | Description | |----------|------|-------------| | isBlurry | boolean | Whether the image is considered blurry | | score | number | Laplacian variance score (higher = sharper) | | threshold | number | The threshold that was used | | processingTimeMs | number | Processing time in milliseconds |

`extractVideoFrames(source, options?)`

Extract frames from video files for video analysis, action recognition, and temporal ML models.

import { extractVideoFrames } from 'react-native-vision-utils';

// Extract 10 evenly-spaced frames
const result = await extractVideoFrames(
  { type: 'file', value: '/path/to/video.mp4' },
  {
    count: 10, // Extract 10 evenly-spaced frames
    resize: { width: 224, height: 224 },
    outputFormat: 'base64', // or 'pixelData' for ML-ready arrays
  }
);

console.log(result.frameCount); // Number of frames extracted
console.log(result.videoDuration); // Video duration in seconds
console.log(result.videoWidth); // Video width
console.log(result.videoHeight); // Video height
console.log(result.frames); // Array of extracted frames

Extraction Modes:

// Mode 1: Specific timestamps
const result = await extractVideoFrames(source, {
  timestamps: [0.5, 1.0, 2.5, 5.0], // Extract at these exact times
});

// Mode 2: Regular intervals
const result = await extractVideoFrames(source, {
  interval: 1.0, // Extract every 1 second
  startTime: 5.0, // Start at 5 seconds
  endTime: 30.0, // End at 30 seconds
  maxFrames: 25, // Limit to 25 frames
});

// Mode 3: Count-based (evenly spaced)
const result = await extractVideoFrames(source, {
  count: 16, // Extract exactly 16 evenly-spaced frames
});

ML-Ready Output:

// Get pixel arrays ready for inference
const result = await extractVideoFrames(
  { type: 'file', value: '/path/to/video.mp4' },
  {
    count: 16,
    resize: { width: 224, height: 224 },
    outputFormat: 'pixelData',
    colorFormat: 'rgb',
    normalization: { preset: 'imagenet' },
  }
);

// Each frame has normalized pixel data ready for models
result.frames.forEach((frame) => {
  const tensor = frame.data; // Float32 array
  // Use with your ML model
});

Options:

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | timestamps | number[] | - | Specific timestamps in seconds to extract frames | | interval | number | - | Extract frames at regular intervals (seconds) | | count | number | - | Number of evenly-spaced frames to extract | | startTime | number | 0 | Start time in seconds (for interval/count modes) | | endTime | number | duration | End time in seconds (for interval/count modes) | | maxFrames | number | 100 | Maximum frames for interval mode | | resize | object | - | Resize frames to { width, height } | | outputFormat | string | 'base64' | 'base64' or 'pixelData' for ML arrays | | quality | number | 90 | JPEG quality for base64 output (0-100) | | colorFormat | string | 'rgb' | Color format for pixelData ('rgb', 'rgba', 'bgr', 'grayscale') | | normalization | object | scale | Normalization preset for pixelData ('scale', 'imagenet', 'tensorflow') |

Note: If no extraction mode is specified (no timestamps, interval, or count), a single frame at t=0 is extracted by default.

Platform Note: The asset source type is only supported on iOS. Android supports file and url sources.

PixelData Note: colorFormat and normalization are applied only when outputFormat === 'pixelData'.

Result:

| Property | Type | Description | |----------|------|-------------| | frames | ExtractedFrame[] | Array of extracted frames | | frameCount | number | Number of frames extracted | | videoDuration | number | Total video duration in seconds | | videoWidth | number | Video width in pixels | | videoHeight | number | Video height in pixels | | frameRate | number | Video frame rate (fps) | | processingTimeMs | number | Processing time in milliseconds |

ExtractedFrame:

| Property | Type | Description | |----------|------|-------------| | timestamp | number | Actual frame timestamp in seconds | | requestedTimestamp | number | Originally requested timestamp | | width | number | Frame width | | height | number | Frame height | | data | string | number[] | Base64 string (when outputFormat='base64') or pixel array (when outputFormat='pixelData') | | channels | number | Number of channels (only present when outputFormat='pixelData') | | error | string | Error message if frame extraction failed (frame may still be present with partial data) |

🎨 Image Augmentation

`applyAugmentations(source, augmentations)`

Apply image augmentations for data augmentation pipelines.

import { applyAugmentations } from 'react-native-vision-utils';

const augmented = await applyAugmentations(
  { type: 'file', value: '/path/to/image.jpg' },
  {
    rotation: 15, // Degrees
    horizontalFlip: true,
    verticalFlip: false,
    brightness: 1.2, // 1.0 = no change
    contrast: 1.1, // 1.0 = no change
    saturation: 0.9, // 1.0 = no change
    blur: { type: 'gaussian', radius: 2 }, // Blur options
  }
);

console.log(augmented.base64); // Base64 encoded result
console.log(augmented.width);
console.log(augmented.height);

`colorJitter(source, options)`

Apply color jitter augmentation with granular control over brightness, contrast, saturation, and hue. More flexible than applyAugmentations - supports ranges for each property with random sampling and optional seed for reproducibility.

import { colorJitter } from 'react-native-vision-utils';

// Basic usage with symmetric ranges
const result = await colorJitter(
  { type: 'file', value: '/path/to/image.jpg' },
  {
    brightness: 0.2,     // Random in [-0.2, +0.2]
    contrast: 0.2,       // Random in [0.8, 1.2]
    saturation: 0.3,     // Random in [0.7, 1.3]
    hue: 0.1,            // Random in [-0.1, +0.1] (fraction of 360°)
  }
);

console.log(result.base64);              // Augmented image
console.log(result.appliedBrightness);   // Actual value applied
console.log(result.appliedContrast);
console.log(result.appliedSaturation);
console.log(result.appliedHue);

// Asymmetric ranges with seed for reproducibility
const reproducible = await colorJitter(
  { type: 'url', value: 'https://example.com/image.jpg' },
  {
    brightness: [-0.1, 0.3],  // More brightening than darkening
    contrast: [0.8, 1.5],     // Allow up to 50% more contrast
    seed: 42,                  // Same seed = same random values
  }
);

Options:

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | brightness | number | [min, max] | 0 | Additive brightness adjustment. Single value creates symmetric range [-v, +v] | | contrast | number | [min, max] | 1 | Contrast multiplier. Single value creates range [max(0, 1-v), 1+v] | | saturation | number | [min, max] | 1 | Saturation multiplier. Single value creates range [max(0, 1-v), 1+v] | | hue | number | [min, max] | 0 | Hue shift as fraction of color wheel (0-1 = 0-360°). Single value creates symmetric range [-v, +v] | | seed | number | random | Seed for reproducible random sampling |

Result:

| Property | Type | Description | |----------|------|-------------| | base64 | string | Augmented image as base64 PNG | | width | number | Output image width | | height | number | Output image height | | appliedBrightness | number | Actual brightness value applied | | appliedContrast | number | Actual contrast value applied | | appliedSaturation | number | Actual saturation value applied | | appliedHue | number | Actual hue shift value applied | | seed | number | Seed used for random generation | | processingTimeMs | number | Processing time in milliseconds |

`cutout(source, options)`

Apply cutout (random erasing) augmentation to mask random rectangular regions. Improves model robustness by forcing it to rely on diverse features.

import { cutout } from 'react-native-vision-utils';

// Basic cutout with black fill
const result = await cutout(
  { type: 'file', value: '/path/to/image.jpg' },
  {
    numCutouts: 1,
    minSize: 0.02,      // 2% of image area min
    maxSize: 0.33,      // 33% of image area max
  }
);

console.log(result.base64);       // Augmented image
console.log(result.numCutouts);   // Number of cutouts applied
console.log(result.regions);      // Details of each cutout

// Multiple cutouts with noise fill
const noisy = await cutout(
  { type: 'url', value: 'https://example.com/image.jpg' },
  {
    numCutouts: 3,
    minSize: 0.01,
    maxSize: 0.1,
    fillMode: 'noise',
    seed: 42,  // Reproducible
  }
);

// Cutout with custom gray fill and probability
const probabilistic = await cutout(
  { type: 'base64', value: base64Data },
  {
    numCutouts: 2,
    fillMode: 'constant',
    fillValue: [128, 128, 128],  // Gray
    probability: 0.5,  // 50% chance of applying
  }
);

Options:

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | numCutouts | number | 1 | Number of cutout regions to apply | | minSize | number | 0.02 | Minimum cutout size as fraction of image area (0-1) | | maxSize | number | 0.33 | Maximum cutout size as fraction of image area (0-1) | | minAspect | number | 0.3 | Minimum aspect ratio (width/height) | | maxAspect | number | 3.3 | Maximum aspect ratio (width/height) | | fillMode | string | 'constant' | Fill mode: 'constant', 'noise', or 'random' | | fillValue | [R,G,B] | [0,0,0] | Fill color for 'constant' mode (0-255) | | probability | number | 1.0 | Probability of applying cutout (0-1) | | seed | number | random | Seed for reproducible cutouts |

Result:

| Property | Type | Description | |----------|------|-------------| | base64 | string | Augmented image as base64 PNG | | width | number | Output image width | | height | number | Output image height | | applied | boolean | Whether cutout was applied (based on probability) | | numCutouts | number | Number of cutout regions applied | | regions | array | Details of each cutout: {x, y, width, height, fill} | | seed | number | Seed used for random generation | | processingTimeMs | number | Processing time in milliseconds |

✂️ Multi-Crop Operations

`fiveCrop(source, options, pixelOptions)`

Extract 5 crops: 4 corners + center. Useful for test-time augmentation.

import { fiveCrop, MODEL_PRESETS } from 'react-native-vision-utils';

const crops = await fiveCrop(
  { type: 'file', value: '/path/to/image.jpg' },
  { width: 224, height: 224 },
  MODEL_PRESETS.mobilenet
);

console.log(crops.cropCount); // 5
crops.results.forEach((result, i) => {
  console.log(`Crop ${i}: ${result.width}x${result.height}`);
});

`tenCrop(source, options, pixelOptions)`

Extract 10 crops: 5 crops + their horizontal flips.

import { tenCrop, MODEL_PRESETS } from 'react-native-vision-utils';

const crops = await tenCrop(
  { type: 'file', value: '/path/to/image.jpg' },
  { width: 224, height: 224 },
  MODEL_PRESETS.mobilenet
);

console.log(crops.cropCount); // 10

`extractGrid(source, gridOptions, pixelOptions?)`

Extract image patches in a grid pattern. Useful for sliding window detection, image tiling, and patch-based models.

import { extractGrid } from 'react-native-vision-utils';

// Extract 3x3 grid of patches
const result = await extractGrid(
  { type: 'file', value: '/path/to/image.jpg' },
  {
    columns: 3,      // Number of columns
    rows: 3,         // Number of rows
    overlap: 0,      // Overlap in pixels (optional)
    overlapPercent: 0.1, // Or overlap as percentage (optional)
    includePartial: false, // Include edge patches smaller than full size
  },
  {
    colorFormat: 'rgb',
    normalization: { preset: 'scale' },
  }
);

console.log(result.patchCount); // 9
console.log(result.patchWidth);  // Width of each patch
console.log(result.patchHeight); // Height of each patch
result.patches.forEach((patch) => {
  console.log(`Patch [${patch.row},${patch.column}] at (${patch.x},${patch.y}): ${patch.width}x${patch.height}`);
  console.log(patch.data); // Pixel data for this patch
});

Options:

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | columns | number | required | Number of columns in the grid | | rows | number | required | Number of rows in the grid | | overlap | number | 0 | Overlap between patches in pixels | | overlapPercent | number | - | Overlap as percentage (0-1), alternative to overlap | | includePartial | boolean | false | Include edge patches that are smaller than full size |

`randomCrop(source, cropOptions, pixelOptions?)`

Extract random crops from an image with optional seed for reproducibility. Essential for data augmentation in training pipelines.

import { randomCrop } from 'react-native-vision-utils';

// Extract 5 random 64x64 crops with a fixed seed
const result = await randomCrop(
  { type: 'file', value: '/path/to/image.jpg' },
  {
    width: 64,
    height: 64,
    count: 5,
    seed: 42, // For reproducibility (optional)
  },
  {
    colorFormat: 'rgb',
    normalization: { preset: 'imagenet' },
  }
);

console.log(result.cropCount); // 5
result.crops.forEach((crop, i) => {
  console.log(`Crop ${i}: position (${crop.x},${crop.y}), size ${crop.width}x${crop.height}`);
  console.log(`Seed: ${crop.seed}`); // Seed used for this specific crop
  console.log(crop.data); // Pixel data for this crop
});

Options:

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | width | number | required | Width of each crop | | height | number | required | Height of each crop | | count | number | 1 | Number of random crops to extract | | seed | number | random | Seed for reproducible random positions |

`validateTensor(data, shape, spec?)`

Validate tensor data against a specification. Pure TypeScript function for checking tensor shapes, dtypes, and value ranges before inference.

import { validateTensor } from 'react-native-vision-utils';

const tensorData = new Float32Array(1 * 3 * 224 * 224);
// ... fill with data

const validation = validateTensor(
  tensorData,
  [1, 3, 224, 224], // Actual shape of your data
  {
    shape: [1, 3, 224, 224], // Expected shape (-1 as wildcard)
    dtype: 'float32',
    minValue: 0,
    maxValue: 1,
  }
);

if (validation.isValid) {
  console.log('Tensor is valid for inference');
} else {
  console.log('Validation issues:', validation.issues);
}

// Also provides statistics
console.log(validation.actualMin);
console.log(validation.actualMax);
console.log(validation.actualMean);
console.log(validation.actualShape);

TensorSpec Options:

| Parameter | Type | Description | |-----------|------|-------------| | shape | number[] | Expected shape (use -1 as wildcard for any dimension) | | dtype | string | Expected dtype ('float32', 'int32', 'uint8', etc.) | | minValue | number | Minimum allowed value (optional) | | maxValue | number | Maximum allowed value (optional) | | channels | number | Expected number of channels (optional) |

Result:

| Property | Type | Description | |----------|------|-------------| | isValid | boolean | Whether tensor passes all validations | | issues | string[] | List of validation issues found | | actualShape | number[] | Actual shape of the tensor | | actualMin | number | Minimum value in tensor | | actualMax | number | Maximum value in tensor | | actualMean | number | Mean value of tensor |

`assembleBatch(pixelResults, options?)`

Assemble multiple PixelDataResults into a single batch tensor. Pure TypeScript function for preparing batched inference.

import { getPixelData, assembleBatch, MODEL_PRESETS } from 'react-native-vision-utils';

// Process multiple images
const images = ['/path/to/1.jpg', '/path/to/2.jpg', '/path/to/3.jpg'];
const results = await Promise.all(
  images.map((path) =>
    getPixelData({
      source: { type: 'file', value: path },
      ...MODEL_PRESETS.mobilenet,
    })
  )
);

// Assemble into batch
const batch = assembleBatch(results, {
  layout: 'nchw', // or 'nhwc'
  padToSize: 4,   // Pad batch to size 4 (optional)
});

console.log(batch.shape);     // [3, 3, 224, 224] for NCHW
console.log(batch.batchSize); // 3 (or 4 if padded)
console.log(batch.data);      // Combined Float32Array

Options:

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | layout | 'nchw' | 'nhwc' | 'nchw' | Output batch layout | | padToSize | number | - | Pad batch to this size with zeros (optional) |

🧮 Tensor Operations

`extractChannel(data, width, height, channels, channelIndex, dataLayout)`

Extract a single channel from pixel data.

import { getPixelData, extractChannel } from 'react-native-vision-utils';

const result = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
  colorFormat: 'rgb',
});

// Extract red channel
const redChannel = await extractChannel(
  result.data,
  result.width,
  result.height,
  result.channels,
  0, // channel index (0=R, 1=G, 2=B)
  result.dataLayout
);

`extractPatch(data, width, height, channels, patchOptions, dataLayout)`

Extract a rectangular patch from pixel data.

import { getPixelData, extractPatch } from 'react-native-vision-utils';

const result = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
});

const patch = await extractPatch(
  result.data,
  result.width,
  result.height,
  result.channels,
  { x: 100, y: 100, width: 64, height: 64 },
  result.dataLayout
);

`concatenateToBatch(results)`

Combine multiple results into a batch tensor.

import {
  batchGetPixelData,
  concatenateToBatch,
  MODEL_PRESETS,
} from 'react-native-vision-utils';

const batch = await batchGetPixelData(
  [
    {
      source: { type: 'file', value: '/path/to/1.jpg' },
      ...MODEL_PRESETS.mobilenet,
    },
    {
      source: { type: 'file', value: '/path/to/2.jpg' },
      ...MODEL_PRESETS.mobilenet,
    },
  ],
  { concurrency: 2 }
);

const batchTensor = await concatenateToBatch(batch.results);
console.log(batchTensor.shape); // [2, 3, 224, 224]
console.log(batchTensor.batchSize); // 2

`permute(data, shape, order)`

Transpose/permute tensor dimensions.

import { getPixelData, permute } from 'react-native-vision-utils';

const result = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
  dataLayout: 'hwc', // [H, W, C]
});

// Convert HWC to CHW
const permuted = await permute(
  result.data,
  result.shape,
  [2, 0, 1] // new order: C, H, W
);
console.log(permuted.shape); // [channels, height, width]

🖼️ Tensor to Image

`tensorToImage(data, width, height, options)`

Convert tensor data back to an image.

import {
  getPixelData,
  tensorToImage,
  MODEL_PRESETS,
} from 'react-native-vision-utils';

// Process image
const result = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
  ...MODEL_PRESETS.mobilenet,
});

// Convert back to image (with denormalization)
const image = await tensorToImage(result.data, result.width, result.height, {
  channels: 3,
  dataLayout: 'chw',
  format: 'png',
  denormalize: true,
  mean: [0.485, 0.456, 0.406],
  std: [0.229, 0.224, 0.225],
});

console.log(image.base64); // Base64 encoded PNG

💾 Cache Management

`clearCache()`

Clear the internal image cache.

import { clearCache } from 'react-native-vision-utils';

await clearCache();

`getCacheStats()`

Get cache statistics.

import { getCacheStats } from 'react-native-vision-utils';

const stats = await getCacheStats();
console.log(stats.hitCount);
console.log(stats.missCount);
console.log(stats.size);
console.log(stats.maxSize);

🏷️ Label Database

Built-in label databases for common ML classification and detection models. No external files needed.

`getLabel(index, options)`

Get a label by its class index.

import { getLabel } from 'react-native-vision-utils';

// Simple lookup
const label = await getLabel(0); // Returns 'person' (COCO default)

// With metadata
const labelInfo = await getLabel(0, {
  dataset: 'coco',
  includeMetadata: true,
});
console.log(labelInfo.name); // 'person'
console.log(labelInfo.displayName); // 'Person'
console.log(labelInfo.supercategory); // 'person'

`getTopLabels(scores, options)`

Get top-K labels from prediction scores (e.g., from softmax output).

import { getTopLabels } from 'react-native-vision-utils';

// After running inference, get your output scores
const scores = modelOutput; // Array of 80 confidence scores for COCO

const topLabels = await getTopLabels(scores, {
  dataset: 'coco',
  k: 5,
  minConfidence: 0.1,
  includeMetadata: true,
});

topLabels.forEach((result) => {
  console.log(`${result.label}: ${(result.confidence * 100).toFixed(1)}%`);
});
// Output:
// person: 92.3%
// car: 45.2%
// dog: 23.1%

`getAllLabels(dataset)`

Get all labels for a dataset.

import { getAllLabels } from 'react-native-vision-utils';

const cocoLabels = await getAllLabels('coco');
console.log(cocoLabels.length); // 80
console.log(cocoLabels[0]); // 'person'

`getDatasetInfo(dataset)`

Get information about a dataset.

import { getDatasetInfo } from 'react-native-vision-utils';

const info = await getDatasetInfo('imagenet');
console.log(info.name); // 'imagenet'
console.log(info.numClasses); // 1000
console.log(info.description); // 'ImageNet ILSVRC 2012 classification labels'

`getAvailableDatasets()`

List all available label datasets.

import { getAvailableDatasets } from 'react-native-vision-utils';

const datasets = await getAvailableDatasets();
// ['coco', 'coco91', 'imagenet', 'voc', 'cifar10', 'cifar100', 'places365', 'ade20k']

Available Datasets

| Dataset | Classes | Description | | ------------- | ------- | ------------------------------------ | | coco | 80 | COCO 2017 object detection labels | | coco91 | 91 | COCO original labels with background | | imagenet | 1000 | ImageNet ILSVRC 2012 classification | | imagenet21k | 21841 | ImageNet-21K full classification | | voc | 21 | PASCAL VOC with background | | cifar10 | 10 | CIFAR-10 classification | | cifar100 | 100 | CIFAR-100 classification | | places365 | 365 | Places365 scene recognition | | ade20k | 150 | ADE20K semantic segmentation |

📹 Camera Frame Utilities

High-performance utilities for processing camera frames directly, optimized for integration with react-native-vision-camera.

`processCameraFrame(source, options)`

Convert camera frame buffer to ML-ready tensor.

import { processCameraFrame } from 'react-native-vision-utils';

// In a vision-camera frame processor
const result = await processCameraFrame(
  {
    width: 1920,
    height: 1080,
    pixelFormat: 'yuv420', // Camera format
    bytesPerRow: 1920,
    dataBase64: frameDataBase64, // Base64-encoded frame data
    orientation: 'right', // Device orientation
    timestamp: Date.now(),
  },
  {
    outputWidth: 224, // Resize for model
    outputHeight: 224,
    normalize: true,
    outputFormat: 'rgb',
    mean: [0.485, 0.456, 0.406], // ImageNet normalization
    std: [0.229, 0.224, 0.225],
  }
);

console.log(result.tensor); // Normalized float array
console.log(result.shape); // [224, 224, 3]
console.log(result.processingTimeMs); // Processing time

`convertYUVToRGB(options)`

Direct YUV to RGB conversion for camera frames.

import { convertYUVToRGB } from 'react-native-vision-utils';

const rgbData = await convertYUVToRGB({
  width: 640,
  height: 480,
  pixelFormat: 'yuv420',
  yPlaneBase64: yPlaneData,
  uPlaneBase64: uPlaneData,
  vPlaneBase64: vPlaneData,
  outputFormat: 'rgb', // or 'base64'
});

console.log(rgbData.data); // RGB pixel values
console.log(rgbData.channels); // 3

Supported Pixel Formats

| Format | Description | | -------- | ------------------------------- | | yuv420 | YUV 4:2:0 planar | | yuv422 | YUV 4:2:2 planar | | nv12 | NV12 (Y plane + interleaved UV) | | nv21 | NV21 (Y plane + interleaved VU) | | bgra | BGRA 8-bit | | rgba | RGBA 8-bit | | rgb | RGB 8-bit |

Frame Orientations

| Orientation | Description | | ----------- | --------------------- | | up | No rotation (default) | | down | 180° rotation | | left | 90° counter-clockwise | | right | 90° clockwise |

📊 Quantization (for TFLite and Other Quantized Models)

Native high-performance quantization for deploying to quantized ML models like TFLite int8.

`quantize(data, options)`

Quantize float data to int8/uint8/int16 format.

import {
  getPixelData,
  quantize,
  MODEL_PRESETS,
} from 'react-native-vision-utils';

// Get float pixel data
const result = await getPixelData({
  source: { type: 'file', value: '/path/to/image.jpg' },
  ...MODEL_PRESETS.mobilenet,
});

// Per-tensor quantization (single scale/zeroPoint)
const quantized = await quantize(result.data, {
  mode: 'per-tensor',
  dtype: 'uint8',
  scale: 0.0078125, // From TFLite model
  zeroPoint: 128, // From TFLite model
});

console.log(quantized.data); // Uint8Array
console.log(quantized.scale); // 0.0078125
console.log(quantized.zeroPoint); // 128
console.log(quantized.dtype); // 'uint8'
console.log(quantized.mode); // 'per-tensor'

Per-Channel Quantization

For models with per-channel quantization (common in TFLite):

// Per-channel quantization (different scale/zeroPoint per channel)
const quantized = await quantize(result.data, {
  mode: 'per-channel',
  dtype: 'int8',
  scale: [0.0123, 0.0156, 0.0189], // Per-channel scales
  zeroPoint: [0, 0, 0], // Per-channel zero points
  channels: 3,
  dataLayout: 'chw', // Specify layout for per-channel
});

Automatic Parameter Calculation

Calculate optimal quantization parameters from your data:

import {
  calculateQuantizationParams,
  quantize,
} from 'react-native-vision-utils';

// Calculate optimal params for your data range
const params = await calculateQuantizationParams(result.data, {
  mode: 'per-tensor',
  dtype: 'uint8',
});

console.log(params.scale); // Calculated scale
console.log(params.zeroPoint); // Calculated zero point
console.log(params.min); // Data min value
console.log(params.max); // Data max value

// Use calculated params for quantization
const quantized = await quantize(result.data, {
  mode: 'per-tensor',
  dtype: 'uint8',
  scale: params.scale as number,
  zeroPoint: params.zeroPoint as number,
});

`dequantize(data, options)`

Convert quantized data back to float32.

import { dequantize } from 'react-native-vision-utils';

// Dequantize int8 data back to float
const dequantized = await dequantize(quantized.data, {
  mode: 'per-tensor',
  dtype: 'int8',
  scale: 0.0078125,
  zeroPoint: 0,
});

console.log(dequantized.data); // Float32Array

Quantization Options

| Property | Type | Required | Description | | ------------ | ------------------------------- | ----------- | --------------------------------------------- | | mode | 'per-tensor' \| 'per-channel' | Yes | Quantization mode | | dtype | 'int8' \| 'uint8' \| 'int16' | Yes | Output data type | | scale | number \| number[] | Yes | Scale factor(s) - array for per-channel | | zeroPoint | number \| number[] | Yes | Zero point(s) - array for per-channel | | channels | number | Per-channel | Number of channels (required for per-channel) | | dataLayout | 'hwc' \| 'chw' | Per-channel | Data layout (default: 'hwc') |

Quantization Formulas

Quantize:   q = round(value / scale + zeroPoint)
Dequantize: value = (q - zeroPoint) * scale

📦 Bounding Box Utilities

High-performance utilities for working with object detection outputs.

`convertBoxFormat(boxes, options)`

Convert between bounding box formats (xyxy, xywh, cxcywh).

import { convertBoxFormat } from 'react-native-vision-utils';

// Convert YOLO format (center x, center y, width, height) to corners
const result = await convertBoxFormat(
  [[320, 240, 100, 80]], // [cx, cy, w, h]
  { fromFormat: 'cxcywh', toFormat: 'xyxy' }
);
console.log(result.boxes); // [[270, 200, 370, 280]] - [x1, y1, x2, y2]

Box Formats

| Format | Description | Values | | -------- | -------------------------------- | ----------------------- | | xyxy | Two corners | [x1, y1, x2, y2] | | xywh | Top-left corner + dimensions | [x, y, width, height] | | cxcywh | Center + dimensions (YOLO style) | [cx, cy, width, height] |

`scaleBoxes(boxes, options)`

Scale bounding boxes between different image dimensions.

import { scaleBoxes } from 'react-native-vision-utils';

// Scale boxes from 640x640 model input back to 1920x1080 original
const result = await scaleBoxes(
  [[100, 100, 200, 200]], // detections in 640x640 space
  {
    fromWidth: 640,
    fromHeight: 640,
    toWidth: 1920,
    toHeight: 1080,
    format: 'xyxy',
  }
);
// Boxes are now in original image coordinates

`clipBoxes(boxes, options)`

Clip bounding boxes to image boundaries.

import { clipBoxes } from 'react-native-vision-utils';

const result = await clipBoxes(
  [[-10, 50, 700, 500]], // box extends beyond image
  { width: 640, height: 480, format: 'xyxy' }
);
console.log(result.boxes); // [[0, 50, 640, 480]]

`calculateIoU(box1, box2, format)`

Calculate Intersection over Union between two boxes.

import { calculateIoU } from 'react-native-vision-utils';

const result = await calculateIoU(
  [100, 100, 200, 200],
  [150, 150, 250, 250],
  'xyxy'
);
console.log(result.iou); // ~0.142 (14% overlap)
console.log(result.intersection); // Area of overlap
console.log(result.union); // Total covered area

`nonMaxSuppression(detections, options)`

Apply Non-Maximum Suppression to filter overlapping detections.

import { nonMaxSuppression } from 'react-native-vision-utils';

const detections = [
  { box: [100, 100, 200, 200], score: 0.9, classIndex: 0 },
  { box: [110, 110, 210, 210], score: 0.8, classIndex: 0 }, // overlaps
  { box: [300, 300, 400, 400], score: 0.7, classIndex: 1 },
];

const result = await nonMaxSuppression(detections, {
  iouThreshold: 0.5,
  scoreThreshold: 0.3,
  maxDetections: 100,
});
// Keeps first and third detection; second is suppressed due to overlap

📐 Letterbox Padding

YOLO-style letterbox preprocessing for preserving aspect ratio.

`letterbox(source, options)`

Apply letterbox padding to an image.

import { letterbox, getPixelData } from 'react-native-vision-utils';

// Letterbox a 1920x1080 image to 640x640 for YOLO
const result = await letterbox(
  { type: 'file', value: '/path/to/image.jpg' },
  {
    targetWidth: 640,
    targetHeight: 640,
    fillColor: [114, 114, 114], // YOLO gray
  }
);

console.log(result.imageBase64); // Letterboxed image
console.log(result.letterboxInfo); // Transform info for reverse mapping
// letterboxInfo: { scale, offset, originalSize, letterboxedSize }

// Get pixel data from letterboxed image
const pixels = await getPixelData({
  source: { type: 'base64', value: result.imageBase64 },
  colorFormat: 'rgb',
  normalization: { preset: 'scale' },
  dataLayout: 'nchw',
});

`reverseLetterbox(boxes, options)`

Transform detection boxes back to original image coordinates.

import { letterbox, reverseLetterbox } from 'react-native-vision-utils';

// First letterbox the image
const lb = await letterbox(source, { targetWidth: 640, targetHeight: 640 });

// Run detection on letterboxed image...
const detections = [
  { box: [100, 150, 200, 250], score: 0.9 }, // in 640x640 space
];

// Reverse transform to original coordinates
const result = await reverseLetterbox(
  detections.map((d) => d.box),
  {
    scale: lb.letterboxInfo.scale,
    offset: lb.letterboxInfo.offset,
    originalSize: lb.letterboxInfo.originalSize,
    format: 'xyxy',
  }
);
// Boxes are now in original 1920x1080 coordinates

🎨 Drawing & Visualization

Utilities for visualizing detection and segmentation results.

`drawBoxes(source, boxes, options)`

Draw bounding boxes with labels on an image.

import { drawBoxes } from 'react-native-vision-utils';

const result = await drawBoxes(
  { type: 'base64', value: imageBase64 },
  [
    { box: [100, 100, 200, 200], label: 'person', score: 0.95, classIndex: 0 },
    { box: [300, 150, 400, 350], label: 'dog', score: 0.87, classIndex: 16 },
    { box: [500, 200, 600, 300], color: [255, 0, 0] }, // custom red color
  ],
  {
    lineWidth: 3,
    fontSize: 14,
    drawLabels: true,
    labelBackgroundAlpha: 0.7,
  }
);

// Display result.imageBase64
console.log(result.width, result.height);
console.log(result.processingTimeMs);

`drawKeypoints(source, keypoints, options)`

Draw pose keypoints with skeleton connections.

import { drawKeypoints } from 'react-native-vision-utils';

// COCO pose keypoints
const keypoints = [
  { x: 320, y: 100, confidence: 0.9 }, // nose
  { x: 310, y: 90, confidence: 0.85 }, // left_eye
  { x: 330, y: 90, confidence: 0.87 }, // right_eye
  // ... 17 keypoints total
];

const result = await drawKeypoints(
  { type: 'base64', value: imageBase64 },
  keypoints,
  {
    pointRadius: 5,
    minConfidence: 0.5,
    lineWidth: 2,
    skeleton: [
      { from: 0, to: 1 }, // nose to left_eye
      { from: 0, to: 2 }, // nose to right_eye
      { from: 5, to: 6 }, // left_shoulder to right_shoulder
      // ... more connections
    ],
  }
);

`overlayMask(source, mask, options)`

Overlay a segmentation mask on an image.

import { overlayMask } from 'react-native-vision-utils';

// Segmentation output (class indices for each pixel)
const segmentationMask = new Array(160 * 160).fill(0); // 160x160 mask
segmentationMask.fill(1, 0, 5000); // Some pixels class 1
segmentationMask.fill(15, 5000, 10000); // Some pixels class 15

const result = await overlayMask(
  { type: 'base64', value: imageBase64 },
  segmentationMask,
  {
    maskWidth: 160,
    maskHeight: 160,
    alpha: 0.5,
    isClassMask: true, // Each value is a class index
  }
);

`overlayHeatmap(source, heatmap, options)`

Overlay an attention/saliency heatmap on an image.

import { overlayHeatmap } from 'react-native-vision-utils';

// Attention weights from a ViT model (14x14 patches)
const attentionWeights = new Array(14 * 14).fill(0).map(() => Math.random());

const result = await overlayHeatmap(
  { type: 'base64', value: imageBase64 },
  attentionWeights,
  {
    heatmapWidth: 14,
    heatmapHeight: 14,
    alpha: 0.6,
    colorScheme: 'jet', // 'jet', 'hot', or 'viridis'
  }
);

Heatmap Color Schemes

| Scheme | Description | | --------- | -------------------------------------------- | | jet | Blue → Cyan → Green → Yellow → Red (default) | | hot | Black → Red → Yellow → White | | viridis | Purple → Blue → Green → Yellow |

📝 Type Reference

Image Source Types

type ImageSourceType = 'url' | 'file' | 'base64' | 'asset' | 'photoLibrary';

interface ImageSource {
  type: ImageSourceType;
  value: string;
}

// Examples:
{ type: 'url', value: 'https://example.com/image.jpg' }
{ type: 'file', value: '/path/to/image.jpg' }
{ type: 'base64', value: 'data:image/png;base64,...' }
{ type: 'asset', value: 'image_name' }
{ type: 'photoLibrary', value: 'identifier' }

Color Formats

type ColorFormat =
  | 'rgb' // 3 channels
  | 'rgba' // 4 channels
  | 'bgr' // 3 channels (OpenCV style)
  | 'bgra' // 4 channels
  | 'grayscale' // 1 channel
  | 'hsv' // 3 channels (Hue, Saturation, Value)
  | 'hsl' // 3 channels (Hue, Saturation, Lightness)
  | 'lab' // 3 channels (CIE LAB)
  | 'yuv' // 3 channels
  | 'ycbcr'; // 3 channels

Resize Strategies

type ResizeStrategy =
  | 'cover' // Fill target, crop excess (default)
  | 'contain' // Fit within target, padColor padding
  | 'stretch' // Stretch to fill (may distort)
  | 'letterbox'; // Fit within target, letterboxColor padding (YOLO-style)

interface ResizeOptions {
  width: number;
  height: number;
  strategy?: ResizeStrategy;
  padColor?: [number, number, number, number]; // RGBA for 'contain' (default: black)
  letterboxColor?: [number, number, number]; // RGB for 'letterbox' (default: [114, 114, 114])
}

Normalization

type NormalizationPreset =
  | 'scale' // [0, 1] range
  | 'imagenet' // ImageNet mean/std
  | 'tensorflow' // [-1, 1] range
  | 'raw' // No normalization (0-255)
  | 'custom'; // Custom mean/std

interface Normalization {
  preset?: NormalizationPreset;
  mean?: number[]; // Per-channel mean
  std?: number[]; // Per-channel std
}

Data Layouts

type DataLayout =
  | 'hwc' // Height × Width × Channels (default)
  | 'chw' // Channels × Height × Width (PyTorch)
  | 'nhwc' // Batch × Height × Width × Channels (TensorFlow)
  | 'nchw'; // Batch × Channels × Height × Width (PyTorch batched)

Utility Functions

// Get channel count for a color format
getChannelCount('rgb'); // 3
getChannelCount('rgba'); // 4
getChannelCount('grayscale'); // 1

// Check if error is a VisionUtilsError
isVisionUtilsError(error); // boolean

⚠️ Error Handling

All functions throw VisionUtilsError on failure:

import {
  getPixelData,
  isVisionUtilsError,
  VisionUtilsErrorCode,
} from 'react-native-vision-utils';

try {
  const result = await getPixelData({
    source: { type: 'file', value: '/nonexistent.jpg' },
  });
} catch (error) {
  if (isVisionUtilsError(error)) {
    console.log(error.code); // 'LOAD_FAILED'
    console.log(error.message); // Human-readable message
  }
}

Error Codes

| Code | Description | |:-----|:------------| | INVALID_SOURCE | Invalid image source | | LOAD_FAILED | Failed to load image | | INVALID_ROI | Invalid region of interest | | PROCESSING_FAILED | Processing error | | INVALID_OPTIONS | Invalid options provided | | INVALID_CHANNEL | Invalid channel index | | INVALID_PATCH | Invalid patch dimensions | | DIMENSION_MISMATCH | Tensor dimension mismatch | | EMPTY_BATCH | Empty batch provided | | UNKNOWN | Unknown error |

🔄 Platform Considerations

⚠️ Cross-Platform Variance

When processing identical images on iOS and Android, you may observe slight differences in pixel values (typically 5-15%). This is expected behavior due to:

| Source | Description | |:-------|:------------| | 🖼️ Image Decoders | iOS uses ImageIO, Android uses Skia - JPEG/PNG decoding differs slightly | | 📐 Resize Algorithms | Bilinear/bicubic interpolation implementations vary between platforms | | 🎨 Color Space Handling | sRGB gamma curves may be applied differently | | 🔢 Floating-Point Precision | Minor rounding differences in native math operations |

Impact on ML Models:

✅ Relative comparisons work identically across platforms
✅ Classification/detection results are consistent
✅ Threshold-based logic (value > 0, value < 1) works the same
⚠️ Exact numerical equality between platforms is not guaranteed

Best Practices:

// ✅ Good - threshold-based comparison
const isValid = result.actualMin >= 0 && result.actualMax <= 1;

// ✅ Good - relative comparison
const isBlurry = blurScore < 100;

// ⚠️ Avoid - exact value comparison across platforms
const isExact = result.actualMin === 0.0431; // May differ on iOS

This variance is standard across cross-platform ML libraries (TensorFlow Lite, ONNX Runtime, PyTorch Mobile).

⚡ Performance Tips

💡 Pro Tips for optimal performance

| Tip | Description | |:----|:------------| | 🎯 Resize Strategies | Use letterbox for YOLO, cover for classification models | | 📦 Batch Processing | Use batchGetPixelData with appropriate concurrency for multiple images | | 💾 Cache Management | Call clearCache() when memory is constrained | | ⚙️ Model Presets | Pre-configured settings are optimized for each model | | 🔄 Data Layout | Choose the right dataLayout upfront to avoid unnecessary conversions |

🤝 Contributing

We welcome contributions! Please see our guides:

| Resource | Description | |:---------|:------------| | 📋 Development workflow | How to set up your dev environment | | 🔀 Sending a pull request | Guidelines for submitting PRs | | 📜 Code of conduct | Community guidelines |

📄 License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

📋 Table of Contents

✨ Features

📦 Installation

🚀 Quick Start

Basic Usage

Using Model Presets

Available Model Presets

📖 API Reference

🔧 Core Functions

getPixelData(options)

Options

Result

batchGetPixelData(optionsArray, batchOptions)

🔍 Image Analysis

getImageStatistics(source)

getImageMetadata(source)

validateImage(source, options)

detectBlur(source, options?)

extractVideoFrames(source, options?)

🎨 Image Augmentation

applyAugmentations(source, augmentations)

colorJitter(source, options)

cutout(source, options)

✂️ Multi-Crop Operations

fiveCrop(source, options, pixelOptions)

tenCrop(source, options, pixelOptions)

extractGrid(source, gridOptions, pixelOptions?)

randomCrop(source, cropOptions, pixelOptions?)

validateTensor(data, shape, spec?)

assembleBatch(pixelResults, options?)

🧮 Tensor Operations

extractChannel(data, width, height, channels, channelIndex, dataLayout)

extractPatch(data, width, height, channels, patchOptions, dataLayout)

concatenateToBatch(results)

permute(data, shape, order)

🖼️ Tensor to Image

tensorToImage(data, width, height, options)

💾 Cache Management

clearCache()

getCacheStats()

🏷️ Label Database

getLabel(index, options)

getTopLabels(scores, options)

getAllLabels(dataset)

getDatasetInfo(dataset)

getAvailableDatasets()

Available Datasets

📹 Camera Frame Utilities

processCameraFrame(source, options)

convertYUVToRGB(options)

Supported Pixel Formats

Frame Orientations

📊 Quantization (for TFLite and Other Quantized Models)

quantize(data, options)

Per-Channel Quantization

Automatic Parameter Calculation

dequantize(data, options)

Quantization Options

Quantization Formulas

📦 Bounding Box Utilities

convertBoxFormat(boxes, options)

Box Formats

scaleBoxes(boxes, options)

clipBoxes(boxes, options)

calculateIoU(box1, box2, format)

nonMaxSuppression(detections, options)

📐 Letterbox Padding

letterbox(source, options)

reverseLetterbox(boxes, options)

🎨 Drawing & Visualization

drawBoxes(source, boxes, options)

drawKeypoints(source, keypoints, options)

overlayMask(source, mask, options)

overlayHeatmap(source, heatmap, options)

`getPixelData(options)`

`batchGetPixelData(optionsArray, batchOptions)`

`getImageStatistics(source)`

`getImageMetadata(source)`

`validateImage(source, options)`

`detectBlur(source, options?)`

`extractVideoFrames(source, options?)`

`applyAugmentations(source, augmentations)`

`colorJitter(source, options)`

`cutout(source, options)`

`fiveCrop(source, options, pixelOptions)`

`tenCrop(source, options, pixelOptions)`

`extractGrid(source, gridOptions, pixelOptions?)`

`randomCrop(source, cropOptions, pixelOptions?)`

`validateTensor(data, shape, spec?)`

`assembleBatch(pixelResults, options?)`

`extractChannel(data, width, height, channels, channelIndex, dataLayout)`

`extractPatch(data, width, height, channels, patchOptions, dataLayout)`

`concatenateToBatch(results)`

`permute(data, shape, order)`

`tensorToImage(data, width, height, options)`

`clearCache()`

`getCacheStats()`

`getLabel(index, options)`

`getTopLabels(scores, options)`

`getAllLabels(dataset)`

`getDatasetInfo(dataset)`

`getAvailableDatasets()`

`processCameraFrame(source, options)`

`convertYUVToRGB(options)`

`quantize(data, options)`

`dequantize(data, options)`

`convertBoxFormat(boxes, options)`

`scaleBoxes(boxes, options)`

`clipBoxes(boxes, options)`

`calculateIoU(box1, box2, format)`

`nonMaxSuppression(detections, options)`

`letterbox(source, options)`

`reverseLetterbox(boxes, options)`

`drawBoxes(source, boxes, options)`

`drawKeypoints(source, keypoints, options)`

`overlayMask(source, mask, options)`

`overlayHeatmap(source, heatmap, options)`