wasm-cld-toolkit

v0.10.4

Published

3 months ago

A lightweight, blazingly fast 2D image descriptor, matcher and retriever, based on the MPEG-7 Color Layout Descriptor (CLD)

Downloads

0High
0Medium
0Low

toshiaki-b

browser cbir cld color-layout descriptor image mpeg7 node retrieval search wasm

wasm-cld-toolkit

license MIT

A lightweight, blazingly fast 2D image descriptor, matcher and retriever, based on the MPEG-7 Color Layout Descriptor (CLD).

Calculating descriptors for 949 images...
Calculated descriptors for 949 images in 341ms (0.359ms per image)

Retrieving image by comparing coefficients...
Scored 949 images in 2ms (0.00211ms per image) + scores sorted in 0ms = 2ms total

Matching multiple images...
Matched 633 images (50% hit, 50% miss) against 632 images in 8ms (<400056 searches, >0.0000200ms per image)

You can read the in-depth documentation from here.

You can check the latest changelog from here.

⚠️ Version 0.9.x is bugged! Do not use [email protected] or its descriptors!

Installation

npm i wasm-cld-toolkit

yarn add wasm-cld-toolkit

pnpm add wasm-cld-toolkit

Quick examples

Calculate an image descriptor

// env: browser
import { imageDescriptor } from "wasm-cld-toolkit";

const htmlImageBitmap = await window.createImageBitmap(htmlImageElement);
const htmlImageDescriptor: string = imageDescriptor(htmlImageBitmap);

// This will work just as well
const blobImageBitmap = await window.createImageBitmap(imageBlob);
const blobImageDescriptor: string = imageDescriptor(blobImageBitmap);

// env: node
import { join } from "node:fs";
import sharp from "sharp";

// Add "/node" as a suffix to use the NodeJS version
import { imageDescriptor } from "wasm-cld-toolkit/node";

// // Other functions are reexported as well
// import { descriptorDistance } from "wasm-cld-toolkit/node";

const sharpImage = sharp(join(__dirname, "images/007.jpg"));

// NOTE: Functions that reads out data from `sharp` are always async
const sharpImageDescriptor: string = await imageDescriptor(sharpImage);

Fetch an image, then calculate its descriptor

// env: browser
import { imageDescriptor } from "wasm-cld-toolkit";
import { fetchImageBitmap } from "wasm-cld-toolkit/extra";

// Creating `ImageBitmap`s from remote resources requires some boilerplate.
// So, we implemented it for you.
const remoteImageBitmap = await fetchImageBitmap(
  "https://picsum.photos/id/0/100/100",
);

const remoteImageDescriptor: string = imageDescriptor(remoteImageBitmap);

// You don't have to free the resource by yourself;
// Just leave it to the garbage collector.
// // remoteImageBitmap.close();

Customize the length of the descriptor

import { imageDescriptor, descriptorDistance } from "wasm-cld-toolkit";
import {
  longDescriptorPreset,
  shortDescriptorPreset,
} from "wasm-cld-toolkit/presets";

// You can use any number between 0 and 64 (inclusive)
const customImageDescriptor: string = imageDescriptor(imageBitmap, {
  // Luma (grayscale) precision
  yLength: 12, // 👈

  // Chroma (color) precision
  cLength: 8, // 👈
});

customImageDescriptor.length; // => 43

// ...or you may use a preset that suits you needs
const longImageDescriptor: string = imageDescriptor(
  imageBitmap,
  longDescriptorPreset,
);

longImageDescriptor.length; // => 78

const shortImageDescriptor: string = imageDescriptor(
  imageBitmap,
  shortDescriptorPreset,
);

shortImageDescriptor.length; // => 38

// When comparing descriptors of different sizes, only the overlapping
// coefficients will be measured. This ensures that the distance between
// the descriptors created from the same image will always be zero.
descriptorDistance(longImageDescriptor, shortImageDescriptor); // => 0

// If colors are not important, set `cLength` to `0` to ignore them and save
// some space. Keep in mind that this descriptor will ignore ALL differences
// in color. Mixing it with descriptors that have a non-zero `cLength` is
// not recommended, as this will likely lead to unexpected results.
const colorlessImageDescriptor: string = imageDescriptor(imageBitmap, {
  cLength: 0, // 👈
});

Work with multiple descriptors using `transpose()`

import { mapImageDescriptor } from "wasm-cld-toolkit";
import { detranspose, transpose } from "wasm-cld-toolkit/extra";

// Don't do this
{
  const uselessImageDataset = {
    src: imageSources,
    descriptor: mapImageDescriptor(imageBitmaps),
  };

  // Messy and sluggish
  const firstImageData: ImageData = {
    src: uselessImageDataset.src[0],
    descriptor: uselessImageDataset.descriptor[0],
  };
}

// Do this instead
{
  // `transpose()` will convert `Object<Array>` into `Array<Object>`
  const usefulImageDataset = transpose({
    src: imageSources,
    descriptor: mapImageDescriptor(imageBitmaps),
  });

  // The layout is clean and sweat
  const firstImageData = usefulImageDataset[0] satisfies {
    src: string;
    descriptor: string;
  };

  // There's also a reverse version of this
  const detransposedImageDataset = detranspose(usefulImageDataset);

  // ...that may help you adding a property
  const imageDatasetWithDistance = transpose({
    ...detransposedImageDataset,

    distanceFromRef: mapDescriptorDistance(
      refDescriptor,
      detransposedImageDataset.descriptor,
    ),
  });

  // The layout is still nice and wise
  const firstImageDataWithDistance = ImageDataWithDistance[0] satisfies {
    src: string;
    descriptor: string;
    distanceFromRef: number;
  };

  // Thanks to `transpose()`, sorting is easier than ever
  const sortedImageDatasetWithDistance = imageDatasetWithDistance.toSorted(
    (left, right) => left.distanceFromRef - right.distanceFromRef,
  );
}

Sort images by similarity

import { mapDescriptorDistance, mapImageDescriptor } from "wasm-cld-toolkit";
import {
  fetchImageBitmap,
  detranspose,
  transpose,
} from "wasm-cld-toolkit/extra";

// Descriptors are universal across platforms
const refDescriptor = "!#,^W7KWbh!:g*d!<`<#r;cKm!!2usR-jnO!!)fn!_<e%!<<*'!WN";

const remoteImageSources = [
  "https://picsum.photos/id/0/100/100",
  "https://picsum.photos/id/1/100/100",
  // ...
];

const remoteImageBitmaps = await Promise.all(
  remoteImageSources.map((src) => fetchImageBitmap(src)),
);

const remoteImageDescriptors = mapImageDescriptor(randomImageBitmaps);

const remoteImageDistancesFromRef = mapDescriptorDistance(
  refDescriptor,
  remoteImageDescriptors,
);

const remoteImageSimilarityDataset = transpose({
  src: remoteImageSources,
  distanceFromRef: remoteImageDistancesFromRef,
}).toSorted((left, right) => left.distanceFromRef - right.distanceFromRef);

Filter out specific images

import { mapContainsDescriptor, mapImageDescriptor } from "wasm-cld-toolkit";
import { transpose } from "wasm-cld-toolkit/extra";

// Descriptors are universal across platforms
const unwelcomeImageDescriptors = [
  "!#,^W7KWbh!:g*d!<`<#r;cKm!!2usR-jnO!!)fn!_<e%!<<*'!WN",
  "!#,]F'a=^<!W2ru!!!!!!WW3#!!!!!2`Us?!<E'#!:'Xj!rN,s!!*",
  // ...
];

const localImageElements = [
  ...document.querySelectorAll(".gallery > .image-container > img"),
];

const localImageBitmaps = await Promise.all(
  localImageElements.map((imageElement) =>
    window.createImageBitmap(imageElement),
  ),
);

const localImageDescriptors = mapImageDescriptor(localImageBitmaps);

const localImageMatchResults = mapContainsDescriptor(
  localImageDescriptors,
  unwelcomeImageDescriptors,
);

transpose({
  imageElement: localImageElements,
  match: localImageMatchResults,
})
  .filter((image) => image.match)
  .forEach((image) => {
    const container = image.imageElement.parentElement;
    container.classList.add("hidden");
  });

Goals

Be blazingly fast
Make the dependency very minimal
Make the binary as small as possible

Non-goals

Support more expensive descriptors
- Expensive image descriptors, such as the Joint Composite Descriptor (JCD) (and anything else), aren't worth the cost. Some people have tried to replace the good-old CLD with them, but after years of testing, they got nowhere.
Make the package work without WASM
- This would ruin the whole point. I didn't choose WASM just because it sounds cool, or because I found some handy library, but because ~~Rust is blazingly fast~~ WASM provides greater control over low-level operations.
Go multithread
- I have two issues with this. First, multithreaded WASM has several caveats. Second, this library is already fast enough for small and large inputs. Introducing an extra pile of code and overhead just to reduce processing time from 1.1ms to 0.9ms is pointless.

Difference between the MPEG-7 Color Layout Descriptor

We said "based on". But what does that mean?

The infamous paywall makes it difficult to find reliable information about MPEG-7 standards. To save you the trouble, we have listed the key differences here so you don't have to look them up:

Our version uses a larger integers to store coefficients; we no longer use potatoes for storage
- DC coefficients: stored using 16-bit integers instead of 6-bit integers
- AC coefficients: stored using 8-bit integers instead of 5-bit integers
Our version quantizes each coefficient using their largest possible values as the scaling factor
Our version stores coefficients in their natural order [Y'(DC), Y'(AC).., Cb(DC), Cb(AC).., Cr(DC), Cr(AC)..] instead of the standardized [Y'(AC).., Cb(AC).., Cr(AC).., Y'(DC), Cb(DC), Cr(DC)]
Our version stores data length at the beginning of the data instead of the end for easier parsing
Our version includes a "version" byte at the start to ensure future compatibility
Our version uses Base85 encoding for ASCII representation instead of inefficient XML
Our version is strictly focused on coefficients; no space is wasted on unrelated metadata

Under the hood

Calculating a descriptor

The calculation is done in five stages.

Stage 1

In the first stage, 64 (8x8) representative colors will be selected from the image. After fitting the input ImageBitmap on a 128x128 sized OffscreenCanvas (with a fallback to <canvas>), the canvas will be split 8x8 blocks. Each block's representative color will be calculated by averaging its pixels, and the resulting colors will be converted into floating-point 32-bit Y'CbCr colors.

ITU-R BT.601 Y'CbCr

ITU-R BT.601 Y'CbCr

Stage 2

In the second stage, a 2-Dimensional Type-II Discrete Cosine Transform (2D DCT-II) will be applied to each Y'CbCr block, by applying a 1D DCT-II to each row and column.

1-Dimensional Type-II Discrete Cosine Transform:

1-Dimensional Type-II Discrete Cosine Transform

Stage 3

In the third stage, the result of 2D DCT-II will be converted into a 1D array using the zig-zag scanning pattern from the original MPEG standard:

Zig-Zag Scanning Pattern:

Zig-Zag Scanning Pattern

This pattern will arrange the important coefficients at the start and the less important coefficients at the end, allowing safe truncation of the array at any arbitrary length.

Stage 4

After applying stages 1-3 on each channels, each coefficient will be quantized and the results will be packed into a statically sized array of unsigned 8-bit integers. The first coefficient of each channel will be the direct current coefficient (DC), and the rest will be the alternating current coefficients (AC). Since the DC coefficients will have a maximum value of 8.0 (without rounding error), which is greater than twice most AC coefficients, they will be quantized into two bytes by converting them to signed 16-bit integers and then reading them into two unsigned 8-bit integers in little-endian byte order. The AC coefficients will be quantized into signed 8-bit integers and cast directly into unsigned 8-bit integers. The number of coefficients will be set to 20 for the Y' channel and 8 for the Cb and Cr channels. Each coefficient will be uniformly quantized using its largest possible value as the factor.

The length of resulting bytes can be calculated using the function:

The len(n) function

Stage 5

In the fifth stage, the coefficients will be combined and serialized.

The quantized coefficients will be serialized using the next byte layout:

[[Version: 0_u8], [Y' coeffs count as u8], [Cb/Cr coeffs count as u8], [Coeffs<Y' coeffs>], [Coeffs<Cb coeffs>], [Coeffs<Cr coeffs>]]

Where Coeffs<T> is:

[[DC coeff: i16_le as [u8; 2]], [AC coeffs: i8 as u8]..]

After combining the quantized coefficients from stage 4, a hexdump of a CLD descriptor will be similar to the following example:

00000000  00 14 08 e7 42 fd 02 00 fb 02 fc fc 04 fe 01 00  |...çBý..û.üü.þ..|
00000010  f5 fd fb 02 01 01 01 fd dc f0 fc 00 00 01 03 fd  |õýû....ýÜðü....ý|
00000020  ff 43 0d 03 ff 00 ff fe 02 01                    |ÿC..ÿ.ÿþ..|
0000002a

For Base85 encodings, a non-padded alternative of atob/btoa Ascii85 (!-u) will be used. The later added z-exception (=0x00000000) and y-exception (=0x20202020) will be ignored to ensure the output has a fixed length. The encoded string will be ceil(len * 5 / 4) characters long.

After Base85 encoding the CLD descriptor shown above, the final result will be:

!#,_%6N$rcqZ?]n"TAE%p&4ah!<E6"h!k4A!!*3#s$I4trrE)u!W`

Implementation notes

Stage 1 may introduce a minor "noise" caused by differences in browsers, but this can be ignored as long as you use the provided functions to compare descriptors.
In stages 3 and 4, the FastDCT algorithm will replace the 8-point DCT-II, and only the required amount of coefficients will be allocated, scanned and quantized, with no truncations performed.
To dequantize a Base85 encoded descriptor, perform steps 4 and 5 in reverse.

Calculating the distance between descriptors

The formula is straightforward:

Distance formula

...where $w_{Y'}$ and $w_C$ are weights for the coefficients Y' and Cb/Cr, respectively.

Roadmap to 1.0.0

[ ] Finalize the API
[x] Fall back to <canvas> when OffscreenCanvas is not available
[x] Support Sharp as an input
[x] Add benchmarks
[x] Add documentation
[x] Add examples
[x] Add tests

License

MIT

Third-party licenses

Third-party licenses for the released package:

rlsf: MIT – yvt
once_cell: MIT – Aleksey Kladov
The Rust Language: MIT – The Rust Project Contributors
js-sys, wasm-bindgen: MIT – Alex Crichton
unchecked-std: 0BSD – lincot

Refer to the LICENSE file for the full license text.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

wasm-cld-toolkit

Installation

Quick examples

Calculate an image descriptor

Fetch an image, then calculate its descriptor

Customize the length of the descriptor

Work with multiple descriptors using transpose()

Sort images by similarity

Filter out specific images

Goals

Non-goals

Difference between the MPEG-7 Color Layout Descriptor

Under the hood

Calculating a descriptor

Stage 1

Stage 2

Stage 3

Stage 4

Stage 5

Implementation notes

Calculating the distance between descriptors

Roadmap to 1.0.0

License

Third-party licenses

Work with multiple descriptors using `transpose()`