wasm-cld-toolkit
v0.10.4
Published
A lightweight, blazingly fast 2D image descriptor, matcher and retriever, based on the MPEG-7 Color Layout Descriptor (CLD)
Downloads
28
Maintainers
Readme
wasm-cld-toolkit
A lightweight, blazingly fast 2D image descriptor, matcher and retriever, based on the MPEG-7 Color Layout Descriptor (CLD).
Calculating descriptors for 949 images...
Calculated descriptors for 949 images in 341ms (0.359ms per image)
Retrieving image by comparing coefficients...
Scored 949 images in 2ms (0.00211ms per image) + scores sorted in 0ms = 2ms total
Matching multiple images...
Matched 633 images (50% hit, 50% miss) against 632 images in 8ms (<400056 searches, >0.0000200ms per image)You can read the in-depth documentation from here.
You can check the latest changelog from here.
⚠️ Version
0.9.xis bugged! Do not use[email protected]or its descriptors!
Installation
npm i wasm-cld-toolkityarn add wasm-cld-toolkitpnpm add wasm-cld-toolkitQuick examples
Calculate an image descriptor
// env: browser
import { imageDescriptor } from "wasm-cld-toolkit";
const htmlImageBitmap = await window.createImageBitmap(htmlImageElement);
const htmlImageDescriptor: string = imageDescriptor(htmlImageBitmap);
// This will work just as well
const blobImageBitmap = await window.createImageBitmap(imageBlob);
const blobImageDescriptor: string = imageDescriptor(blobImageBitmap);// env: node
import { join } from "node:fs";
import sharp from "sharp";
// Add "/node" as a suffix to use the NodeJS version
import { imageDescriptor } from "wasm-cld-toolkit/node";
// // Other functions are reexported as well
// import { descriptorDistance } from "wasm-cld-toolkit/node";
const sharpImage = sharp(join(__dirname, "images/007.jpg"));
// NOTE: Functions that reads out data from `sharp` are always async
const sharpImageDescriptor: string = await imageDescriptor(sharpImage);Fetch an image, then calculate its descriptor
// env: browser
import { imageDescriptor } from "wasm-cld-toolkit";
import { fetchImageBitmap } from "wasm-cld-toolkit/extra";
// Creating `ImageBitmap`s from remote resources requires some boilerplate.
// So, we implemented it for you.
const remoteImageBitmap = await fetchImageBitmap(
"https://picsum.photos/id/0/100/100",
);
const remoteImageDescriptor: string = imageDescriptor(remoteImageBitmap);
// You don't have to free the resource by yourself;
// Just leave it to the garbage collector.
// // remoteImageBitmap.close();Customize the length of the descriptor
import { imageDescriptor, descriptorDistance } from "wasm-cld-toolkit";
import {
longDescriptorPreset,
shortDescriptorPreset,
} from "wasm-cld-toolkit/presets";
// You can use any number between 0 and 64 (inclusive)
const customImageDescriptor: string = imageDescriptor(imageBitmap, {
// Luma (grayscale) precision
yLength: 12, // 👈
// Chroma (color) precision
cLength: 8, // 👈
});
customImageDescriptor.length; // => 43
// ...or you may use a preset that suits you needs
const longImageDescriptor: string = imageDescriptor(
imageBitmap,
longDescriptorPreset,
);
longImageDescriptor.length; // => 78
const shortImageDescriptor: string = imageDescriptor(
imageBitmap,
shortDescriptorPreset,
);
shortImageDescriptor.length; // => 38
// When comparing descriptors of different sizes, only the overlapping
// coefficients will be measured. This ensures that the distance between
// the descriptors created from the same image will always be zero.
descriptorDistance(longImageDescriptor, shortImageDescriptor); // => 0
// If colors are not important, set `cLength` to `0` to ignore them and save
// some space. Keep in mind that this descriptor will ignore ALL differences
// in color. Mixing it with descriptors that have a non-zero `cLength` is
// not recommended, as this will likely lead to unexpected results.
const colorlessImageDescriptor: string = imageDescriptor(imageBitmap, {
cLength: 0, // 👈
});Work with multiple descriptors using transpose()
import { mapImageDescriptor } from "wasm-cld-toolkit";
import { detranspose, transpose } from "wasm-cld-toolkit/extra";
// Don't do this
{
const uselessImageDataset = {
src: imageSources,
descriptor: mapImageDescriptor(imageBitmaps),
};
// Messy and sluggish
const firstImageData: ImageData = {
src: uselessImageDataset.src[0],
descriptor: uselessImageDataset.descriptor[0],
};
}
// Do this instead
{
// `transpose()` will convert `Object<Array>` into `Array<Object>`
const usefulImageDataset = transpose({
src: imageSources,
descriptor: mapImageDescriptor(imageBitmaps),
});
// The layout is clean and sweat
const firstImageData = usefulImageDataset[0] satisfies {
src: string;
descriptor: string;
};
// There's also a reverse version of this
const detransposedImageDataset = detranspose(usefulImageDataset);
// ...that may help you adding a property
const imageDatasetWithDistance = transpose({
...detransposedImageDataset,
distanceFromRef: mapDescriptorDistance(
refDescriptor,
detransposedImageDataset.descriptor,
),
});
// The layout is still nice and wise
const firstImageDataWithDistance = ImageDataWithDistance[0] satisfies {
src: string;
descriptor: string;
distanceFromRef: number;
};
// Thanks to `transpose()`, sorting is easier than ever
const sortedImageDatasetWithDistance = imageDatasetWithDistance.toSorted(
(left, right) => left.distanceFromRef - right.distanceFromRef,
);
}Sort images by similarity
import { mapDescriptorDistance, mapImageDescriptor } from "wasm-cld-toolkit";
import {
fetchImageBitmap,
detranspose,
transpose,
} from "wasm-cld-toolkit/extra";
// Descriptors are universal across platforms
const refDescriptor = "!#,^W7KWbh!:g*d!<`<#r;cKm!!2usR-jnO!!)fn!_<e%!<<*'!WN";
const remoteImageSources = [
"https://picsum.photos/id/0/100/100",
"https://picsum.photos/id/1/100/100",
// ...
];
const remoteImageBitmaps = await Promise.all(
remoteImageSources.map((src) => fetchImageBitmap(src)),
);
const remoteImageDescriptors = mapImageDescriptor(randomImageBitmaps);
const remoteImageDistancesFromRef = mapDescriptorDistance(
refDescriptor,
remoteImageDescriptors,
);
const remoteImageSimilarityDataset = transpose({
src: remoteImageSources,
distanceFromRef: remoteImageDistancesFromRef,
}).toSorted((left, right) => left.distanceFromRef - right.distanceFromRef);Filter out specific images
import { mapContainsDescriptor, mapImageDescriptor } from "wasm-cld-toolkit";
import { transpose } from "wasm-cld-toolkit/extra";
// Descriptors are universal across platforms
const unwelcomeImageDescriptors = [
"!#,^W7KWbh!:g*d!<`<#r;cKm!!2usR-jnO!!)fn!_<e%!<<*'!WN",
"!#,]F'a=^<!W2ru!!!!!!WW3#!!!!!2`Us?!<E'#!:'Xj!rN,s!!*",
// ...
];
const localImageElements = [
...document.querySelectorAll(".gallery > .image-container > img"),
];
const localImageBitmaps = await Promise.all(
localImageElements.map((imageElement) =>
window.createImageBitmap(imageElement),
),
);
const localImageDescriptors = mapImageDescriptor(localImageBitmaps);
const localImageMatchResults = mapContainsDescriptor(
localImageDescriptors,
unwelcomeImageDescriptors,
);
transpose({
imageElement: localImageElements,
match: localImageMatchResults,
})
.filter((image) => image.match)
.forEach((image) => {
const container = image.imageElement.parentElement;
container.classList.add("hidden");
});Goals
- Be blazingly fast
- Make the dependency very minimal
- Make the binary as small as possible
Non-goals
- Support more expensive descriptors
- Expensive image descriptors, such as the Joint Composite Descriptor (JCD) (and anything else), aren't worth the cost. Some people have tried to replace the good-old CLD with them, but after years of testing, they got nowhere.
- Make the package work without WASM
- This would ruin the whole point. I didn't choose WASM just because it sounds cool, or because I found some handy library, but because ~~Rust is blazingly fast~~ WASM provides greater control over low-level operations.
- Go multithread
- I have two issues with this. First, multithreaded WASM has several caveats. Second, this library is already fast enough for small and large inputs. Introducing an extra pile of code and overhead just to reduce processing time from 1.1ms to 0.9ms is pointless.
Difference between the MPEG-7 Color Layout Descriptor
We said "based on". But what does that mean?
The infamous paywall makes it difficult to find reliable information about MPEG-7 standards. To save you the trouble, we have listed the key differences here so you don't have to look them up:
- Our version uses a larger integers to store coefficients; we no longer use potatoes for storage
- DC coefficients: stored using 16-bit integers instead of 6-bit integers
- AC coefficients: stored using 8-bit integers instead of 5-bit integers
- Our version quantizes each coefficient using their largest possible values as the scaling factor
- Our version stores coefficients in their natural order
[Y'(DC), Y'(AC).., Cb(DC), Cb(AC).., Cr(DC), Cr(AC)..]instead of the standardized[Y'(AC).., Cb(AC).., Cr(AC).., Y'(DC), Cb(DC), Cr(DC)] - Our version stores data length at the beginning of the data instead of the end for easier parsing
- Our version includes a "version" byte at the start to ensure future compatibility
- Our version uses Base85 encoding for ASCII representation instead of inefficient XML
- Our version is strictly focused on coefficients; no space is wasted on unrelated metadata
Under the hood
Calculating a descriptor
The calculation is done in five stages.
Stage 1
In the first stage, 64 (8x8) representative colors will be selected from the image. After fitting the input ImageBitmap on a 128x128 sized OffscreenCanvas (with a fallback to <canvas>), the canvas will be split 8x8 blocks. Each block's representative color will be calculated by averaging its pixels, and the resulting colors will be converted into floating-point 32-bit Y'CbCr colors.
ITU-R BT.601 Y'CbCr

Stage 2
In the second stage, a 2-Dimensional Type-II Discrete Cosine Transform (2D DCT-II) will be applied to each Y'CbCr block, by applying a 1D DCT-II to each row and column.
1-Dimensional Type-II Discrete Cosine Transform:

Stage 3
In the third stage, the result of 2D DCT-II will be converted into a 1D array using the zig-zag scanning pattern from the original MPEG standard:
Zig-Zag Scanning Pattern:

This pattern will arrange the important coefficients at the start and the less important coefficients at the end, allowing safe truncation of the array at any arbitrary length.
Stage 4
After applying stages 1-3 on each channels, each coefficient will be quantized and the results will be packed into a statically sized array of unsigned 8-bit integers. The first coefficient of each channel will be the direct current coefficient (DC), and the rest will be the alternating current coefficients (AC). Since the DC coefficients will have a maximum value of 8.0 (without rounding error), which is greater than twice most AC coefficients, they will be quantized into two bytes by converting them to signed 16-bit integers and then reading them into two unsigned 8-bit integers in little-endian byte order. The AC coefficients will be quantized into signed 8-bit integers and cast directly into unsigned 8-bit integers. The number of coefficients will be set to 20 for the Y' channel and 8 for the Cb and Cr channels. Each coefficient will be uniformly quantized using its largest possible value as the factor.
The length of resulting bytes can be calculated using the function:

Stage 5
In the fifth stage, the coefficients will be combined and serialized.
The quantized coefficients will be serialized using the next byte layout:
[[Version: 0_u8], [Y' coeffs count as u8], [Cb/Cr coeffs count as u8], [Coeffs<Y' coeffs>], [Coeffs<Cb coeffs>], [Coeffs<Cr coeffs>]]Where Coeffs<T> is:
[[DC coeff: i16_le as [u8; 2]], [AC coeffs: i8 as u8]..]After combining the quantized coefficients from stage 4, a hexdump of a CLD descriptor will be similar to the following example:
00000000 00 14 08 e7 42 fd 02 00 fb 02 fc fc 04 fe 01 00 |...çBý..û.üü.þ..|
00000010 f5 fd fb 02 01 01 01 fd dc f0 fc 00 00 01 03 fd |õýû....ýÜðü....ý|
00000020 ff 43 0d 03 ff 00 ff fe 02 01 |ÿC..ÿ.ÿþ..|
0000002aFor Base85 encodings, a non-padded alternative of atob/btoa Ascii85 (!-u) will be used. The later added z-exception (=0x00000000) and y-exception (=0x20202020) will be ignored to ensure the output has a fixed length. The encoded string will be ceil(len * 5 / 4) characters long.
After Base85 encoding the CLD descriptor shown above, the final result will be:
!#,_%6N$rcqZ?]n"TAE%p&4ah!<E6"h!k4A!!*3#s$I4trrE)u!W`Implementation notes
- Stage 1 may introduce a minor "noise" caused by differences in browsers, but this can be ignored as long as you use the provided functions to compare descriptors.
- In stages 3 and 4, the FastDCT algorithm will replace the 8-point DCT-II, and only the required amount of coefficients will be allocated, scanned and quantized, with no truncations performed.
- To dequantize a Base85 encoded descriptor, perform steps 4 and 5 in reverse.
Calculating the distance between descriptors
The formula is straightforward:

...where $w_{Y'}$ and $w_C$ are weights for the coefficients Y' and Cb/Cr, respectively.
Roadmap to 1.0.0
- [ ] Finalize the API
- [x] Fall back to
<canvas>whenOffscreenCanvasis not available - [x] Support
Sharpas an input - [x] Add benchmarks
- [x] Add documentation
- [x] Add examples
- [x] Add tests
License
MIT
Third-party licenses
Third-party licenses for the released package:
rlsf: MIT – yvtonce_cell: MIT – Aleksey Kladov- The Rust Language: MIT – The Rust Project Contributors
js-sys,wasm-bindgen: MIT – Alex Crichtonunchecked-std: 0BSD – lincot
Refer to the LICENSE file for the full license text.
