extremely
v0.0.3
Published
Perceptual, shift-aware UI diffing that scores screenshots the way a human would judge them.
Downloads
337
Maintainers
Readme
extremely
Compare two UI screenshots and score how similar they look to a person.
extremely compares two screenshots, a reference and an output, and scores how alike they look to a person. It ignores the differences a human would never notice (anti-aliasing, sub-perceptual color shifts, content that only moved) and surfaces the ones that matter. Use it to grade generated UIs: one score from 0 to 1, plus a list of what changed and what merely moved.
What it does:
- compares color in CIEDE2000, so only perceptible color differences count
- ignores anti-aliasing from font and edge rendering
- groups changes into regions and detects when a region only moved
- adds multiscale SSIM and GMSD to check overall structure and layout
The headline perceptualScore tracks human judgment. An imperceptible tint scores 1.0, a small layout shift scores near 1.0, and a real color or content change pulls it down.
Install
Install from npm:
npm install extremelyextremely ships prebuilt binaries for macOS, Linux, and Windows, so nothing compiles on install.
Usage
Diff two image files, read the score, then walk the regions:
import { diff } from "extremely";
const result = diff("reference.png", "output.png");
console.log(result.perceptualScore); // 0 to 1; higher is more alike
for (const region of result.regions) {
if (region.kind === "shifted") {
console.log("moved by", region.shift.x, region.shift.y);
} else {
console.log("changed, severity", region.severity);
}
}Set diffImage to also write an annotated PNG (changed regions in red, shifted in blue, anti-aliasing in yellow, over a dimmed reference):
diff("reference.png", "output.png", { diffImage: "diff.png" });extremely ships with TypeScript types.
Options
diff takes an optional third argument. Each option maps to a CLI flag where one exists:
| Option | CLI flag | Default | Description |
| --- | --- | --- | --- |
| diffImage | --diff | | Write an annotated diff PNG to this path |
| colorThreshold | --threshold | 2.3 | CIEDE2000 Delta-E below which two colors count as the same |
| includeAntialiasing | --include-aa | false | Count anti-aliasing edges as real changes |
| maxShiftDistance | --max-shift | 24 | Largest translation, in pixels, searched when testing if a region moved |
| minShiftMatchRatio | | 0.9 | Share of a region one translation must explain to count as a shift |
| minRegionPixelCount | | 8 | Regions smaller than this are never treated as shifts |
| regionClusterGap | | 2 | Gap, in pixels, bridged when clustering mismatched pixels into regions |
CLI
npx extremely <reference> <output> prints a report and accepts every option above, plus two flags that control output rather than scoring:
| Flag | Description |
| --- | --- |
| --json | Print the result as JSON instead of a report |
| --fail-under <score> | Exit non-zero if perceptualScore is below score |
Use --fail-under to gate CI when a generated UI drifts too far from the target:
npx extremely target.png generated.png --diff diff.png --fail-under 0.95perceptual score : 0.9783
similarity : 1.0000
ssim : 0.7728
gmsd : 0.2371
size : 80x80 (6400 px)
pixels : 0 changed, 430 shifted, 430 mismatched
regions : 0 changed, 1 shifted
[0] shifted 29x29 @ (20,20) by (5,5) match 1.00Result
diff returns:
| Field | Type | Description |
| --- | --- | --- |
| perceptualScore | number | Headline score from 0 to 1; 1 means identical to a person. Gate on this |
| similarity | number | Share of pixels left unchanged, shifts excluded, 0 to 1 |
| ssim | number | Multiscale SSIM over CIELAB, 0 to 1 (1 is structurally identical) |
| gmsd | number | Gradient magnitude similarity deviation, 0 or higher (0 is identical edges) |
| width, height | number | Image size in pixels |
| pixels | object | Pixel counts: total, mismatched, changed, shifted |
| regions | array | One entry per differing region, described below |
Each region:
| Field | Type | Description |
| --- | --- | --- |
| kind | string | "changed" or "shifted" |
| bounds | object | Bounding box { x, y, width, height } in pixels |
| pixelCount | number | Pixels in the region |
| shift | object | { x, y, matchRatio }, present only when kind is "shifted" |
| severity | number | Perceptual severity from 0 to 1, present only when kind is "changed" |
Pure layout shifts barely move perceptualScore. Real color or content changes and broken structure pull it down, and sub-perceptual color differences are ignored entirely.
How it works
extremely scores a pair of images in six steps:
- Convert both images to CIELAB and compare pixels with CIEDE2000, so only perceptible color differences are flagged.
- Drop flagged pixels that look like anti-aliasing, treating them as rendering noise.
- Cluster the remaining mismatches into connected regions, bridging small gaps so one UI element stays one region.
- Test each region against candidate translations. If one explains the mismatch in both directions, report the region as
shiftedwith its vector instead ofchanged. - Score overall structure with multiscale SSIM over CIELAB and edge layout with GMSD.
- Combine everything into
perceptualScore, forgiving the structural penalty for content already explained as a shift.
License
MIT
