extremely

v0.0.3

Published

6 days ago

Perceptual, shift-aware UI diffing that scores screenshots the way a human would judge them.

Downloads

337

0High
0Medium
0Low

abai

visual-regression image diff perceptual screenshot

extremely

Compare two UI screenshots and score how similar they look to a person.

extremely compares two screenshots, a reference and an output, and scores how alike they look to a person. It ignores the differences a human would never notice (anti-aliasing, sub-perceptual color shifts, content that only moved) and surfaces the ones that matter. Use it to grade generated UIs: one score from 0 to 1, plus a list of what changed and what merely moved.

What it does:

compares color in CIEDE2000, so only perceptible color differences count
ignores anti-aliasing from font and edge rendering
groups changes into regions and detects when a region only moved
adds multiscale SSIM and GMSD to check overall structure and layout

The headline perceptualScore tracks human judgment. An imperceptible tint scores 1.0, a small layout shift scores near 1.0, and a real color or content change pulls it down.

Install

Install from npm:

npm install extremely

extremely ships prebuilt binaries for macOS, Linux, and Windows, so nothing compiles on install.

Usage

Diff two image files, read the score, then walk the regions:

import { diff } from "extremely";

const result = diff("reference.png", "output.png");

console.log(result.perceptualScore); // 0 to 1; higher is more alike

for (const region of result.regions) {
  if (region.kind === "shifted") {
    console.log("moved by", region.shift.x, region.shift.y);
  } else {
    console.log("changed, severity", region.severity);
  }
}

Set diffImage to also write an annotated PNG (changed regions in red, shifted in blue, anti-aliasing in yellow, over a dimmed reference):

diff("reference.png", "output.png", { diffImage: "diff.png" });

extremely ships with TypeScript types.

Options

diff takes an optional third argument. Each option maps to a CLI flag where one exists:

| Option | CLI flag | Default | Description | | --- | --- | --- | --- | | diffImage | --diff | | Write an annotated diff PNG to this path | | colorThreshold | --threshold | 2.3 | CIEDE2000 Delta-E below which two colors count as the same | | includeAntialiasing | --include-aa | false | Count anti-aliasing edges as real changes | | maxShiftDistance | --max-shift | 24 | Largest translation, in pixels, searched when testing if a region moved | | minShiftMatchRatio | | 0.9 | Share of a region one translation must explain to count as a shift | | minRegionPixelCount | | 8 | Regions smaller than this are never treated as shifts | | regionClusterGap | | 2 | Gap, in pixels, bridged when clustering mismatched pixels into regions |

CLI

npx extremely <reference> <output> prints a report and accepts every option above, plus two flags that control output rather than scoring:

| Flag | Description | | --- | --- | | --json | Print the result as JSON instead of a report | | --fail-under <score> | Exit non-zero if perceptualScore is below score |

Use --fail-under to gate CI when a generated UI drifts too far from the target:

npx extremely target.png generated.png --diff diff.png --fail-under 0.95

perceptual score : 0.9783
similarity       : 1.0000
ssim             : 0.7728
gmsd             : 0.2371
size             : 80x80 (6400 px)
pixels           : 0 changed, 430 shifted, 430 mismatched
regions          : 0 changed, 1 shifted
  [0] shifted  29x29 @ (20,20)  by (5,5) match 1.00

Result

diff returns:

| Field | Type | Description | | --- | --- | --- | | perceptualScore | number | Headline score from 0 to 1; 1 means identical to a person. Gate on this | | similarity | number | Share of pixels left unchanged, shifts excluded, 0 to 1 | | ssim | number | Multiscale SSIM over CIELAB, 0 to 1 (1 is structurally identical) | | gmsd | number | Gradient magnitude similarity deviation, 0 or higher (0 is identical edges) | | width, height | number | Image size in pixels | | pixels | object | Pixel counts: total, mismatched, changed, shifted | | regions | array | One entry per differing region, described below |

Each region:

| Field | Type | Description | | --- | --- | --- | | kind | string | "changed" or "shifted" | | bounds | object | Bounding box { x, y, width, height } in pixels | | pixelCount | number | Pixels in the region | | shift | object | { x, y, matchRatio }, present only when kind is "shifted" | | severity | number | Perceptual severity from 0 to 1, present only when kind is "changed" |

Pure layout shifts barely move perceptualScore. Real color or content changes and broken structure pull it down, and sub-perceptual color differences are ignored entirely.

How it works

extremely scores a pair of images in six steps:

Convert both images to CIELAB and compare pixels with CIEDE2000, so only perceptible color differences are flagged.
Drop flagged pixels that look like anti-aliasing, treating them as rendering noise.
Cluster the remaining mismatches into connected regions, bridging small gaps so one UI element stays one region.
Test each region against candidate translations. If one explains the mismatch in both directions, report the region as shifted with its vector instead of changed.
Score overall structure with multiscale SSIM over CIELAB and edge layout with GMSD.
Combine everything into perceptualScore, forgiving the structural penalty for content already explained as a shift.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

extremely

Install

Usage

Options

CLI

Result

How it works

License