@mikeyfrilot/websketch-ir

v0.1.0

Published

14 days ago

Grammar-based web UI representation for LLM consumption. Stop treating webpages like pictures.

0High
0Medium
0Low

mcptoolshop

websketch ui dom llm ai browser-automation web-agent accessibility visual-regression mcp

@websketch/ir

"Stop treating webpages like pictures."

A grammar-based representation of web UI for LLM consumption. Instead of screenshots, WebSketch IR captures the intent and structure of a webpage in a compact, deterministic format that LLMs can reason about directly.

Why?

Current AI browser agents rely on:

Screenshots - expensive, lossy, requires vision models
Raw DOM - verbose, framework noise, hard to reason about

WebSketch IR compiles the DOM into a small vocabulary of UI primitives (NAV, FORM, BUTTON, INPUT, etc.) with normalized geometry. The result is:

Invariant - stable across CSS changes, framework differences, viewport sizes
Compact - ~100x smaller than screenshots
Readable - ASCII rendering an LLM can understand without vision

Installation

npm install @websketch/ir

Quick Example

import { renderAscii, diff, fingerprintCapture } from '@websketch/ir';

// Load a capture (from browser extension or other source)
const capture = JSON.parse(fs.readFileSync('capture.json', 'utf-8'));

// Render to ASCII (LLM-readable)
console.log(renderAscii(capture));

// Output:
// ┌──────────────────────────────────────────────────────────────────────────────┐
// │[NAV:primary_nav]                                                             │
// ├──────────────────────────────────────────────────────────────────────────────┤
// │                    ┌────────────────────────────────────────┐                │
// │                    │[FRM:login]                             │                │
// │                    │  [INP:email]                           │                │
// │                    │  [INP:password]                        │                │
// │                    │  [BTN:primary_cta]                     │                │
// │                    └────────────────────────────────────────┘                │
// └──────────────────────────────────────────────────────────────────────────────┘

// Compare two captures
const result = diff(captureA, captureB);
console.log(result.summary);
// { added: 2, removed: 0, moved: 3, resized: 1, text_changed: 1 }

// Get structural fingerprint
console.log(fingerprintCapture(capture));
// "e33442b6"

API

Rendering

// ASCII rendering (default 80x24)
renderAscii(capture, options?)

// LLM-optimized format with metadata and legend
renderForLLM(capture)

// Minimal structure-only view
renderStructure(capture, width?, height?)

Fingerprinting

// Full structural fingerprint (includes text)
fingerprintCapture(capture)

// Layout-only fingerprint (ignores text changes)
fingerprintLayout(capture)

Diffing

// Compare two captures
const result = diff(captureA, captureB, {
  includeText: true,      // Include text changes (default: true)
  matchThreshold: 0.5,    // Node matching threshold (default: 0.5)
  topChangesLimit: 10,    // Max top changes to return (default: 10)
});

// Human-readable diff report
formatDiff(result)

// JSON output
formatDiffJson(result)

Grammar

WebSketch IR uses a small vocabulary of UI primitives:

| Role | Description | |------|-------------| | PAGE | Root container | | NAV | Navigation (sidebar, navbar, menu) | | HEADER | Page/section header | | FOOTER | Page/section footer | | SECTION | Generic content section | | FORM | Form container | | LIST | Repeated items | | CARD | Content card/tile | | TABLE | Tabular data | | MODAL | Modal dialog | | TOAST | Notification | | INPUT | Text input | | BUTTON | Action trigger | | LINK | Navigation trigger | | DROPDOWN | Select/menu | | CHECKBOX | Boolean toggle | | RADIO | Single-select option | | IMAGE | Visual content | | TEXT | Text block |

Capture Format (v0.1)

interface WebSketchCapture {
  version: "0.1";
  url: string;
  timestamp_ms: number;
  viewport: {
    w_px: number;
    h_px: number;
    aspect: number;
  };
  compiler: {
    name: string;
    version: string;
    options_hash: string;
  };
  root: UINode;
}

interface UINode {
  id: string;
  role: UIRole;
  bbox: [x: number, y: number, w: number, h: number]; // 0-1 normalized
  visible: boolean;
  interactive: boolean;
  semantic?: string;      // e.g., "login", "search", "primary_cta"
  text?: TextSignal;      // hash + length, not content
  children?: UINode[];
}

Related Packages

websketch - CLI tool
websketch-extension - Chrome extension for capture
websketch-mcp - MCP tools for AI agents

Design Principles

Rendered layout is truth - use getBoundingClientRect, not DOM positions
UI intent > DOM structure - compile to small primitive vocabulary
Normalize aggressively - remove stylistic noise; keep geometry + interactivity
Stable under "div soup" - framework wrappers collapse automatically

License

MIT

Author

MCP Tool Shop