@mikeyfrilot/websketch-ir
v0.1.0
Published
Grammar-based web UI representation for LLM consumption. Stop treating webpages like pictures.
Maintainers
Readme
@websketch/ir
"Stop treating webpages like pictures."
A grammar-based representation of web UI for LLM consumption. Instead of screenshots, WebSketch IR captures the intent and structure of a webpage in a compact, deterministic format that LLMs can reason about directly.
Why?
Current AI browser agents rely on:
- Screenshots - expensive, lossy, requires vision models
- Raw DOM - verbose, framework noise, hard to reason about
WebSketch IR compiles the DOM into a small vocabulary of UI primitives (NAV, FORM, BUTTON, INPUT, etc.) with normalized geometry. The result is:
- Invariant - stable across CSS changes, framework differences, viewport sizes
- Compact - ~100x smaller than screenshots
- Readable - ASCII rendering an LLM can understand without vision
Installation
npm install @websketch/irQuick Example
import { renderAscii, diff, fingerprintCapture } from '@websketch/ir';
// Load a capture (from browser extension or other source)
const capture = JSON.parse(fs.readFileSync('capture.json', 'utf-8'));
// Render to ASCII (LLM-readable)
console.log(renderAscii(capture));
// Output:
// ┌──────────────────────────────────────────────────────────────────────────────┐
// │[NAV:primary_nav] │
// ├──────────────────────────────────────────────────────────────────────────────┤
// │ ┌────────────────────────────────────────┐ │
// │ │[FRM:login] │ │
// │ │ [INP:email] │ │
// │ │ [INP:password] │ │
// │ │ [BTN:primary_cta] │ │
// │ └────────────────────────────────────────┘ │
// └──────────────────────────────────────────────────────────────────────────────┘
// Compare two captures
const result = diff(captureA, captureB);
console.log(result.summary);
// { added: 2, removed: 0, moved: 3, resized: 1, text_changed: 1 }
// Get structural fingerprint
console.log(fingerprintCapture(capture));
// "e33442b6"API
Rendering
// ASCII rendering (default 80x24)
renderAscii(capture, options?)
// LLM-optimized format with metadata and legend
renderForLLM(capture)
// Minimal structure-only view
renderStructure(capture, width?, height?)Fingerprinting
// Full structural fingerprint (includes text)
fingerprintCapture(capture)
// Layout-only fingerprint (ignores text changes)
fingerprintLayout(capture)Diffing
// Compare two captures
const result = diff(captureA, captureB, {
includeText: true, // Include text changes (default: true)
matchThreshold: 0.5, // Node matching threshold (default: 0.5)
topChangesLimit: 10, // Max top changes to return (default: 10)
});
// Human-readable diff report
formatDiff(result)
// JSON output
formatDiffJson(result)Grammar
WebSketch IR uses a small vocabulary of UI primitives:
| Role | Description |
|------|-------------|
| PAGE | Root container |
| NAV | Navigation (sidebar, navbar, menu) |
| HEADER | Page/section header |
| FOOTER | Page/section footer |
| SECTION | Generic content section |
| FORM | Form container |
| LIST | Repeated items |
| CARD | Content card/tile |
| TABLE | Tabular data |
| MODAL | Modal dialog |
| TOAST | Notification |
| INPUT | Text input |
| BUTTON | Action trigger |
| LINK | Navigation trigger |
| DROPDOWN | Select/menu |
| CHECKBOX | Boolean toggle |
| RADIO | Single-select option |
| IMAGE | Visual content |
| TEXT | Text block |
Capture Format (v0.1)
interface WebSketchCapture {
version: "0.1";
url: string;
timestamp_ms: number;
viewport: {
w_px: number;
h_px: number;
aspect: number;
};
compiler: {
name: string;
version: string;
options_hash: string;
};
root: UINode;
}
interface UINode {
id: string;
role: UIRole;
bbox: [x: number, y: number, w: number, h: number]; // 0-1 normalized
visible: boolean;
interactive: boolean;
semantic?: string; // e.g., "login", "search", "primary_cta"
text?: TextSignal; // hash + length, not content
children?: UINode[];
}Related Packages
websketch- CLI toolwebsketch-extension- Chrome extension for capturewebsketch-mcp- MCP tools for AI agents
Design Principles
- Rendered layout is truth - use
getBoundingClientRect, not DOM positions - UI intent > DOM structure - compile to small primitive vocabulary
- Normalize aggressively - remove stylistic noise; keep geometry + interactivity
- Stable under "div soup" - framework wrappers collapse automatically
License
MIT
