@rastaweb/domoscope
v1.0.0
Published
Domoscope is an HTML diff engine with intelligent DOM comparison, configurable tracking, and comprehensive statistics
Downloads
48
Maintainers
Readme
🔍 Domoscope
Advanced HTML diff engine with intelligent DOM comparison and comprehensive change statistics.
Domoscope is a TypeScript library for comparing HTML content with intelligent element matching, word-level text diffing, and detailed change tracking.
� Installation
npm install @rastaweb/domoscopeyarn add @rastaweb/domoscopepnpm add @rastaweb/domoscopeRequirements: Node.js ≥16.0.0, TypeScript ≥4.5.0 (optional)
🚀 Quick Start
Basic Usage
import { getCustomDiffStats } from '@rastaweb/domoscope';
const oldHTML = '<div><p>Original content</p></div>';
const newHTML = '<div><p>Modified content</p><span>Added content</span></div>';
const { diffResult, stats } = getCustomDiffStats(oldHTML, newHTML);
console.log(`Added ${stats.totalAddedTags} elements`);
console.log(`Removed ${stats.totalRemovedTags} elements`);
console.log(`Changed ${stats.totalChangedTags} elements`);TypeScript Usage
import {
getCustomDiffStats,
compareElements,
formatTagStatsSummary,
type DiffStats,
type ExtendedCompareOptions,
} from '@rastaweb/domoscope';
const options: ExtendedCompareOptions = {
addedClass: 'highlight-added',
removedClass: 'highlight-removed',
watchedTags: ['img', 'a', 'button'],
minSimilarityThreshold: 0.3,
};
const result = getCustomDiffStats(oldHTML, newHTML, options);
const summary = formatTagStatsSummary(result.stats);📚 API Reference
Core Functions
getCustomDiffStats(oldHTML, newHTML, options?)
High-level function that parses HTML, performs diff, and returns both modified DOM and statistics.
function getCustomDiffStats(
oldHTML: string,
newHTML: string,
options?: ExtendedCompareOptions
): DiffResultWithStats;Parameters:
oldHTML(string): Original HTML contentnewHTML(string): Modified HTML contentoptions(ExtendedCompareOptions, optional): Configuration options
Returns: Object with diffResult and stats properties
Example:
const result = getCustomDiffStats('<div>Old</div>', '<div>New</div>', {
addedClass: 'added',
removedClass: 'removed',
});compareElements(oldElements, newElements, options?)
Compare arrays of DOM elements with intelligent pairing and recursive diffing.
function compareElements(
oldElements: Element[],
newElements: Element[],
options?: ExtendedCompareOptions
): void;Parameters:
oldElements(Element[]): Array of original elementsnewElements(Element[]): Array of modified elementsoptions(ExtendedCompareOptions, optional): Configuration options
Side Effects: Modifies the DOM in-place with diff annotations
Example:
const oldTree = stringToFlatTree('<div><p>Content</p></div>');
const newTree = stringToFlatTree('<div><p>New content</p></div>');
compareElements(oldTree.rootElements, newTree.rootElements);formatTagStatsSummary(stats)
Generate human-readable summary of diff statistics.
function formatTagStatsSummary(stats: DiffStats): string;Parameters:
stats(DiffStats): Statistics object from diff operation
Returns: Multi-line string with formatted statistics
Example:
const summary = formatTagStatsSummary(stats);
console.log(summary); // "DOMOSCOPE DIFF STATISTICS\n Added: 2 elements..."getChangedTagsList(stats)
Extract list of changed tags with their attributes.
function getChangedTagsList(stats: DiffStats): Array<{
tagName: string;
count: number;
changedAttributes: string[];
}>;Utility Functions
stringToFlatTree(html)
Parse HTML string into structured tree representation.
function stringToFlatTree(html: string): {
rootElements: Element[];
allElements: Element[];
};Time Complexity: O(n) where n is number of DOM nodes
Space Complexity: O(n)
validateHTML(html)
Validate HTML string and return parsing information.
function validateHTML(html: string): {
isValid: boolean;
errors: string[];
};Algorithm Functions
computeLCS(a, b)
Compute Longest Common Subsequence using dynamic programming.
function computeLCS(a: string[], b: string[]): Array<[number, number]>;Time Complexity: O(a × b)
Space Complexity: O(min(a, b))
elementSimilarity(elementA, elementB, enableMemoization?)
Calculate similarity score between two DOM elements.
function elementSimilarity(
elementA: Element,
elementB: Element,
enableMemoization?: boolean
): number;Returns: Similarity score (higher = more similar)
Scoring Algorithm:
- ID exact match: +10 points
- Tag name match: +5 points
- Class overlap: +N points (N = shared classes)
- Attribute similarity: +0.5 × N points
- Text content overlap: +0.3 × N points
computeWordDiff(oldText, newText)
Perform word-level diff on text content.
function computeWordDiff(oldText: string, newText: string): Token[];Returns: Array of tokens with change types (equal, added, removed)
Performance Functions
clearCaches()
Clear all memoization caches to free memory.
function clearCaches(): void;getCacheStats()
Get cache performance statistics.
function getCacheStats(): {
lcsCache: { size: number; hits: number; misses: number };
similarityCache: { size: number; hits: number; misses: number };
};getPerformanceMetrics()
Get detailed performance metrics.
function getPerformanceMetrics(): PerformanceMetrics;⚙️ Configuration Options
ExtendedCompareOptions
Complete configuration interface combining style, tracking, and performance options.
interface ExtendedCompareOptions {
// Style Configuration
addedClass?: string; // Default: "diff-added"
removedClass?: string; // Default: "diff-removed"
elementChangeClass?: string; // Default: "diff-elem-changed"
attributeChangeClass?: string; // Default: "diff-attr-changed"
wrapperTag?: string; // Default: "span"
textWrapperTag?: string; // Default: same as wrapperTag
addedWrapperTag?: string; // Default: same as wrapperTag
removedWrapperTag?: string; // Default: same as wrapperTag
changedWrapperTag?: string; // Default: same as wrapperTag
// Tracking Configuration
watchedTags?: string[]; // Tags to track for special handling
trackedTags?: string[] | Record<string, string[]>; // Tags and attributes to track
trackedAttributes?: string[]; // Global attribute filter
// Performance Configuration
maxTextLength?: number; // Default: 10000
minSimilarityThreshold?: number; // Default: 0
enableMemoization?: boolean; // Default: true
ignoreWhitespaceTexts?: boolean; // Default: false
// Custom Handlers
onElementChange?: ElementChangeHandler;
}Common Configuration Examples
// Basic styling
const styleConfig = {
addedClass: 'highlight-green',
removedClass: 'highlight-red',
wrapperTag: 'mark',
};
// Performance optimization
const performanceConfig = {
minSimilarityThreshold: 0.3,
maxTextLength: 5000,
enableMemoization: true,
};
// Selective tracking
const trackingConfig = {
watchedTags: ['img', 'a', 'button'],
trackedAttributes: ['href', 'src', 'class', 'id'],
};
// Combined configuration
const fullConfig = {
...styleConfig,
...performanceConfig,
...trackingConfig,
};🔄 Algorithm Flow & Implementation
System Overview
flowchart TD
A[HTML Input] --> B[HTML Parsing]
B --> C[Element Arrays]
C --> D[Element Matching]
D --> E[Recursive Comparison]
E --> F[Text Diffing]
F --> G[Statistics Collection]
G --> H[Annotated DOM + Stats]
subgraph "Element Matching Algorithm"
D1[Similarity Matrix] --> D2[Best Match Selection]
D2 --> D3[Pairing Results]
end
subgraph "Text Diffing Process"
F1[Tokenization] --> F2[LCS Computation]
F2 --> F3[Token Classification]
F3 --> F4[DOM Annotation]
end
D --> D1
F --> F1Core Diff Algorithm Steps
- HTML Parsing: Parse input strings into DOM element trees using
stringToFlatTree() - Element Pool Creation: Create sets of old and new elements for matching
- Similarity Computation: Calculate similarity scores using multi-factor algorithm
- Element Pairing: Find optimal element matches using similarity thresholds
- Recursive Processing: For paired elements, recursively compare child nodes
- LCS Alignment: Align child nodes using Longest Common Subsequence algorithm
- Text Diffing: Perform word-level diff on text content with Unicode support
- DOM Annotation: Apply CSS classes and wrapper elements to indicate changes
- Statistics Collection: Gather comprehensive metrics about detected changes
Element Similarity Algorithm
flowchart LR
A[Element A] --> C[Similarity Calculator]
B[Element B] --> C
C --> D[ID Match: +10]
C --> E[Tag Match: +5]
C --> F[Class Overlap: +N]
C --> G[Attribute Similarity: +0.5N]
C --> H[Text Overlap: +0.3N]
C --> I[Structure Score: +1]
D --> J[Total Score]
E --> J
F --> J
G --> J
H --> J
I --> JText Diffing Process
- Tokenization: Split text into words and punctuation using Unicode-aware regex
- LCS Computation: Find longest common subsequence of tokens
- Classification: Mark tokens as
equal,added, orremoved - Merging: Combine consecutive tokens of same type
- DOM Generation: Create document fragments with appropriate wrapper elements
📊 Data Types
DiffStats
Comprehensive statistics about detected changes.
interface DiffStats {
totalChangedTags: number; // Elements with tag/attribute changes
totalAddedTexts: number; // Added text spans/nodes
totalRemovedTexts: number; // Removed text spans/nodes
totalAddedTags: number; // Newly added elements
totalRemovedTags: number; // Removed elements
totalAddedWords: number; // Total words added
totalRemovedWords: number; // Total words removed
addedTags?: Record<string, number>; // Per-tag addition counts
removedTags?: Record<string, number>; // Per-tag removal counts
changedTags?: Record<
string,
{
// Per-tag change details
count: number;
changedAttributes: string[];
}
>;
}DiffResultWithStats
Complete result including both DOM modifications and statistics.
interface DiffResultWithStats {
diffResult: {
oldRootElements: Element[]; // Root elements from old content
newRootElements: Element[]; // Root elements from new content
rootElements: Element[]; // All root elements (compatibility)
allElements: Element[]; // All elements from both trees
};
stats: DiffStats;
}Token
Individual unit in word-level diff.
interface Token {
type: 'equal' | 'added' | 'removed';
text: string;
}� Error Handling & Common Issues
HTML Parsing Errors
// Validate HTML before processing
const validation = validateHTML(htmlString);
if (!validation.isValid) {
console.error('HTML validation failed:', validation.errors);
}Performance Issues
// For large documents, adjust performance settings
const options = {
maxTextLength: 1000, // Limit text diff size
minSimilarityThreshold: 0.5, // Raise threshold for faster matching
enableMemoization: true, // Enable caching
};Memory Management
// Clear caches periodically for long-running applications
import { clearCaches } from '@rastaweb/domoscope';
clearCaches(); // Frees all memoization memoryCommon Pitfalls
- Large Text Blocks: Word-level diffing becomes slow with very large text. Use
maxTextLengthoption. - Memory Leaks: Clear caches in long-running applications to prevent memory growth.
- Invalid HTML: Always validate HTML input, especially from user sources.
- Case Sensitivity: Element tag names are case-insensitive, but attributes are case-sensitive.
🎯 Best Practices & Performance Tips
Optimal Configuration
// For content management systems
const cmsConfig = {
watchedTags: ['img', 'a', 'video', 'iframe'],
trackedAttributes: ['src', 'href', 'class'],
minSimilarityThreshold: 0.3,
maxTextLength: 5000,
};
// For code diff (low similarity tolerance)
const codeConfig = {
minSimilarityThreshold: 0.8,
enableMemoization: true,
wrapperTag: 'mark',
};
// For large documents (performance focused)
const performanceConfig = {
minSimilarityThreshold: 0.5,
maxTextLength: 2000,
enableMemoization: true,
ignoreWhitespaceTexts: true,
};Memory Optimization
// Monitor cache performance
const cacheStats = getCacheStats();
if (cacheStats.lcsCache.size > 1000) {
clearCaches();
}
// Disable memoization for one-time operations
getCustomDiffStats(oldHTML, newHTML, { enableMemoization: false });DOM Structure Recommendations
- Use semantic HTML for better element matching
- Include stable
idattributes for important elements - Use consistent
classnaming for similar content types - Avoid deeply nested structures when possible
🔧 Runtime Behavior & Lifecycle
Initialization
// Library is stateless - no global initialization required
import { getCustomDiffStats } from '@rastaweb/domoscope';
// Each function call is independent
const result = getCustomDiffStats(html1, html2);Memory Management
- Caches: LCS and similarity computations are memoized by default
- Cleanup: Caches auto-expire after 5 minutes
- Size Limits: Caches are limited to 1000 entries each
- Manual Control: Use
clearCaches()for explicit cleanup
Concurrency Model
- Synchronous: All operations are synchronous - no async/await needed
- Thread Safe: Pure functions with no shared mutable state
- Browser Compatible: Works in both Node.js and browser environments
🧪 Examples & Use Cases
Content Management System
// Track content changes in CMS
const { stats } = getCustomDiffStats(originalArticle, editedArticle, {
watchedTags: ['img', 'a', 'blockquote'],
trackedAttributes: ['src', 'href', 'alt'],
});
console.log(
`Article edited: ${stats.totalAddedWords} words added, ${stats.totalRemovedWords} removed`
);Version Control Interface
// Show file differences in version control UI
const { diffResult } = getCustomDiffStats(oldVersion, newVersion, {
addedClass: 'git-added',
removedClass: 'git-removed',
wrapperTag: 'mark',
});
// Render diffResult.rootElements in UIAutomated Testing
// Assert content changes in tests
const { stats } = getCustomDiffStats(beforeHTML, afterHTML);
expect(stats.totalAddedTags).toBe(1);
expect(stats.addedTags?.button).toBe(1);Email Template Comparison
// Compare email template versions
const { stats } = getCustomDiffStats(template1, template2, {
watchedTags: ['img', 'a', 'table'],
trackedAttributes: ['src', 'href', 'style', 'width', 'height'],
});
const report = formatTagStatsSummary(stats);📈 Performance Characteristics
Algorithm Complexity
| Operation | Time Complexity | Space Complexity | Notes | | --------------------- | --------------- | ---------------- | -------------------------- | | HTML Parsing | O(n) | O(n) | n = DOM nodes | | Element Matching | O(n×m×k) | O(n+m) | k = similarity computation | | LCS Computation | O(a×b) | O(min(a,b)) | a,b = token arrays | | Text Tokenization | O(t) | O(tokens) | t = text length | | Statistics Collection | O(elements) | O(tags) | Linear scan |
Memory Usage
- Base Library: ~50KB minified
- Cache Memory: ~1MB max (auto-managed)
- DOM Overhead: Proportional to input size
- Peak Usage: 3-5x input HTML size during processing
Performance Benchmarks
- Small Documents (<1KB): <1ms
- Medium Documents (10KB): 10-50ms
- Large Documents (100KB): 100-500ms
- Very Large Documents (1MB+): Use performance settings
⚙️ Compatibility & Requirements
Environment Support
- Node.js: ≥16.0.0
- TypeScript: ≥4.5.0 (optional)
- Browsers: Modern browsers with ES2022 support
- Module Formats: ESM only (use
type: "module")
Dependencies
- Runtime: None (zero dependencies)
- Peer Dependencies: TypeScript ≥4.5.0 (optional)
- Dev Dependencies: Jest, TypeScript, ESLint, Prettier
Browser Compatibility
- Chrome: ≥91
- Firefox: ≥90
- Safari: ≥14
- Edge: ≥91
🧪 Testing & Examples
Running Tests
npm test # Run test suite
npm run test:watch # Watch mode
npm run test:coverage # Coverage reportExample Projects
examples/comprehensive-examples.mjs- Complete usage examplesplayground/playground.html- Interactive browser demotests/simple.test.js- Basic functionality tests
Test Output Example
PASS tests/simple.test.js
✓ should perform basic diff operation (5ms)
✓ should generate statistics summary (3ms)
✓ should handle empty content (2ms)
Test Suites: 1 passed, 1 total
Tests: 3 passed, 3 total📋 API Surface Summary
| Export | Type | Description |
| ----------------------- | -------- | ----------------------------- |
| getCustomDiffStats | Function | Main high-level diff function |
| compareElements | Function | Element array comparison |
| formatTagStatsSummary | Function | Statistics formatting |
| getChangedTagsList | Function | Extract changed tag info |
| stringToFlatTree | Function | HTML parsing utility |
| validateHTML | Function | HTML validation |
| computeLCS | Function | LCS algorithm |
| elementSimilarity | Function | Element similarity scoring |
| computeWordDiff | Function | Word-level text diff |
| clearCaches | Function | Memory management |
| getCacheStats | Function | Cache performance metrics |
| DiffEngine | Class | Core diff engine |
| StatsCollector | Class | Statistics collection |
| ConfigBuilder | Class | Fluent configuration |
| ConfigPresets | Object | Predefined configurations |
📄 License
MIT License - see LICENSE file for details.
Repository: https://github.com/rastaweb/domoscope
Issues: https://github.com/rastaweb/domoscope/issues
Author: kamran taghinejad
- LCS Algorithm: Optimized Longest Common Subsequence implementation with dynamic programming
- Text-Level Diffing: Word-by-word and character-level comparison with tokenization
- Element Similarity Scoring: Multi-factor scoring including tag names, attributes, and content
🎨 Configuration & Customization
- Fluent Builder API:
ConfigBuilderwith method chaining for easy configuration - Configuration Presets: Pre-built configurations for common scenarios (CMS, forms, navigation, performance)
- Flexible Tracking: Configurable tag and attribute tracking with wildcard support
- Custom CSS Classes: Configurable styling for added, removed, and changed content
- Wrapper Element Control: Customizable HTML wrapper tags for different change types
- Element Change Handlers: Custom callbacks for handling specific element changes
📊 Advanced Statistics & Analytics
- Comprehensive Change Metrics: Detailed statistics with per-tag breakdown
- Performance Monitoring: Built-in timing and cache performance metrics
- Accurate Change Counting: Precise statistics that count element changes once (not per DOM tree)
- Changed Tags Analysis: Detailed tracking of which tags and attributes changed
- Statistics Formatting: Human-readable summary formatting for debugging and reporting
⚡ Performance & Optimization
- Memoization & Caching: Advanced caching with configurable TTL and size limits
- Dynamic Programming: Space-optimized algorithms for large content comparison
- Cache Management: Manual cache control with statistics and configuration
- Performance Metrics: Detailed timing breakdown for pairing, LCS, and text diffing
- Configurable Thresholds: Similarity thresholds and text length limits for optimization
🧩 Architecture & Engineering
- Modular Architecture: SOLID principles with dependency inversion and clean interfaces
- TypeScript First: Complete type safety with branded types and strict null checking
- ES Modules: Modern module system with proper exports and imports
- Universal Compatibility: Browser & Node.js support with ES modules and CommonJS
- Extensible Design: Plugin-friendly architecture for custom extensions
🌍 Text & Internationalization
- Unicode Support: Enhanced tokenization for international text and complex scripts
- Multi-language Text Processing: Persian, Arabic, Chinese, and complex script handling
- Smart Tokenization: Context-aware text splitting with punctuation and whitespace handling
- HTML Validation: Built-in HTML parsing and validation utilities
🔧 Developer Experience
- Interactive Playground: Built-in HTML playground for testing and experimentation
- Algorithm Transparency: Detailed flow documentation with visual algorithm diagrams
- Comprehensive API: Multiple levels of API from high-level to low-level utilities
- Error Handling: Robust error handling with detailed error messages
- Configuration Validation: Built-in validation for configuration options
🔬 Algorithm Flow Diagram
The core diff algorithm follows a sophisticated multi-stage process:
flowchart TD
subgraph Input["🔄 Input Processing"]
A1[HTML String 1] --> B1[Parse & Validate]
A2[HTML String 2] --> B2[Parse & Validate]
B1 --> C1[stringToFlatTree]
B2 --> C2[stringToFlatTree]
C1 --> D1[Element Arrays]
C2 --> D2[Element Arrays]
end
subgraph Matching["🎯 Element Matching Phase"]
D1 --> E[Element Pool Creation]
D2 --> E
E --> F[Similarity Matrix Computation]
F --> G{elementSimilarity}
G --> H[Best Match Selection]
H --> I[Pairing Results]
subgraph SimilarityAlgo["📏 Similarity Algorithm"]
G1[ID Exact Match: +10]
G2[Tag Name Match: +5]
G3[Class Overlap: +N]
G4[Attribute Similarity: +0.5*N]
G5[Text Token Overlap: +0.3*N]
G6[Structure Similarity: +1]
end
G --> SimilarityAlgo
end
subgraph Processing["⚙️ Diff Processing Phase"]
I --> J[Paired Elements]
I --> K[Unmatched Old]
I --> L[Unmatched New]
J --> M{compareNode}
K --> N[Mark as Removed]
L --> O[Mark as Added]
subgraph NodeComparison["🔍 Node Comparison"]
M1[Element Change Detection]
M2[Child Node Alignment]
M3[LCS Algorithm]
M4[Text Content Diffing]
M5[Recursive Processing]
end
M --> NodeComparison
end
subgraph LCS["📐 LCS Algorithm Detail"]
P1[Build Node Keys]
P2[Dynamic Programming Matrix]
P3[Optimal Path Backtracking]
P4[Match Sequence Generation]
P1 --> P2
P2 --> P3
P3 --> P4
end
subgraph TextDiff["📝 Text Diffing Algorithm"]
T1[tokenize Text]
T2[LCS on Tokens]
T3[Build Diff Tokens]
T4[Merge Consecutive]
T5[fragmentFromTokens]
T1 --> T2
T2 --> T3
T3 --> T4
T4 --> T5
end
subgraph Output["📊 Output Generation"]
Q1[DOM with Annotations]
Q2[Statistics Collection]
Q3[Performance Metrics]
Q4[Formatted Results]
end
NodeComparison --> LCS
NodeComparison --> TextDiff
N --> Q1
O --> Q1
NodeComparison --> Q1
Q1 --> Q2
Q2 --> Q3
Q3 --> Q4
style Input fill:#e1f5fe
style Matching fill:#f3e5f5
style Processing fill:#e8f5e8
style LCS fill:#fff3e0
style TextDiff fill:#fce4ec
style Output fill:#f1f8e9📦 Installation
# npm
npm install domoscope
# yarn
yarn add domoscope
# pnpm
pnpm add domoscope
# bun
bun add domoscopeBrowser Usage
<!-- ES Modules (Recommended) -->
<script type="module">
import { getCustomDiffStats } from './node_modules/domoscope/dist/index.js';
window.domoscope = { getCustomDiffStats };
</script>
<!-- Legacy Browser Support -->
<script type="module" src="./node_modules/domoscope/browser-bundle.js"></script>CDN Usage
<script type="module">
import { getCustomDiffStats } from 'https://unpkg.com/domoscope/dist/index.js';
</script>🚀 Quick Start
Basic Usage
import { getCustomDiffStats, formatTagStatsSummary } from 'domoscope';
const oldHTML = '<div><p>Original content</p></div>';
const newHTML = '<div><p>Modified content</p><img src="new.jpg" alt="New image"></div>';
// Generate diff with comprehensive statistics
const { diffResult, stats } = getCustomDiffStats(oldHTML, newHTML);
// Display the annotated results
document.body.appendChild(diffResult.rootElements[0]); // Old version with diff highlights
document.body.appendChild(diffResult.rootElements[1]); // New version with diff highlights
// Show detailed statistics
console.log(formatTagStatsSummary(stats));
// Output:
// ═════════════════════════════════════
// DOMOSCOPE DIFF STATISTICS
// ═════════════════════════════════════
// Total Changed Tags: 2
// Total Elements: 4
// Performance: 1.23ms
// ═════════════════════════════════════Advanced Configuration
import { ConfigBuilder, getCustomDiffStats, getPerformanceMetrics } from 'domoscope';
// Use fluent configuration API
const config = new ConfigBuilder()
.watchTags('div', 'p', 'span')
.trackAttributes('class', 'id', 'data-value')
.withPerformance({
minSimilarityThreshold: 0.7,
enableMemoization: true,
maxTextLength: 10000,
})
.build();
const result = getCustomDiffStats(oldHTML, newHTML, config);
// Access performance metrics
const metrics = getPerformanceMetrics();
console.log(`LCS computation: ${metrics.lcsTime}ms`);
console.log(`Element pairing: ${metrics.pairingTime}ms`);
console.log(`Cache efficiency: ${metrics.cacheHits}/${metrics.cacheMisses}`);Preset Configurations
import { ConfigPresets, getCustomDiffStats } from 'domoscope';
// Basic configuration with minimal tracking
const basicResult = getCustomDiffStats(oldHTML, newHTML, ConfigPresets.basic());
// Content Management System optimized
const cmsResult = getCustomDiffStats(oldHTML, newHTML, ConfigPresets.cms());
// Form elements comparison
const formsResult = getCustomDiffStats(oldHTML, newHTML, ConfigPresets.forms());
// Navigation elements comparison
const navResult = getCustomDiffStats(oldHTML, newHTML, ConfigPresets.navigation());
// Performance-focused (minimal tracking)
const fastResult = getCustomDiffStats(oldHTML, newHTML, ConfigPresets.performance());document.body.appendChild(diffResult.rootElements[1]); // New version with highlights
// Print statistics console.log(formatTagStatsSummary(stats));
### Configuration Presets
```typescript
import { getCustomDiffStats, ConfigPresets } from 'domoscope';
// Use preset configurations for common scenarios
const cmsConfig = ConfigPresets.cms(); // Content management optimized
const formConfig = ConfigPresets.forms(); // Form diffing optimized
const navConfig = ConfigPresets.navigation(); // Navigation diffing
const perfConfig = ConfigPresets.performance(); // High performance
const { diffResult, stats } = getCustomDiffStats(oldHTML, newHTML, cmsConfig);Custom Configuration
import { getCustomDiffStats, ConfigBuilder } from 'domoscope';
const customConfig = new ConfigBuilder()
.withStyles({
addedClass: 'my-added',
removedClass: 'my-removed',
elementChangeClass: 'my-changed',
})
.trackTags(['p', 'div', 'span'])
.trackAttributes('class', 'id', 'data-value')
.watchTags('img', 'video', 'iframe')
.withPerformance({
maxTextLength: 5000,
enableMemoization: true,
})
.build();
const result = getCustomDiffStats(oldHTML, newHTML, customConfig);📝 Comprehensive Examples
Example 1: Added and Removed Tags
import { getCustomDiffStats, formatTagStatsSummary } from 'domoscope';
const oldHTML = `
<div class="content">
<h1>Article Title</h1>
<p>Original paragraph content.</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
</ul>
</div>
`;
const newHTML = `
<div class="content">
<h1>Article Title</h1>
<p>Modified paragraph content with more details.</p>
<blockquote>This is a new quote that was added.</blockquote>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
</ul>
<img src="diagram.png" alt="New diagram" />
</div>
`;
// Generate diff with comprehensive tracking
const { diffResult, stats } = getCustomDiffStats(oldHTML, newHTML, {
addedClass: 'highlight-added',
removedClass: 'highlight-removed',
watchedTags: ['blockquote', 'img', 'li'], // Watch for these tag additions/removals
});
// Display results
document.getElementById('old-version').appendChild(diffResult.rootElements[0]);
document.getElementById('new-version').appendChild(diffResult.rootElements[1]);
console.log(formatTagStatsSummary(stats));
// Output shows:
// - Added 1 blockquote element
// - Added 1 img element
// - Added 1 li element
// - Text changes in 1 p elementExample 2: Text and Word-Level Changes
import { getCustomDiffStats } from 'domoscope';
const oldHTML = `
<article>
<h2>Product Review</h2>
<p>This product is good and works well for basic needs.</p>
<p>The price is reasonable at $50.</p>
</article>
`;
const newHTML = `
<article>
<h2>Product Review</h2>
<p>This product is excellent and works perfectly for advanced needs.</p>
<p>The price is very reasonable at $45 with discount.</p>
</article>
`;
const { diffResult, stats } = getCustomDiffStats(oldHTML, newHTML, {
addedClass: 'word-added',
removedClass: 'word-removed',
wrapperTag: 'mark', // Use <mark> tags for highlighting
});
// The result will show:
// - "good" → "excellent" (removed/added words)
// - "well" → "perfectly" (removed/added words)
// - "basic" → "advanced" (removed/added words)
// - "$50" → "$45 with discount" (removed/added words)
console.log(`Changed words: +${stats.totalAddedWords} -${stats.totalRemovedWords}`);
console.log(`Text nodes modified: ${stats.totalChangedTags}`);Example 3: Attribute Changes
import { getCustomDiffStats, getChangedTagsList } from 'domoscope';
const oldHTML = `
<div class="container">
<img src="old-image.jpg" alt="Old description" width="300" />
<a href="/old-link" title="Old title">Click here</a>
<button type="button" disabled>Submit</button>
</div>
`;
const newHTML = `
<div class="container updated">
<img src="new-image.jpg" alt="Updated description" width="400" height="300" />
<a href="/new-link" title="Updated title" target="_blank">Click here</a>
<button type="submit">Submit</button>
</div>
`;
const { diffResult, stats } = getCustomDiffStats(oldHTML, newHTML, {
attributeChangeClass: 'attr-changed',
elementChangeClass: 'element-modified',
trackedTags: {
img: ['src', 'alt', 'width', 'height'],
a: ['href', 'title', 'target'],
button: ['type', 'disabled'],
div: ['class'],
},
});
// Get detailed list of changes
const changes = getChangedTagsList(stats);
changes.forEach(({ tagName, count, changedAttributes }) => {
console.log(`${tagName}: ${count} elements changed`);
console.log(` Attributes: ${changedAttributes.join(', ')}`);
});
// Expected output:
// div: 1 elements changed
// Attributes: class
// img: 1 elements changed
// Attributes: src, alt, width, height
// a: 1 elements changed
// Attributes: href, title, target
// button: 1 elements changed
// Attributes: type, disabledExample 4: Complex Mixed Changes
import { getCustomDiffStats, ConfigBuilder } from 'domoscope';
const oldHTML = `
<section class="blog-post">
<header>
<h1>How to Use APIs</h1>
<p class="meta">Published on 2024-01-15</p>
</header>
<main>
<p>APIs are powerful tools for developers.</p>
<code>fetch('/api/data')</code>
<p>They allow seamless data exchange.</p>
</main>
</section>
`;
const newHTML = `
<section class="blog-post featured">
<header>
<h1>How to Use REST APIs</h1>
<p class="meta updated">Published on 2024-01-15, Updated on 2024-10-21</p>
<div class="tags">
<span class="tag">API</span>
<span class="tag">Tutorial</span>
</div>
</header>
<main>
<p>REST APIs are powerful tools for modern developers.</p>
<pre><code>fetch('/api/v2/data')</code></pre>
<p>They allow seamless and efficient data exchange.</p>
<p>Here's an example of error handling:</p>
<code>try { ... } catch (error) { ... }</code>
</main>
</section>
`;
const config = new ConfigBuilder()
.withStyles({
addedClass: 'diff-added',
removedClass: 'diff-removed',
elementChangeClass: 'diff-changed',
attributeChangeClass: 'diff-attr-changed',
})
.trackTags(['section', 'h1', 'p', 'code', 'pre', 'div', 'span'])
.trackAttributes('class')
.watchTags('div', 'span', 'pre') // Watch for structural additions
.build();
const { diffResult, stats } = getCustomDiffStats(oldHTML, newHTML, config);
// Detailed analysis
console.log('=== CHANGE SUMMARY ===');
console.log(`Total elements changed: ${stats.totalChangedTags}`);
console.log(`Elements added: ${stats.totalAddedTags}`);
console.log(`Words added: ${stats.totalAddedWords}`);
console.log(`Words removed: ${stats.totalRemovedWords}`);
// Per-tag breakdown
if (stats.addedTags) {
console.log('\n=== ADDED ELEMENTS ===');
Object.entries(stats.addedTags).forEach(([tag, count]) => {
console.log(`+${count} ${tag} element(s)`);
});
}
if (stats.changedTags) {
console.log('\n=== CHANGED ELEMENTS ===');
Object.entries(stats.changedTags).forEach(([tag, data]) => {
console.log(`~${data.count} ${tag} element(s) modified`);
if (data.changedAttributes.length > 0) {
console.log(` Attributes: ${data.changedAttributes.join(', ')}`);
}
});
}
// Expected output:
// === CHANGE SUMMARY ===
// Total elements changed: 4
// Elements added: 5
// Words added: 12
// Words removed: 4
//
// === ADDED ELEMENTS ===
// +1 div element(s)
// +2 span element(s)
// +1 pre element(s)
// +1 p element(s)
//
// === CHANGED ELEMENTS ===
// ~1 section element(s) modified
// Attributes: class
// ~1 h1 element(s) modified
// ~1 p element(s) modified
// Attributes: class
// ~1 code element(s) modifiedExample 5: CSS Styling for Visual Diff
Add this CSS to visualize the changes:
/* Added content styling */
.diff-added {
background-color: #d4edda;
color: #155724;
padding: 2px 4px;
border-radius: 3px;
border-left: 3px solid #28a745;
}
.highlight-added {
background-color: #28a745;
color: white;
font-weight: bold;
padding: 1px 3px;
border-radius: 2px;
}
/* Removed content styling */
.diff-removed {
background-color: #f8d7da;
color: #721c24;
padding: 2px 4px;
border-radius: 3px;
border-left: 3px solid #dc3545;
text-decoration: line-through;
}
.highlight-removed {
background-color: #dc3545;
color: white;
font-weight: bold;
padding: 1px 3px;
border-radius: 2px;
text-decoration: line-through;
}
/* Changed elements styling */
.diff-changed {
border: 2px dashed #ffc107;
padding: 4px;
border-radius: 4px;
background-color: #fff3cd;
}
/* Attribute changes styling */
.diff-attr-changed {
outline: 2px dotted #17a2b8;
outline-offset: 2px;
background-color: #d1ecf1;
}
/* Word-level changes */
.word-added {
background-color: #90ee90;
font-weight: bold;
}
.word-removed {
background-color: #ffb6c1;
text-decoration: line-through;
}
/* Element modifications */
.element-modified {
box-shadow: 0 0 5px rgba(255, 193, 7, 0.5);
}
.attr-changed {
border-bottom: 2px wavy #007bff;
}Modular Imports
Domoscope supports modular imports for tree-shaking and reduced bundle size:
// Import only what you need
import { getCustomDiffStats } from 'domoscope';
import { ConfigBuilder } from 'domoscope/config';
import { computeLCS, elementSimilarity } from 'domoscope/algorithms';
import { stringToFlatTree, validateHTML } from 'domoscope/utils';
import { DiffEngine, StatsCollector } from 'domoscope/core';
// Or import specific types
import type { DiffStats, ExtendedCompareOptions } from 'domoscope/types';Available Module Paths:
domoscope- Main entry point with all functionalitydomoscope/config- Configuration builders and presetsdomoscope/algorithms- Core algorithms and performance utilitiesdomoscope/utils- DOM manipulation and utility functionsdomoscope/core- Core diff engine and statistics collectordomoscope/types- TypeScript type definitions
🎛️ API Reference
Core Functions
getCustomDiffStats(oldHTML, newHTML, options?)
High-level function that parses HTML, performs diffing, and collects statistics.
function getCustomDiffStats(
oldHTML: string,
newHTML: string,
options?: ExtendedCompareOptions
): DiffResultWithStats;Returns:
diffResult.rootElements: Array of root elements from both treesdiffResult.allElements: Array of all elementsstats: Comprehensive statistics object
compareElements(oldElements, newElements, options?)
Compare two arrays of DOM elements directly.
function compareElements(
oldElements: Element[],
newElements: Element[],
options?: ExtendedCompareOptions
): void;collectDiffStats(rootElements, options?)
Analyze diffed DOM elements and extract statistics.
function collectDiffStats(rootElements: Element[], options?: ExtendedCompareOptions): DiffStats;formatTagStatsSummary(stats)
Create a formatted summary of diff statistics for debugging and reporting.
function formatTagStatsSummary(stats: DiffStats): string;getChangedTagsList(stats)
Get a simple list of which tags were changed and what attributes changed.
function getChangedTagsList(stats: DiffStats): Array<{
tagName: string;
count: number;
changedAttributes: string[];
}>;Algorithm Functions
computeLCS(a, b, config?)
Compute Longest Common Subsequence with memoization.
function computeLCS(a: string[], b: string[], config?: LCSConfig): LCSMatch[];elementSimilarity(a, b)
Calculate similarity score between two elements.
function elementSimilarity(a: Element, b: Element): SimilarityScore;tokenize(text)
Tokenize text for word-level diffing with enhanced Unicode support.
function tokenize(text: string): Token[];computeWordDiff(oldText, newText, maxLength?)
Compute word-level differences between two text strings.
function computeWordDiff(
oldText: string,
newText: string,
maxLength?: number
): Array<{ type: 'equal' | 'added' | 'removed'; text: string }>;Utility Functions
stringToFlatTree(html)
Parse HTML string into a flat tree structure.
function stringToFlatTree(html: string): ParsedTree;validateHTML(html)
Validate HTML string and return parsing information.
function validateHTML(html: string): {
isValid: boolean;
errors: string[];
warnings: string[];
};nodeKey(node)
Generate a unique key for DOM node identification.
function nodeKey(node: Node): string;wrapElement(element, className, wrapperTag?)
Wrap an element with a wrapper containing the specified class.
function wrapElement(element: Element, className: string | undefined, wrapperTag?: string): void;Performance & Cache Management
clearCaches()
Clear all internal memoization caches.
function clearCaches(): void;getCacheStats()
Get current cache performance statistics.
function getCacheStats(): {
lcsCache: { size: number; hits: number; misses: number };
similarityCache: { size: number; hits: number; misses: number };
};getPerformanceMetrics()
Get detailed performance metrics from the last operations.
function getPerformanceMetrics(): PerformanceMetrics;resetPerformanceMetrics()
Reset performance metrics counters.
function resetPerformanceMetrics(): void;configureCaching(options)
Configure cache behavior and limits.
function configureCaching(options: { ttl?: number; maxSize?: number; enabled?: boolean }): void;Core Classes
DiffEngine
Main diff engine for advanced usage.
class DiffEngine {
constructor(options: ExtendedCompareOptions);
compareElements(oldElements: Element[], newElements: Element[]): void;
}StatsCollector
Statistics collection and analysis.
class StatsCollector {
constructor(config: ExtendedCompareOptions);
collectStats(rootElements: Element[]): DiffStats;
}Configuration
ConfigBuilder
Fluent interface for building configurations:
const config = new ConfigBuilder()
.withStyles({ addedClass: 'added', removedClass: 'removed' })
.withTracking({ trackedTags: ['p', 'div'], trackedAttributes: ['class', 'id'] })
.trackTags({ img: ['src', 'alt'], a: ['href'] })
.trackAttributes('class', 'id')
.watchTags('img', 'video')
.withPerformance({ maxTextLength: 10000, enableMemoization: true })
.withElementChangeHandler((oldEl, newEl, changeType, changedAttrs) => {
// Custom element change handling
})
.build();ConfigBuilder Methods:
withStyles(styleConfig): Set CSS classes and wrapper tagswithTracking(trackingConfig): Configure tag and attribute trackingwithPerformance(performanceConfig): Set performance optimization optionswithElementChangeHandler(handler): Set custom element change handlertrackTags(...tags): Configure specific tags to track for changestrackAttributes(...attributes): Set attributes to track globallywatchTags(...tags): Configure tags to watch for additions/removals
ConfigPresets
Pre-built configurations for common use cases:
// Basic configuration with minimal tracking
const basicConfig = ConfigPresets.basic();
// Content management system optimized
const cmsConfig = ConfigPresets.cms();
// Form elements optimized
const formsConfig = ConfigPresets.forms();
// Navigation elements optimized
const navConfig = ConfigPresets.navigation();
// High-performance optimized
const perfConfig = ConfigPresets.performance();Available Presets:
ConfigPresets.basic(): Minimal configuration with default settingsConfigPresets.cms(): Optimized for content management (p, h1-h6, div, span tracking)ConfigPresets.forms(): Optimized for form elements (input, select, textarea, button)ConfigPresets.navigation(): Optimized for navigation (a, nav, ul, li elements)ConfigPresets.performance(): High-performance with reduced processing
validateConfig(config)
Validate configuration options and get detailed error information.
function validateConfig(config: ExtendedCompareOptions): {
isValid: boolean;
errors: string[];
};
### Advanced Usage
#### Custom Element Change Handler
```typescript
const config = new ConfigBuilder()
.withElementChangeHandler((oldEl, newEl, changeType, changedAttrs) => {
if (changeType === 'attribute' && newEl?.tagName === 'IMG') {
// Custom handling for image changes
const wrapper = document.createElement('div');
wrapper.className = 'image-change-indicator';
if (changedAttrs?.includes('src')) {
const badge = document.createElement('span');
badge.textContent = 'Image Updated';
wrapper.appendChild(badge);
}
return wrapper; // Custom wrapper element
}
return undefined; // Use default handling
})
.build();Performance Monitoring
import { getPerformanceMetrics, resetPerformanceMetrics } from 'domoscope';
resetPerformanceMetrics();
// Perform diff operations...
getCustomDiffStats(oldHTML, newHTML);
const metrics = getPerformanceMetrics();
console.log(`Pairing time: ${metrics.pairingTime}ms`);
console.log(`LCS time: ${metrics.lcsTime}ms`);
console.log(`Cache hits: ${metrics.cacheHits}`);🎨 CSS Styling
Add these CSS classes to style the diff results:
/* Added content */
.diff-added {
background-color: #e6ffe6;
color: #006600;
text-decoration: none;
}
/* Removed content */
.diff-removed {
background-color: #ffe6e6;
color: #660000;
text-decoration: line-through;
}
/* Changed elements */
.diff-elem-changed {
border: 2px solid #ffa500;
border-radius: 3px;
}
/* Changed attributes */
.diff-attr-changed {
outline: 2px dotted #0066cc;
outline-offset: 2px;
}📊 Statistics Object
The DiffStats object provides comprehensive change metrics:
interface DiffStats {
/** Number of elements with tag or attribute changes */
totalChangedTags: number;
/** Number of added text spans/nodes */
totalAddedTexts: number;
/** Number of removed text spans/nodes */
totalRemovedTexts: number;
/** Number of newly added elements */
totalAddedTags: number;
/** Number of removed elements */
totalRemovedTags: number;
/** Total number of words added across all text content */
totalAddedWords: number;
/** Total number of words removed across all text content */
totalRemovedWords: number;
/** Per-tag statistics for added elements (e.g., { a: 5, img: 2 }) */
addedTags?: Record<string, number>;
/** Per-tag statistics for removed elements (e.g., { a: 2, span: 10 }) */
removedTags?: Record<string, number>;
/** Per-tag statistics for changed elements with detailed attribute info */
changedTags?: Record<
string,
{
count: number;
changedAttributes: string[];
}
>;
}Usage Example:
const { stats } = getCustomDiffStats(oldHTML, newHTML);
console.log(`Total changes: ${stats.totalChangedTags}`);
console.log(`Added elements: ${stats.totalAddedTags}`);
console.log(`Removed elements: ${stats.totalRemovedTags}`);
console.log(`Added words: ${stats.totalAddedWords}`);
console.log(`Removed words: ${stats.totalRemovedWords}`);
// Per-tag breakdown
if (stats.addedTags) {
Object.entries(stats.addedTags).forEach(([tag, count]) => {
console.log(`Added ${count} ${tag} elements`);
});
}
if (stats.changedTags) {
Object.entries(stats.changedTags).forEach(([tag, data]) => {
console.log(`Changed ${data.count} ${tag} elements:`);
console.log(` Attributes: ${data.changedAttributes.join(', ')}`);
});
}🏗️ Architecture
Domoscope follows SOLID principles with a clean, modular architecture:
src/
├── types/ # TypeScript type definitions
├── config/ # Configuration management
├── algorithms/ # Core algorithms with memoization
├── utils/ # DOM manipulation utilities
├── core/ # Main diff engine and statistics
└── index.ts # Public API exportsKey Components
- DiffEngine: Core comparison algorithm
- StatsCollector: Statistics gathering and analysis
- ConfigBuilder: Fluent configuration interface
- Algorithm modules: LCS, similarity, and word diffing with optimization
📝 TypeScript Types
Domoscope exports comprehensive TypeScript types for full type safety:
Core Types
// Token types for text diffing
type TokenType = 'equal' | 'added' | 'removed';
type Token = { type: TokenType; text: string };
// Result types
interface DiffResult {
rootElements: Element[];
allElements: Element[];
}
interface DiffResultWithStats {
diffResult: DiffResult;
stats: DiffStats;
}Configuration Types
// Style configuration
interface StyleConfig {
addedClass?: string;
removedClass?: string;
elementChangeClass?: string;
attributeChangeClass?: string;
wrapperTag?: string;
textWrapperTag?: string;
addedWrapperTag?: string;
removedWrapperTag?: string;
changedWrapperTag?: string;
}
// Tracking configuration
interface TrackingConfig {
watchedTags?: string[];
trackedTags?: string[] | Record<string, string[]>;
trackedAttributes?: string[];
}
// Performance configuration
interface PerformanceConfig {
maxTextLength?: number;
minSimilarityThreshold?: number;
enableMemoization?: boolean;
ignoreWhitespaceTexts?: boolean;
}
// Complete configuration
interface ExtendedCompareOptions extends StyleConfig, TrackingConfig, PerformanceConfig {
onElementChange?: ElementChangeHandler;
}Handler Types
type ElementChangeHandler = (
oldEl: Element | null,
newEl: Element | null,
changeType: 'tag' | 'attribute' | 'tag-added' | 'tag-removed',
changedAttrs?: string[]
) => void | Element | null;Algorithm Types
// Internal algorithm types for advanced usage
interface LCSMatch {
oldIndex: number;
newIndex: number;
length: number;
}
interface SimilarityScore {
score: number;
factors: {
tagMatch: number;
attributeMatch: number;
contentMatch: number;
structureMatch: number;
};
}
interface PerformanceMetrics {
pairingTime: number;
lcsTime: number;
textDiffTime: number;
elementsProcessed: number;
cacheHits: number;
cacheMisses: number;
}🔧 Configuration Options
Style Configuration
interface StyleConfig {
addedClass?: string; // CSS class for added content
removedClass?: string; // CSS class for removed content
elementChangeClass?: string; // CSS class for changed elements
attributeChangeClass?: string; // CSS class for attribute changes
wrapperTag?: string; // HTML tag for wrappers
}Tracking Configuration
interface TrackingConfig {
watchedTags?: string[]; // Tags for special handling. Use ['*'] to watch all tags
trackedTags?: string[] | Record<string, string[]>; // Tags to track
trackedAttributes?: string[]; // Attributes to track
}Performance Configuration
interface PerformanceConfig {
maxTextLength?: number; // Max text length for word diffing
minSimilarityThreshold?: number; // Min similarity for element pairing
enableMemoization?: boolean; // Enable caching
}📈 Performance
Domoscope is optimized for performance with several strategies:
- Dynamic Programming: LCS algorithm with memoization
- Intelligent Caching: Similarity scores and computation results
- Efficient Algorithms: O(n*m) complexity with space optimization
- Configurable Thresholds: Skip expensive operations when appropriate
Benchmarks
| Elements | Time (ms) | Memory (MB) | | -------- | --------- | ----------- | | 100 | ~5 | ~2 | | 1,000 | ~45 | ~15 | | 10,000 | ~450 | ~120 |
📄 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
- Inspired by modern diff algorithms and DOM manipulation techniques
- Built with TypeScript for maximum developer experience
- Optimized using dynamic programming patterns
Domoscope - Advanced HTML diffing for the modern web. 🔍
Public API Functions
stringToFlatTree(html: string)
export function stringToFlatTree(html: string): {
rootElements: Element[];
allElements: Element[];
};Purpose: Parses HTML string into a structured DOM representation for processing.
Algorithm:
- Creates temporary container element
- Sets innerHTML to parse the HTML
- Recursively traverses all descendants to build flat element list
- Returns both root-level elements and complete element inventory
Usage Example:
const { rootElements, allElements } = stringToFlatTree('<div><p>Hello</p></div>');
console.log(rootElements.length); // 1 (the div)
console.log(allElements.length); // 2 (div + p)Performance Notes: Uses native browser HTML parsing for optimal speed. The flat traversal enables efficient similarity comparisons later.
flowchart TD
A["HTML String"] --> B["Create temp container"]
B --> C["Set innerHTML"]
C --> D["Extract root elements"]
D --> E["Recursive traverse"]
E --> F["Build allElements array"]
F --> G["Return rootElements + allElements"]📚 Detailed Algorithm Documentation
Core Algorithm Flow
sequenceDiagram
participant Input as HTML Input
participant Parser as HTML Parser
participant Matcher as Element Matcher
participant LCS as LCS Engine
participant Differ as Text Differ
participant Output as Annotated DOM
Input->>Parser: Parse HTML strings
Parser->>Parser: Validate & sanitize
Parser->>Matcher: Element arrays
Matcher->>Matcher: Compute similarity matrix
Note over Matcher: O(n×m×k) complexity
Matcher->>Matcher: Find optimal pairings
Matcher->>LCS: Aligned element pairs
LCS->>LCS: Child node alignment
Note over LCS: Dynamic programming O(a×b)
LCS->>Differ: Text content pairs
Differ->>Differ: Tokenize & compute word diff
Note over Differ: Enhanced Unicode tokenization
Differ->>Output: Annotated fragments
LCS->>Output: Structure with diff markers
Matcher->>Output: Element change annotations
Output->>Output: Collect statistics1. Element Similarity Algorithm
The core matching algorithm uses a multi-factor scoring system:
function elementSimilarity(a: Element, b: Element): number {
let score = 0;
// 🎯 ID exact match (highest priority)
if (a.id && b.id && a.id === b.id) {
score += 10; // Strong identity signal
}
// 🏷️ Tag name compatibility
if (a.tagName === b.tagName) {
score += 5; // Structural similarity
}
// 🎨 Class overlap analysis
const classIntersection = getClassIntersection(a, b);
score += classIntersection.length; // +1 per shared class
// 📋 Attribute similarity
const attrSimilarity = computeAttributeSimilarity(a, b);
score += attrSimilarity * 0.5; // Weighted attribute score
// 📝 Text content analysis
const textSimilarity = computeTextSimilarity(a.textContent, b.textContent);
score += textSimilarity * 0.3; // Content relevance
// 🏗️ Structural compatibility
const structSimilarity = computeStructuralSimilarity(a, b);
score += structSimilarity; // Child count & nesting
return score;
}Similarity Scoring Breakdown
| Factor | Weight | Description | Example Impact |
| ------------------------ | ------------- | ---------------------- | ------------------------------------ |
| ID Match | 10.0 | Exact ID equality | <div id="header"> matches strongly |
| Tag Match | 5.0 | Same HTML tag | <p> prefers <p> over <div> |
| Class Overlap | 1.0 per class | Shared CSS classes | .nav.active vs .nav.hidden = 1.0 |
| Attribute Similarity | 0.5 × count | Similar attributes | data-*, aria-* attributes |
| Text Similarity | 0.3 × tokens | Shared text tokens | Common words/phrases |
| Structure Match | 0.5-1.0 | Child count similarity | Similar nesting patterns |
2. LCS (Longest Common Subsequence) Engine
Algorithm Selection Strategy
flowchart LR
A["Input Arrays"] --> B{"Size Check"}
B -->|"Small Arrays n,m < 1000"| C["Standard DP O(n×m) space"]
B -->|"Large Arrays n,m ≥ 1000"| D["Space-Optimized O(min(n,m)) space"]
C --> E["Memoization Check"]
D --> F["Direct Computation"]
E -->|"Cache Hit"| G["Return Cached"]
E -->|"Cache Miss"| H["Compute & Cache"]
G --> I["LCS Matches"]
H --> I
F --> IStandard Dynamic Programming Approach
function computeLCS(a: string[], b: string[]): LCSMatch[] {
const n = a.length,
m = b.length;
const dp = Array.from({ length: n + 1 }, () => Array(m + 1).fill(0));
// Fill DP table (bottom-up)
for (let i = n - 1; i >= 0; i--) {
for (let j = m - 1; j >= 0; j--) {
if (a[i] === b[j]) {
dp[i][j] = 1 + dp[i + 1][j + 1]; // Match found
} else {
dp[i][j] = Math.max(dp[i + 1][j], dp[i][j + 1]); // Take best
}
}
}
// Backtrack to find actual matches
return backtrackMatches(dp, a, b);
}Space-Optimized Version
For large inputs, switches to O(min(n,m)) space complexity:
function computeLCSSpaceOptimized(a: string[], b: string[]): LCSMatch[] {
// Ensure 'a' is shorter for optimal space usage
if (a.length > b.length) {
return computeLCSSpaceOptimized(b, a).map(([i, j]) => [j, i]);
}
let prev = Array(a.length + 1).fill(0);
let curr = Array(a.length + 1).fill(0);
// Process row by row, keeping only current and previous
for (let j = b.length - 1; j >= 0; j--) {
for (let i = a.length - 1; i >= 0; i--) {
if (a[i] === b[j]) {
curr[i] = 1 + prev[i + 1];
} else {
curr[i] = Math.max(prev[i], curr[i + 1]);
}
}
[prev, curr] = [curr, prev]; // Swap arrays
}
}3. Text Diffing Algorithm
Enhanced Tokenization
Supports complex Unicode and international text:
function tokenize(text: string): string[] {
// Unicode-aware tokenization with category support
return text.match(/\p{L}+\p{M}*|\d+|[^\s\p{L}\p{N}]+/gu) || [];
}Word-Level Diff Generation
flowchart LR
A[Old Text] --> B[tokenize]
C[New Text] --> D[tokenize]
B --> E[Token Arrays]
D --> E
E --> F[LCS on Tokens]
F --> G[Build Diff Sequence]
G --> H[Merge Consecutive]
H --> I[Fragment Generation]
subgraph "Token Types"
T1[equal: unchanged]
T2[added: new content]
T3[removed: deleted content]
endConsecutive Token Merging
function mergeConsecutiveTokens(tokens: Token[]): Token[] {
const merged: Token[] = [];
let current: Token | null = null;
for (const token of tokens) {
if (current && current.type === token.type) {
// Merge with previous token of same type
current.text += ' ' + token.text;
} else {
if (current) merged.push(current);
current = { ...token };
}
}
if (current) merged.push(current);
return merged;
}compareElements(oldEls: Element[], newEls: Element[], options: CompareOptions)
Purpose: The core diff engine that compares two element arrays and applies visual change indicators.
Algorithm Overview:
flowchart TD
A[Old Elements] --> B[Similarity Matching]
C[New Elements] --> B
B --> D[Paired Elements]
B --> E[Unmatched Old]
B --> F[Unmatched New]
D --> G[compareNode recursion]
E --> H[Mark as removed]
F --> I[Mark as added]
G --> J[DOM with diff annotations]
H --> J
I --> JDetailed Steps:
Similarity-Based Pairing:
- Uses
elementSimilarity()to score potential matches - Prefers same-tag matches but allows cross-tag pairing for high similarity
- Maintains a pool of unmatched elements
- Uses
Special Handling for Watched Tags:
- Elements in
watchedTagsget wrapped when added/removed - Use
'*'wildcard to watch all HTML tags:watchedTags: ['*'] - Combines with specific tags:
watchedTags: ['*']watches everything - Triggers
onElementChangecallback for custom handling
- Elements in
Recursive Processing:
- Paired elements go through
compareNode()for deep comparison - Unmatched elements get marked as added/removed with appropriate CSS classes
- Paired elements go through
Usage Example:
const oldTree = stringToFlatTree('<div><p>Old text</p></div>');
const newTree = stringToFlatTree('<div><p>New text</p></div>');
compareElements(oldTree.rootElements, newTree.rootElements, {
addedClass: 'highlight-added',
removedClass: 'highlight-removed',
watchedTags: ['img', 'a'], // Watch specific tags
// watchedTags: ['*'], // Watch ALL tags (wildcard)
// watchedTags: ['*', 'div'], // Watch all tags (redundant example)
onElementChange: (oldEl, newEl, changeType) => {
console.log(`${changeType} detected`);
return null; // use default wrapping
},
});collectDiffStats(rootElements: Element[], options: CompareOptions)
Purpose: Analyzes a diffed DOM tree to extract comprehensive change statistics.
Algorithm:
flowchart TD
A[Diffed DOM Elements] --> B[Recursive Traversal]
B --> C[Check CSS Classes]
C --> D[Count Text Changes]
C --> E[Count Element Changes]
C --> F[Read data-* attributes]
F --> G[Extract changed attributes]
F --> H[Extract tag types]
D --> I[Update totalAddedTexts/Removed]
E --> J[Update totalChangedTags]
G --> K[Update changedTags]
H --> L[Update addedTags/removedTags]
I --> M[DiffStats Object]
J --> M
K --> M
L --> MStatistical Categories:
- Text-level: Counts wrapped text spans indicating additions/removals
- Element-level: Counts structural changes (new/removed tags)
- Attribute-level: Tracks which attributes changed on which tag types
- Per-tag breakdown: Aggregates all changes by HTML tag type
Usage Example:
// After running compareElements...
const stats = collectDiffStats(diffedElements, options);
console.log(stats);
// Output:
// {
// totalChangedTags: 3,
// totalAddedTexts: 5,
// totalRemovedTexts: 2,
// addedTags: { img: 2, p: 1 },
// removedTags: { span: 3 },
// changedTags: {
// a: { count: 2, changedAttributes: ['href', 'class'] }
// }
// }getCustomDiffStats(oldHTML: string, newHTML: string, options: CompareOptions)
Purpose: High-level convenience function that combines parsing, diffing, and statistics collection.
Workflow:
flowchart LR
A[Old HTML] --> B[stringToFlatTree]
C[New HTML] --> D[stringToFlatTree]
B --> E[compareElements]
D --> E
E --> F[collectDiffStats]
F --> G["diffResult + stats"]Return Value:
{
diffResult: {
rootElements: Element[], // All root elements from both trees
allElements: Element[] // All elements from both trees
},
stats: DiffStats // Comprehensive statistics
}Usage Example:
const oldHTML = '<div><p>Original content</p></div>';
const newHTML = '<div><p>Modified content</p><img src="new.jpg"></div>';
const { diffResult, stats } = getCustomDiffStats(oldHTML, newHTML, {
trackedTags: { img: ['src'], p: ['class'] },
trackedAttributes: ['src', 'class', 'href'],
});
// DOM is now annotated with diff classes
document.body.appendChild(diffResult.rootElements[0]); // old version
document.body.appendChild(diffResult.rootElements[1]); // new version
// Stats show exactly what changed
console.log(`Added ${stats.addedTags?.img || 0} images`);formatTagStatsSummary(stats: DiffStats)
Purpose: Creates human-readable summary of per-tag statistics for debugging and reporting.
Output Format:
=== PER-TAG DIFF STATISTICS ===
🟢 Added Tags:
- <img>: 2 element(s)
- <p>: 1 element(s)
🔴 Removed Tags:
- <span>: 3 element(s)
🟡 Changed Tags:
- <a>: 2 element(s)
Changed attributes: href, class
- <img>: 1 element(s)
Changed attributes: src
📊 Totals: 3 added, 3 removed, 3 changed
📝 Text changes: 5 added, 2 removedInternal Algorithm Functions
compareNode(oldEl: Element, newEl: Element, options: CompareOptions)
Purpose: Recursively compares two matched DOM elements and their children.
Algorithm Steps:
- Element-level Change Detection: Calls
detectAndWrapElementChange()first - Child Alignment: Uses LCS algorithm to align child nodes optimally
- Recursive Processing: Processes matched pairs recursively
- Text Diffing: For text nodes, performs word-level diffing
**LCS
