mirror-web-cli
v1.1.3
Published
Professional website mirroring tool with intelligent framework preservation, AI-powered analysis, and comprehensive asset optimization
Maintainers
Readme
🪞 Mirror Web CLI v1.1.3
Professional Website Mirroring with Intelligent Framework Preservation & Enhanced Asset Processing
A powerful, universal website mirroring tool that intelligently detects and preserves framework structures while creating offline-ready websites. Works seamlessly with React, Next.js, Vue, Angular, Svelte, WordPress, and static sites.
✨ Key Features
🧠 Intelligent Framework Detection
- Automatically detects 14+ frameworks (React, Vue, Angular, Next.js, Nuxt, Gatsby, Svelte, etc.)
- Comprehensive pattern matching with confidence scoring
- Framework-specific optimization strategies
🎨 Beautiful Terminal Experience
- Modern UI with gradient effects and smooth animations
- Professional progress tracking with step-by-step indicators
- Color-coded status messages and comprehensive feedback
⚡ Advanced Asset Processing
- Complete asset extraction and optimization (images, CSS, JS, fonts, icons, videos)
- Smart URL rewriting for offline functionality
- Framework-preserving structure generation
- Comprehensive video support with 14+ video formats (.mp4, .webm, .ogg, etc.)
🧹 Clean Code Generation
- Optional tracking script removal (analytics, GTM, Facebook Pixel)
- Professional project structure ready for development
- Offline-ready websites with localized resources
- Next.js/React error handling for graceful offline operation
🆕 Auto-Differentiated Output Directories
- Standard mirroring: Creates
./domain-standard/directories - AI-enhanced mirroring: Creates
./domain-ai-enhanced/directories - Easy comparison: Side-by-side analysis of different approaches
- Organized workflow: Never overwrite previous results
🛠️ Recent Improvements (v1.1.3)
✅ Enhanced Environment Variable System
- Priority-based .env loading with shell environment preservation
- Improved OpenAI API key handling with multiple configuration sources
- Better development workflow with .env.local support
✅ Next.js Image Optimizer Support
- Robust handling of
/_next/imageendpoints with HTTP 402 avoidance - Original image extraction from optimizer URLs
- Runtime asset rewriting with DOM mutation observer
- Enhanced offline compatibility for Next.js applications
✅ Advanced Asset Processing
- Microlink integration for screenshot services
- Comprehensive hover/popover content capture
- Responsive image support with
srcsetrewriting - Enhanced video and audio processing with extended timeouts
✅ Smart Output Organization
- Auto-differentiated directories prevent accidental overwrites
- Easy comparison between standard and AI-enhanced results
- Professional project organization
🚀 Quick Start
Installation
# Global installation (recommended)
npm install -g mirror-web-cli
# Or run directly with npx (no installation required)
npx mirror-web-cli https://example.comOpenAI API Setup (Optional)
For AI-powered website analysis, you'll need an OpenAI API key:
Option 1: Environment Variable (Recommended)
Windows PowerShell:
$env:OPENAI_API_KEY="sk-proj-your-openai-key-here"Windows Command Prompt:
set OPENAI_API_KEY=sk-proj-your-openai-key-heremacOS/Linux (Bash/Zsh):
export OPENAI_API_KEY="sk-proj-your-openai-key-here"Permanent Setup (recommended for regular use):
Windows (PowerShell as Administrator):
[System.Environment]::SetEnvironmentVariable('OPENAI_API_KEY', 'sk-proj-your-openai-key-here', 'User')macOS/Linux (add to ~/.bashrc or ~/.zshrc):
echo 'export OPENAI_API_KEY="sk-proj-your-openai-key-here"' >> ~/.bashrc
source ~/.bashrcOption 2: Command Line Parameter
mirror-web-cli https://example.com --ai --openai-key "sk-proj-your-key-here"Requirements:
- Only OpenAI API keys are supported (must start with
sk-) - Uses OpenAI GPT-4o model for intelligent analysis
- Get your API key: OpenAI Platform
Basic Usage
# Standard mirroring (outputs to example.com-standard)
mirror-web-cli https://example.com
# AI-enhanced mirroring (outputs to example.com-ai-enhanced)
mirror-web-cli https://example.com --ai
# Clean mirror without tracking scripts
mirror-web-cli https://react-site.com --clean
# Custom output directory (overrides automatic naming)
mirror-web-cli https://vue-app.com -o ./my-project
# Debug mode with detailed logging
mirror-web-cli https://complex-site.com --debug📁 Auto-Differentiated Output Directories
Mirror Web CLI automatically creates different output directories based on the analysis method:
- Standard:
./domain-standard(e.g.,./example.com-standard) - AI-Enhanced:
./domain-ai-enhanced(e.g.,./example.com-ai-enhanced) - Custom: Uses your specified path with
-oflag
This allows easy comparison between different analysis approaches and organized project management.
Serving the Output
# The tool generates a complete project structure
cd ./example.com-standard # or ./example.com-ai-enhanced
# Use any static server to serve the mirrored site
python -m http.server 8000
# Open http://localhost:8000
# Or use Node.js static server
npx serve .🎯 How It Works
1. Intelligent Page Loading
- Launches headless browser with optimized settings
- Waits for framework-specific elements (#__next, #root, #app)
- Performs scroll-to-bottom for lazy-loaded content
- Waits for images and network idle state
2. Framework Analysis Engine
📊 Detection Methods:
├── Script Source Analysis → Framework bundles & runtime files
├── DOM Element Inspection → Framework-specific containers
├── Meta Tag Analysis → Generator tags & signatures
├── Content Pattern Matching → Component structures
├── CSS Class Analysis → Framework styling patterns
├── JSON Data Detection → State management structures
└── Link Href Analysis → Framework asset paths3. Comprehensive Asset Extraction
🎯 Asset Categories:
├── 🖼️ Images → src, srcset, lazy attributes, backgrounds
├── 🎨 Stylesheets → External CSS + inline styles with url() rewriting
├── ⚙️ Scripts → External JS + inline scripts (with optional cleaning)
├── 🔠 Fonts → Web fonts and icon fonts
├── 🎭 Icons → Favicons and app icons
└── 🎥 Media → Videos (.mp4, .webm, .ogg, .avi, .mov, etc.), audio files4. Smart URL Rewriting
- Converts all absolute URLs to relative paths
- Creates organized asset directory structure
- Generates short, stable, hashed filenames
- Maintains proper file extensions and MIME types
5. Framework-Preserving Output
📁 Output Structure:
website.com/
├── index.html # Main page with framework intact
├── package.json # Project metadata & serve scripts
├── README.md # Usage instructions
├── server.js # Optional Node.js static server
└── assets/
├── images/ # All images with optimized names
├── css/ # Stylesheets with localized assets
├── js/ # JavaScript files (cleaned if --clean)
├── fonts/ # Web fonts and typography
├── icons/ # Favicons and app icons
└── media/ # Videos (.mp4, .webm, .ogg), audio files, and other mediaNext.js + Microlink offline support (v1.0.2)
Modern sites often use:
- Next.js Image Optimizer:
/_next/image?url=<original>&w=<size>&q=<quality> - Microlink-based previews:
https://api.microlink.io/?url=...returning either JSON or direct images
This tool:
- Skips downloading
/_next/imagedirectly (avoids 402s) - Extracts the original image URL from the
url=param and downloads that - Aliases
/_next/image?...to the same local file as the original - Injects a runtime MutationObserver rewriter that:
- Rewrites
src,href,poster, inlinestylebackground-image - Rewrites
srcsetandimagesrcset(browsers prefer srcset over src) - Handles dynamically added DOM (hover cards, popovers, etc.)
- Rewrites
- Captures Microlink responses; if JSON, follows to the actual screenshot URL and downloads bytes
Verification
Run with
--debugand open DevTools ConsoleInteract with the page (e.g., hover “Preview” links)
Look for lines like:
[MW rewrite] imagesrcset: /_next/image?url=... -> ./assets/images/asset_dc814d3448.png 1x, ...Open the local asset path (e.g., http://localhost:8000/assets/images/asset_dc814d3448.png)
Troubleshooting (quick)
Blank hover/popover preview
- Serve over HTTP (not file://)
- Ensure
srcset/imagesrcsetare being rewritten (use--debug) - Open the local asset URL from logs; if 404, rebuild the mirror
HTTP 402 from Next.js
/_next/image- Expected; the tool avoids these endpoints and downloads the original target from
url=
- Expected; the tool avoids these endpoints and downloads the original target from
Helpful snippet to locate candidates:
document.querySelectorAll('img, [style]').forEach(n => { const src = n.currentSrc || n.getAttribute('src') || ''; const styleAttr = n.getAttribute('style') || ''; const bg = getComputedStyle(n).backgroundImage || ''; const hay = [src, styleAttr, bg].join(' '); if (/(microlink|_next\/image|og|twitter|card)/i.test(hay)) { console.log('el:', n, { src, styleAttr, bg }); } });
🔧 CLI Reference
Usage: mirror-web-cli <url> [options]
Arguments:
url Target website URL to mirror
Options:
-o, --output <dir> Custom output directory (default: domain name)
--clean Remove tracking scripts and analytics
--ai Enable AI-powered analysis (requires OpenAI API key)
--openai-key <key> OpenAI API key for AI features (or set OPENAI_API_KEY env var)
--debug Enable detailed debug logging
--timeout <ms> Page load timeout in milliseconds (default: 120000)
--headless <bool> Run browser in headless mode (default: true)
-h, --help Show help information
-V, --version Show version numberOpenAI API Key Priority
The tool checks for OpenAI API keys in this order:
--openai-keycommand line parameterOPENAI_API_KEYenvironment variable- If neither is found, AI features are disabled with a helpful message
- Keys must start with
sk-(validated automatically)
🏗️ Framework Support
| Framework | Detection | Preservation | Output Quality | |-----------|-----------|--------------|----------------| | React | ✅ High confidence | ✅ Component structure | ⭐⭐⭐⭐⭐ | | Next.js | ✅ Advanced patterns | ✅ SSR/SSG structure | ⭐⭐⭐⭐⭐ | | Vue.js | ✅ Reactive patterns | ✅ Template structure | ⭐⭐⭐⭐⭐ | | Nuxt | ✅ SSR detection | ✅ Module organization | ⭐⭐⭐⭐⭐ | | Angular | ✅ Component analysis | ✅ Module structure | ⭐⭐⭐⭐⭐ | | Svelte | ✅ Store patterns | ✅ Component logic | ⭐⭐⭐⭐⭐ | | Gatsby | ✅ GraphQL detection | ✅ Static generation | ⭐⭐⭐⭐⭐ | | WordPress | ✅ Theme detection | ✅ Content structure | ⭐⭐⭐⭐ | | Static Sites | ✅ Always works | ✅ Clean HTML/CSS/JS | ⭐⭐⭐⭐⭐ |
🧪 Usage Examples
Basic Website Mirroring
# Simple static site
mirror-web-cli https://example.com
# → Creates: ./example.com-standard/ with complete offline functionalityReact Application
# React SPA with complex routing
mirror-web-cli https://react-app.com --clean
# → Creates: ./react-app.com-standard/ preserves React structure, removes tracking, offline-readyNext.js Website
# Next.js with image optimization and error handling
mirror-web-cli https://nextjs-site.com --clean
# → Creates: ./nextjs-site.com-standard/ with enhanced Next.js compatibility
# → Handles /_next/image URLs, fixes hydration issues, preserves SSR structureE-commerce Site
# Complex site with lots of assets
mirror-web-cli https://shop.example.com --debug --clean
# → Creates: ./shop.example.com-standard/ with detailed logging, removes analyticsAI-Powered Analysis (OpenAI)
Windows PowerShell:
# Set environment variable first
$env:OPENAI_API_KEY="sk-proj-your-openai-key-here"
mirror-web-cli https://complex-app.com --ai --clean
# → Creates: ./complex-app.com-ai-enhanced/ with OpenAI GPT-4o framework analysismacOS/Linux:
# Set environment variable first
export OPENAI_API_KEY="sk-proj-your-openai-key-here"
mirror-web-cli https://complex-app.com --ai --clean
# → Creates: ./complex-app.com-ai-enhanced/ with OpenAI GPT-4o framework analysisCross-platform (using CLI parameter):
# Compare standard vs AI-enhanced outputs
mirror-web-cli https://react-app.com --clean # → ./react-app.com-standard/
mirror-web-cli https://react-app.com --ai --clean # → ./react-app.com-ai-enhanced/Development Workflow
# Mirror for development reference
mirror-web-cli https://design-system.com -o ./reference
cd ./reference
npm start # Built-in development serverVideo-Rich Websites
# Websites with hero videos (like VS Code, Apple, etc.)
mirror-web-cli https://code.visualstudio.com --clean
# → Downloads all video formats (.mp4, .webm), preserves video posters
# → Handles responsive video sources with media queries
# → Supports autoplay, muted, and poster attributes
# Complex video embedding
mirror-web-cli https://video-heavy-site.com --timeout 180000
# → Extended timeout for large video downloads
# → Maintains video element structure and JavaScript controls🎨 Terminal UI Showcase
════════════════════════════════════════════════════════════════════════════════
🪞 Mirror Web CLI v1.1.3
Professional Website Mirroring
════════════════════════════════════════════════════════════════════════════════
✨ Features:
• Intelligent framework detection (React, Vue, Angular, Next.js, etc.)
• Framework-preserving output with professional structure
• Comprehensive asset extraction and optimization
• Clean code generation with tracking script removal
🚀 Quick Start:
mirror-web-cli https://example.com
mirror-web-cli https://react-app.com --clean -o ./my-projectProgress Tracking
╭──────────────────────────────────────────────────────────────────────────────╮
● Step 3/7 • Framework Analysis
Detecting technology stack and framework patterns...
╰──────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────────────────────────────╮
📦 Framework Analysis
Framework: Next.js
Confidence: 95% ████████████████████░
Complexity: HIGH
Strategy: Preserve DOM; localize assets for exact Next.js look
╰──────────────────────────────────────────────────────────────────────────────╯🛡️ Privacy & Security
Tracking Removal (--clean flag)
- Google Analytics (gtag, ga, analytics.js)
- Google Tag Manager (gtm, dataLayer)
- Facebook Pixel (fbevents, facebook.com/tr)
- Service Workers (registration scripts)
- Third-party trackers (extensive database)
Safety Considerations
- Always respect robots.txt and terms of service
- Ensure you have permission to mirror content
- Use responsibly and ethically
- Consider rate limiting for large sites
🏗️ Architecture Overview
src/
├── cli.js # Command-line interface & argument parsing
├── core/ # Core functionality modules
│ ├── mirror-cloner.js # Main orchestrator class
│ ├── browser-engine.js # Puppeteer browser management
│ ├── framework-analyzer.js # Intelligent framework detection
│ ├── asset-manager.js # Comprehensive asset extraction
│ ├── framework-writer.js # Output generation & structure
│ ├── display.js # Beautiful terminal UI system
│ ├── logger.js # Logging & warning management
│ ├── file-writer.js # File system operations
│ ├── filename-utils.js # Smart filename generation
│ └── server.js # Optional static server
└── ai/ # AI-powered analysis (optional)
└── ai-analyzer.js # OpenAI integration for analysis🧩 Extending the Tool
Adding New Framework Detection
// In src/core/framework-analyzer.js
this.frameworks.myframework = {
name: 'My Framework',
patterns: [
{ type: 'script', pattern: /myframework\.js/ },
{ type: 'element', selector: '#my-app' },
{ type: 'meta', name: 'generator', pattern: /myframework/i }
]
};Custom Asset Processing
// In src/core/asset-manager.js
async extractCustomAssets() {
// Add your custom asset extraction logic
}🤝 Contributing
We welcome contributions! Here's how to get started:
# Development setup
git clone https://github.com/SanjeevSaniel/mirror-web-cli.git
cd mirror-web-cli
npm install
# Run tests
npm test
# Development with debugging
npm run dev -- https://example.com --debugKey Areas for Contribution
- Framework Detection: Add support for new frameworks
- Asset Processing: Improve extraction algorithms
- Output Optimization: Enhance generated code quality
- Terminal UI: Improve user experience
- Documentation: Help others understand the tool
🐛 Troubleshooting
Common Issues
"Cannot read properties of undefined" Error
- Fixed in v1.0 - update to latest version
- Use
--debugflag for detailed error information
Incomplete Asset Loading
- Increase timeout:
--timeout 180000(3 minutes) - Check network connectivity
- Some dynamic content may require JavaScript enabled
Framework Not Detected
- Use
--debugto see detection process - Framework patterns may need updating for newer versions
- Manual inspection may be needed for custom frameworks
Environment Variable Issues
Windows PowerShell "export command not found":
# ❌ Wrong (Bash syntax)
export OPENAI_API_KEY="sk-..."
# ✅ Correct (PowerShell syntax)
$env:OPENAI_API_KEY="sk-..."Windows Command Prompt:
# ✅ Correct (CMD syntax)
set OPENAI_API_KEY=sk-your-key-hereVerify environment variable is set:
# PowerShell
echo $env:OPENAI_API_KEY
# Command Prompt
echo %OPENAI_API_KEY%
# Bash/Zsh
echo $OPENAI_API_KEYAI Features Not Working
- Verify OpenAI API key is set correctly (see above)
- Check API key format: Must start with
sk- - Ensure sufficient OpenAI credits/quota
- Use
--debugto see AI analysis process
Blank Screen or Empty Content
Iframe-based sites (like hitesh.ai):
Some sites are just iframe wrappers pointing to external URLs
Example:
hitesh.ailoadshiteshchoudhary.comin an iframeSolution: Mirror the actual content site directly:
# Instead of the wrapper mirror-web-cli https://hitesh.ai # Mirror the actual content mirror-web-cli https://hiteshchoudhary.com --clean
Sites with heavy JavaScript dependencies:
Some React/Next.js sites may need additional processing
Try AI-enhanced mode for better framework handling:
mirror-web-cli https://your-site.com --ai --clean
Getting Help
- Check the GitHub Issues
- Use
--debugflag for detailed logging - Include error output when reporting bugs
📊 Performance Stats
- Average Processing Time: 15-45 seconds per site
- Asset Extraction Rate: 95%+ success rate
- Framework Detection Accuracy: 90%+ for supported frameworks
- Memory Usage: Optimized for large sites (>1000 assets)
🙏 Acknowledgments
Special thanks to the amazing open-source community:
- Puppeteer - Headless browser automation
- Cheerio - Server-side HTML parsing
- Chalk - Terminal styling
- Commander - CLI framework
- Sharp - Image processing
📄 License
MIT License - see LICENSE file for details.
Made with ❤️ by Sanjeev Saniel Kujur
Convert any website to universal HTML/CSS/JS with intelligent framework preservation!
