website-source-extractor
v1.0.0
Published
A CLI tool that extracts source code and assets from websites for analysis and offline viewing
Maintainers
Readme
Website Source Extractor
⭐ If you find this tool helpful, please consider giving it a star on GitHub! ⭐
A command-line tool that extracts source code and assets from websites for analysis and offline viewing.
Features
- Extract HTML, JavaScript, CSS, and media assets from websites
- Process source maps to recover original source code
- Support for iframe processing with configurable depth
- Save all content to a configurable output directory
- Detect npm dependencies used in JavaScript files
- Verbose logging options for detailed output
Installation
Global Installation (Recommended)
npm install -g website-source-extractorLocal Installation
npm install website-source-extractorUsage
Basic Usage
website-source-extractor https://example.comWith Options
website-source-extractor https://example.com --output ./my-extracted-site --verbose --max-depth 2Command Options
| Option | Alias | Description | Default |
| --------------- | ----- | ------------------------------------------------------ | ------------------------ |
| --output | -o | Output directory for extracted content | ./extracted-{hostname} |
| --verbose | -v | Enable verbose logging | false |
| --save-assets | -a | Save all assets (images, CSS, JS, and process iframes) | true |
| --max-depth | -d | Maximum depth for processing iframes | 2 |
| --help | | Show help | |
Examples
Extract a Single-Page Application with Iframes
website-source-extractor https://my-react-app.com --save-assets --max-depth 2Extract Only HTML and JavaScript (No Assets)
website-source-extractor https://example.com --no-save-assetsTroubleshooting
CORS Issues
If you encounter CORS issues when extracting from certain websites, this is expected behavior as the tool respects web security protocols. Some assets may not be accessible.
Large Websites
For large websites with many assets, the extraction process may take some time. Use the --verbose flag to see detailed progress.
Memory Issues
If you encounter memory issues with very large websites, try extracting specific sections or limiting the iframe depth with --max-depth.
Support this Project
If you find this tool valuable for your work, please consider supporting its development:
⭐ Star the Repository
The simplest way to show your support is to star the project on GitHub.
💖 Sponsor
You can financially support this project through:
🤝 Contribute
Contributions are welcome! Check out the contribution guidelines.
Development
Building from Source
git clone https://github.com/kabonkoda/website-source-extractor.git
cd website-source-extractor
npm install
npm run buildRunning in Development Mode
npm run dev -- https://example.comLicense
MIT
Author
Adeyeye Oluwatobiloba (@kabonkoda)
