sourcethecode
v1.0.0
Published
A powerful CLI tool to download and parse source code from any website, extracting TypeScript, JavaScript, CSS, and other source files
Maintainers
Readme
sourcethecode (stc)
A powerful CLI tool and library to download and parse source code from any website, extracting TypeScript, JavaScript, CSS, and other source files in a readable format similar to Chrome DevTools.
Features
- Universal compatibility: Works with any website, not just specific platforms
- Multi-source extraction: Captures files from network requests, webpack bundles, source maps, and inline scripts
- TypeScript/TSX support: Properly identifies and saves TypeScript and React TypeScript files
- Source map parsing: Extracts original source files from source maps
- Directory structure: Maintains original file hierarchy
- Comprehensive reporting: Generates detailed download reports
- Duplicate detection: Avoids downloading duplicate files
- Error handling: Robust error handling with detailed logging
- CLI & Library: Use as a command-line tool or import as a library
Installation
Global Installation (CLI)
npm install -g sourcethecodeLocal Installation (Library)
npm install sourcethecodeDevelopment Installation
git clone <repository-url>
cd sourcethecode
npm install
npx playwright install chromiumCLI Usage
Basic Usage
# Using full command name
sourcethecode https://example.com
# Using short alias
stc https://example.com
# With custom output directory
sourcethecode https://react.dev -o ./react-source
stc https://vuejs.org --output ./vue-sourceCLI Options
sourcethecode <url> [options]
Options:
-o, --output <dir> Output directory (default: ./downloaded_source)
-h, --help Show this help message
-v, --version Show versionExamples
# Download from any website
sourcethecode https://github.com
# Save to specific directory
stc https://stackoverflow.com -o ./stackoverflow-source
# Download from documentation sites
sourcethecode https://developer.mozilla.org -o ./mdn-docsLibrary Usage
Basic Usage
import { SourceDownloaderAndParser } from 'sourcethecode';
async function downloadSource() {
const downloader = new SourceDownloaderAndParser(
'https://example.com',
'./my-download-folder'
);
try {
const sources = await downloader.downloadAndParse();
console.log(`Downloaded ${sources.length} files`);
} catch (error) {
console.error('Download failed:', error);
}
}
downloadSource();Advanced Usage
import { SourceDownloaderAndParser } from 'sourcethecode';
async function advancedDownload() {
const downloader = new SourceDownloaderAndParser('https://example.com');
// Custom file filtering
const originalSaveFiles = downloader.saveFiles;
downloader.saveFiles = async (files) => {
const tsFiles = files.filter(f =>
f.fileType === 'typescript' || f.fileType === 'typescript-react'
);
return originalSaveFiles.call(downloader, tsFiles);
};
const sources = await downloader.downloadAndParse();
return sources;
}How It Works
1. Browser Initialization
- Launches Chromium browser with appropriate settings
- Sets up request interception for capturing network traffic
- Configures optimal viewport and user agent
2. Source Extraction Methods
Network Files
- Downloads all external JavaScript and CSS files
- Preserves original filenames and directory structure
- Handles relative and absolute URLs
Webpack Sources
- Extracts modules from webpack bundles
- Identifies source files through webpack internals
- Recovers original module structure
Source Maps
- Parses source map files to extract original TypeScript/TSX sources
- Recovers original file structure, comments, and formatting
- Handles both inline and external source maps
Inline Scripts
- Captures inline JavaScript and CSS
- Saves as separate files with descriptive names
- Preserves execution order
3. File Organization
- Creates organized directory structure
- Maintains original file hierarchy
- Sanitizes filenames for filesystem compatibility
- Generates comprehensive download report
Output Structure
downloaded_source/
├── src/
│ ├── components/
│ │ ├── Header.tsx
│ │ └── Footer.tsx
│ ├── utils/
│ │ └── helpers.ts
│ └── index.ts
├── styles/
│ ├── main.css
│ └── components/
│ └── button.css
├── static/
│ ├── js/
│ │ └── vendor/
│ └── css/
└── download_report.jsonConfiguration
Environment Variables
HEADLESS=false- Run browser in visible mode (default: true)DEBUG=pw:api- Enable Playwright debug logging
Browser Options
The library uses sensible defaults, but you can extend the class to customize:
- Browser viewport size
- User agent string
- Network timeout settings
- Additional browser arguments
Supported File Types
| Extension | Type | Description |
| --------- | ---------------- | ------------------------ |
| .ts | TypeScript | TypeScript source files |
| .tsx | TypeScript React | React TypeScript files |
| .js | JavaScript | JavaScript source files |
| .jsx | JavaScript React | React JavaScript files |
| .css | CSS | Stylesheet files |
| .scss | SCSS | Sass stylesheets |
| .sass | Sass | Sass stylesheets |
| .html | HTML | HTML template files |
| .json | JSON | Configuration/data files |
Troubleshooting
Common Issues
"Cannot find module 'playwright'"
npm install npx playwright install chromiumPermission errors on Linux/macOS
sudo npx playwright install-deps chromiumNetwork timeouts
- Check internet connection
- Verify the target URL is accessible
- Try increasing timeout in browser options
Missing source files
- Some files may be dynamically loaded
- Check browser console for errors during manual inspection
- Try increasing wait times
Debug Mode
Enable debug logging:
DEBUG=pw:api sourcethecode https://example.comAPI Reference
SourceDownloaderAndParser
Constructor
new SourceDownloaderAndParser(url: string, outputDir?: string)Methods
downloadAndParse(): Promise<ParsedSource[]>- Main method to download and parse all sourcesinitialize(): Promise<void>- Initialize browser and pagenavigateToSite(): Promise<void>- Navigate to target URLclose(): Promise<void>- Clean up browser resources
Interfaces
interface SourceFile {
url: string;
content: string;
fileName: string;
fileType: string;
}
interface ParsedSource {
originalUrl: string;
localPath: string;
content: string;
fileType: string;
}Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests if applicable
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development
# Install dependencies
npm install
# Install Playwright browsers
npx playwright install chromium
# Run in development mode
npm run dev
# Test CLI locally
npm link
sourcethecode https://example.comLicense
MIT License - feel free to use this tool for any purpose.
Changelog
v1.0.0
- Initial release
- CLI support with
sourcethecodeandstccommands - Library support for programmatic usage
- Support for all major file types
- Source map parsing
- Comprehensive reporting
- Cross-platform compatibility
