openscrape-cli
v1.0.0
Published
Interactive CLI tool for web scraping with Puppeteer. Extract titles, descriptions, links, headings, paragraphs, and full text from any website.
Maintainers
Readme
OpenScrape CLI
Interactive CLI tool for web scraping with Puppeteer. Extract titles, descriptions, links, headings, paragraphs, and full text from any website.
Installation
Global Installation
npm install -g openscrape-clLocal Installation
npm install openscrape-clUsage
Interactive Mode
openscrapeOr if installed locally:
npx openscrape-clUsing the MCP Server
To use the MCP server capabilities, run:
node mcp-server.jsThe MCP server provides two tools:
scrape_page: Scrape a single web pagecrawl_pages: Crawl multiple pages starting from a URL
Features
- Interactive CLI: User-friendly terminal interface with colorful prompts
- Screenshot Support: Capture full page or viewport screenshots
- Multiple Output Formats: Save results as Markdown, XML, or HTML
- Comprehensive Extraction: Extract titles, descriptions, links, headings, paragraphs, and full text
- MCP Server: Model Context Protocol server for integration with AI assistants
Example
██████╗ ██████╗ ███████╗███╗ ██╗███████╗ ██████╗██████╗ █████╗ ██████╗ ███████╗
██╔═══██╗██╔══██╗██╔════╝████╗ ██║██╔════╝██╔════╝██╔══██╗██╔══██╗██╔══██╗██╔════╝
██║ ██║██████╔╝█████╗ ██╔██╗ ██║███████╗██║ ██████╔╝███████║██████╔╝█████╗
██║ ██║██╔═══╝ ██╔══╝ ██║╚██╗██║╚════██║██║ ██╔══██╗██╔══██║██╔═══╝ ██╔══╝
╚██████╔╝██║ ███████╗██║ ╚████║███████║╚██████╗██║ ██║██║ ██║██║ ███████╗
╚═════╝ ╚═╝ ╚══════╝╚═╝ ╚═══╝╚══════╝ ╚═════╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚══════╝
Current directory: /your/project/directory
────────────────────────────────────────────────────────────────────────────────
┌─ > Enter the URL to scrape:─┐Configuration
Create a config/default.yaml file to configure scraping options:
urls:
- https://example.com
- https://example.org
screenshots:
enabled: true
dir: ./output/screenshots
output:
format: md
path: ./output/scraped.md
delay: 1000Requirements
- Node.js 18+
- npm
License
ISC
