just-scrape
v1.1.0
Published
ScrapeGraph AI CLI tool
Readme
just-scrape
Made with love by the ScrapeGraphAI team 💜

Command-line interface for ScrapeGraph AI — AI-powered web scraping, data extraction, search, crawling, and page-change monitoring.
v1.0.0 — SDK v2 migration. This release migrates the CLI to the scrapegraph-js v2 SDK. The v1 endpoints (
smart-scraper,search-scraper,markdownify,sitemap,agentic-scraper,generate-schema) have been removed. Usescrape --format …for multi-format output,extractfor structured data, and the newmonitorcommand for page-change tracking.
Project Structure
just-scrape/
├── src/
│ ├── cli.ts # Entry point, citty main command + subcommands
│ ├── lib/
│ │ ├── env.ts # Env config (API key, JUST_SCRAPE_* → SGAI_* bridge)
│ │ ├── folders.ts # API key resolution + interactive prompt
│ │ └── log.ts # Logger factory + syntax-highlighted JSON output
│ ├── commands/
│ │ ├── scrape.ts # scrape — multi-format (markdown/html/screenshot/json/…)
│ │ ├── extract.ts # extract — structured extraction with prompt + schema
│ │ ├── search.ts # search — web search + extraction
│ │ ├── crawl.ts # crawl — multi-page crawling (polls until done)
│ │ ├── monitor.ts # monitor — create/list/get/update/delete/pause/resume/activity
│ │ ├── history.ts # history — paginated browser for past requests
│ │ ├── credits.ts # credits — balance + job quotas
│ │ └── validate.ts # validate — API key health check
│ └── utils/
│ └── banner.ts # ASCII banner + version from package.json
├── dist/ # Build output (git-ignored)
│ └── cli.mjs # Bundled ESM with shebang
├── tests/
│ └── smoke.test.ts # SDK export smoke test
├── package.json
├── tsconfig.json
├── tsup.config.ts
├── biome.json
└── .gitignoreInstallation
npm install -g just-scrape # npm (recommended)
pnpm add -g just-scrape # pnpm
yarn global add just-scrape # yarn
bun add -g just-scrape # bun
npx just-scrape --help # or run without installing
bunx just-scrape --help # bun equivalentPackage: just-scrape on npm.
Coding Agent Skill
You can use just-scrape as a skill for AI coding agents via Vercel's skills.sh with this tutorial.
Or you can manually install it:
bunx skills add https://github.com/ScrapeGraphAI/just-scrapeBrowse the skill: skills.sh/scrapegraphai/just-scrape/just-scrape
Configuration
The CLI needs a ScrapeGraph API key. Get one at dashboard.scrapegraphai.com.
Four ways to provide it (checked in order):
- Environment variable:
export SGAI_API_KEY="sgai-..." .envfile:SGAI_API_KEY=sgai-...in project root- Config file:
~/.scrapegraphai/config.json - Interactive prompt: the CLI asks and saves to config
Environment Variables
| Variable | Description | Default |
|---|---|---|
| SGAI_API_KEY | ScrapeGraph API key | — |
| SGAI_API_URL | Override API base URL (also JUST_SCRAPE_API_URL) | https://api.scrapegraphai.com/api/v2 |
| SGAI_TIMEOUT | Request timeout in seconds (also legacy SGAI_TIMEOUT_S / JUST_SCRAPE_TIMEOUT_S) | 120 |
| SGAI_DEBUG | Set to 1 to enable debug logging to stderr (also JUST_SCRAPE_DEBUG) | 0 |
JSON Mode (--json)
All commands support --json for machine-readable output. When set, banner, spinners, and interactive prompts are suppressed — only minified JSON on stdout (saves tokens when piped to AI agents).
just-scrape credits --json | jq '.remaining'
just-scrape scrape https://example.com --json > result.json
just-scrape history scrape --json | jq '.[].id'Scrape
Multi-format scraping from a URL. Supports markdown, html, screenshot, branding, links, images, summary, json — comma-separated for multi-format output. docs
Usage
just-scrape scrape <url> # markdown (default)
just-scrape scrape <url> -f html # raw HTML
just-scrape scrape <url> -f markdown,screenshot,links # multi-format
just-scrape scrape <url> -f json -p "<prompt>" # AI extraction
just-scrape scrape <url> -f json -p "..." --schema '<json>'# enforce schema
just-scrape scrape <url> --html-mode reader # readable-view cleanup
just-scrape scrape <url> -m js --stealth # JS rendering + anti-bot
just-scrape scrape <url> --country us --scrolls 3 # geo + infinite scrollExamples
# Replace legacy markdownify
just-scrape scrape https://blog.example.com/post
# Replace legacy smart-scraper (AI extraction)
just-scrape scrape https://store.example.com -f json \
-p "Extract product name, price, and rating"
# Multi-format: markdown + screenshot in one request
just-scrape scrape https://example.com -f markdown,screenshot
# Structured extraction with schema
just-scrape scrape https://news.example.com -f json \
-p "All articles" \
--schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}'Extract
Structured data extraction from a URL using a prompt (and optional schema). docs
Usage
just-scrape extract <url> -p <prompt> # AI extraction
just-scrape extract <url> -p <prompt> --schema <json> # enforce schema
just-scrape extract <url> -p <prompt> --html-mode reader # content-only mode
just-scrape extract <url> -p <prompt> --stealth --scrolls 5
just-scrape extract <url> -p <prompt> --cookies <json> --headers <json>Examples
# Extract product data
just-scrape extract https://store.example.com/shoes \
-p "Extract all product names, prices, and ratings"
# Behind a login (pass cookies)
just-scrape extract https://app.example.com/dashboard \
-p "Extract user stats" \
--cookies '{"session":"abc123"}'Search
Web search + AI extraction across multiple result pages. docs
Usage
just-scrape search "<query>" # default 3 results, markdown
just-scrape search "<query>" --num-results 10 # 1-20 results
just-scrape search "<query>" -p "<extract prompt>" # extract across results
just-scrape search "<query>" -p "..." --schema <json> # structured output
just-scrape search "<query>" --country us --time-range past_week
just-scrape search "<query>" --format html # raw HTML per resultExamples
# Research a topic across multiple sources
just-scrape search "best Python web frameworks 2025" --num-results 10
# Structured comparison
just-scrape search "Top 5 cloud providers pricing" \
-p "Extract provider name, free tier, and starting price" \
--schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"},"price":{"type":"string"}}}}}}'Crawl
Multi-page crawl starting from a URL. Polls until the job reaches a terminal state. docs
Usage
just-scrape crawl <url> # markdown, defaults (50 pages, depth 2)
just-scrape crawl <url> -f markdown,links # multi-format per page
just-scrape crawl <url> --max-pages 200 --max-depth 4
just-scrape crawl <url> --max-links-per-page 25
just-scrape crawl <url> --allow-external # follow cross-domain links
just-scrape crawl <url> --include-patterns '["/blog/.*"]' --exclude-patterns '["/tag/.*"]'
just-scrape crawl <url> -m js --stealthExamples
# Crawl a docs site and collect all markdown
just-scrape crawl https://docs.example.com --max-pages 100
# Only crawl blog posts
just-scrape crawl https://example.com \
--include-patterns '["/blog/.*"]' --max-pages 50Monitor
Create and manage page-change monitors. Each monitor re-scrapes a URL on a schedule and diffs the result against the previous tick. docs
Actions
just-scrape monitor create --url <url> --interval <cron> [-f <formats>] [--webhook-url <url>]
just-scrape monitor list # all monitors
just-scrape monitor get --id <cronId>
just-scrape monitor update --id <cronId> [--interval ...] [--name ...] [-f <formats>] [--webhook-url ...]
just-scrape monitor delete --id <cronId>
just-scrape monitor pause --id <cronId>
just-scrape monitor resume --id <cronId>
just-scrape monitor activity --id <cronId> [--limit <n>] [--cursor <c>]Examples
# Monitor a price page hourly, webhook on change
just-scrape monitor create \
--url https://store.example.com/product/42 \
--interval "0 * * * *" \
-f markdown,screenshot \
--webhook-url https://hooks.example.com/price-change
# List active monitors
just-scrape monitor list
# Pause a monitor temporarily
just-scrape monitor pause --id <cronId>
# Paginate tick history
just-scrape monitor activity --id <cronId> --limit 20History
Browse request history across all services. Interactive by default — arrow keys to navigate, select to view details, "Load more" for pagination. docs
Usage
just-scrape history # all services, interactive
just-scrape history <service> # scrape | extract | search | monitor | crawl
just-scrape history <service> <request-id> # fetch specific request
just-scrape history <service> --page <n> --page-size <n>
just-scrape history <service> --json # raw JSON (pipeable)Examples
# Browse scrape history interactively
just-scrape history scrape
# Fetch a specific request by id
just-scrape history scrape 550e8400-e29b-41d4-a716-446655440000
# Export crawl history
just-scrape history crawl --json --page-size 100 | jq '.[].id'Credits
Check your credit balance and per-job quotas (crawl, monitor).
just-scrape credits
just-scrape credits --json | jq '.remaining'
just-scrape credits --json | jq '.jobs.monitor'Validate
Validate your API key (calls the SDK's checkHealth / /health endpoint).
just-scrape validateMigration from v0.x (v1 API) to v1.0 (v2 API)
The v2 API consolidates and renames endpoints. The CLI now reflects that:
| v0.x command | v1.0 equivalent |
|---|---|
| smart-scraper <url> -p "..." | scrape <url> -f json -p "..." (or extract <url> -p "...") |
| markdownify <url> | scrape <url> (markdown is the default format) |
| search-scraper "..." | search "..." (query now positional, -p for extraction) |
| scrape <url> (raw HTML) | scrape <url> -f html |
| crawl <url> -p "..." | crawl <url> -f markdown (crawl no longer runs AI extraction; run extract/scrape on the pages) |
| sitemap <url> | Removed (fetch sitemap XML directly, or use crawl with --include-patterns) |
| agentic-scraper | Removed |
| generate-schema | Removed |
credits and validate work the same; response shapes changed (see SDK v2 types).
Contributing
From Source
Requires Bun and Node.js 22+.
git clone https://github.com/ScrapeGraphAI/just-scrape.git
cd just-scrape
bun install
bun run dev --helpTech Stack
| Concern | Tool | |---|---| | Language | TypeScript 5.8 | | Dev Runtime | Bun | | Build | tsup (esbuild) | | CLI Framework | citty (unjs) | | Prompts | @clack/prompts | | Styling | chalk v5 (ESM) | | SDK | scrapegraph-js v2 | | Env | dotenv | | Lint / Format | Biome | | Target | Node.js 22+, ESM-only |
Scripts
bun run dev # Run CLI from TS source
bun run build # Bundle ESM to dist/cli.mjs
bun run lint # Lint + format check
bun run format # Auto-format
bun run check # Type-check + lint
bun test # Run unit testsLicense
ISC
Made with love by the ScrapeGraphAI team 💜
