anycrawl-cli

v0.2.0

Published

2 months ago

Command-line interface for AnyCrawl. Scrape, crawl, search, and map websites from your terminal.

0High
0Medium
0Low

ntbperst

anycrawl cli web-scraping crawler scraper data-extraction llm markdown

AnyCrawl CLI

Command-line interface for AnyCrawl. Scrape, crawl, search, and map websites from your terminal.

Installation

npm install -g anycrawl-cli

Or use the one-command setup:

npx -y anycrawl-cli init

Quick Start

Authenticate with your API key (get one at anycrawl.dev/dashboard):

anycrawl login --api-key <your-api-key>

Or set the environment variable:

export ANYCRAWL_API_KEY=<your-api-key>

Then try:

# Scrape a URL ( shorthand )
anycrawl https://example.com

# Search the web
anycrawl search "web scraping tools"

# Discover URLs on a site
anycrawl map https://example.com

# Crawl a website
anycrawl crawl https://example.com --wait -o results.json

Skills

Install the AnyCrawl skill for Cursor, Codex, and other AI coding agents. Lets your Agent directly use scrape, search, map, and crawl.

Steps:

Install CLI and authenticate

npm install -g anycrawl-cli
anycrawl login --api-key <your-api-key>

Install the skill
```
anycrawl setup skills
```
By default this installs to all detected agents (Cursor, Codex, etc.). For a specific agent:
```
anycrawl setup skills --agent cursor
```
Restart your Agent
Restart your AI coding assistant after installation to load the new skill.

Commands

| Command | Description | | ----------------------- | ----------------------------------------------------- | | scrape [urls...] | Scrape URL(s) and extract content (default: markdown) | | crawl [url-or-job-id] | Crawl a website or check crawl status | | search <query> | Search the web with optional result scraping | | map [url] | Discover URLs on a website | | login | Authenticate with API key | | logout | Clear stored credentials | | config | View or update configuration | | setup skills | Install AnyCrawl skill for AI coding agents | | setup mcp | Get MCP configuration for AnyCrawl |

Options

Default engine: auto (automatically selects the best engine for each page)
Output: Use -o or --output to save to file. Recommended: .anycrawl/ directory
Global: -k, --api-key, --api-url work with any command

Scrape

anycrawl scrape https://example.com -o page.md
anycrawl scrape https://example.com --format html,links --json -o data.json

Options: --engine, --format, --wait-for, --proxy, --output, --json, --pretty

Crawl

anycrawl crawl https://example.com
anycrawl crawl <job-id>              # Check status
anycrawl crawl https://example.com --wait --progress -o crawl.json
anycrawl crawl cancel <job-id>       # Cancel a job

Options: --wait, --progress, --limit, --max-depth, --include-paths, --exclude-paths

Search

anycrawl search "machine learning" -o .anycrawl/search.json
anycrawl search "tutorials" --scrape --limit 5

Options: --limit, --pages, --lang, --country, --scrape, --scrape-formats

Map

anycrawl map https://example.com -o urls.txt
anycrawl map https://example.com --limit 500 --json

Options: --limit, --include-subdomains, --ignore-sitemap

MCP

Use AnyCrawl via MCP integration:

anycrawl setup mcp

Outputs your MCP URL—add it to your Cursor or other tool’s MCP configuration.

Documentation

AnyCrawl API Docs

Self-hosted

For self-hosted AnyCrawl instances:

export ANYCRAWL_API_URL=https://your-api.example.com
anycrawl scrape https://example.com

Or use --api-url with any command.