@udx/mcurl
v1.2.7
Published
curl for markdown — fetch any URL as clean Markdown, search the web, and map domains. Pipe to mq for jq-like querying.
Readme
mCurl
curl but in markdown - fetches content from URLs and converts to markdown.
By default, mCurl returns the body/article/content of any URL that's HTML and converts it to valid Markdown. If the response is in JSON, it also converts it to Markdown in a readable format.
Features
- Converts HTML to Markdown, extracting main content by default
- Converts JSON to Markdown in a structured, readable format
- Pluggable handlers for different content types (extensible architecture)
- Selector support for extracting specific parts of HTML
- Global config file support (~/.udx/mcurl.yml) for domain-specific settings
- Options to include site's menu/header/footer
- Designed for AI tool integration and chaining with SERP API responses
- Customizable headers, user agents, and authentication
Installation
# Install globally
npm install -g @udx/mcurl
# After installation, the command is available as 'mcurl'After installation, you can use the command mcurl directly.
Usage
# Basic usage - fetch a URL and convert to markdown
mcurl https://udx.io
# Extract specific content using CSS selector
mcurl --selector "article.main-content" https://udx.io/about
# Include header and footer
mcurl --include header,footer https://udx.io
# Use custom headers
mcurl --header "Authorization: Bearer token" https://api.example.com
# Save output to file
mcurl --output result.md https://udx.io
# Use custom user agent
mcurl --user-agent "MyBot/1.0" https://udx.io
# Verbose output
mcurl --verbose https://udx.io
# Use like curl with JSON response converted to markdown
mcurl -X POST "https://api.example.com/endpoint" -H "Content-Type: application/json" -d '{"query": "example"}'
# WARNING: Never use internal development ports (like localhost:3456) in production or share them externally
# Always use public API endpoints for production use
# Example with Elasticsearch-like query (for development environments)
mcurl -X POST "http://localhost:3456/harvest-v1/_search" -H "Content-Type: application/json" -d '{
"size": 0,
"query": {
"bool": {
"must": [
{"range": {"date": {"gte": "2024-01-01", "lte": "2024-12-31"}}}
]
}
},
"aggs": {
"by_billable": {
"terms": {
"field": "billable"
},
"aggs": {
"hours_sum": {"sum": {"field": "hours"}},
"cost_estimate": {"sum": {"script": {"source": "doc[\"hours\"].value * 85"}}},
"revenue": {"sum": {"field": "billableAmount"}}
}
}
}
}'
# Complex Elasticsearch-like queries (for production)
mcurl -X POST "https://api.example.com/search" -H "Content-Type: application/json" -d '{
"size": 0,
"query": {
"bool": {
"must": [
{"range": {"date": {"gte": "2024-01-01", "lte": "2024-12-31"}}}
]
}
},
"aggs": {
"by_billable": {
"terms": {
"field": "billable"
},
"aggs": {
"hours_sum": {"sum": {"field": "hours"}},
"cost_estimate": {"sum": {"script": {"source": "doc[\"hours\"].value * 85"}}},
"revenue": {"sum": {"field": "billableAmount"}}
}
}
}
}'Configuration File
mCurl supports a global configuration file at ~/.udx/mcurl.yml that allows you to specify default options and domain-specific settings:
# Global defaults
user-agent: "mCurl/1.0"
timeout: 30000
format: markdown
# Domain-specific settings
domains:
"udx.io":
headers:
- "Cookie: session=abc123"
selector: "main.content"
"api.example.com":
headers:
- "Authorization: Bearer YOUR_TOKEN"
user-agent: "CustomBot/2.0"Examples
HTML Example
# Fetch and convert HTML to markdown
mcurl https://udx.ioOutput:
# UDX - Digital Experience Agency
We build digital experiences that transform businesses and drive growth.
## Our Services
- Web Development
- Digital Marketing
- UX Design
- Cloud SolutionsJSON Example
# Fetch and convert JSON to markdown
mcurl https://udx.io/wp-json/udx/v2/works/search?query=&page=0Output:
# JSON Response
## results
| id | title | excerpt | ... |
| --- | --- | --- | --- |
| 123 | Project A | This is project A | ... |
| 456 | Project B | This is project B | ... |
...
## pagination
```json
{
"currentPage": 0,
"totalPages": 5,
"totalItems": 48
}
## Web Search Integration
mCurl integrates with [SearchAPI](https://searchapi.io) for web research directly from the command line. Requires the `SEARCH_API_KEY` environment variable. If the key is not set, search features return empty results with a stderr warning.
### Search Mode
```bash
# Basic web search - returns titles, URLs, and snippets
mcurl --search "federal government website redesign"
# Deep search - fetches and converts each result page to markdown
mcurl --search "enterprise WordPress architecture" --deep
# Limit results and max characters per page
mcurl --search "Drupal vs WordPress government" --limit 5 --max-chars 2000
# Restrict search to a specific domain
mcurl --search "contract vehicles" --site coforma.io
# Filter by freshness: d(ay), w(eek), m(onth), y(ear)
mcurl --search "federal website award 2024" --freshness yDomain Mapping
Map a domain's indexed pages (uses Google's site: operator):
# List top pages indexed by Google for a domain
mcurl --map example.com
# Get more results
mcurl --map example.com --limit 15
# Deep mode fetches each indexed page
mcurl --map example.com --deep --max-chars 1000Search Options
| Option | Description |
|--------|-------------|
| --search <query> | Perform a web search and return results |
| --map <domain> | List top indexed pages for a domain |
| --deep | Fetch and convert each result page to markdown |
| --limit <n> | Number of results to return (default: 20) |
| --max-chars <n> | Max characters per fetched page in deep mode (default: 5000) |
| --site <domain> | Restrict search to a specific domain |
| --freshness <period> | Filter by recency: d, w, m, or y |
Piping Search Results to mq
Combine mcurl --search with @udx/mq for powerful analysis of search results:
# Analyze search results structure
mcurl --search "cloud infrastructure" --limit 3 | mq --analyze
# Extract headings from search results
mcurl --search "cloud infrastructure" --limit 3 | mq '.headings[]'
# Get document structure of search results
mcurl --search "udx cloud" --site udx.io --limit 3 | mq --structure
# Clean narrative content from search results (strip code blocks)
mcurl --search "udx devops" --site udx.io --limit 3 | mq --clean-content
# Deep search + word count via mq
mcurl --search "WordPress hosting" --limit 3 --deep --max-chars 500 | mq --count
# Map a domain then analyze its page structure
mcurl --map udx.io --limit 5 | mq --analyzeIntegration with AI Tools
mCurl is designed to be easily integrated with AI tools and command-line workflows:
# Fetch content and pipe to other tools
mcurl https://udx.io | head -20
# Chain with jq for JSON processing
mcurl https://api.github.com/repos/torvalds/linux --format json | jq '.stargazers_count'License
MIT
