@jenslys/curldown
v1.0.5
Published
Fetch URL content and convert it to markdown.
Readme
curldown
Fetch a webpage and return clean Markdown for AI workflows.
curldown is CLI-first:
- Static mode:
fetchHTML -> Cheerio cleanup -> Turndown markdown. - Dynamic mode: headless Chromium (Playwright) -> HTML -> markdown.
--autotries static first and falls back to dynamic when static output is thin.- Direct markdown responses are passed through (including
.mdURLs served astext/plain). --format jsonemits markdown plus metadata for agent pipelines.
Install
npm install -g @jenslys/curldownQuick Start
# Print markdown to stdout
curldown https://example.com
# JS-heavy pages
curldown https://example.com --dynamic
# Auto fallback to dynamic when static output looks incomplete
curldown https://example.com --auto
# JSON output for AI pipelines
curldown https://example.com --format json
# Write output to a file
curldown https://example.com --output page.mdCLI
curldown <url> [options]Options
--autoTry static first and fallback to dynamic when static output is thin.--dynamicUse Playwright Chromium to render before extraction.--format <type>Output format:markdown(default) orjson.-o, --output <path>Write output to file instead of stdout.--timeout-ms <number>Request/render timeout in milliseconds.--header <key:value>Custom request header (repeatable).--helpShow help.--versionShow version.
JSON Output Shape
--format json returns:
urlfinal_urltitlemarkdowncontent_typestatusfetched_atword_countsha256used_dynamic
Local Development
bun install
bun run build
bun run test
node dist/cli.js https://example.comAGENTS.md Snippet (Optional)
Paste this into your AGENTS.md if you want agents to always use curldown for website content retrieval:
## Website Content Retrieval
- Use `curldown` for website/article page retrieval in agent workflows.
- Do not use `curldown` for raw code files or repository file blobs (for those, fetch the file directly).
- Default command: `curldown <url>`.
- Prefer `curldown <url> --auto` when page rendering might be uncertain.
- Use `curldown <url> --format json` when downstream steps need structured metadata.
- Prefer stdout output unless a task explicitly requires a file (`--output <path>`).
- Do not use ad-hoc HTML scraping or direct browser automation when `curldown` can handle it.