super-curl

v0.1.3

Published

2 months ago

scurl: super-curl powered by Lightpanda (local or Docker)

0High
0Medium
0Low

godsendgeoff

curl cli browser lightpanda scraping

super-curl (`scurl`)

A curl-style CLI that uses a headless browser runtime for JS-heavy pages.

scurl runs Lightpanda (local install or Docker), executes browser-only JavaScript, allows page actions like clicks, then prints extracted output to stdout.

Why "super-curl"?

Unlike plain curl, this command can:

execute browser JS
wait for rendered DOM nodes
click elements before extracting
evaluate browser-context JavaScript
persist cookies/localStorage/sessionStorage between runs

Requirements

Node.js 18+
One Lightpanda runtime:
- local Lightpanda binary (either lightpanda in PATH, or cached at ~/.cache/lightpanda-node/lightpanda from @lightpanda/browser), or
- Docker daemon running

Install (npm)

Install globally from npm:

npm install -g super-curl
scurl --help

Or run without installing:

npx super-curl --help

Install from source (development)

npm install
node ./bin/scurl --help

Optional if you want to run it by name (scurl) instead of node ./bin/scurl:

npm link

Telemetry

Telemetry is disabled for both runtime modes by default.

scurl sets this automatically when launching Lightpanda:

LIGHTPANDA_DISABLE_TELEMETRY=true

Usage

scurl [options] <url>

Options

--select <css>: extraction scope
--wait-for <css>: wait for selector (repeatable, ordered)
--click <css>: click selector (repeatable, ordered)
--eval <js>: evaluate JS in page context
--format <text|html|markdown|links|json>
--session <file>: load + save browser state
--timeout <ms>: default 10000
--runtime <auto|local|docker>: default auto
--docker: shorthand for --runtime docker
--lightpanda-bin <path>: local binary path (default: lightpanda)
--container-image <img>: Docker image (default: lightpanda/browser:nightly)
--keep-container: reuse one persistent Lightpanda Docker container between runs (Docker runtime)
--keep-container-name <name>: persistent container name (default: scurl-lightpanda-keep)
--drop-keep-container: remove the persistent keep container and exit
-h, --help
-V, --version

Runtime behavior

scurl supports three runtime modes:

--runtime auto (default): try local lightpanda first, fall back to Docker if local startup fails
--runtime local: require local lightpanda binary
--runtime docker (or --docker): force Docker

If you pass Docker-specific flags like --keep-container or --container-image, auto mode will use Docker.

Local runtime troubleshooting

If auto mode unexpectedly falls back to Docker, force local mode to see the local error:

scurl --runtime local https://example.com --select "h1"

If you installed @lightpanda/browser, scurl automatically tries the cached binary path:

~/.cache/lightpanda-node/lightpanda

You can also set it explicitly:

scurl --runtime local --lightpanda-bin "$HOME/.cache/lightpanda-node/lightpanda" https://example.com --select "h1"

Selector syntax (`--select`, `--wait-for`, `--click`)

scurl passes selectors directly to Playwright (page.click / page.waitForSelector), so selector behavior follows Playwright selector rules.

Standard CSS selectors are supported (#id, .class, a[href], main article:nth-of-type(2), etc.)
Playwright selector extensions are also supported (:visible, :has-text(), :nth-match(), etc.)
Playwright extensions are not part of the W3C CSS spec

References:

Playwright locator docs: https://playwright.dev/docs/locators
Playwright other locators / selector extensions: https://playwright.dev/docs/other-locators
W3C Selectors Level 4 (standard CSS): https://www.w3.org/TR/selectors-4/
MDN CSS selectors reference: https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_selectors

Example (click first link, wait for next page content, click third link):

scurl "https://example.com" \
  --click ':nth-match(a, 1)' \
  --wait-for ':nth-match(a, 3)' \
  --click ':nth-match(a, 3)' \
  --eval 'window.location.href'

Examples

Extract rendered text (auto runtime):

scurl https://news.ycombinator.com

Force local runtime:

scurl --runtime local https://example.com --select "h1"

Force Docker runtime:

scurl --docker https://example.com --select "h1"

Wait for SPA content then extract markdown:

scurl https://example.com/docs --wait-for "main" --format markdown

Click then wait then extract links:

scurl https://example.com --click "#open-menu" --wait-for ".menu a" --format links

Evaluate browser JS and pretty-print JSON:

scurl https://example.com --eval "window.location.href" --format json

Pipe scurl output to jq (extract page title + link list):

scurl https://example.com \
  --eval '({
    title: document.title,
    links: Array.from(document.querySelectorAll("a")).map(a => ({
      text: (a.textContent || "").trim(),
      href: a.href
    }))
  })' \
  --format json | jq -r '.title, (.links[].href)'

Pipe scurl text output to grep/sed to find matching lines:

scurl https://react.dev/reference/react/useEffect --format text \
  | grep -in "cleanup" \
  | sed -n '1,10p'

Pipe link output to xargs and fetch each linked page title:

scurl https://news.ycombinator.com --format links \
  | grep -E '^https?://' \
  | head -n 5 \
  | xargs -I{} sh -c 'printf "%s -> " "{}"; scurl "{}" --eval "document.title"'

Persist session state:

scurl https://app.local/login --click "button[type=submit]" --session ./.scurl/session.json
scurl https://app.local/dashboard --session ./.scurl/session.json --wait-for "#ready"

Use local files:

scurl file:///Users/me/site/index.html --format text

Reuse one persistent Lightpanda Docker container across multiple calls (faster):

scurl verifies the kept container has LIGHTPANDA_DISABLE_TELEMETRY=true before reuse.

scurl --docker --keep-container https://example.com --wait-for "main"
scurl --docker --keep-container https://example.com/about --format markdown

When you are done with keep mode, remove the persistent container:

scurl --drop-keep-container
# or with a custom name:
scurl --drop-keep-container --keep-container-name my-scurl-lpd

Exit codes

0: success
2: usage/argument error
3: runtime error (Docker/CDP/navigation/extraction)

Development

Run tests:

npm test

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

super-curl (scurl)