npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

crawl4ai-cloud

v0.10.0

Published

Lightweight cloud SDK for Crawl4AI - mirrors the OSS API

Readme

Crawl4AI Cloud SDK for Node.js

One-line web scraping, screenshots, and structured extraction — powered by Crawl4AI Cloud.

npm version v0.7.0

What's new in 0.7.0

  • scrape() + scrapeMany() are the canonical methods (replacing markdown / markdownMany). Same shape, same response — they hit the new /v1/scrape(/async) endpoints.
  • markdown() / markdownMany() are kept as deprecated aliases that route to /v1/scrape. Removed in 0.8.0.
  • extractMany() signature fixed: now takes extractMany(url, { extraUrls: […], … }) (was extractMany(urls[], …), which the server rejected). method: 'auto' is now allowed for batch.
  • sources field added to scan() and map() ('primary' | 'extended'). Replaces mode, which is kept as a deprecated alias.
  • crawlSite() and deepCrawl() target deprecated endpoints. They still work and emit a DeprecationWarning. Migrate to the composable scan + scrape/extract chain — see below.

Install

npm install crawl4ai-cloud

Quick Start

import { AsyncWebCrawler } from 'crawl4ai-cloud';

const c = new AsyncWebCrawler({ apiKey: 'sk_live_...' });

// Scrape a page → clean markdown (+ optional links/media/metadata/tables)
const page = await c.scrape('https://example.com');
console.log(page.markdown);

// Take a full-page screenshot
const shot = await c.screenshot('https://example.com');
// shot.screenshot is base64 PNG

// Extract structured data — AUTO picks css_schema vs llm.
// wait: true auto-hydrates job.results from per-URL S3.
const job = await c.extractMany('https://news.ycombinator.com', {
  method: 'auto',
  query: 'get each story title, URL, and points',
  wait: true,
});
for (const r of job.results ?? []) console.log(r.extractedContent);

// Discover all URLs on a domain
const sitemap = await c.map('https://docs.python.org', { sources: 'primary' });
sitemap.urls.slice(0, 10).forEach(u => console.log(u.url));

Crawl a whole site — composable two-step

There's no single bundled "crawl this site" method anymore. Compose it:

// Step 1 — discover URLs
const scan = await c.scan('https://books.toscrape.com', {
  criteria: 'all book detail pages',
  maxUrls: 20,
});
const urls = scan.urls.map(u => u.url);

// Step 2A — pipe to scrapeMany for markdown
const mdJob = await c.scrapeMany(urls, { strategy: 'http', wait: true });

// Step 2B — OR pipe to extractMany for structured fields. The base url is
// the schema TEMPLATE — schema is generated once, then re-applied across
// extraUrls for free in css_schema mode (10-100× cheaper than per-page LLM).
const [base, ...rest] = urls;
const exJob = await c.extractMany(base, {
  extraUrls: rest,
  method: 'auto',
  query: 'book title, price, rating',
  wait: true,
});
for (const r of exJob.results ?? []) console.log(r.extractedContent);

Multi-URL fan-out — what happens under the hood

scrapeMany(urls), screenshotMany(urls), and extractMany(url, { extraUrls }) all decompose into N independent child jobs that distribute across the worker pool. Throughput scales with pool size, not with how many URLs you submitted.

const job = await c.scrapeMany(urls, { wait: false });             // returns immediately
// Poll to inspect per-URL state without downloading the data:
const status = await c.getScrapeJob(job.jobId);                    // WrapperJob
for (const u of status.urlStatuses ?? []) {
  console.log(u.index, u.url, u.status, u.durationMs, u.error);
}

// Fetch one URL's full result on demand (recipe-agnostic — works for any wrapper)
const result = await c.getPerUrlResult(job.jobId, 0);              // CrawlResult
console.log(result.markdown);

wait: true does this for you: poll → wait until terminal → call getPerUrlResult for each URL in parallel → populate job.results. Failed URLs become CrawlResult stubs (success=false + errorMessage) so job.results.length === job.urlStatuses.length.

Wrapper API Reference

| Method | What it does | Endpoint | |---|---|---| | scrape(url) | Page → clean markdown + optional include: ['links'\|'media'\|'metadata'\|'tables'] | POST /v1/scrape | | scrapeMany(urls) | Async batch scrape (≤100 URLs) | POST /v1/scrape/async | | screenshot(url) | Base64 PNG (and optional PDF) | POST /v1/screenshot | | screenshotMany(urls) | Async batch screenshot | POST /v1/screenshot/async | | extract(url, { query }) | Sync extract — single URL | POST /v1/extract | | extractMany(url, { extraUrls }) | Async extract — base + followers, schema reused for free in css_schema mode | POST /v1/extract/async | | scan(url, { sources, criteria }) | URL discovery — sources=primary/extended, optional AI criteria | POST /v1/scan | | map(url, { sources }) | Simpler URL discovery (no criteria) | POST /v1/map | | enrich({ query }) | Multi-entity table from a brief, list of entities, or list of URLs | POST /v1/enrich/async |

Deprecated (still work, emit warnings)

| Method | Migration | |---|---| | markdown(url) / markdownMany(urls) | Use scrape / scrapeMany. | | crawlSite(url, { criteria, extract }) | Use scan + extractMany. See "Crawl a whole site" above. | | deepCrawl(url, …) | Use scan(url, { scan: { mode: 'deep' } }) + the chain above. | | scan(url, { mode }) / map(url, { mode }) | Use sources: 'primary' or sources: 'extended'. |

Enrich v2 (build a data table)

Multi-phase enrichment. Give a brief, a list of entities, or a list of URLs. The job walks through planning → resolving_urls → extracting → merging, with optional human-review pauses at plan_ready and urls_ready.

// 1. Agent one-shot — give a brief, get a table back
const result = await crawler.enrich({
  query: 'licensed nurseries in North York Toronto with extended hours',
  country: 'ca',
  topKPerEntity: 3,
});
for (const row of result.phaseData.rows ?? []) {
  console.log(row.inputKey, row.fields);
  for (const [f, c] of Object.entries(row.certainty)) {
    console.log(`  ${f}: ${c.toFixed(2)}  (from ${row.sources[f].url})`);
  }
}
console.log(`crawls=${result.usage.crawls}, llm=${JSON.stringify(result.usage.llmTotals)}`);

// 2. Pre-resolved URLs — skip planning + URL resolution
const r2 = await crawler.enrich({
  urls: ['https://example.com/a', 'https://example.com/b'],
  features: ['price', 'hours'],   // string shortcut for [{name: 'price'}, ...]
});

// 3. Human review flow — pause for editing the plan
const job = await crawler.enrich({
  query: 'best Italian restaurants in Brooklyn',
  country: 'us',
  autoConfirmPlan: false,
  autoConfirmUrls: false,
  wait: false,
});

let paused = await crawler.waitEnrichJob(job.jobId, { until: 'plan_ready' });
console.log(paused.phaseData.plan?.entities, paused.phaseData.plan?.features);

await crawler.resumeEnrichJob(job.jobId, {
  entities: [{ name: "Lucali" }, { name: "Roberta's" }],
  features: [{ name: 'address' }, { name: 'hours' }],
});
paused = await crawler.waitEnrichJob(job.jobId, { until: 'urls_ready' });
await crawler.resumeEnrichJob(job.jobId);   // accept server's URL picks
const final = await crawler.waitEnrichJob(job.jobId);

// 4. Live progress via SSE
for await (const event of crawler.streamEnrichJob(job.jobId)) {
  if (event.type === 'phase')   console.log('→', event.status);
  if (event.type === 'row')     console.log('✓', event.row?.inputKey);
  if (event.type === 'complete') break;
}

// Job management
const jobs = await crawler.listEnrichJobs({ limit: 5 });
await crawler.cancelEnrichJob(job.jobId);

Vocabulary: entities are row identifiers; criteria are search-side filters; features are extraction columns. Per-purpose usage is reported in result.usage.llmTokensByPurpose with five buckets: plan_intent, url_plan, paywall_classify, extract, merge_tiebreak (only buckets that ran appear).

Wrapper API Reference

| Method | Endpoint | Returns | Description | |--------|----------|---------|-------------| | markdown(url) | POST /v1/markdown | MarkdownResponse | Clean markdown from any page | | screenshot(url) | POST /v1/screenshot | ScreenshotResponse | Full-page screenshot (PNG) or PDF | | extract(url) | POST /v1/extract | ExtractResponse | LLM or schema-based structured extraction | | map(url) | POST /v1/map | MapResponse | Simple URL discovery (always sync) | | scan(url, {criteria}) | POST /v1/scan | ScanResult | AI-assisted URL discovery with plain-English criteria | | crawlSite(url, {criteria, extract}) | POST /v1/crawl/site | SiteCrawlResponse | AI-assisted whole-site crawl (always async) | | enrich(opts) | POST /v1/enrich/async | EnrichJobStatus | Multi-phase enrichment from query / entities / URLs |

Markdown Options

const res = await c.markdown('https://example.com', {
  strategy: 'browser',   // 'browser' (default) or 'http'
  fit: true,              // return cleaned/fitted markdown (default true)
  include: ['article'],   // CSS selectors to include
  proxy: { mode: 'residential', country: 'US' },
  bypassCache: true,
});
// res.markdown, res.fitMarkdown, res.links, res.media, res.metadata

Screenshot Options

const res = await c.screenshot('https://example.com', {
  fullPage: true,         // full page vs viewport (default true)
  pdf: true,              // also generate PDF
  waitFor: 'css:.loaded', // wait for element before capture
});
// res.screenshot (base64 PNG), res.pdf (base64 PDF)

Extract Options

// Auto mode -- LLM picks the best approach
const res = await c.extract('https://example.com/products', {
  query: 'Extract product name, price, and rating',
  method: 'auto', // 'auto' (default), 'llm', or 'schema'
});

// Schema mode -- CSS/XPath selectors, no LLM cost
const res = await c.extract('https://example.com/products', {
  method: 'schema',
  schema: {
    name: 'Products',
    baseSelector: '.product-card',
    fields: [
      { name: 'title', selector: 'h2', type: 'text' },
      { name: 'price', selector: '.price', type: 'text' },
    ],
  },
});

Map Options (legacy, sync-only)

const res = await c.map('https://example.com', {
  mode: 'deep',            // 'default' or 'deep' (DomainMapper source depth)
  maxUrls: 500,
  includeSubdomains: true,
  query: 'blog posts',     // relevance scoring
  scoreThreshold: 0.5,
});

Scan Options (AI-assisted)

// AI picks everything from criteria
const result = await c.scan('https://docs.crawl4ai.com', {
  criteria: 'API reference pages',
  maxUrls: 50,
});

// Explicit overrides on top of LLM output
const result = await c.scan('https://example.com', {
  criteria: 'product pages',
  scan: {
    mode: 'auto',          // 'auto' | 'map' | 'deep'
    patterns: ['*/p/*'],
    includeSubdomains: false,
  },
});

// Async deep scan (waits for completion)
const result = await c.scan('https://directory.example.com', {
  criteria: 'company profile pages',
  scan: { mode: 'deep', maxDepth: 3 },
  wait: true,              // poll until done
  pollInterval: 3,         // seconds
});

Site Crawl Options (AI-assisted flagship)

const job = await c.crawlSite('https://books.toscrape.com', {
  // AI-assisted fields (new in v0.4.0)
  criteria: 'all book listing pages',
  scan: { mode: 'auto' },                        // optional explicit override
  extract: {
    query: 'book title, price, rating',
    jsonExample: { title: '...', price: '£0.00', rating: 0 },
    method: 'auto',                              // 'auto' | 'llm' | 'schema'
  },
  include: ['markdown', 'links'],                // drop 'markdown' to strip it
  // Standard fields
  maxPages: 50,
  strategy: 'http',                              // 'browser' or 'http'
  fit: true,
  // Legacy fields (still supported, prefer the above)
  discovery: 'bfs',                              // 'map' (default) / 'bfs' / 'dfs' / 'best_first'
  pattern: '/docs/*',
  maxDepth: 3,
  // Waiting
  wait: true,                                    // poll until done
  pollInterval: 5,                               // seconds
  timeout: 300,                                  // seconds
});

// job.generatedConfig — LLM's scan + extract decisions
// job.extractionMethodUsed — "css_schema" | "llm"
// job.schemaUsed — generated CSS schema (reusable!)

Drop markdown with include: pass include: ['links', 'media'] (without 'markdown') and the worker strips markdown from every result — saves bandwidth for extract-only crawls.

Async / Batch Operations

For multiple URLs, use the batch variants. They return a WrapperJob you can poll or wait on.

markdownMany

const job = await c.markdownMany(
  ['https://a.com', 'https://b.com', 'https://c.com'],
  { wait: true, timeout: 120 },
);
console.log(`Job ${job.jobId}: ${job.status}`);

screenshotMany

const job = await c.screenshotMany(
  ['https://a.com', 'https://b.com'],
  { fullPage: true, wait: true },
);

extractMany

const job = await c.extractMany(
  ['https://a.com/products', 'https://b.com/products'],
  {
    method: 'llm', // 'auto' not supported for batch -- use 'llm' or 'schema'
    query: 'Extract product name and price',
    wait: true,
  },
);

Batch Options (shared)

All batch methods accept these additional options:

| Option | Type | Default | Description | |--------|------|---------|-------------| | wait | boolean | false | Block until job completes | | pollInterval | number | 2 | Seconds between status checks (when wait: true) | | timeout | number | -- | Max seconds to wait | | webhookUrl | string | -- | URL to POST results to on completion | | priority | number | 5 | Job priority (1=high, 5=normal, 10=low) |

Job Management

Wrapper Jobs (markdown, screenshot, extract)

// Get job status
const job = await c.getMarkdownJob(jobId);
const job = await c.getScreenshotJob(jobId);
const job = await c.getExtractJob(jobId);

// List jobs
const jobs = await c.listMarkdownJobs({ status: 'completed', limit: 10 });
const jobs = await c.listScreenshotJobs({ limit: 5 });
const jobs = await c.listExtractJobs();

// Cancel
await c.cancelMarkdownJob(jobId);
await c.cancelScreenshotJob(jobId);
await c.cancelExtractJob(jobId);

Crawl Jobs (from runMany / deepCrawl)

const jobs = await c.listJobs({ status: 'completed', limit: 10 });
const job = await c.getJob(jobId);
const job = await c.waitJob(jobId, { pollInterval: 2, timeout: 300 });
await c.cancelJob(jobId);

// Download results as ZIP
const url = await c.downloadUrl(jobId);

Scan Jobs (AI-assisted deep scans)

// Poll status + URLs discovered so far
const job = await c.getScanJob(jobId);
console.log(`Status: ${job.status}, found: ${job.totalUrls}`);

// Cancel a running deep scan (preserves partial results)
await c.cancelScanJob(jobId);

Site Crawl Jobs (unified scan + crawl polling)

const status = await c.getSiteCrawlJob(jobId);
console.log(`${status.phase}: ${status.progress.urlsCrawled}/${status.progress.total}`);
// status.phase walks: "scan" -> "crawl" -> "done"
// status.downloadUrl is set when phase === "done" and status === "completed"

Power User: Config Passthrough

Every wrapper method accepts optional crawlerConfig and browserConfig for full control over the underlying crawl engine.

const res = await c.markdown('https://spa-app.com', {
  crawlerConfig: {
    waitFor: 'css:.content-loaded',
    jsCode: 'window.scrollTo(0, document.body.scrollHeight)',
    scanFullPage: true,
    pageTimeout: 60000,
  },
  browserConfig: {
    viewportWidth: 1920,
    viewportHeight: 1080,
    headers: { 'Accept-Language': 'en-US' },
  },
  proxy: { mode: 'residential', country: 'US' },
});

This works on markdown, screenshot, extract, and their batch variants. The map method accepts proxy only (no browser needed).

Full Power Mode

For advanced use cases -- custom extraction strategies, full HTML access, media/link parsing, session management -- use the core run() / runMany() / deepCrawl() methods directly.

Single Crawl

const result = await c.run('https://example.com', {
  config: { screenshot: true, excludeExternalLinks: true },
  browserConfig: { viewportWidth: 1920 },
  proxy: 'datacenter',
});
console.log(result.markdown?.rawMarkdown);
console.log(result.screenshot); // base64
console.log(result.links);
console.log(result.media);

Batch Crawl

const job = await c.runMany(
  ['https://a.com', 'https://b.com'],
  { wait: true, priority: 1 },
);

Deep Crawl

const result = await c.deepCrawl('https://docs.example.com', {
  strategy: 'bfs',
  maxDepth: 3,
  maxUrls: 200,
  wait: true,
});

Scan (URL Discovery Only)

const scan = await c.scan('https://example.com', { maxUrls: 100 });
console.log(`Found ${scan.totalUrls} URLs across ${scan.hostsFound} hosts`);

Error Handling

import {
  CloudError,
  AuthenticationError,
  RateLimitError,
  QuotaExceededError,
  NotFoundError,
  ValidationError,
  TimeoutError,
} from 'crawl4ai-cloud';

try {
  const res = await c.markdown('https://example.com');
} catch (err) {
  if (err instanceof AuthenticationError) {
    console.log('Bad API key');
  } else if (err instanceof RateLimitError) {
    console.log(`Rate limited, retry after ${err.retryAfter}s`);
  } else if (err instanceof QuotaExceededError) {
    console.log('Credits exhausted');
  } else if (err instanceof TimeoutError) {
    console.log('Request timed out');
  } else if (err instanceof CloudError) {
    console.log(`API error ${err.statusCode}: ${err.message}`);
  }
}

Environment Variables

export CRAWL4AI_API_KEY=sk_live_...
// API key auto-loaded from CRAWL4AI_API_KEY
const c = new AsyncWebCrawler();

Links

License

Apache 2.0