npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@webclaw/sdk

v0.3.0

Published

TypeScript SDK for the Webclaw web extraction API

Readme


Installation

npm install @webclaw/sdk
pnpm add @webclaw/sdk
yarn add @webclaw/sdk
bun add @webclaw/sdk

Quick Start

import { Webclaw } from "@webclaw/sdk";

const client = new Webclaw({ apiKey: "wc-YOUR_API_KEY" });

const result = await client.scrape({ url: "https://example.com", formats: ["markdown"] });
console.log(result.markdown);

Endpoints

Scrape

Extract content from a single URL. Supports multiple output formats, CSS selectors for targeting specific elements, and cache control.

const result = await client.scrape({
  url: "https://example.com",
  formats: ["markdown", "text", "llm", "json"],
  include_selectors: ["article", ".content"],
  exclude_selectors: ["nav", "footer"],
  only_main_content: true,
  no_cache: true,
});

result.url       // string
result.markdown  // string | undefined
result.text      // string | undefined
result.llm       // string | undefined
result.json      // unknown | undefined
result.metadata  // { title?, description?, language?, ... }
result.cache     // { status: "hit" | "miss" | "bypass" }
result.warning   // string | undefined

Vertical extractors

28 site-specific extractors that return typed JSON (GitHub, Reddit, Amazon, YouTube, PyPI, HuggingFace, Trustpilot, etc.) instead of generic markdown. See the catalog for the full list.

// Discover available extractors
const catalog = await client.listExtractors();
catalog.extractors.forEach((e) => console.log(e.name, "-", e.label));

// Run a specific extractor
const pr = await client.scrapeVertical(
  "github_pr",
  "https://github.com/rust-lang/rust/pull/123456",
);
console.log(pr.data); // { title, state, author, commits, reviews, ... }

// Amazon product as typed JSON
const product = await client.scrapeVertical(
  "amazon_product",
  "https://www.amazon.com/dp/B0C6KKQ7ND",
);
console.log(product.data.price, product.data.rating);

The data field is extractor-specific; call listExtractors() to discover what each returns.

Search

Web search with optional parallel scraping of each result page.

const result = await client.search({
  query: "web scraping tools 2026",
  num_results: 10,
  scrape: true,
  formats: ["markdown"],
  country: "us",
  lang: "en",
  topic: "technology",
});

for (const r of result.results) {
  console.log(r.title, r.url, r.snippet);
  console.log(r.markdown); // present when scrape: true
}

Map

Discover URLs from a site's sitemap.

const result = await client.map({ url: "https://example.com" });
console.log(`Found ${result.count} URLs`);
result.urls.forEach((url) => console.log(url));

Endpoints

Discover API endpoints embedded in a page's JavaScript — scans inline <script> bodies plus <script src> bundles for request paths, absolute URLs, GraphQL, and WebSocket endpoints. This surfaces the request layer that map (sitemap-based) can't see.

const result = await client.endpoints({
  url: "https://example.com",
  include_third_party: false, // default; set true to include other hosts
  max_bundles: 20,            // default & max; bundles fetched on top of inline JS
});

console.log(`${result.endpoint_count} endpoints across ${result.bundles_scanned} bundles`);
for (const ep of result.endpoints) {
  console.log(ep.kind, ep.value, ep.first_party ? "(1st-party)" : "(3rd-party)");
}
result.hosts      // distinct hosts seen, e.g. ["api.example.com"]
result.truncated  // true if results were capped by max_bundles

Security: endpoints, hosts, and their fields are extracted from page content (inline scripts and fetched bundles), which is attacker-influenced. The SDK does not sanitize them. Never feed a returned value or source into another request, shell command, eval, or SQL query without your own validation.

Batch

Scrape multiple URLs in parallel with configurable concurrency.

const result = await client.batch({
  urls: ["https://a.com", "https://b.com", "https://c.com"],
  formats: ["markdown"],
  concurrency: 5,
});

for (const item of result.results) {
  if ("error" in item) console.error(item.url, item.error);
  else console.log(item.url, item.markdown?.length);
}

Extract

LLM-powered structured data extraction. Provide a JSON schema for typed output, or a natural-language prompt for flexible extraction.

// Schema-based extraction
const result = await client.extract({
  url: "https://example.com/pricing",
  schema: {
    type: "object",
    properties: {
      plans: { type: "array", items: { type: "object" } },
    },
  },
});
console.log(result.data);

// Prompt-based extraction
const result2 = await client.extract({
  url: "https://example.com",
  prompt: "Extract all pricing tiers with names and prices",
});
console.log(result2.data);

Summarize

Generate a concise summary of a page's content.

const result = await client.summarize({
  url: "https://example.com/blog/long-article",
  max_sentences: 3,
});
console.log(result.summary);

Diff

Detect content changes on a page. Optionally provide a previous state to diff against.

const result = await client.diff({
  url: "https://example.com",
  previous: { title: "Old Title", body: "Old content..." },
});
console.log(result.changes);

Brand

Extract brand identity information (name, colors, fonts, logos) from a URL.

const result = await client.brand({ url: "https://example.com" });
console.log(result); // { name, colors, fonts, logos, ... }

Research

Start an async deep research job. The SDK automatically polls until the job completes.

const result = await client.research(
  {
    query: "How do modern web crawlers handle JavaScript rendering?",
    max_sources: 15,
    deep: true,
  },
  { interval: 3_000, maxWait: 600_000 },
);

console.log(result.report);
console.log("Sources:", result.sources?.length);
console.log("Findings:", result.findings?.length);

You can also poll manually using getResearchStatus:

const job = await client.research({ query: "AI trends 2026" });
// ... or check status independently:
const status = await client.getResearchStatus(job.id);

Crawl

Start an async crawl job that discovers and scrapes pages from a root URL.

const job = await client.crawl({
  url: "https://example.com",
  max_depth: 3,
  max_pages: 100,
  use_sitemap: true,
});

console.log("Job ID:", job.id);

Poll with waitForCompletion, which resolves when the crawl finishes or fails:

const result = await job.waitForCompletion({
  interval: 2_000,   // polling interval in ms
  maxWait: 300_000,  // max wait time in ms (5 min)
});

console.log(`Status: ${result.status}`);
console.log(`${result.completed}/${result.total} pages`);
for (const page of result.pages) {
  console.log(page.url, page.markdown?.length);
}

Or check status manually at any time:

const status = await job.getStatus();
// or: const status = await client.getCrawlStatus(job.id);

Watch

Monitor URLs for content changes. Create watchers, check them on demand, and receive webhook notifications when content changes.

Create a watch

const watch = await client.watchCreate({
  url: "https://example.com/pricing",
  name: "Pricing page",
  interval_minutes: 60,
  webhook_url: "https://your-server.com/webhooks/webclaw",
});
console.log("Watch ID:", watch.id);

List all watches

const watches = await client.watchList(10, 0); // limit, offset
for (const w of watches) {
  console.log(w.id, w.url, w.active);
}

Get a single watch

const watch = await client.watchGet("watch_abc123");
console.log(watch.last_checked_at, watch.last_changed_at);

Trigger an immediate check

const updated = await client.watchCheck("watch_abc123");
console.log(updated.last_checked_at);

Delete a watch

await client.watchDelete("watch_abc123");

Firecrawl v2 compatibility

The API also exposes a Firecrawl-compatible surface at /v2/scrape, /v2/crawl, and /v2/search. These endpoints are not yet wrapped by this SDK (future work) — call them directly if you need Firecrawl drop-in compatibility today.

Error Handling

All errors extend WebclawError, so you can catch broadly or handle specific cases.

import {
  WebclawError,
  AuthenticationError,
  NotFoundError,
  RateLimitError,
  TimeoutError,
} from "@webclaw/sdk";

try {
  await client.scrape({ url: "https://example.com" });
} catch (err) {
  if (err instanceof RateLimitError) {
    console.error("Rate limited, retry after:", err.retryAfter, "s");
  } else if (err instanceof AuthenticationError) {
    console.error("Bad API key");
  } else if (err instanceof NotFoundError) {
    console.error("Resource not found");
  } else if (err instanceof TimeoutError) {
    console.error("Request timed out");
  } else if (err instanceof WebclawError) {
    console.error("API error:", err.message, err.status, err.body);
  }
}

Configuration

const client = new Webclaw({
  apiKey: process.env.WEBCLAW_API_KEY!,
  baseUrl: "https://api.webclaw.io", // default
  timeout: 60_000,                    // ms, default 30_000
});

| Option | Type | Default | Description | |--------|------|---------|-------------| | apiKey | string | required | Your Webclaw API key | | baseUrl | string | https://api.webclaw.io | API base URL | | timeout | number | 30000 | Request timeout in milliseconds |

TypeScript

Full type definitions are included for every request and response. All types are exported from the package root:

import type {
  ScrapeRequest,
  ScrapeResponse,
  CrawlRequest,
  CrawlStatusResponse,
  EndpointsRequest,
  EndpointsResponse,
  SearchRequest,
  SearchResponse,
  ExtractRequest,
  ExtractResponse,
  ResearchRequest,
  ResearchResponse,
  WatchCreateRequest,
  WatchResponse,
  // ... and more
} from "@webclaw/sdk";

Highlights

  • Zero runtime dependencies. Uses native fetch.
  • ESM + CJS dual output via tsup.
  • Full TypeScript types for every request and response.
  • Automatic polling for async jobs (crawl, research).
  • Node.js 18+.

License

MIT