npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@upcrawl/sdk

v1.5.0

Published

Official Upcrawl SDK - Extract data from any website with a single API call

Readme

Upcrawl

@upcrawl/sdk

Official Node.js/Browser SDK for the Upcrawl API. Extract data from any website with a single API call.

Installation

npm install @upcrawl/sdk

Or with yarn:

yarn add @upcrawl/sdk

Quick Start

import Upcrawl from '@upcrawl/sdk';

// Set your API key (get one at https://upcrawl.dev)
Upcrawl.setApiKey('uc-your-api-key');

// Scrape a webpage
const result = await Upcrawl.scrape({
  url: 'https://example.com',
  type: 'markdown'
});

console.log(result.markdown);

Usage

Setting API Key

The API key must be set before making any requests:

import Upcrawl from '@upcrawl/sdk';

Upcrawl.setApiKey('uc-your-api-key');

Or using named imports:

import { setApiKey } from '@upcrawl/sdk';

setApiKey('uc-your-api-key');

Scraping a Single URL

import Upcrawl from '@upcrawl/sdk';

Upcrawl.setApiKey('uc-your-api-key');

const result = await Upcrawl.scrape({
  url: 'https://example.com',
  type: 'markdown',          // 'markdown' or 'html'
  onlyMainContent: true,     // Remove nav, ads, footers
  extractMetadata: true      // Get title, description, etc.
});

console.log(result.markdown);
console.log(result.metadata?.title);

Batch Scraping

Scrape multiple URLs in a single request:

const result = await Upcrawl.batchScrape({
  urls: [
    'https://example.com/page1',
    'https://example.com/page2',
    // You can also pass detailed options per URL:
    { url: 'https://example.com/page3', type: 'html' }
  ],
  type: 'markdown'
});

console.log(`Scraped ${result.successful} of ${result.total} pages`);

result.results.forEach(page => {
  if (page.success) {
    console.log(`${page.url}: ${page.markdown?.length} chars`);
  } else {
    console.log(`${page.url}: Failed - ${page.error}`);
  }
});

Web Search

Search the web and get structured results:

const result = await Upcrawl.search({
  queries: ['latest AI news 2025'],
  limit: 10,
  location: 'US'
});

result.results.forEach(queryResult => {
  console.log(`Query: ${queryResult.query}`);
  queryResult.results.forEach(item => {
    console.log(`- ${item.title}`);
    console.log(`  ${item.url}`);
  });
});

Generate PDF from HTML

Generate a PDF from HTML content:

const result = await Upcrawl.generatePdf({
  html: '<html><body><h1>Invoice</h1><p>Total: $500</p></body></html>',
  title: 'invoice-123',
  pageSize: 'A4',
  printBackground: true,
  margin: { top: '20mm', right: '20mm', bottom: '20mm', left: '20mm' }
});

console.log(result.url); // Download URL for the PDF

Generate PDF from URL

Generate a PDF from any webpage:

const result = await Upcrawl.generatePdfFromUrl({
  url: 'https://example.com/report',
  title: 'report',
  pageSize: 'Letter',
  landscape: true
});

console.log(result.url); // Download URL for the PDF

Execute Code

Run code in an isolated sandbox environment:

const result = await Upcrawl.executeCode({
  code: 'print("Hello, World!")',
  language: 'python'
});

console.log(result.stdout);          // "Hello, World!\n"
console.log(result.exitCode);        // 0
console.log(result.executionTimeMs); // 95.23
console.log(result.memoryUsageMb);   // 8.45

Each execution runs in its own isolated subprocess inside a Kata micro-VM with no network access. Code is cleaned up immediately after execution.

// Multi-line code with imports
const result = await Upcrawl.executeCode({
  code: `
import json
data = {"name": "Upcrawl", "version": 1}
print(json.dumps(data, indent=2))
  `
});

console.log(result.stdout);
// {
//   "name": "Upcrawl",
//   "version": 1
// }

Domain Filtering

Filter search results by domain:

// Only include specific domains
const result = await Upcrawl.search({
  queries: ['machine learning tutorials'],
  includeDomains: ['medium.com', 'towardsdatascience.com']
});

// Or exclude domains
const result2 = await Upcrawl.search({
  queries: ['javascript frameworks'],
  excludeDomains: ['pinterest.com', 'quora.com']
});

LLM Summarization

Ask the API to summarize scraped content:

const result = await Upcrawl.scrape({
  url: 'https://example.com/product',
  type: 'markdown',
  summary: {
    query: 'Extract the product name, price, and key features in JSON format'
  }
});

console.log(result.content); // Summarized content

LLM Tool Definitions

The SDK includes pre-built tool definitions that you can pass directly to LLMs. Available in two formats:

  • Vercel AI SDK — works with generateText, streamText from the ai package
  • OpenAI — works with chat.completions.create from the openai package

Install the peer dependency for whichever format you need:

npm install @upcrawl/sdk ai          # for Vercel AI SDK
npm install @upcrawl/sdk openai      # for OpenAI

Vercel AI SDK

import Upcrawl from '@upcrawl/sdk';
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';

Upcrawl.setApiKey('uc-your-api-key');

// Pass all tools
const { text } = await generateText({
  model: openai('gpt-4.1'),
  tools: Upcrawl.tools.aiSdk.all,
  prompt: 'Find top AI startups and analyze their pricing',
});

// Or pick specific ones
const { text: text2 } = await generateText({
  model: openai('gpt-4.1'),
  tools: {
    webSearch: Upcrawl.tools.aiSdk.webSearch,
    scrape: Upcrawl.tools.aiSdk.scrape,
  },
  prompt: 'Search for and scrape the top 3 results',
});

OpenAI

import Upcrawl from '@upcrawl/sdk';
import OpenAI from 'openai';

Upcrawl.setApiKey('uc-your-api-key');
const client = new OpenAI();

// Pass all tool definitions
const response = await client.chat.completions.create({
  model: 'gpt-4.1',
  messages: [{ role: 'user', content: 'Search for AI trends' }],
  tools: Upcrawl.tools.openai.all,
});

// Handle tool calls
for (const toolCall of response.choices[0].message.tool_calls ?? []) {
  const result = await Upcrawl.tools.openai.execute(
    toolCall.function.name,
    JSON.parse(toolCall.function.arguments),
  );
  console.log(result);
}

Available Tools

| Tool | Description | |------|-------------| | webSearch | Search the web and get results with URLs, titles, descriptions | | scrape | Scrape a URL and get clean markdown or HTML content | | executeCode | Execute Python code in an isolated sandbox |

Configuration

Custom Base URL

For self-hosted instances or testing:

Upcrawl.setBaseUrl('https://your-instance.com/v1');

Request Timeout

Set a custom timeout (in milliseconds):

Upcrawl.setTimeout(60000); // 60 seconds

Configure Multiple Options

Upcrawl.configure({
  apiKey: 'uc-your-api-key',
  baseUrl: 'https://api.upcrawl.dev/v1',
  timeout: 120000
});

Error Handling

The SDK throws UpcrawlError for API errors:

import Upcrawl, { UpcrawlError } from '@upcrawl/sdk';

try {
  const result = await Upcrawl.scrape({ url: 'https://example.com' });
} catch (error) {
  if (error instanceof UpcrawlError) {
    console.error(`Error ${error.status}: ${error.message}`);
    console.error(`Code: ${error.code}`);
  }
}

Common error codes:

| Status | Code | Description | |--------|------|-------------| | 401 | UNAUTHORIZED | Invalid or missing API key | | 403 | FORBIDDEN | Access forbidden | | 429 | RATE_LIMIT_EXCEEDED | Too many requests | | 500 | INTERNAL_ERROR | Server error |

TypeScript Support

The SDK includes full TypeScript definitions:

import Upcrawl, {
  ScrapeOptions,
  ScrapeResponse,
  SearchOptions,
  SearchResponse,
  BatchScrapeOptions,
  BatchScrapeResponse,
  GeneratePdfOptions,
  PdfResponse,
  ExecuteCodeOptions,
  ExecuteCodeResponse
} from '@upcrawl/sdk';

const options: ScrapeOptions = {
  url: 'https://example.com',
  type: 'markdown'
};

const result: ScrapeResponse = await Upcrawl.scrape(options);

API Reference

Methods

| Method | Description | |--------|-------------| | Upcrawl.setApiKey(key) | Set the API key globally | | Upcrawl.setBaseUrl(url) | Set custom base URL | | Upcrawl.setTimeout(ms) | Set request timeout | | Upcrawl.configure(config) | Configure multiple options | | Upcrawl.scrape(options) | Scrape a single URL | | Upcrawl.batchScrape(options) | Scrape multiple URLs | | Upcrawl.search(options) | Search the web | | Upcrawl.generatePdf(options) | Generate PDF from HTML | | Upcrawl.generatePdfFromUrl(options) | Generate PDF from a URL | | Upcrawl.executeCode(options) | Execute code in an isolated sandbox |

UpcrawlConfig

| Field | Type | Required | Description | |-------|------|----------|-------------| | apiKey | string | No | Your Upcrawl API key | | baseUrl | string | No | Custom API base URL | | timeout | number | No | Request timeout in milliseconds |

SummaryQuery

| Field | Type | Required | Description | |-------|------|----------|-------------| | query | string | Yes | Query/instruction for content summarization |

ScrapeOptions

| Field | Type | Required | Description | |-------|------|----------|-------------| | url | string | Yes | URL to scrape (required) | | type | "html" | "markdown" | No | Output format: html or markdown. Defaults to "html" | | onlyMainContent | boolean | No | Extract only main content (removes nav, ads, footers). Defaults to true | | extractMetadata | boolean | No | Whether to extract page metadata | | summary | object | No | Summary query for LLM summarization | | timeoutMs | number | No | Custom timeout in milliseconds (1000-120000) | | waitUntil | "load" | "domcontentloaded" | "networkidle" | No | Wait strategy for page load |

ScrapeMetadata

| Field | Type | Required | Description | |-------|------|----------|-------------| | title | string | No | | | description | string | No | | | canonicalUrl | string | No | | | finalUrl | string | No | | | contentType | string | No | | | contentLength | number | No | |

ScrapeResponse

| Field | Type | Required | Description | |-------|------|----------|-------------| | url | string | Yes | Original URL that was scraped | | html | string | null | No | Rendered HTML content (when type is html) | | markdown | string | null | No | Content converted to Markdown (when type is markdown) | | statusCode | number | null | Yes | HTTP status code | | success | boolean | Yes | Whether scraping was successful | | error | string | No | Error message if scraping failed | | timestamp | string | Yes | ISO timestamp when scraping completed | | loadTimeMs | number | Yes | Time taken to load and render the page in milliseconds | | metadata | object | No | Additional page metadata | | retryCount | number | Yes | Number of retry attempts made | | cost | number | No | Cost in USD for this scrape operation | | content | string | null | No | Content after summarization (when summary query provided) |

BatchScrapeOptions

| Field | Type | Required | Description | |-------|------|----------|-------------| | urls | string | object[] | Yes | Array of URLs to scrape (strings or detailed request objects) | | type | "html" | "markdown" | No | Output format: html or markdown | | onlyMainContent | boolean | No | Extract only main content (removes nav, ads, footers) | | summary | object | No | Summary query for LLM summarization | | batchTimeoutMs | number | No | Global timeout for entire batch operation in milliseconds (10000-600000) | | failFast | boolean | No | Whether to stop on first error |

BatchScrapeResponse

| Field | Type | Required | Description | |-------|------|----------|-------------| | results | object[] | Yes | Array of scrape results | | total | number | Yes | Total number of URLs processed | | successful | number | Yes | Number of successful scrapes | | failed | number | Yes | Number of failed scrapes | | totalTimeMs | number | Yes | Total time taken for batch operation in milliseconds | | timestamp | string | Yes | Timestamp when batch operation completed | | cost | number | No | Total cost in USD for all scrape operations |

SearchOptions

| Field | Type | Required | Description | |-------|------|----------|-------------| | queries | string[] | Yes | Array of search queries to execute (1-20) | | limit | number | No | Number of results per query (1-100). Defaults to 10 | | location | string | No | Location for search (e.g., "IN", "US") | | includeDomains | string[] | No | Domains to include (will add site: to query) | | excludeDomains | string[] | No | Domains to exclude (will add -site: to query) |

SearchResultWeb

| Field | Type | Required | Description | |-------|------|----------|-------------| | url | string | Yes | URL of the search result | | title | string | Yes | Title of the search result | | description | string | Yes | Description/snippet of the search result |

SearchResultItem

| Field | Type | Required | Description | |-------|------|----------|-------------| | query | string | Yes | The search query | | success | boolean | Yes | Whether the search was successful | | results | object[] | Yes | Parsed search result links | | error | string | No | Error message if failed | | loadTimeMs | number | No | Time taken in milliseconds | | cost | number | No | Cost in USD for this query |

SearchResponse

| Field | Type | Required | Description | |-------|------|----------|-------------| | results | object[] | Yes | Array of search results per query | | total | number | Yes | Total number of queries | | successful | number | Yes | Number of successful searches | | failed | number | Yes | Number of failed searches | | totalTimeMs | number | Yes | Total time in milliseconds | | timestamp | string | Yes | ISO timestamp | | cost | number | No | Total cost in USD |

PdfMargin

| Field | Type | Required | Description | |-------|------|----------|-------------| | top | string | No | | | right | string | No | | | bottom | string | No | | | left | string | No | |

GeneratePdfOptions

| Field | Type | Required | Description | |-------|------|----------|-------------| | html | string | Yes | Complete HTML content to convert to PDF (required) | | title | string | No | Title used for the exported filename | | pageSize | "A4" | "Letter" | "Legal" | No | Page size. Defaults to "A4" | | landscape | boolean | No | Landscape orientation. Defaults to false | | margin | object | No | Page margins (e.g., { top: "20mm", right: "20mm", bottom: "20mm", left: "20mm" }) | | printBackground | boolean | No | Print background graphics and colors. Defaults to true | | skipChartWait | boolean | No | Skip waiting for chart rendering signal. Defaults to false | | timeoutMs | number | No | Timeout in milliseconds (5000-120000). Defaults to 30000 |

GeneratePdfFromUrlOptions

| Field | Type | Required | Description | |-------|------|----------|-------------| | url | string | Yes | URL to navigate to and convert to PDF (required) | | title | string | No | Title used for the exported filename | | pageSize | "A4" | "Letter" | "Legal" | No | Page size. Defaults to "A4" | | landscape | boolean | No | Landscape orientation. Defaults to false | | margin | object | No | Page margins | | printBackground | boolean | No | Print background graphics and colors. Defaults to true | | timeoutMs | number | No | Timeout in milliseconds (5000-120000). Defaults to 30000 |

PdfResponse

| Field | Type | Required | Description | |-------|------|----------|-------------| | success | boolean | Yes | Whether PDF generation succeeded | | url | string | No | Public URL of the generated PDF | | filename | string | No | Generated filename | | blobName | string | No | Blob storage path | | error | string | No | Error message on failure | | durationMs | number | Yes | Total time taken in milliseconds |

ExecuteCodeOptions

| Field | Type | Required | Description | |-------|------|----------|-------------| | code | string | Yes | Code to execute (required) | | language | "python" | No | Language runtime. Defaults to "python" |

ExecuteCodeResponse

| Field | Type | Required | Description | |-------|------|----------|-------------| | stdout | string | Yes | Standard output from the executed code | | stderr | string | Yes | Standard error from the executed code | | exitCode | number | Yes | Process exit code (0 = success, 124 = timeout) | | executionTimeMs | number | Yes | Execution time in milliseconds | | timedOut | boolean | Yes | Whether execution was killed due to timeout | | memoryUsageMb | number | No | Peak memory usage in megabytes | | error | string | No | Error message if execution infrastructure failed | | cost | number | No | Cost in USD for this execution |

UpcrawlErrorResponse

| Field | Type | Required | Description | |-------|------|----------|-------------| | error | object | Yes | | | statusCode | number | No | |

CreateBrowserSessionOptions

| Field | Type | Required | Description | |-------|------|----------|-------------| | width | number | No | Browser viewport width (800-3840). Defaults to 1280 | | height | number | No | Browser viewport height (600-2160). Defaults to 720 | | headless | boolean | No | Run browser in headless mode. Defaults to true |

BrowserSession

| Field | Type | Required | Description | |-------|------|----------|-------------| | sessionId | string | Yes | Unique session identifier | | wsEndpoint | string | Yes | WebSocket URL for connecting with Playwright/Puppeteer | | vncUrl | string | null | Yes | VNC URL for viewing the browser (if available) | | affinityCookie | string | No | Affinity cookie for sticky session routing (format: SCRAPER_AFFINITY=xxx) - extracted from response headers | | createdAt | Date | Yes | Session creation timestamp | | width | number | Yes | Browser viewport width | | height | number | Yes | Browser viewport height |

License

MIT