@upcrawl/sdk
v1.5.0
Published
Official Upcrawl SDK - Extract data from any website with a single API call
Maintainers
Readme

@upcrawl/sdk
Official Node.js/Browser SDK for the Upcrawl API. Extract data from any website with a single API call.
Installation
npm install @upcrawl/sdkOr with yarn:
yarn add @upcrawl/sdkQuick Start
import Upcrawl from '@upcrawl/sdk';
// Set your API key (get one at https://upcrawl.dev)
Upcrawl.setApiKey('uc-your-api-key');
// Scrape a webpage
const result = await Upcrawl.scrape({
url: 'https://example.com',
type: 'markdown'
});
console.log(result.markdown);Usage
Setting API Key
The API key must be set before making any requests:
import Upcrawl from '@upcrawl/sdk';
Upcrawl.setApiKey('uc-your-api-key');Or using named imports:
import { setApiKey } from '@upcrawl/sdk';
setApiKey('uc-your-api-key');Scraping a Single URL
import Upcrawl from '@upcrawl/sdk';
Upcrawl.setApiKey('uc-your-api-key');
const result = await Upcrawl.scrape({
url: 'https://example.com',
type: 'markdown', // 'markdown' or 'html'
onlyMainContent: true, // Remove nav, ads, footers
extractMetadata: true // Get title, description, etc.
});
console.log(result.markdown);
console.log(result.metadata?.title);Batch Scraping
Scrape multiple URLs in a single request:
const result = await Upcrawl.batchScrape({
urls: [
'https://example.com/page1',
'https://example.com/page2',
// You can also pass detailed options per URL:
{ url: 'https://example.com/page3', type: 'html' }
],
type: 'markdown'
});
console.log(`Scraped ${result.successful} of ${result.total} pages`);
result.results.forEach(page => {
if (page.success) {
console.log(`${page.url}: ${page.markdown?.length} chars`);
} else {
console.log(`${page.url}: Failed - ${page.error}`);
}
});Web Search
Search the web and get structured results:
const result = await Upcrawl.search({
queries: ['latest AI news 2025'],
limit: 10,
location: 'US'
});
result.results.forEach(queryResult => {
console.log(`Query: ${queryResult.query}`);
queryResult.results.forEach(item => {
console.log(`- ${item.title}`);
console.log(` ${item.url}`);
});
});Generate PDF from HTML
Generate a PDF from HTML content:
const result = await Upcrawl.generatePdf({
html: '<html><body><h1>Invoice</h1><p>Total: $500</p></body></html>',
title: 'invoice-123',
pageSize: 'A4',
printBackground: true,
margin: { top: '20mm', right: '20mm', bottom: '20mm', left: '20mm' }
});
console.log(result.url); // Download URL for the PDFGenerate PDF from URL
Generate a PDF from any webpage:
const result = await Upcrawl.generatePdfFromUrl({
url: 'https://example.com/report',
title: 'report',
pageSize: 'Letter',
landscape: true
});
console.log(result.url); // Download URL for the PDFExecute Code
Run code in an isolated sandbox environment:
const result = await Upcrawl.executeCode({
code: 'print("Hello, World!")',
language: 'python'
});
console.log(result.stdout); // "Hello, World!\n"
console.log(result.exitCode); // 0
console.log(result.executionTimeMs); // 95.23
console.log(result.memoryUsageMb); // 8.45Each execution runs in its own isolated subprocess inside a Kata micro-VM with no network access. Code is cleaned up immediately after execution.
// Multi-line code with imports
const result = await Upcrawl.executeCode({
code: `
import json
data = {"name": "Upcrawl", "version": 1}
print(json.dumps(data, indent=2))
`
});
console.log(result.stdout);
// {
// "name": "Upcrawl",
// "version": 1
// }Domain Filtering
Filter search results by domain:
// Only include specific domains
const result = await Upcrawl.search({
queries: ['machine learning tutorials'],
includeDomains: ['medium.com', 'towardsdatascience.com']
});
// Or exclude domains
const result2 = await Upcrawl.search({
queries: ['javascript frameworks'],
excludeDomains: ['pinterest.com', 'quora.com']
});LLM Summarization
Ask the API to summarize scraped content:
const result = await Upcrawl.scrape({
url: 'https://example.com/product',
type: 'markdown',
summary: {
query: 'Extract the product name, price, and key features in JSON format'
}
});
console.log(result.content); // Summarized contentLLM Tool Definitions
The SDK includes pre-built tool definitions that you can pass directly to LLMs. Available in two formats:
- Vercel AI SDK — works with
generateText,streamTextfrom theaipackage - OpenAI — works with
chat.completions.createfrom theopenaipackage
Install the peer dependency for whichever format you need:
npm install @upcrawl/sdk ai # for Vercel AI SDK
npm install @upcrawl/sdk openai # for OpenAIVercel AI SDK
import Upcrawl from '@upcrawl/sdk';
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
Upcrawl.setApiKey('uc-your-api-key');
// Pass all tools
const { text } = await generateText({
model: openai('gpt-4.1'),
tools: Upcrawl.tools.aiSdk.all,
prompt: 'Find top AI startups and analyze their pricing',
});
// Or pick specific ones
const { text: text2 } = await generateText({
model: openai('gpt-4.1'),
tools: {
webSearch: Upcrawl.tools.aiSdk.webSearch,
scrape: Upcrawl.tools.aiSdk.scrape,
},
prompt: 'Search for and scrape the top 3 results',
});OpenAI
import Upcrawl from '@upcrawl/sdk';
import OpenAI from 'openai';
Upcrawl.setApiKey('uc-your-api-key');
const client = new OpenAI();
// Pass all tool definitions
const response = await client.chat.completions.create({
model: 'gpt-4.1',
messages: [{ role: 'user', content: 'Search for AI trends' }],
tools: Upcrawl.tools.openai.all,
});
// Handle tool calls
for (const toolCall of response.choices[0].message.tool_calls ?? []) {
const result = await Upcrawl.tools.openai.execute(
toolCall.function.name,
JSON.parse(toolCall.function.arguments),
);
console.log(result);
}Available Tools
| Tool | Description |
|------|-------------|
| webSearch | Search the web and get results with URLs, titles, descriptions |
| scrape | Scrape a URL and get clean markdown or HTML content |
| executeCode | Execute Python code in an isolated sandbox |
Configuration
Custom Base URL
For self-hosted instances or testing:
Upcrawl.setBaseUrl('https://your-instance.com/v1');Request Timeout
Set a custom timeout (in milliseconds):
Upcrawl.setTimeout(60000); // 60 secondsConfigure Multiple Options
Upcrawl.configure({
apiKey: 'uc-your-api-key',
baseUrl: 'https://api.upcrawl.dev/v1',
timeout: 120000
});Error Handling
The SDK throws UpcrawlError for API errors:
import Upcrawl, { UpcrawlError } from '@upcrawl/sdk';
try {
const result = await Upcrawl.scrape({ url: 'https://example.com' });
} catch (error) {
if (error instanceof UpcrawlError) {
console.error(`Error ${error.status}: ${error.message}`);
console.error(`Code: ${error.code}`);
}
}Common error codes:
| Status | Code | Description |
|--------|------|-------------|
| 401 | UNAUTHORIZED | Invalid or missing API key |
| 403 | FORBIDDEN | Access forbidden |
| 429 | RATE_LIMIT_EXCEEDED | Too many requests |
| 500 | INTERNAL_ERROR | Server error |
TypeScript Support
The SDK includes full TypeScript definitions:
import Upcrawl, {
ScrapeOptions,
ScrapeResponse,
SearchOptions,
SearchResponse,
BatchScrapeOptions,
BatchScrapeResponse,
GeneratePdfOptions,
PdfResponse,
ExecuteCodeOptions,
ExecuteCodeResponse
} from '@upcrawl/sdk';
const options: ScrapeOptions = {
url: 'https://example.com',
type: 'markdown'
};
const result: ScrapeResponse = await Upcrawl.scrape(options);API Reference
Methods
| Method | Description |
|--------|-------------|
| Upcrawl.setApiKey(key) | Set the API key globally |
| Upcrawl.setBaseUrl(url) | Set custom base URL |
| Upcrawl.setTimeout(ms) | Set request timeout |
| Upcrawl.configure(config) | Configure multiple options |
| Upcrawl.scrape(options) | Scrape a single URL |
| Upcrawl.batchScrape(options) | Scrape multiple URLs |
| Upcrawl.search(options) | Search the web |
| Upcrawl.generatePdf(options) | Generate PDF from HTML |
| Upcrawl.generatePdfFromUrl(options) | Generate PDF from a URL |
| Upcrawl.executeCode(options) | Execute code in an isolated sandbox |
UpcrawlConfig
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| apiKey | string | No | Your Upcrawl API key |
| baseUrl | string | No | Custom API base URL |
| timeout | number | No | Request timeout in milliseconds |
SummaryQuery
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| query | string | Yes | Query/instruction for content summarization |
ScrapeOptions
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| url | string | Yes | URL to scrape (required) |
| type | "html" | "markdown" | No | Output format: html or markdown. Defaults to "html" |
| onlyMainContent | boolean | No | Extract only main content (removes nav, ads, footers). Defaults to true |
| extractMetadata | boolean | No | Whether to extract page metadata |
| summary | object | No | Summary query for LLM summarization |
| timeoutMs | number | No | Custom timeout in milliseconds (1000-120000) |
| waitUntil | "load" | "domcontentloaded" | "networkidle" | No | Wait strategy for page load |
ScrapeMetadata
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| title | string | No | |
| description | string | No | |
| canonicalUrl | string | No | |
| finalUrl | string | No | |
| contentType | string | No | |
| contentLength | number | No | |
ScrapeResponse
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| url | string | Yes | Original URL that was scraped |
| html | string | null | No | Rendered HTML content (when type is html) |
| markdown | string | null | No | Content converted to Markdown (when type is markdown) |
| statusCode | number | null | Yes | HTTP status code |
| success | boolean | Yes | Whether scraping was successful |
| error | string | No | Error message if scraping failed |
| timestamp | string | Yes | ISO timestamp when scraping completed |
| loadTimeMs | number | Yes | Time taken to load and render the page in milliseconds |
| metadata | object | No | Additional page metadata |
| retryCount | number | Yes | Number of retry attempts made |
| cost | number | No | Cost in USD for this scrape operation |
| content | string | null | No | Content after summarization (when summary query provided) |
BatchScrapeOptions
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| urls | string | object[] | Yes | Array of URLs to scrape (strings or detailed request objects) |
| type | "html" | "markdown" | No | Output format: html or markdown |
| onlyMainContent | boolean | No | Extract only main content (removes nav, ads, footers) |
| summary | object | No | Summary query for LLM summarization |
| batchTimeoutMs | number | No | Global timeout for entire batch operation in milliseconds (10000-600000) |
| failFast | boolean | No | Whether to stop on first error |
BatchScrapeResponse
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| results | object[] | Yes | Array of scrape results |
| total | number | Yes | Total number of URLs processed |
| successful | number | Yes | Number of successful scrapes |
| failed | number | Yes | Number of failed scrapes |
| totalTimeMs | number | Yes | Total time taken for batch operation in milliseconds |
| timestamp | string | Yes | Timestamp when batch operation completed |
| cost | number | No | Total cost in USD for all scrape operations |
SearchOptions
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| queries | string[] | Yes | Array of search queries to execute (1-20) |
| limit | number | No | Number of results per query (1-100). Defaults to 10 |
| location | string | No | Location for search (e.g., "IN", "US") |
| includeDomains | string[] | No | Domains to include (will add site: to query) |
| excludeDomains | string[] | No | Domains to exclude (will add -site: to query) |
SearchResultWeb
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| url | string | Yes | URL of the search result |
| title | string | Yes | Title of the search result |
| description | string | Yes | Description/snippet of the search result |
SearchResultItem
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| query | string | Yes | The search query |
| success | boolean | Yes | Whether the search was successful |
| results | object[] | Yes | Parsed search result links |
| error | string | No | Error message if failed |
| loadTimeMs | number | No | Time taken in milliseconds |
| cost | number | No | Cost in USD for this query |
SearchResponse
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| results | object[] | Yes | Array of search results per query |
| total | number | Yes | Total number of queries |
| successful | number | Yes | Number of successful searches |
| failed | number | Yes | Number of failed searches |
| totalTimeMs | number | Yes | Total time in milliseconds |
| timestamp | string | Yes | ISO timestamp |
| cost | number | No | Total cost in USD |
PdfMargin
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| top | string | No | |
| right | string | No | |
| bottom | string | No | |
| left | string | No | |
GeneratePdfOptions
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| html | string | Yes | Complete HTML content to convert to PDF (required) |
| title | string | No | Title used for the exported filename |
| pageSize | "A4" | "Letter" | "Legal" | No | Page size. Defaults to "A4" |
| landscape | boolean | No | Landscape orientation. Defaults to false |
| margin | object | No | Page margins (e.g., { top: "20mm", right: "20mm", bottom: "20mm", left: "20mm" }) |
| printBackground | boolean | No | Print background graphics and colors. Defaults to true |
| skipChartWait | boolean | No | Skip waiting for chart rendering signal. Defaults to false |
| timeoutMs | number | No | Timeout in milliseconds (5000-120000). Defaults to 30000 |
GeneratePdfFromUrlOptions
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| url | string | Yes | URL to navigate to and convert to PDF (required) |
| title | string | No | Title used for the exported filename |
| pageSize | "A4" | "Letter" | "Legal" | No | Page size. Defaults to "A4" |
| landscape | boolean | No | Landscape orientation. Defaults to false |
| margin | object | No | Page margins |
| printBackground | boolean | No | Print background graphics and colors. Defaults to true |
| timeoutMs | number | No | Timeout in milliseconds (5000-120000). Defaults to 30000 |
PdfResponse
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| success | boolean | Yes | Whether PDF generation succeeded |
| url | string | No | Public URL of the generated PDF |
| filename | string | No | Generated filename |
| blobName | string | No | Blob storage path |
| error | string | No | Error message on failure |
| durationMs | number | Yes | Total time taken in milliseconds |
ExecuteCodeOptions
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| code | string | Yes | Code to execute (required) |
| language | "python" | No | Language runtime. Defaults to "python" |
ExecuteCodeResponse
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| stdout | string | Yes | Standard output from the executed code |
| stderr | string | Yes | Standard error from the executed code |
| exitCode | number | Yes | Process exit code (0 = success, 124 = timeout) |
| executionTimeMs | number | Yes | Execution time in milliseconds |
| timedOut | boolean | Yes | Whether execution was killed due to timeout |
| memoryUsageMb | number | No | Peak memory usage in megabytes |
| error | string | No | Error message if execution infrastructure failed |
| cost | number | No | Cost in USD for this execution |
UpcrawlErrorResponse
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| error | object | Yes | |
| statusCode | number | No | |
CreateBrowserSessionOptions
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| width | number | No | Browser viewport width (800-3840). Defaults to 1280 |
| height | number | No | Browser viewport height (600-2160). Defaults to 720 |
| headless | boolean | No | Run browser in headless mode. Defaults to true |
BrowserSession
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| sessionId | string | Yes | Unique session identifier |
| wsEndpoint | string | Yes | WebSocket URL for connecting with Playwright/Puppeteer |
| vncUrl | string | null | Yes | VNC URL for viewing the browser (if available) |
| affinityCookie | string | No | Affinity cookie for sticky session routing (format: SCRAPER_AFFINITY=xxx) - extracted from response headers |
| createdAt | Date | Yes | Session creation timestamp |
| width | number | Yes | Browser viewport width |
| height | number | Yes | Browser viewport height |
License
MIT
