deepcrawl
v0.5.5
Published
JavaScript/TypeScript SDK for Deepcrawl API
Maintainers
Readme
Deepcrawl SDK
TypeScript SDK for the Deepcrawl API - Web scraping and crawling with comprehensive error handling.
⚡ Why Deepcrawl SDK?
- 🏗️ oRPC-Powered: Built on oRPC framework for type-safe RPC
- 🔒 Type-Safe: End-to-end TypeScript with error handling
- 🖥️ Server-Side Only: Designed for Node.js, Cloudflare Workers, and Next.js Server Actions
- 🪶 Lightweight: Minimal bundle size with tree-shaking support
- 🛡️ Error Handling: Comprehensive, typed errors with context
- 🔄 Retry Logic: Built-in exponential backoff for transient failures
- ⚡ Connection Pooling: Automatic HTTP connection reuse (Node.js)
📦 Installation
npm install deepcrawl
# or
yarn add deepcrawl
# or
pnpm add deepcrawlZod v4 ships with the SDK as a runtime dependency and is mirrored as a peer dependency. If your app already provides Zod ≥4.1, your package manager will dedupe it; otherwise, the bundled copy means no extra install step.
🚀 Quick Start
import { DeepcrawlApp } from 'deepcrawl';
const deepcrawl = new DeepcrawlApp({
apiKey: process.env.DEEPCRAWL_API_KEY
});
const result = await deepcrawl.readUrl('https://example.com');
console.log(result.markdown);📦 Package Exports
The SDK uses dedicated export paths for better tree-shaking and organization:
Main Export (SDK Client)
import { DeepcrawlApp } from 'deepcrawl';Types Export
import type {
// Configuration
DeepcrawlConfig,
// API Types
ReadUrlOptions,
ReadUrlResponse,
GetMarkdownOptions,
GetMarkdownResponse,
ExtractLinksOptions,
ExtractLinksResponse,
GetLinksOptions,
GetLinksResponse,
// Activity Logs
ActivityLogEntry,
ListLogsOptions,
ListLogsResponse,
GetOneLogOptions,
// Metadata & Metrics
Metadata,
MetricsOptions,
Metrics,
// Links
LinksTree,
LinkItem,
SocialMediaLink,
// Errors
DeepcrawlError,
DeepcrawlReadError,
DeepcrawlLinksError,
DeepcrawlRateLimitError,
DeepcrawlAuthError,
DeepcrawlValidationError,
DeepcrawlNotFoundError,
DeepcrawlServerError,
DeepcrawlNetworkError,
} from 'deepcrawl/types';Schemas Export
import {
z,
// Request Schemas
ReadUrlOptionsSchema,
GetMarkdownOptionsSchema,
ExtractLinksOptionsSchema,
GetLinksOptionsSchema,
ListLogsOptionsSchema,
GetOneLogOptionsSchema,
// Response Schemas
ReadUrlResponseSchema,
GetMarkdownResponseSchema,
ExtractLinksResponseSchema,
GetLinksResponseSchema,
ListLogsResponseSchema,
// Metadata & Metrics
MetadataSchema,
MetricsOptionsSchema,
MetricsSchema,
// Links
LinksTreeSchema,
// services
CacheOptionsSchema
} from 'deepcrawl/schemas';Importing
zfromdeepcrawl/zod/v4(or fromdeepcrawl/schemas) reuses the SDK's Zod runtime so schema composition works even if your app already has its own Zod installation.
Zod Helper
import { z } from 'deepcrawl/zod/v4';
import { ReadUrlOptionsSchema } from 'deepcrawl/schemas';
const CustomSchema = ReadUrlOptionsSchema.extend({
customFlag: z.boolean().default(false),
});Use this helper when you want to compose Zod schemas with the SDK’s public schemas or utils to avoid instance mismatch issues in projects that install multiple copies of Zod.
Utilities Export
import {
// Zod schema helper
OptionalBoolWithDefault,
// Pagination normalization
normalizeListLogsPagination
} from 'deepcrawl/types/utils';
// Example: Create optional boolean schema with default
const schema = OptionalBoolWithDefault(true);
// Example: Normalize pagination input
const normalized = normalizeListLogsPagination({ limit: 150, offset: -5 });
// Returns: { limit: 100, offset: 0 } (clamped to valid ranges)📖 API Methods
readUrl(url, options?)
Extract clean content and metadata from any URL.
import { DeepcrawlApp } from 'deepcrawl';
import type { ReadUrlOptions } from 'deepcrawl/types';
const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });
const result = await deepcrawl.readUrl('https://example.com', {
metadata: true,
markdown: true,
cleanedHtml: true,
metricsOptions: { enabled: true }
});
console.log(result.markdown);
console.log(result.metadata?.title);
console.log(result.metrics?.readableDuration);getMarkdown(url, options?)
Simplified method to get just markdown content.
import { DeepcrawlApp } from 'deepcrawl';
const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });
const result = await deepcrawl.getMarkdown('https://example.com', {
metricsOptions: { enable: true }
});
console.log(result.markdown);extractLinks(url, options?)
Extract all links from a page with powerful filtering options.
import { DeepcrawlApp } from 'deepcrawl';
import type { ExtractLinksOptions } from 'deepcrawl/types';
const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });
const result = await deepcrawl.extractLinks('https://example.com', {
includeInternal: true,
includeExternal: false,
includeEmails: false,
includePhoneNumbers: false,
includeSocialMedia: false,
metricsOptions: { enable: true }
});
console.log(result.tree.internal);
console.log(result.tree.socialMedia);listLogs(options?)
Retrieve activity logs with paginated results and filtering.
import { DeepcrawlApp } from 'deepcrawl';
import type { ListLogsOptions } from 'deepcrawl/types';
const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });
const result = await deepcrawl.listLogs({
limit: 50,
offset: 0,
path: 'read-getMarkdown',
success: true,
startDate: '2025-01-01T00:00:00Z',
endDate: '2025-12-31T23:59:59Z',
orderBy: 'requestTimestamp',
orderDir: 'desc'
});
console.log(result.logs);
console.log(result.meta.hasMore);getOneLog(options)
Get a single activity log entry by ID.
import { DeepcrawlApp } from 'deepcrawl';
import type { GetOneLogOptions } from 'deepcrawl/types';
const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });
const log = await deepcrawl.getOneLog({ id: 'request-id-123' });
console.log(log.path);
console.log(log.response);🌟 Real-World Usage Examples
E-commerce Product Monitoring
import { DeepcrawlApp } from 'deepcrawl';
import type { ReadUrlOptions } from 'deepcrawl/types';
const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });
async function monitorProduct(productUrl: string) {
try {
const result = await deepcrawl.readUrl(productUrl, {
metadata: true,
cleanedHtml: true
});
return {
title: result.metadata?.title,
lastChecked: new Date().toISOString()
};
} catch (error) {
if (error.isRateLimit?.()) {
console.log(`Rate limited. Retry after ${error.retryAfter}s`);
await new Promise(r => setTimeout(r, error.retryAfter * 1000));
return monitorProduct(productUrl);
}
throw error;
}
}Content Aggregation Pipeline
import { DeepcrawlApp } from 'deepcrawl';
import type { ReadUrlResponse } from 'deepcrawl/types';
const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });
async function aggregateArticles(urls: string[]) {
const results = await Promise.allSettled(
urls.map(url => deepcrawl.readUrl(url, {
metadata: true,
markdown: true
}))
);
return results.map((result, index) => ({
url: urls[index],
success: result.status === 'fulfilled',
data: result.status === 'fulfilled' ? result.value : null,
error: result.status === 'rejected' ? result.reason.message : null
}));
}Next.js Server Actions
// app/actions/scrape.ts
'use server'
import { DeepcrawlApp } from 'deepcrawl';
import { headers } from 'next/headers';
import { revalidatePath } from 'next/cache';
export async function scrapeUrlAction(url: string) {
const deepcrawl = new DeepcrawlApp({
apiKey: process.env.DEEPCRAWL_API_KEY,
headers: await headers(),
});
try {
const result = await deepcrawl.readUrl(url, {
metadata: true,
markdown: true,
});
revalidatePath('/dashboard');
return {
success: true,
data: {
title: result.metadata?.title,
content: result.markdown,
}
};
} catch (error) {
return {
success: false,
error: {
message: error.message,
retryable: error.isRateLimit?.() || error.isNetwork?.(),
}
};
}
}React Hook with Error Handling
import { useState, useCallback } from 'react';
import { DeepcrawlApp } from 'deepcrawl';
import type { ReadUrlResponse } from 'deepcrawl/types';
export function useScraping(apiKey: string) {
const [data, setData] = useState<ReadUrlResponse | null>(null);
const [loading, setLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
const deepcrawl = new DeepcrawlApp({ apiKey });
const scrape = useCallback(async (url: string) => {
setLoading(true);
setError(null);
try {
const result = await deepcrawl.readUrl(url, { metadata: true });
setData(result);
} catch (err) {
setError(err.message);
} finally {
setLoading(false);
}
}, [deepcrawl]);
return { data, loading, error, scrape };
}Activity Logging with Server Actions
// app/actions/logs.ts
'use server';
import { DeepcrawlApp } from 'deepcrawl';
import type { ListLogsResponse } from 'deepcrawl/types';
const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });
export async function getActivityLogs() {
try {
const logs = await deepcrawl.listLogs({
limit: 50,
offset: 0
});
return { success: true, data: logs };
} catch (error) {
return {
success: false,
error: error instanceof Error ? error.message : 'Failed to fetch logs'
};
}
}🛡️ Error Handling
Error Classes
import type {
DeepcrawlError,
DeepcrawlReadError,
DeepcrawlLinksError,
DeepcrawlRateLimitError,
DeepcrawlAuthError,
DeepcrawlValidationError,
DeepcrawlNotFoundError,
DeepcrawlServerError,
DeepcrawlNetworkError,
} from 'deepcrawl/types';Try/Catch Pattern
import { DeepcrawlApp } from 'deepcrawl';
const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });
try {
const result = await deepcrawl.readUrl(url);
} catch (error) {
if (error.isRateLimit?.()) {
console.log(`Retry after ${error.retryAfter}s`);
} else if (error.isRead?.()) {
console.log(`Failed to read: ${error.message}`);
}
}Instance Type Checking
import { DeepcrawlApp } from 'deepcrawl';
const deepcrawl = new DeepcrawlApp({ apiKey: process.env.DEEPCRAWL_API_KEY });
try {
const result = await deepcrawl.readUrl(url);
} catch (error) {
// Check error type using instance methods
if (error.isAuth?.()) {
console.log('Authentication failed');
} else if (error.isValidation?.()) {
console.log('Invalid request parameters');
}
}Error Properties
All errors include:
code: string- oRPC error codestatus: number- HTTP statusmessage: string- User-friendly error messagedata: any- Raw error data from API
Rate limit errors include:
retryAfter: number- Seconds to waitoperation: string- What operation was rate limited
Read/Links errors include:
targetUrl: string- URL that failedsuccess: false- Always false for errors
🔧 Configuration
import { DeepcrawlApp } from 'deepcrawl';
import type { DeepcrawlConfig } from 'deepcrawl/types';
const deepcrawl = new DeepcrawlApp({
apiKey: process.env.DEEPCRAWL_API_KEY,
baseUrl: "https://api.deepcrawl.dev",
headers: {
'User-Agent': 'MyApp/1.0'
},
fetch: customFetch,
fetchOptions: {
timeout: 30000
}
});Connection Pooling (Node.js)
Automatic HTTP connection pooling in Node.js:
// Automatic configuration
{
keepAlive: true,
maxSockets: 10,
maxFreeSockets: 5,
timeout: 60000,
keepAliveMsecs: 30000
}Benefits:
- ⚡ Faster for concurrent requests
- 🔄 Connection reuse reduces handshake overhead
- 🎯 Auto-cleanup of idle connections
🔒 Security Best Practices
Next.js Server Actions (Recommended)
// ✅ SECURE: lib/deepcrawl.ts
'use server';
import { DeepcrawlApp } from 'deepcrawl';
export const deepcrawlClient = new DeepcrawlApp({
apiKey: process.env.DEEPCRAWL_API_KEY
});// ✅ SECURE: app/actions/scrape.ts
'use server';
import { deepcrawlClient } from '@/lib/deepcrawl';
export async function scrapeAction(url: string) {
return deepcrawlClient.readUrl(url);
}// ✅ SECURE: Client component
'use client';
import { scrapeAction } from '@/app/actions/scrape';
export function ScrapeButton() {
const handleClick = async () => {
const result = await scrapeAction('https://example.com');
console.log(result);
};
return <button onClick={handleClick}>Scrape</button>;
}What NOT to Do
// ❌ INSECURE: Direct SDK usage in client components
'use client';
import { DeepcrawlApp } from 'deepcrawl';
export function BadComponent() {
const deepcrawl = new DeepcrawlApp({
apiKey: process.env.DEEPCRAWL_API_KEY // ❌ Exposes API key!
});
}🌍 Environment Support
⚠️ Server-Side Only: The Deepcrawl SDK is designed for server-side use:
- ✅ Node.js (18+) with connection pooling
- ✅ Cloudflare Workers
- ✅ Vercel Edge Runtime
- ✅ Next.js Server Actions (recommended)
- ✅ Deno, Bun, and other modern runtimes
- ❌ Browser environments (use Server Actions instead)
📄 License
MIT - see LICENSE for details.
🤝 Support
Built with ❤️ by the @felixLu
