olostep
v1.0.0
Published
Node.js SDK for the Olostep web data platform.
Downloads
4,285
Maintainers
Readme
Olostep Node SDK
This package is the official Node.js SDK for the Olostep web data platform.
Getting started
npm install olostepimport Olostep from 'olostep';
const client = new Olostep({apiKey: process.env.OLOSTEP_API_KEY});
// Minimal scrape example
const result = await client.scrapes.create('https://example.com');
console.log(result.id, result.html_content);Usage
Scraping
Scrape a single URL with various options:
import Olostep, {Format} from 'olostep';
const client = new Olostep({apiKey: 'your_api_key'});
// Simple scrape
const scrape = await client.scrapes.create('https://example.com');
// With multiple formats
const scrape = await client.scrapes.create({
url: 'https://example.com',
formats: [Format.HTML, Format.MARKDOWN, Format.TEXT],
waitBeforeScraping: 1000,
removeImages: true
});
// Access the content
console.log(scrape.html_content);
console.log(scrape.markdown_content);
// Get scrape by ID
const fetched = await client.scrapes.get(scrape.id);Batch Processing
Process multiple URLs in a single batch:
// Using URL strings (custom IDs auto-generated)
const batch = await client.batches.create([
'https://example.com',
'https://example.org',
'https://example.net'
]);
// Or with explicit custom IDs
const batch = await client.batches.create([
{url: 'https://example.com', customId: 'site-1'},
{url: 'https://example.org', customId: 'site-2'}
]);
console.log(`Batch ${batch.id} created with ${batch.total_urls} URLs`);
// Wait for completion
await batch.waitTillDone({
checkEveryNSecs: 5,
timeoutSeconds: 120
});
// Get batch info
const info = await batch.info();
console.log(info);
// Stream individual results
for await (const item of batch.items()) {
console.log(item.custom_id);
}Crawling
Crawl an entire website:
const crawl = await client.crawls.create({
url: 'https://example.com',
maxPages: 100,
maxDepth: 3,
includeUrls: ['*/blog/*'],
excludeUrls: ['*/admin/*']
});
console.log(`Crawl ${crawl.id} started`);
// Wait for completion
await crawl.waitTillDone({
checkEveryNSecs: 10,
timeoutSeconds: 300
});
// Get crawl info
const info = await crawl.info();
console.log(`Crawled ${info.pages_crawled} pages`);
// Stream crawled pages
for await (const page of crawl.pages()) {
console.log(page.url, page.status_code);
}Site Mapping
Generate a sitemap of URLs from a website:
const map = await client.maps.create({
url: 'https://example.com',
topN: 100,
includeSubdomain: true,
searchQuery: 'blog posts'
});
console.log(`Map ${map.id} created`);
// Stream URLs
for await (const url of map.urls()) {
console.log(url);
}
// Get map info
const info = await map.info();Content Retrieval
Retrieve previously scraped content:
// Get content in specific format(s)
const content = await client.retrieve(retrieveId, Format.MARKDOWN);
console.log(content.markdown_content);
// Multiple formats
const content = await client.retrieve(retrieveId, [
Format.HTML,
Format.MARKDOWN
]);Advanced Options
Custom Actions
Perform browser actions before scraping:
const scrape = await client.scrapes.create({
url: 'https://example.com',
actions: [
{type: 'wait', milliseconds: 2000},
{type: 'click', selector: '#load-more'},
{type: 'scroll', distance: 1000},
{type: 'fill_input', selector: '#search', value: 'query'}
]
});Geographic Location
Scrape from different countries using predefined country codes or any valid country code string:
import Olostep, {Country} from 'olostep';
const client = new Olostep({apiKey: 'your_api_key'});
// Using predefined enum values (US, DE, FR, GB, SG)
const scrape = await client.scrapes.create({
url: 'https://example.com',
country: Country.DE // Germany
});
// Or use any valid country code as a string
const scrape2 = await client.scrapes.create({
url: 'https://example.com',
country: 'jp' // Japan
});LLM Extraction
Extract structured data using LLMs:
const scrape = await client.scrapes.create({
url: 'https://example.com',
llmExtract: {
schema: {
title: 'string',
price: 'number',
description: 'string'
},
// Optionally provide a prompt to guide extraction
prompt: 'Extract product information from this page'
}
});Client Configuration
import Olostep from 'olostep';
const client = new Olostep({
apiKey: 'your_api_key',
apiBaseUrl: 'https://api.olostep.com/v1', // optional
timeoutMs: 150000, // 150 seconds (optional)
retry: {
maxRetries: 3,
initialDelayMs: 1000
},
userAgent: 'MyApp/1.0' // optional
});Feature highlights
- Async-first client with full TypeScript support.
- Type-safe inputs using TypeScript enums and interfaces (Formats, Countries, Actions, etc.).
- Rich resource namespaces with both shorthand calls (
client.scrapes.create()) and explicit methods (client.scrapes.get()). - Shared transport layer with retries, timeouts, and JSON decoding.
- Comprehensive error hierarchy aligned with the Python SDK.
Project structure
olostep/
├─ src/
│ ├─ client.ts # Client + facade wiring
│ ├─ config.ts # Option resolution & defaults
│ ├─ errors.ts # Exception hierarchy
│ ├─ http/transport.ts # Fetch-based HTTP transport with retries
│ ├─ resources/ # Namespaces (scrape, batch, crawl, map, retrieve)
│ └─ types.ts # Shared enums and DTOs
├─ package.json # NPM metadata + scripts
├─ tsconfig*.json # TypeScript build configs
└─ README.md # You are hereScripts
npm run build– emit ESM todist/.npm run lint– lint the TypeScript sources with ESLint.npm run check:types– type-check without emitting files.npm run clean– remove the build output.
Examples
Sample scripts live in examples/. Copy .env.example to .env and set your OLOSTEP_API_KEY:
cp .env.example .env
# Edit .env and add your API keyThen run the examples:
npx tsx examples/scrape.ts
npx tsx examples/batch.ts
npx tsx examples/crawl.ts
npx tsx examples/map.ts
npx tsx examples/retrieve.ts <retrieve_id>They exercise each namespace using the current SDK surface and are a quick way to verify changes manually.
