@olostep/inngest
v1.0.0
Published
Official Olostep integration for Inngest - Build reliable web scraping workflows with durable functions. Search, extract, and structure web data from any website.
Maintainers
Readme
@olostep/inngest
Official Olostep integration for Inngest - Build reliable, fault-tolerant web scraping workflows with durable functions.
Olostep is a web search, scraping, and crawling API that extracts structured web data from any website in real time. Perfect for automating research workflows, monitoring competitors, and collecting data at scale.
Inngest is a platform for building reliable background jobs and workflows with automatic retries, scheduling, and observability.
Installation
Quick Start
Operations
Credentials
Examples
Specialized Parsers
Resources
Installation
npm install @olostep/inngest inngestOr with yarn/pnpm:
yarn add @olostep/inngest inngest
pnpm add @olostep/inngest inngestQuick Start
import { Inngest } from 'inngest';
import { createOlostepClient } from '@olostep/inngest';
// Initialize clients
const inngest = new Inngest({ id: 'my-app' });
const olostep = createOlostepClient({
apiKey: process.env.OLOSTEP_API_KEY!,
});
// Create a durable web scraping workflow
export const scrapeWorkflow = inngest.createFunction(
{ id: 'scrape-website' },
{ event: 'scrape/start' },
async ({ event, step }) => {
// Each step.run is automatically retried on failure
const result = await step.run('scrape-page', () =>
olostep.scrape({
url: event.data.url,
formats: ['markdown'],
})
);
return {
url: result.url,
content: result.markdownContent,
metadata: result.pageMetadata,
};
}
);Operations
Scrape Website
Extract content from a single URL. Supports multiple formats and JavaScript rendering.
Use Cases:
- Monitor specific pages for changes
- Extract product information from e-commerce sites
- Gather data from news articles or blog posts
- Pull content for content aggregation
Parameters:
- url (required): The URL of the website you want to scrape
- formats: Output formats (
html,markdown,json,text) - default:['markdown'] - country: Country code (e.g.,
US,GB,CA) for location-specific scraping - waitBeforeScraping: Wait time in milliseconds before scraping (0-10000)
- parser: Parser ID for specialized extraction (e.g.,
@olostep/amazon-product)
Example:
const result = await step.run('scrape', () =>
olostep.scrape({
url: 'https://example.com',
formats: ['markdown', 'html'],
country: 'US',
})
);
// Returns: { id, url, markdownContent, htmlContent, pageMetadata, ... }Batch Scrape URLs
Scrape up to 100,000 URLs in parallel. Perfect for large-scale data extraction.
Use Cases:
- Scrape entire product catalogs
- Extract data from multiple search results
- Process lists of URLs from spreadsheets
- Bulk content extraction
Parameters:
- items (required): Array of
{ url, customId? }objects - formats: Output formats - default:
['markdown'] - country: Country code for location-specific scraping
- parser: Parser ID for specialized extraction
Example:
// Create batch job
const batch = await step.run('create-batch', () =>
olostep.batch.create({
items: [
{ url: 'https://example1.com', customId: 'page-1' },
{ url: 'https://example2.com', customId: 'page-2' },
],
formats: ['markdown'],
})
);
// Wait for processing
await step.sleep('wait', '2m');
// Get results
const results = await step.run('get-results', () =>
olostep.batch.get(batch.id)
);Create Crawl
Autonomously discover and scrape entire websites by following links.
Use Cases:
- Crawl and archive entire documentation sites
- Extract all blog posts from a website
- Build knowledge bases from web content
- Monitor website structure changes
Parameters:
- startUrl (required): Starting URL for the crawl
- maxPages: Maximum number of pages to crawl (default: 10)
- followLinks: Whether to follow links (default: true)
- formats: Output formats - default:
['markdown'] - includeUrls: Glob patterns to include (e.g.,
['/docs/**']) - excludeUrls: Glob patterns to exclude (e.g.,
['/admin/**'])
Example:
const crawl = await step.run('start-crawl', () =>
olostep.crawl.create({
startUrl: 'https://docs.example.com',
maxPages: 100,
includeUrls: ['/docs/**'],
})
);Create Map
Extract all URLs from a website for content discovery and site structure analysis.
Use Cases:
- Build sitemaps and site structure diagrams
- Discover all pages before batch scraping
- Find broken or missing pages
- SEO audits and analysis
Parameters:
- url (required): Website URL to extract links from
- searchQuery: Optional search query to filter URLs
- topN: Limit the number of URLs returned
- includeUrls: Glob patterns to include
- excludeUrls: Glob patterns to exclude
Example:
const siteMap = await step.run('discover-urls', () =>
olostep.map({
url: 'https://example.com',
includeUrls: ['/blog/**'],
topN: 100,
})
);
// Returns: { id, url, totalUrls, urls: string[] }AI-Powered Answers
Search the web and get AI-powered answers with sources and citations.
Use Cases:
- Enrich data with web-sourced facts
- Ground AI applications on real-world data
- Research tasks with verified outputs
- Competitive intelligence
Parameters:
- task (required): Question or task to answer
- jsonSchema: JSON schema for structured output
Example:
const answer = await step.run('get-answer', () =>
olostep.answer({
task: 'Who is the CEO of Anthropic?',
jsonSchema: {
ceo_name: '',
founded_year: '',
headquarters: '',
},
})
);
// Returns: { id, task, answer: { ceo_name: 'Dario Amodei', ... }, sources }Web Search
Search the web using Google Search and get structured results.
Use Cases:
- Automated research workflows
- Lead discovery and enrichment
- Competitive analysis
- Content research
Parameters:
- query (required): Search query
- country: Country code for geo-specific results
- numResults: Number of results to return
Example:
const results = await step.run('search', () =>
olostep.search({
query: 'best web scraping APIs 2024',
country: 'US',
numResults: 10,
})
);
// Returns: { query, results: [{ title, url, snippet, position }] }Credentials
To use this package, you need an Olostep API key:
- Sign up for an account at olostep.com
- Get your API key from the Olostep Dashboard
- Set it as an environment variable:
export OLOSTEP_API_KEY=your_api_key_hereOr pass it directly to the client:
const olostep = createOlostepClient({
apiKey: 'your_api_key_here',
});Examples
Multi-Step Research Workflow
export const researchWorkflow = inngest.createFunction(
{ id: 'web-research' },
{ event: 'research/start' },
async ({ event, step }) => {
// Step 1: Discover pages
const siteMap = await step.run('discover-pages', () =>
olostep.map({
url: event.data.websiteUrl,
includeUrls: ['/blog/**'],
topN: 50,
})
);
// Step 2: Batch scrape discovered URLs
const batch = await step.run('create-batch', () =>
olostep.batch.create({
items: siteMap.urls.map((url, idx) => ({
url,
customId: `page-${idx}`,
})),
formats: ['markdown'],
})
);
// Step 3: Wait for processing
await step.sleep('wait', '3m');
// Step 4: Get results
const results = await step.run('get-results', () =>
olostep.batch.get(batch.id)
);
return {
pagesFound: siteMap.totalUrls,
pagesScraped: results.completedItems,
};
}
);Scheduled Monitoring
export const dailyMonitor = inngest.createFunction(
{
id: 'daily-monitor',
cron: '0 9 * * *', // Every day at 9am
},
async ({ step }) => {
const result = await step.run('check-competitor', () =>
olostep.scrape({
url: 'https://competitor.com/pricing',
formats: ['markdown'],
})
);
// Process and alert on changes...
return { scraped: true };
}
);Using Middleware
Inject the Olostep client into all your Inngest functions:
import { Inngest } from 'inngest';
import { createOlostepMiddleware } from '@olostep/inngest';
const inngest = new Inngest({
id: 'my-app',
middleware: [
createOlostepMiddleware({
apiKey: process.env.OLOSTEP_API_KEY!,
}),
],
});
// Now ctx.olostep is available in all functions
export const myFunction = inngest.createFunction(
{ id: 'my-function' },
{ event: 'app/event' },
async ({ step, ctx }) => {
const result = await step.run('scrape', () =>
ctx.olostep.scrape({ url: event.data.url })
);
return result;
}
);Specialized Parsers
Olostep provides pre-built parsers for popular websites. Use them with the parser parameter:
- @olostep/google-search - Extract search results, titles, snippets, URLs
- @olostep/google-maps - Extract business info, reviews, ratings, location
- @olostep/amazon-product - Extract product details, prices, reviews, images
- @olostep/linkedin-profile - Extract LinkedIn profile data
- @olostep/extract-emails - Extract emails from pages
- @olostep/extract-socials - Extract social profile links
Example:
const product = await olostep.scrape({
url: 'https://amazon.com/dp/PRODUCT_ID',
parser: '@olostep/amazon-product',
formats: ['json'],
});Error Handling
The client throws typed errors for different scenarios:
import { OlostepError, OlostepRateLimitError, OlostepAuthError } from '@olostep/inngest';
try {
await olostep.scrape({ url: 'https://example.com' });
} catch (error) {
if (error instanceof OlostepRateLimitError) {
// Inngest will automatically retry with backoff
} else if (error instanceof OlostepAuthError) {
// Invalid API key
} else if (error instanceof OlostepError) {
console.log(error.code, error.statusCode);
}
}Compatibility
- Node.js: >= 18.0.0
- Inngest: >= 3.0.0
- TypeScript: Full type support included
Resources
- Olostep API Documentation
- Inngest Integration Guide
- Olostep Dashboard - Get your API key
- Inngest Documentation
Support
Need help with the Inngest integration?
- Documentation: docs.olostep.com
- Support Email: [email protected]
- Website: olostep.com
License
MIT © Olostep
