@olostep/inngest

v1.0.0

Published

a month ago

Official Olostep integration for Inngest - Build reliable web scraping workflows with durable functions. Search, extract, and structure web data from any website.

0High
0Medium
0Low

olostep

inngest olostep web-scraping web-search web-crawling durable-functions background-jobs workflows ai-agents automation data-extraction scraper crawler

@olostep/inngest

Official Olostep integration for Inngest - Build reliable, fault-tolerant web scraping workflows with durable functions.

Olostep is a web search, scraping, and crawling API that extracts structured web data from any website in real time. Perfect for automating research workflows, monitoring competitors, and collecting data at scale.

Inngest is a platform for building reliable background jobs and workflows with automatic retries, scheduling, and observability.

Installation
Quick Start
Operations
Credentials
Examples
Specialized Parsers
Resources

Installation

npm install @olostep/inngest inngest

Or with yarn/pnpm:

yarn add @olostep/inngest inngest
pnpm add @olostep/inngest inngest

Quick Start

import { Inngest } from 'inngest';
import { createOlostepClient } from '@olostep/inngest';

// Initialize clients
const inngest = new Inngest({ id: 'my-app' });
const olostep = createOlostepClient({
  apiKey: process.env.OLOSTEP_API_KEY!,
});

// Create a durable web scraping workflow
export const scrapeWorkflow = inngest.createFunction(
  { id: 'scrape-website' },
  { event: 'scrape/start' },
  async ({ event, step }) => {
    // Each step.run is automatically retried on failure
    const result = await step.run('scrape-page', () =>
      olostep.scrape({
        url: event.data.url,
        formats: ['markdown'],
      })
    );

    return {
      url: result.url,
      content: result.markdownContent,
      metadata: result.pageMetadata,
    };
  }
);

Operations

Scrape Website

Extract content from a single URL. Supports multiple formats and JavaScript rendering.

Use Cases:

Monitor specific pages for changes
Extract product information from e-commerce sites
Gather data from news articles or blog posts
Pull content for content aggregation

Parameters:

url (required): The URL of the website you want to scrape
formats: Output formats (html, markdown, json, text) - default: ['markdown']
country: Country code (e.g., US, GB, CA) for location-specific scraping
waitBeforeScraping: Wait time in milliseconds before scraping (0-10000)
parser: Parser ID for specialized extraction (e.g., @olostep/amazon-product)

Example:

const result = await step.run('scrape', () =>
  olostep.scrape({
    url: 'https://example.com',
    formats: ['markdown', 'html'],
    country: 'US',
  })
);
// Returns: { id, url, markdownContent, htmlContent, pageMetadata, ... }

Batch Scrape URLs

Scrape up to 100,000 URLs in parallel. Perfect for large-scale data extraction.

Use Cases:

Scrape entire product catalogs
Extract data from multiple search results
Process lists of URLs from spreadsheets
Bulk content extraction

Parameters:

items (required): Array of { url, customId? } objects
formats: Output formats - default: ['markdown']
country: Country code for location-specific scraping
parser: Parser ID for specialized extraction

Example:

// Create batch job
const batch = await step.run('create-batch', () =>
  olostep.batch.create({
    items: [
      { url: 'https://example1.com', customId: 'page-1' },
      { url: 'https://example2.com', customId: 'page-2' },
    ],
    formats: ['markdown'],
  })
);

// Wait for processing
await step.sleep('wait', '2m');

// Get results
const results = await step.run('get-results', () =>
  olostep.batch.get(batch.id)
);

Create Crawl

Autonomously discover and scrape entire websites by following links.

Use Cases:

Crawl and archive entire documentation sites
Extract all blog posts from a website
Build knowledge bases from web content
Monitor website structure changes

Parameters:

startUrl (required): Starting URL for the crawl
maxPages: Maximum number of pages to crawl (default: 10)
followLinks: Whether to follow links (default: true)
formats: Output formats - default: ['markdown']
includeUrls: Glob patterns to include (e.g., ['/docs/**'])
excludeUrls: Glob patterns to exclude (e.g., ['/admin/**'])

Example:

const crawl = await step.run('start-crawl', () =>
  olostep.crawl.create({
    startUrl: 'https://docs.example.com',
    maxPages: 100,
    includeUrls: ['/docs/**'],
  })
);

Create Map

Extract all URLs from a website for content discovery and site structure analysis.

Use Cases:

Build sitemaps and site structure diagrams
Discover all pages before batch scraping
Find broken or missing pages
SEO audits and analysis

Parameters:

url (required): Website URL to extract links from
searchQuery: Optional search query to filter URLs
topN: Limit the number of URLs returned
includeUrls: Glob patterns to include
excludeUrls: Glob patterns to exclude

Example:

const siteMap = await step.run('discover-urls', () =>
  olostep.map({
    url: 'https://example.com',
    includeUrls: ['/blog/**'],
    topN: 100,
  })
);
// Returns: { id, url, totalUrls, urls: string[] }

AI-Powered Answers

Search the web and get AI-powered answers with sources and citations.

Use Cases:

Enrich data with web-sourced facts
Ground AI applications on real-world data
Research tasks with verified outputs
Competitive intelligence

Parameters:

task (required): Question or task to answer
jsonSchema: JSON schema for structured output

Example:

const answer = await step.run('get-answer', () =>
  olostep.answer({
    task: 'Who is the CEO of Anthropic?',
    jsonSchema: {
      ceo_name: '',
      founded_year: '',
      headquarters: '',
    },
  })
);
// Returns: { id, task, answer: { ceo_name: 'Dario Amodei', ... }, sources }

Web Search

Search the web using Google Search and get structured results.

Use Cases:

Automated research workflows
Lead discovery and enrichment
Competitive analysis
Content research

Parameters:

query (required): Search query
country: Country code for geo-specific results
numResults: Number of results to return

Example:

const results = await step.run('search', () =>
  olostep.search({
    query: 'best web scraping APIs 2024',
    country: 'US',
    numResults: 10,
  })
);
// Returns: { query, results: [{ title, url, snippet, position }] }

Credentials

To use this package, you need an Olostep API key:

Sign up for an account at olostep.com
Get your API key from the Olostep Dashboard
Set it as an environment variable:

export OLOSTEP_API_KEY=your_api_key_here

Or pass it directly to the client:

const olostep = createOlostepClient({
  apiKey: 'your_api_key_here',
});

Examples

Multi-Step Research Workflow

export const researchWorkflow = inngest.createFunction(
  { id: 'web-research' },
  { event: 'research/start' },
  async ({ event, step }) => {
    // Step 1: Discover pages
    const siteMap = await step.run('discover-pages', () =>
      olostep.map({
        url: event.data.websiteUrl,
        includeUrls: ['/blog/**'],
        topN: 50,
      })
    );

    // Step 2: Batch scrape discovered URLs
    const batch = await step.run('create-batch', () =>
      olostep.batch.create({
        items: siteMap.urls.map((url, idx) => ({
          url,
          customId: `page-${idx}`,
        })),
        formats: ['markdown'],
      })
    );

    // Step 3: Wait for processing
    await step.sleep('wait', '3m');

    // Step 4: Get results
    const results = await step.run('get-results', () =>
      olostep.batch.get(batch.id)
    );

    return {
      pagesFound: siteMap.totalUrls,
      pagesScraped: results.completedItems,
    };
  }
);

Scheduled Monitoring

export const dailyMonitor = inngest.createFunction(
  {
    id: 'daily-monitor',
    cron: '0 9 * * *', // Every day at 9am
  },
  async ({ step }) => {
    const result = await step.run('check-competitor', () =>
      olostep.scrape({
        url: 'https://competitor.com/pricing',
        formats: ['markdown'],
      })
    );

    // Process and alert on changes...
    return { scraped: true };
  }
);

Using Middleware

Inject the Olostep client into all your Inngest functions:

import { Inngest } from 'inngest';
import { createOlostepMiddleware } from '@olostep/inngest';

const inngest = new Inngest({
  id: 'my-app',
  middleware: [
    createOlostepMiddleware({
      apiKey: process.env.OLOSTEP_API_KEY!,
    }),
  ],
});

// Now ctx.olostep is available in all functions
export const myFunction = inngest.createFunction(
  { id: 'my-function' },
  { event: 'app/event' },
  async ({ step, ctx }) => {
    const result = await step.run('scrape', () =>
      ctx.olostep.scrape({ url: event.data.url })
    );
    return result;
  }
);

Specialized Parsers

Olostep provides pre-built parsers for popular websites. Use them with the parser parameter:

@olostep/google-search - Extract search results, titles, snippets, URLs
@olostep/google-maps - Extract business info, reviews, ratings, location
@olostep/amazon-product - Extract product details, prices, reviews, images
@olostep/linkedin-profile - Extract LinkedIn profile data
@olostep/extract-emails - Extract emails from pages
@olostep/extract-socials - Extract social profile links

Example:

const product = await olostep.scrape({
  url: 'https://amazon.com/dp/PRODUCT_ID',
  parser: '@olostep/amazon-product',
  formats: ['json'],
});

Error Handling

The client throws typed errors for different scenarios:

import { OlostepError, OlostepRateLimitError, OlostepAuthError } from '@olostep/inngest';

try {
  await olostep.scrape({ url: 'https://example.com' });
} catch (error) {
  if (error instanceof OlostepRateLimitError) {
    // Inngest will automatically retry with backoff
  } else if (error instanceof OlostepAuthError) {
    // Invalid API key
  } else if (error instanceof OlostepError) {
    console.log(error.code, error.statusCode);
  }
}

Compatibility

Node.js: >= 18.0.0
Inngest: >= 3.0.0
TypeScript: Full type support included

Resources

Support

Need help with the Inngest integration?

Documentation: docs.olostep.com
Support Email: [email protected]
Website: olostep.com

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@olostep/inngest

Installation

Quick Start

Operations

Scrape Website

Batch Scrape URLs

Create Crawl

Create Map

AI-Powered Answers

Web Search

Credentials

Examples

Multi-Step Research Workflow

Scheduled Monitoring

Using Middleware

Specialized Parsers

Error Handling

Compatibility

Resources

Support

License