@kadoa/node-sdk

v0.20.2

Published

8 days ago

Kadoa SDK for Node.js

0High
0Medium
0Low

kadoa api sdk client typescript node

Kadoa SDK for Node.js

Official Node.js/TypeScript SDK for the Kadoa API, providing easy integration with Kadoa's web data extraction platform.

Installation

npm install @kadoa/node-sdk
# or
yarn add @kadoa/node-sdk
# or
pnpm add @kadoa/node-sdk

Quick Start

import { KadoaClient } from '@kadoa/node-sdk';

const client = new KadoaClient({
  apiKey: 'your-api-key'
});

// AI automatically detects and extracts data
const result = await client.extraction.run({
  urls: ['https://example.com/products'],
  name: 'Product Extraction'
});

console.log(`Extracted ${result.data?.length} items`);
// Output: Extracted 25 items

Extraction Methods

Auto-Detection

The simplest way to extract data - AI automatically detects structured content:

const result = await client.extraction.run({
  urls: ['https://example.com'],
  name: 'My Extraction'
});

// Returns:
// {
//   workflowId: "abc123",
//   workflow: { id: "abc123", state: "FINISHED", ... },
//   data: [
//     { title: "Item 1", price: "$10" },
//     { title: "Item 2", price: "$20" }
//   ],
//   pagination: { page: 1, totalPages: 3, hasMore: true }
// }

When to use: Quick extractions, exploratory data gathering, or when you don't know the exact schema.

Builder API (Custom Schemas)

Define exactly what data you want to extract using the fluent builder pattern:

const extraction = await client.extract({
  urls: ['https://example.com/products'],
  name: 'Product Extraction',
  extraction: builder => builder
    .schema('Product')
    .field('title', 'Product name', 'STRING', { example: 'Laptop' })
    .field('price', 'Product price', 'CURRENCY')
    .field('inStock', 'Stock status', 'BOOLEAN')
    .field('rating', 'Star rating', 'NUMBER')
}).create();

// Run extraction
const result = await extraction.run();
const data = await result.fetchData({});

// Returns:
// {
//   data: [
//     { title: "Dell XPS", price: "$999", inStock: true, rating: 4.5 },
//     { title: "MacBook", price: "$1299", inStock: false, rating: 4.8 }
//   ],
//   pagination: { ... }
// }

When to use: Production applications, consistent schema requirements, data validation needs.

Builder Patterns

Raw Content Extraction

Extract page content without structure transformation:

// Single format
extraction: builder => builder.raw('markdown')

// Multiple formats
extraction: builder => builder.raw(['html', 'markdown', 'url'])

Classification Fields

Categorize content into predefined labels:

extraction: builder => builder
  .schema('Article')
  .classify('sentiment', 'Content sentiment', [
    { title: 'Positive', definition: 'Optimistic or favorable tone' },
    { title: 'Negative', definition: 'Critical or unfavorable tone' },
    { title: 'Neutral', definition: 'Balanced or objective tone' }
  ])

Hybrid Extraction

Combine structured fields with raw content:

extraction: builder => builder
  .schema('Product')
  .field('title', 'Product name', 'STRING', { example: 'Item' })
  .field('price', 'Product price', 'CURRENCY')
  .raw('html')  // Include raw HTML alongside structured fields

Reference Existing Schema

Reuse a previously defined schema:

extraction: builder => builder.useSchema('schema-id-123')

Real-time Monitoring

Monitor websites continuously and receive live updates when data changes.

Setup:

const client = new KadoaClient({ apiKey: 'your-api-key' });
const realtime = await client.connectRealtime();

// Verify connection
if (client.isRealtimeConnected()) {
  console.log('Connected to real-time updates');
}

Create a monitor:

const monitor = await client
  .extract({
    urls: ['https://example.com/products'],
    name: 'Price Monitor',
    extraction: schema =>
      schema
        .entity('Product')
        .field('name', 'Product name', 'STRING')
        .field('price', 'Current price', 'MONEY'),
  })
  .setInterval({ interval: 'REAL_TIME' })
  .create();

// Wait for monitor to start
await monitor.waitForReady();

// Handle updates
realtime.onEvent((event) => {
  if (event.workflowId === monitor.workflowId) {
    console.log('Update:', event.data);
  }
});

Requirements:

API key (personal or team)
Call await client.connectRealtime() before subscribing to events
Notifications enabled for at least one channel (Webhook, Email, or Slack)

When to use: Price tracking, inventory monitoring, live content updates.

Working with Results

Fetch Specific Page

const page = await client.extraction.fetchData({
  workflowId: 'workflow-id',
  page: 2,
  limit: 50
});

Iterate Through All Pages

for await (const page of client.extraction.fetchDataPages({
  workflowId: 'workflow-id'
})) {
  console.log(`Processing ${page.data.length} items`);
  // Process page.data
}

Fetch All Data at Once

const allData = await client.extraction.fetchAllData({
  workflowId: 'workflow-id'
});

console.log(`Total items: ${allData.length}`);

Advanced Workflow Control

For scheduled extractions, monitoring, and notifications:

const extraction = await client.extract({
  urls: ['https://example.com'],
  name: 'Scheduled Extraction',
  extraction: builder => builder
    .schema('Product')
    .field('title', 'Product name', 'STRING', { example: 'Item' })
    .field('price', 'Price', 'CURRENCY')
})
.setInterval({ interval: 'DAILY' })  // Schedule: HOURLY, DAILY, WEEKLY, MONTHLY
.withNotifications({
  events: 'all',
  channels: { WEBSOCKET: true }
})
.bypassPreview()  // Skip approval step
.create();

const result = await extraction.run();

Data Validation

Kadoa can automatically suggest validation rules and detect anomalies:

import { KadoaClient, pollUntil } from '@kadoa/node-sdk';

const client = new KadoaClient({ apiKey: 'your-api-key' });

// 1. Run extraction
const result = await client.extraction.run({
  urls: ['https://example.com']
});

// 2. Wait for AI-suggested validation rules
const rules = await pollUntil(
  async () => await client.validation.listRules({
    workflowId: result.workflowId
  }),
  (result) => result.data.length > 0,
  { pollIntervalMs: 10000, timeoutMs: 30000 }
);

// 3. Approve and run validation
await client.validation.bulkApproveRules({
  workflowId: result.workflowId,
  ruleIds: rules.result.data.map(r => r.id)
});

const validation = await client.validation.scheduleValidation(
  result.workflowId,
  result.workflow?.jobId || ''
);

// 4. Check for anomalies
const completed = await client.validation.waitUntilCompleted(
  validation.validationId
);
const anomalies = await client.validation.getValidationAnomalies(
  validation.validationId
);

console.log(`Found ${anomalies.length} anomalies`);

Configuration

Basic Setup

const client = new KadoaClient({
  apiKey: 'your-api-key',
  timeout: 30000  // optional, in ms
});

Environment Variables

import { KadoaClient } from '@kadoa/node-sdk';
import { config } from 'dotenv';

config();

const client = new KadoaClient({
  apiKey: process.env.KADOA_API_KEY!
});

WebSocket & Realtime Events

Enable realtime notifications using an API key:

const client = new KadoaClient({ apiKey: 'your-api-key' });
const realtime = await client.connectRealtime();

// Listen to events
realtime.onEvent((event) => {
  console.log('Event:', event);
});

// Use with extractions
const extraction = await client.extract({
  urls: ['https://example.com'],
  name: 'Monitored Extraction',
  extraction: builder => builder.raw('markdown')
})
.withNotifications({
  events: 'all',
  channels: { WEBSOCKET: true }
})
.create();

Connection control:

const realtime = client.connectRealtime();      // Connect manually
const connected = client.isRealtimeConnected(); // Check status
client.disconnectRealtime();                    // Disconnect

Error Handling

import { KadoaClient, KadoaSdkException, KadoaHttpException } from '@kadoa/node-sdk';

try {
  const result = await client.extraction.run({
    urls: ['https://example.com']
  });
} catch (error) {
  if (error instanceof KadoaHttpException) {
    console.error('API Error:', error.message);
    console.error('Status:', error.httpStatus);
  } else if (error instanceof KadoaSdkException) {
    console.error('SDK Error:', error.message);
    console.error('Code:', error.code);
  }
}

Debugging

Enable debug logs using the DEBUG environment variable:

# All SDK logs
DEBUG=kadoa:* node app.js

# Specific modules
DEBUG=kadoa:extraction node app.js
DEBUG=kadoa:http node app.js
DEBUG=kadoa:client,kadoa:extraction node app.js

More Examples

See the examples directory for complete examples including:

Batch processing
Custom error handling
Integration patterns
Advanced validation workflows

Workflow Management

Use the workflows domain to inspect or modify existing workflows without leaving your application.

Update Workflow Metadata

Wraps PUT /v4/workflows/{workflowId}/metadata so you can adjust limits, schedules, tags, schema, monitoring, etc.

const result = await client.workflow.update("workflow-id", {
  limit: 1000,
  monitoring: { enabled: true },
  tags: ["weekly-report"],
});

console.log(result);
// { success: true, message: "Workflow metadata updated successfully" }

Delete a Workflow

await client.workflow.delete("workflow-id");

[!NOTE] client.workflow.cancel(id) still calls the delete endpoint for backward compatibility, but it now logs a deprecation warning. Use client.workflow.delete(id) going forward.

Requirements

Node.js 22+

Support

Documentation: docs.kadoa.com
API Reference: docs.kadoa.com/api
Support: [email protected]
Issues: GitHub Issues

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Kadoa SDK for Node.js

Installation

Quick Start

Extraction Methods

Auto-Detection

Builder API (Custom Schemas)

Builder Patterns

Real-time Monitoring

Working with Results

Advanced Workflow Control

Data Validation

Configuration

Basic Setup

Environment Variables

WebSocket & Realtime Events

Error Handling

Debugging

More Examples

Workflow Management

Update Workflow Metadata

Delete a Workflow

Requirements

Support

License