@kadoa/node-sdk
v0.33.0
Published
Kadoa SDK for Node.js
Readme
Kadoa SDK for Node.js
Official Node.js/TypeScript SDK for the Kadoa API, providing easy integration with Kadoa's web data extraction platform.
Installation
npm install @kadoa/node-sdk
# or
yarn add @kadoa/node-sdk
# or
pnpm add @kadoa/node-sdkQuick Start
import { KadoaClient } from '@kadoa/node-sdk';
const client = new KadoaClient({
apiKey: 'your-api-key'
});
// Default extraction targets the main entity on the page
const result = await client.extraction.run({
urls: ['https://example.com/products'],
name: 'Product Extraction'
});
console.log(`Extracted ${result.data?.length} items`);
// Output: Extracted 25 itemsExtraction Methods
Default Extraction
The simplest way to extract data. By default, the SDK targets the main entity on the page:
const result = await client.extraction.run({
urls: ['https://example.com'],
name: 'My Extraction'
});
// Returns:
// {
// workflowId: "abc123",
// workflow: { id: "abc123", state: "FINISHED", ... },
// data: [
// { title: "Item 1", price: "$10" },
// { title: "Item 2", price: "$20" }
// ],
// pagination: { page: 1, totalPages: 3, hasMore: true }
// }When to use: Quick extractions, exploratory data gathering, or when you don't know the exact schema yet.
Builder API (Custom Schemas)
Define exactly what data you want to extract using the fluent builder pattern. The builder derives a schema-aware prompt from the entity and fields you specify:
const extraction = await client.extract({
urls: ['https://example.com/products'],
name: 'Product Extraction',
extraction: builder => builder
.schema('Product')
.field('title', 'Product name', 'STRING', { example: 'Laptop' })
.field('price', 'Product price', 'CURRENCY')
.field('inStock', 'Stock status', 'BOOLEAN')
.field('rating', 'Star rating', 'NUMBER')
}).create();
// Run extraction
const result = await extraction.run();
const data = await result.fetchData({});
// Returns:
// {
// data: [
// { title: "Dell XPS", price: "$999", inStock: true, rating: 4.5 },
// { title: "MacBook", price: "$1299", inStock: false, rating: 4.8 }
// ],
// pagination: { ... }
// }When to use: Production applications, consistent schema requirements, data validation needs.
Builder Patterns
Raw Content Extraction
Add convenience fields for raw page artifacts to the extraction schema:
// Single format
extraction: builder => builder.raw('markdown')
// Multiple formats
extraction: builder => builder.raw(['html', 'markdown', 'url'])Classification Fields
Categorize content into predefined labels:
extraction: builder => builder
.schema('Article')
.classify('sentiment', 'Content sentiment', [
{ title: 'Positive', definition: 'Optimistic or favorable tone' },
{ title: 'Negative', definition: 'Critical or unfavorable tone' },
{ title: 'Neutral', definition: 'Balanced or objective tone' }
])Hybrid Extraction
Combine structured fields with raw content:
extraction: builder => builder
.schema('Product')
.field('title', 'Product name', 'STRING', { example: 'Item' })
.field('price', 'Product price', 'CURRENCY')
.raw('html') // Include raw HTML alongside structured fieldsReference Existing Schema
Reuse a previously defined schema:
extraction: builder => builder.useSchema('schema-id-123')Real-time Monitoring
Monitor websites continuously and receive live updates when data changes.
Setup:
const client = new KadoaClient({ apiKey: 'your-api-key' });
const realtime = await client.connectRealtime();
// Verify connection
if (client.isRealtimeConnected()) {
console.log('Connected to real-time updates');
}Create a monitor:
const monitor = await client
.extract({
urls: ['https://example.com/products'],
name: 'Price Monitor',
extraction: schema =>
schema
.entity('Product')
.field('name', 'Product name', 'STRING')
.field('price', 'Current price', 'MONEY'),
})
.setInterval({ interval: 'REAL_TIME' })
.create();
// Wait for monitor to start
await monitor.waitForReady();
// Handle updates
realtime.onEvent((event) => {
if (event.workflowId === monitor.workflowId) {
console.log('Update:', event.data);
}
});Requirements:
- API key (personal or team)
- Call
await client.connectRealtime()before subscribing to events - Notifications enabled for at least one channel (Webhook, Email, or Slack)
When to use: Price tracking, inventory monitoring, live content updates.
Working with Results
Fetch Specific Page
const page = await client.extraction.fetchData({
workflowId: 'workflow-id',
page: 2,
limit: 50
});Iterate Through All Pages
for await (const page of client.extraction.fetchDataPages({
workflowId: 'workflow-id'
})) {
console.log(`Processing ${page.data.length} items`);
// Process page.data
}Fetch All Data at Once
const allData = await client.extraction.fetchAllData({
workflowId: 'workflow-id'
});
console.log(`Total items: ${allData.length}`);Advanced Workflow Control
For scheduled extractions, monitoring, and notifications:
const extraction = await client.extract({
urls: ['https://example.com'],
name: 'Scheduled Extraction',
extraction: builder => builder
.schema('Product')
.field('title', 'Product name', 'STRING', { example: 'Item' })
.field('price', 'Price', 'CURRENCY')
})
.setInterval({ interval: 'DAILY' }) // Schedule: HOURLY, DAILY, WEEKLY, MONTHLY
.withNotifications({
events: 'all',
channels: { WEBSOCKET: true }
})
.bypassPreview() // Skip approval step
.create();
const result = await extraction.run();Data Validation
Kadoa can automatically suggest validation rules and detect anomalies:
import { KadoaClient, pollUntil } from '@kadoa/node-sdk';
const client = new KadoaClient({ apiKey: 'your-api-key' });
// 1. Run extraction
const result = await client.extraction.run({
urls: ['https://example.com']
});
// 2. Wait for AI-suggested validation rules
const rules = await pollUntil(
async () => await client.validation.listRules({
workflowId: result.workflowId
}),
(result) => result.data.length > 0,
{ pollIntervalMs: 10000, timeoutMs: 30000 }
);
// 3. Approve and run validation
await client.validation.bulkApproveRules({
workflowId: result.workflowId,
ruleIds: rules.result.data.map(r => r.id)
});
const validation = await client.validation.scheduleValidation(
result.workflowId,
result.workflow?.jobId || ''
);
// 4. Check for anomalies
const completed = await client.validation.waitUntilCompleted(
validation.validationId
);
const anomalies = await client.validation.getValidationAnomalies(
validation.validationId
);
console.log(`Found ${anomalies.length} anomalies`);Configuration
Basic Setup
const client = new KadoaClient({
apiKey: 'your-api-key',
timeout: 30000 // optional, in ms
});Environment Variables
import { KadoaClient } from '@kadoa/node-sdk';
import { config } from 'dotenv';
config();
const client = new KadoaClient({
apiKey: process.env.KADOA_API_KEY!
});WebSocket & Realtime Events
Enable realtime notifications using an API key:
const client = new KadoaClient({ apiKey: 'your-api-key' });
const realtime = await client.connectRealtime();
// Listen to events
realtime.onEvent((event) => {
console.log('Event:', event);
});
// Use with extractions
const extraction = await client.extract({
urls: ['https://example.com'],
name: 'Monitored Extraction',
extraction: builder => builder.raw('markdown')
})
.withNotifications({
events: 'all',
channels: { WEBSOCKET: true }
})
.create();The SDK automatically reconnects when the realtime service drains a socket during deploys. If the server provides _cursor values, the client resumes with lastCursor on the replacement subscribe and suppresses overlapping duplicate deliveries by event.id.
Connection control:
const realtime = client.connectRealtime(); // Connect manually
const connected = client.isRealtimeConnected(); // Check status
client.disconnectRealtime(); // DisconnectError Handling
import { KadoaClient, KadoaSdkException, KadoaHttpException } from '@kadoa/node-sdk';
try {
const result = await client.extraction.run({
urls: ['https://example.com']
});
} catch (error) {
if (error instanceof KadoaHttpException) {
console.error('API Error:', error.message);
console.error('Status:', error.httpStatus);
} else if (error instanceof KadoaSdkException) {
console.error('SDK Error:', error.message);
console.error('Code:', error.code);
}
}Debugging
Enable debug logs using the DEBUG environment variable:
# All SDK logs
DEBUG=kadoa:* node app.js
# Specific modules
DEBUG=kadoa:extraction node app.js
DEBUG=kadoa:http node app.js
DEBUG=kadoa:client,kadoa:extraction node app.jsMore Examples
See the examples directory for complete examples including:
- Batch processing
- Custom error handling
- Integration patterns
- Advanced validation workflows
Workflow Management
Use the workflows domain to inspect or modify existing workflows without leaving your application.
Update Workflow Metadata
Wraps PUT /v4/workflows/{workflowId}/metadata so you can adjust limits, schedules, tags, schema, monitoring, etc.
const result = await client.workflow.update("workflow-id", {
limit: 1000,
monitoring: { enabled: true },
tags: ["weekly-report"],
});
console.log(result);
// { success: true, message: "Workflow metadata updated successfully" }Delete a Workflow
await client.workflow.delete("workflow-id");[!NOTE]
client.workflow.cancel(id)still calls the delete endpoint for backward compatibility, but it now logs a deprecation warning. Useclient.workflow.delete(id)going forward.
Requirements
- Node.js 22+
Support
- Documentation: docs.kadoa.com
- API Reference: docs.kadoa.com/api
- Support: [email protected]
- Issues: GitHub Issues
License
MIT
