weavebot-core
v0.1.1
Published
Generic content processing framework for web scraping and AI extraction
Maintainers
Readme
@weavebot/core
Generic content processing framework for web scraping and AI extraction.
Overview
@weavebot/core is a lightweight, plugin-based framework for extracting structured data from web content. It provides infrastructure without implementation details, allowing you to build custom content processing pipelines.
Features
- 🔌 Plugin Architecture - Extend functionality without modifying core
- 🤖 Schema-Driven AI Extraction - Register custom schemas for any data type
- 🌐 Generic Web Scraper - Platform-agnostic with plugin support
- 💾 Flexible Storage Interface - Use any backend (Airtable, MongoDB, etc.)
- 📝 Dynamic Schema Registry - Register schemas at runtime
- 🔧 Zero Implementation Details - Pure infrastructure, no domain logic
Installation
npm install @weavebot/coreQuick Start
import ContentProcessor, {
createWebScraper,
createAIExtractor,
SchemaRegistry
} from '@weavebot/core';
import { z } from 'zod';
// Create processor instance
const processor = new ContentProcessor();
// Register your schema
const ArticleSchema = z.object({
title: z.string(),
author: z.string(),
content: z.string(),
publishedAt: z.date()
});
processor.registerSchema('article', ArticleSchema);
// Set up processors
const scraper = createWebScraper();
const extractor = createAIExtractor({
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY
});
// Register extraction configuration
extractor.registerExtractor('article', {
schema: ArticleSchema,
systemPrompt: 'Extract article information from the content',
userPromptTemplate: 'Extract article from: {{content}}'
});
processor.addProcessor('web-scraper', scraper);
processor.addProcessor('ai-extractor', extractor);
// Process a URL
const result = await processor.process({
type: 'url',
data: 'https://example.com/article',
schema: 'article'
});Plugin System
Create platform-specific plugins for the web scraper:
import { WebScraperPlugin } from '@weavebot/core';
class MyPlatformPlugin implements WebScraperPlugin {
name = 'my-platform';
canHandle(url: string): boolean {
return url.includes('myplatform.com');
}
getConfig(url: string) {
return {
strategy: 'spa',
waitSelectors: ['.content-loaded'],
timeout: 10000
};
}
}
scraper.registerPlugin(new MyPlatformPlugin());Storage Adapters
Implement the generic storage interface for your backend:
import { StorageAdapter } from '@weavebot/core';
class MyStorageAdapter implements StorageAdapter {
async initialize(config) { /* ... */ }
async create(collection, data) { /* ... */ }
async read(collection, id) { /* ... */ }
async update(collection, id, data) { /* ... */ }
async delete(collection, id) { /* ... */ }
async query(collection, filter) { /* ... */ }
}
processor.addStorage('my-storage', new MyStorageAdapter());Documentation
For complete documentation, visit the GitHub repository.
License
MIT
