weavebot-core

v0.1.1

Published

10 months ago

Generic content processing framework for web scraping and AI extraction

0High
0Medium
0Low

unforced

content-processing ai web-scraping schema-validation plugin-architecture generic-framework extraction

@weavebot/core

Generic content processing framework for web scraping and AI extraction.

Overview

@weavebot/core is a lightweight, plugin-based framework for extracting structured data from web content. It provides infrastructure without implementation details, allowing you to build custom content processing pipelines.

Features

🔌 Plugin Architecture - Extend functionality without modifying core
🤖 Schema-Driven AI Extraction - Register custom schemas for any data type
🌐 Generic Web Scraper - Platform-agnostic with plugin support
💾 Flexible Storage Interface - Use any backend (Airtable, MongoDB, etc.)
📝 Dynamic Schema Registry - Register schemas at runtime
🔧 Zero Implementation Details - Pure infrastructure, no domain logic

Installation

npm install @weavebot/core

Quick Start

import ContentProcessor, { 
  createWebScraper, 
  createAIExtractor,
  SchemaRegistry 
} from '@weavebot/core';
import { z } from 'zod';

// Create processor instance
const processor = new ContentProcessor();

// Register your schema
const ArticleSchema = z.object({
  title: z.string(),
  author: z.string(),
  content: z.string(),
  publishedAt: z.date()
});

processor.registerSchema('article', ArticleSchema);

// Set up processors
const scraper = createWebScraper();
const extractor = createAIExtractor({
  provider: 'openai',
  apiKey: process.env.OPENAI_API_KEY
});

// Register extraction configuration
extractor.registerExtractor('article', {
  schema: ArticleSchema,
  systemPrompt: 'Extract article information from the content',
  userPromptTemplate: 'Extract article from: {{content}}'
});

processor.addProcessor('web-scraper', scraper);
processor.addProcessor('ai-extractor', extractor);

// Process a URL
const result = await processor.process({
  type: 'url',
  data: 'https://example.com/article',
  schema: 'article'
});

Plugin System

Create platform-specific plugins for the web scraper:

import { WebScraperPlugin } from '@weavebot/core';

class MyPlatformPlugin implements WebScraperPlugin {
  name = 'my-platform';
  
  canHandle(url: string): boolean {
    return url.includes('myplatform.com');
  }
  
  getConfig(url: string) {
    return {
      strategy: 'spa',
      waitSelectors: ['.content-loaded'],
      timeout: 10000
    };
  }
}

scraper.registerPlugin(new MyPlatformPlugin());

Storage Adapters

Implement the generic storage interface for your backend:

import { StorageAdapter } from '@weavebot/core';

class MyStorageAdapter implements StorageAdapter {
  async initialize(config) { /* ... */ }
  async create(collection, data) { /* ... */ }
  async read(collection, id) { /* ... */ }
  async update(collection, id, data) { /* ... */ }
  async delete(collection, id) { /* ... */ }
  async query(collection, filter) { /* ... */ }
}

processor.addStorage('my-storage', new MyStorageAdapter());

Documentation

For complete documentation, visit the GitHub repository.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme