@cdp-forge/plugin-pipeline-sdk

v1.2.2

Published

5 months ago

SDK for pipeline plugins for CDP Forge platform

Downloads

0High
0Medium
0Low

cecchigl

cipa78

plugin-pipeline CDP Forge

CDP Forge Plugin Pipeline SDK

SDK for easily implementing pipeline plugins for the CDP Forge platform.

This project serves as an SDK for building plugins that can be integrated into the data processing pipeline of the CDP Forge platform. It is designed to simplify the development of custom data transformation and processing logic within the platform ecosystem.

📦 Installation as NPM Library

You can install this library as a dependency in other projects:

npm install @cdp-forge/plugin-pipeline-sdk

Usage as Library

import { 
    PipelinePluginI, 
    PipelineStage, 
    ConfigListener, 
    ConfigReader,
    Log,
    start
} from '@cdp-forge/plugin-pipeline-sdk';

// Create a custom plugin
class MyCustomPlugin implements PipelinePluginI {
    async elaborate(log: Log): Promise<Log | null> {
        // Implement your processing logic
        console.log('Processing log:', log);
        return log;
    }

    async init(): Promise<void> {
        console.log('Plugin initialization');
    }
}

// Load configuration
const config = ConfigReader.getInstance('./config/config.yml', './config/plugin.yml').config;

// Create plugin instance and start the server
const customPlugin = new MyCustomPlugin();
start(customPlugin, config).then(({ stage, configListener }) => {
    console.log('Server started successfully');
}).catch(error => {
    console.error('Error during startup:', error);
});

🚀 Features

Pipeline Plugin: Provides a structure for creating plugins that fit into a sequential or parallel processing pipeline
Kafka Integration: Uses Kafka for asynchronous communication and data streaming between pipeline stages
TypeScript: Written in TypeScript to improve code maintainability, type safety, and developer productivity
Docker Support: Includes Docker configuration for deployment
Testing: Jest configuration for unit tests
Configuration Management: Automatic merging of cluster and plugin configurations

📋 Prerequisites

Node.js 20.11.1 or higher
npm or yarn
Docker (optional, for deployment)
Access to a Kafka cluster

🛠️ Installation

Clone the repository:

git clone <repository-url>
cd plugin-pipeline-sdk

Install dependencies:
```
npm install
```
Configure the environment:
- Copy and modify configuration files in config/
- Ensure Kafka brokers are accessible

⚙️ Configuration

The SDK uses two separate configuration files to manage different aspects of the plugin system:

Configuration File Structure

`config/config.yml` - Cluster Configuration

This file contains the cluster-level configuration that is shared across all plugins in the CDP Forge platform.

kafkaConfig:
  brokers:
    - 'localhost:36715'

manager: 
  url: 'https://plugin_template_url'
  config_topic: 'config'

mysql:
  uri: 'mysql://user:password@my-server-ip:3306'

Important: If you're using the Helm installer provided by the CDP Forge platform, this file is automatically generated and you should use thia one on your plugin.

`config/plugin.yml` - Plugin-Specific Configuration

This file contains plugin-specific settings that define how your individual plugin behaves within the pipeline.

plugin:
  name: 'myPlugin'
  priority: 1 # 1 to 100 (not required if parallel)
  type: 'blocking' # or 'parallel'

Field Descriptions

Cluster Configuration (`config.yml`)

kafkaConfig.brokers
List of Kafka broker addresses to which the plugin will connect. This is configured at the cluster level and shared by all plugins.
manager.url
URL used to register or communicate with the plugin manager service.
manager.config_topic
Kafka topic used for plugin configuration management across the cluster.
mysql.uri
MySQL connection string for database operations.

Plugin Configuration (`plugin.yml`)

plugin.name
Unique identifier for your plugin instance within the pipeline.
plugin.priority
(Required only for blocking plugins)
An integer from 1 to 100 that defines the execution order of the plugin within the pipeline. A lower number means higher priority, so the plugin with priority 1 will be executed before plugins with priority 2,3,4...
plugin.type
Defines the plugin execution mode:
- blocking: The plugin processes data and returns a Promise<Log> for the next stage.
- parallel: The plugin runs independently and returns a Promise<void>.

Configuration Management

Cluster Config (config.yml): Managed by the platform, automatically generated by Helm installer
Plugin Config (plugin.yml): Managed by you, defines your plugin's behavior
Environment Variables: Can override both configurations if needed
Runtime Updates: Plugin configuration can be updated without restarting the cluster

Using ConfigReader for Convenience

The SDK provides a ConfigReader utility that automatically merges both configuration files into a single config object, making it easier to access all settings in your plugin code.

import { ConfigReader } from 'plugin-pipeline-sdk';

// The ConfigReader automatically loads and merges:
// - config/config.yml (cluster configuration)
// - config/plugin.yml (plugin configuration)
const config = ConfigReader.getInstance('./config/config.yml', './config/plugin.yml').config;

// Access cluster configuration
console.log(config.kafka.brokers);
console.log(config.manager.url);

// Access plugin configuration
console.log(config.plugin.name);
console.log(config.plugin.priority);

// Access merged configuration
console.log(config.mysql.uri);

Starting the Server with Configuration

The start() function requires the merged configuration to initialize the server:

import { start, PipelinePluginI, Log, ConfigReader } from 'plugin-pipeline-sdk';

const config = ConfigReader.getInstance('./config/config.yml', './config/plugin.yml').config;

class MyPlugin implements PipelinePluginI {
    async elaborate(log: Log): Promise<Log | null> {
        // Your plugin logic here
        return log;
    }

    async init(): Promise<void> {
        // Plugin initialization
    }
}

// Start the server with the merged configuration
start(new MyPlugin(), config).then(({ stage, configListener }) => {
    console.log('Server started with merged configuration');
}).catch(error => {
    console.error('Error starting server:', error);
});

The server will:

Load both configuration files using the specified paths
Merge them into a single config object
Validate the configuration
Start the plugin with the merged settings

🔧 Plugin Development

To create a new plugin, follow these steps:

Configure the config.yml and plugin.yml files correctly
Implement the elaborate function in your plugin class

Plugin Implementation

The plugin must implement the PipelinePluginI interface:

import { PipelinePluginI, Log } from 'plugin-pipeline-sdk';

export default class MyPlugin implements PipelinePluginI {
    elaborate(log: Log): Promise<Log | null> {
        // Implement your processing logic here
        // For blocking plugins: return Promise<Log>
        // For parallel plugins: return Promise<void>
        return Promise.resolve(log);
    }

    init(): Promise<void> {
        // Plugin initialization
        return Promise.resolve();
    }
}

Plugin Types

Depending on the plugin type:

blocking plugins: The elaborate function must return a Promise<Log>.
parallel plugins: The elaborate function must return a Promise<void>.

📁 Project Structure

plugin-pipeline-template/
├── config/                 # Configuration files
│   ├── config.yml         # Cluster configuration
│   └── plugin.yml         # Plugin-specific configuration
├── src/                   # TypeScript source code
│   ├── plugin/           # Plugin implementation
│   │   ├── Plugin.ts     # Main plugin class
│   │   └── PipelinePluginI.ts # Plugin interface
│   ├── types.ts          # Type definitions
│   ├── config.ts         # Configuration management
│   ├── index.ts          # Library entry point
│   └── ...               # Other utility files
├── __tests__/            # Unit tests
├── Dockerfile            # Docker configuration
├── package.json          # Dependencies and scripts
└── tsconfig.json         # TypeScript configuration

🚀 Available Scripts

npm run build: Compiles TypeScript code
npm test: Runs unit tests
npm run clean: Cleans the dist folder
npm run prepublishOnly: Builds before publishing

🐳 Docker Deployment

Build the image:
```
docker build -t plugin-pipeline-sdk .
```

Run the container:

docker run -p 3000:3000 plugin-pipeline-sdk

📊 Data Structure

The plugin processes Log objects that contain:

interface Log {
    client: number;
    date: string;
    device: {
        browser?: string;
        id: string;
        ip?: string;
        os?: string;
        type?: string;
        userAgent?: string;
    };
    event: string;
    geo?: {
        city?: string;
        country?: string;
        point?: {
            type: string;
            coordinates: number[];
        };
        region?: string;
    };
    googleTopics?: GoogleTopic[];
    instance: number;
    page: {
        description?: string;
        href?: string;
        image?: string;
        title: string;
        type?: string;
    };
    product?: Product[];
    referrer?: string;
    session: string;
    target?: string;
    order?: string;
    [key: string]: any; // Allows additional properties
}

📦 Publishing to NPM

To publish this library to npm, see the Publishing Guide.

🤝 Contributing

Contributions are welcome! To contribute:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is distributed under the GPL-3.0 license. See the LICENSE file for more details.

📞 Support

For support and questions, please open an issue on the GitHub repository.