@masteryhub-its/speakout-local-client-model

v0.0.4

Published

25 days ago

Local text moderation library using an Arabic MiniBERT model with ONNX Runtime (Web/Browser)

Downloads

483

0High
0Medium
0Low

moderation text-moderation content-moderation bert onnx nlp ai

@masteryhub-its/speakout-local-client-model

Professional-grade Arabic text moderation for browser environments
Powered by BERT with ONNX Runtime Web and WebAssembly for blazing-fast, client-side inference.

🎯 Overview

A production-ready TypeScript library for Arabic text content moderation that runs entirely in the browser. Built on a fine-tuned BERT model (asafaya/bert-mini-arabic) with INT8 quantization for optimal performance, this package provides real-time content filtering without server dependencies.

Key Features

🚀 High Performance - INT8 quantized ONNX model with WebAssembly acceleration
🌐 Client-Side - Zero backend dependencies, complete privacy
📦 Zero Configuration - Embedded models, works out of the box
🔒 Type-Safe - Full TypeScript support with comprehensive type definitions
⚡ Optimized - Max pooling aggregation for accurate multi-chunk analysis
🎯 Production-Ready - Battle-tested moderation logic with safety-first design

📦 Installation

npm install @masteryhub-its/speakout-local-client-model

Requirements

Node.js: ≥ 18.0.0
Browser: Modern browser with WebAssembly support
TypeScript (optional): ≥ 5.3.3

🚀 Quick Start

Basic Usage

import { ClientContentModeration } from '@masteryhub-its/speakout-local-client-model';

// Initialize the moderation client
const moderator = new ClientContentModeration();
await moderator.initialize();

// Moderate content
const result = await moderator.moderate('نص للمراجعة');

if (result.approved) {
  console.log('✅ Content approved');
} else {
  console.log('❌ Content rejected');
}

console.log(`Confidence: ${(result.confidence * 100).toFixed(1)}%`);

React Integration

import { useEffect, useState } from 'react';
import { ClientContentModeration } from '@masteryhub-its/speakout-local-client-model';

function useModerator() {
  const [moderator, setModerator] = useState<ClientContentModeration | null>(null);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    const init = async () => {
      const mod = new ClientContentModeration();
      await mod.initialize();
      setModerator(mod);
      setLoading(false);
    };
    init();
  }, []);

  return { moderator, loading };
}

function CommentForm() {
  const { moderator, loading } = useModerator();

  const handleSubmit = async (text: string) => {
    if (!moderator) return;
    
    const result = await moderator.moderate(text);
    if (!result.approved) {
      alert('Content violates community guidelines');
      return;
    }
    
    // Submit approved content
  };

  // ... rest of component
}

🔧 Configuration

Vite Setup

Add WASM and ONNX support to your vite.config.ts:

import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';

export default defineConfig({
  plugins: [react()],
  // Required so Vite treats ONNX/WASM as static assets (served as bytes, not JS/HTML).
  assetsInclude: ['**/*.onnx', '**/*.wasm'],
  optimizeDeps: {
    // Needed because dependency optimization uses esbuild, which doesn't know `.onnx` by default.
    esbuildOptions: {
      loader: {
        '.onnx': 'file',
      },
    },
    // NOTE: Keep `onnxruntime-web` out of dep optimization for best compatibility.
    // If you still see dev-only issues, also exclude this package:
    // exclude: ['onnxruntime-web', '@masteryhub-its/speakout-local-client-model'],
    exclude: ['onnxruntime-web'],
  },
  server: {
    fs: {
      allow: ['..'], // Allow serving from node_modules
    },
  },
});

Custom Model / Tokenizer URLs (optional)

If your app needs to control where assets are loaded from (CDN, custom public path, etc.), you can pass explicit paths:

import { ClientContentModeration } from '@masteryhub-its/speakout-local-client-model';

const moderator = new ClientContentModeration({
  modelFilePath: '/assets/model.int8.onnx',
  tokenizerFilePath: '/assets/tokenizer.json',
});

Webpack Configuration

module.exports = {
  module: {
    rules: [
      {
        test: /\.onnx$/,
        type: 'asset/resource',
      },
    ],
  },
  resolve: {
    fallback: {
      fs: false,
      path: false,
    },
  },
};

📚 API Reference

`ClientContentModeration`

Main class for content moderation.

Constructor

new ClientContentModeration(options?: ModerationOptions)

Currently uses default configuration with embedded models.

Methods

`initialize(): Promise<void>`

Initializes the ONNX model and tokenizer. Called automatically on first use, but can be called explicitly for better error handling.

const moderator = new ClientContentModeration();
await moderator.initialize(); // Explicit initialization

`moderate(text: string, threshold?: number): Promise<ModerationResult>`

Moderates a single text string.

Parameters:

text (string): Text to moderate
threshold (number, optional): Approval threshold (0-1), default: 0.5

Returns: ModerationResult

interface ModerationResult {
  approved: boolean;      // Whether content passes moderation
  confidence: number;     // Confidence score (0-1)
  probabilities: {
    reject: number;       // Rejection probability (0-1)
    approve: number;      // Approval probability (0-1)
  };
}

Example:

const result = await moderator.moderate('نص للمراجعة', 0.7);
console.log(result);
// {
//   approved: true,
//   confidence: 0.85,
//   probabilities: { reject: 0.15, approve: 0.85 }
// }

`moderateBatch(texts: string[], threshold?: number): Promise<ModerationResult[]>`

Moderates multiple texts in parallel for better performance.

const texts = ['نص أول', 'نص ثاني', 'نص ثالث'];
const results = await moderator.moderateBatch(texts);

results.forEach((result, i) => {
  console.log(`Text ${i + 1}: ${result.approved ? '✅' : '❌'}`);
});

`dispose(): void`

Releases resources and cleans up the ONNX session. Call when done using the moderator.

moderator.dispose();

💡 Advanced Usage

Custom Threshold

Adjust sensitivity based on your use case:

// Strict moderation (fewer false positives)
const strict = await moderator.moderate(text, 0.8);

// Lenient moderation (fewer false negatives)
const lenient = await moderator.moderate(text, 0.3);

// Balanced (default)
const balanced = await moderator.moderate(text, 0.5);

Error Handling

try {
  const moderator = new ClientContentModeration();
  await moderator.initialize();
  
  const result = await moderator.moderate(userInput);
  
  if (!result.approved) {
    // Handle rejected content
    console.warn('Content flagged:', result.probabilities);
  }
} catch (error) {
  console.error('Moderation failed:', error);
  // Fallback: allow content or use server-side moderation
}

Performance Optimization

// Initialize once, reuse for all requests
const moderator = new ClientContentModeration();
await moderator.initialize(); // ~100-200ms initial load

// Subsequent calls are fast (~10-50ms per text)
const result1 = await moderator.moderate(text1);
const result2 = await moderator.moderate(text2);

// Batch processing for multiple texts
const results = await moderator.moderateBatch([text1, text2, text3]);

// Clean up when done
moderator.dispose();

🏗️ Architecture

Model Details

Base Model: asafaya/bert-mini-arabic
Task: Binary sequence classification (approve/reject)
Quantization: INT8 for 4x smaller size and faster inference
Max Sequence Length: 128 tokens
Tokenizer: WordPiece with Unicode normalization

Processing Pipeline

Tokenization - Text → BERT tokens with proper punctuation handling
Chunking - Long texts split into 128-token chunks
Inference - ONNX Runtime processes each chunk
Aggregation - Max pooling on rejection probability (safety-first)
Decision - Threshold-based approval/rejection

Safety-First Design

The library uses max pooling on rejection probabilities rather than averaging. This means:

✅ A single toxic chunk in long text → rejection
✅ Prevents dilution of toxic signals
✅ Better safety for user-generated content

📊 Performance

| Metric | Value | |--------|-------| | Model Size | ~12 MB (INT8 quantized) | | Initial Load | ~100-200ms | | Inference (per text) | ~10-50ms | | Memory Usage | ~50-100 MB | | Browser Support | Chrome 91+, Firefox 89+, Safari 15+ |

🛠️ Development

Building from Source

# Clone repository
git clone <repository-url>
cd speakout-platform-local-model

# Install dependencies
npm install

# Build TypeScript
npm run build

# Format code
npm run format

# Format Python (if contributing to training scripts)
npm run format:py

Project Structure

├── src/                    # TypeScript source
│   ├── index.ts           # Main entry point
│   ├── model.ts           # ONNX model wrapper
│   ├── tokenizer.ts       # BERT tokenizer
│   ├── types.ts           # Type definitions
│   └── utils/
│       └── constants.ts   # Configuration constants
├── lib/                    # Compiled JavaScript (generated)
├── models/                 # ONNX model and tokenizer
│   └── bert-mini-moderation-output/
│       ├── model.int8.onnx
│       └── tokenizer.json
├── src/training/           # Python training scripts (not published)
├── src/data_processing/    # Data pipeline (not published)
└── tests/                  # Test files

TypeScript Types

All types are exported for your convenience:

import type {
  ModerationResult,
  ModerationOptions,
  TokenizerEncoding,
  TokenizerVocab,
  InferenceSession,
} from '@masteryhub-its/speakout-local-client-model';

🔒 Privacy & Security

100% Client-Side - No data sent to external servers
No Telemetry - Zero tracking or analytics
Offline Capable - Works without internet after initial load
GDPR Compliant - No personal data collection

🤝 Contributing

We welcome contributions from the community! Whether you're fixing bugs, adding features, or improving documentation, your help is appreciated.

Ways to Contribute

🐛 Report Bugs - Open an issue with detailed reproduction steps
💡 Suggest Features - Share your ideas for improvements
📝 Improve Documentation - Help make our docs better
🔧 Submit Code - Fix bugs or implement new features
🧪 Write Tests - Improve test coverage
🌍 Translate - Help with internationalization

Development Setup

Fork & Clone

git clone https://github.com/your-username/speakout-platform-local-model.git
cd speakout-platform-local-model

Install Dependencies
```
npm install
```
Make Changes
- Create a feature branch: git checkout -b feature/your-feature-name
- Write your code following our style guide
- Add tests if applicable

Test Your Changes

npm run build        # Ensure it builds
npm run format       # Format TypeScript/JavaScript
npm run format:py    # Format Python (if applicable)

Commit & Push

git add .
git commit -m "feat: add your feature description"
git push origin feature/your-feature-name

Open Pull Request
- Go to the repository on GitHub
- Click "New Pull Request"
- Describe your changes clearly
- Link any related issues

Code Style Guidelines

TypeScript: Follow existing patterns, use proper types
Python: Follow PEP 8, use Black formatter
Commits: Use Conventional Commits
- feat: - New features
- fix: - Bug fixes
- docs: - Documentation changes
- refactor: - Code refactoring
- test: - Adding tests
- chore: - Maintenance tasks

Pull Request Guidelines

✅ Keep PRs focused on a single feature/fix
✅ Update documentation if needed
✅ Add tests for new functionality
✅ Ensure all checks pass
✅ Respond to review feedback promptly

Code of Conduct

We are committed to providing a welcoming and inclusive environment. Please:

Be respectful and considerate
Accept constructive criticism gracefully
Focus on what's best for the community
Show empathy towards others

📄 License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Third-Party Licenses

This project uses the following open-source libraries:

ONNX Runtime Web - MIT License
BERT Model (asafaya/bert-mini-arabic) - Apache 2.0 License

Copyright Notice

All original code and documentation:

Licensed under MIT License

Model files and training data:

Based on asafaya/bert-mini-arabic (Apache 2.0)
Fine-tuned by MasteryHub ITS
Distributed under Apache 2.0 License

🙏 Acknowledgments

BERT Model: asafaya/bert-mini-arabic
ONNX Runtime: Microsoft ONNX Runtime Web
Transformers: Hugging Face Transformers

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: [email protected]

Made with ❤️ by MasteryHub ITS

Website • Documentation • npm

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@masteryhub-its/speakout-local-client-model

🎯 Overview

Key Features

📦 Installation

Requirements

🚀 Quick Start

Basic Usage

React Integration

🔧 Configuration

Vite Setup

Custom Model / Tokenizer URLs (optional)

Webpack Configuration

📚 API Reference

ClientContentModeration

Constructor

Methods

initialize(): Promise<void>

moderate(text: string, threshold?: number): Promise<ModerationResult>

moderateBatch(texts: string[], threshold?: number): Promise<ModerationResult[]>

dispose(): void

💡 Advanced Usage

Custom Threshold

Error Handling

Performance Optimization

🏗️ Architecture

Model Details

Processing Pipeline

Safety-First Design

📊 Performance

🛠️ Development

Building from Source

Project Structure

TypeScript Types

🔒 Privacy & Security

🤝 Contributing

Ways to Contribute

Development Setup

Code Style Guidelines

Pull Request Guidelines

Code of Conduct

📄 License

Third-Party Licenses

Copyright Notice

🙏 Acknowledgments

📞 Support

`ClientContentModeration`

`initialize(): Promise<void>`

`moderate(text: string, threshold?: number): Promise<ModerationResult>`

`moderateBatch(texts: string[], threshold?: number): Promise<ModerationResult[]>`

`dispose(): void`