@masteryhub-its/speakout-local-client-model
v0.0.4
Published
Local text moderation library using an Arabic MiniBERT model with ONNX Runtime (Web/Browser)
Downloads
483
Readme
@masteryhub-its/speakout-local-client-model
Professional-grade Arabic text moderation for browser environments
Powered by BERT with ONNX Runtime Web and WebAssembly for blazing-fast, client-side inference.
🎯 Overview
A production-ready TypeScript library for Arabic text content moderation that runs entirely in the browser. Built on a fine-tuned BERT model (asafaya/bert-mini-arabic) with INT8 quantization for optimal performance, this package provides real-time content filtering without server dependencies.
Key Features
- 🚀 High Performance - INT8 quantized ONNX model with WebAssembly acceleration
- 🌐 Client-Side - Zero backend dependencies, complete privacy
- 📦 Zero Configuration - Embedded models, works out of the box
- 🔒 Type-Safe - Full TypeScript support with comprehensive type definitions
- ⚡ Optimized - Max pooling aggregation for accurate multi-chunk analysis
- 🎯 Production-Ready - Battle-tested moderation logic with safety-first design
📦 Installation
npm install @masteryhub-its/speakout-local-client-modelRequirements
- Node.js: ≥ 18.0.0
- Browser: Modern browser with WebAssembly support
- TypeScript (optional): ≥ 5.3.3
🚀 Quick Start
Basic Usage
import { ClientContentModeration } from '@masteryhub-its/speakout-local-client-model';
// Initialize the moderation client
const moderator = new ClientContentModeration();
await moderator.initialize();
// Moderate content
const result = await moderator.moderate('نص للمراجعة');
if (result.approved) {
console.log('✅ Content approved');
} else {
console.log('❌ Content rejected');
}
console.log(`Confidence: ${(result.confidence * 100).toFixed(1)}%`);React Integration
import { useEffect, useState } from 'react';
import { ClientContentModeration } from '@masteryhub-its/speakout-local-client-model';
function useModerator() {
const [moderator, setModerator] = useState<ClientContentModeration | null>(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
const init = async () => {
const mod = new ClientContentModeration();
await mod.initialize();
setModerator(mod);
setLoading(false);
};
init();
}, []);
return { moderator, loading };
}
function CommentForm() {
const { moderator, loading } = useModerator();
const handleSubmit = async (text: string) => {
if (!moderator) return;
const result = await moderator.moderate(text);
if (!result.approved) {
alert('Content violates community guidelines');
return;
}
// Submit approved content
};
// ... rest of component
}🔧 Configuration
Vite Setup
Add WASM and ONNX support to your vite.config.ts:
import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';
export default defineConfig({
plugins: [react()],
// Required so Vite treats ONNX/WASM as static assets (served as bytes, not JS/HTML).
assetsInclude: ['**/*.onnx', '**/*.wasm'],
optimizeDeps: {
// Needed because dependency optimization uses esbuild, which doesn't know `.onnx` by default.
esbuildOptions: {
loader: {
'.onnx': 'file',
},
},
// NOTE: Keep `onnxruntime-web` out of dep optimization for best compatibility.
// If you still see dev-only issues, also exclude this package:
// exclude: ['onnxruntime-web', '@masteryhub-its/speakout-local-client-model'],
exclude: ['onnxruntime-web'],
},
server: {
fs: {
allow: ['..'], // Allow serving from node_modules
},
},
});Custom Model / Tokenizer URLs (optional)
If your app needs to control where assets are loaded from (CDN, custom public path, etc.), you can pass explicit paths:
import { ClientContentModeration } from '@masteryhub-its/speakout-local-client-model';
const moderator = new ClientContentModeration({
modelFilePath: '/assets/model.int8.onnx',
tokenizerFilePath: '/assets/tokenizer.json',
});Webpack Configuration
module.exports = {
module: {
rules: [
{
test: /\.onnx$/,
type: 'asset/resource',
},
],
},
resolve: {
fallback: {
fs: false,
path: false,
},
},
};📚 API Reference
ClientContentModeration
Main class for content moderation.
Constructor
new ClientContentModeration(options?: ModerationOptions)Currently uses default configuration with embedded models.
Methods
initialize(): Promise<void>
Initializes the ONNX model and tokenizer. Called automatically on first use, but can be called explicitly for better error handling.
const moderator = new ClientContentModeration();
await moderator.initialize(); // Explicit initializationmoderate(text: string, threshold?: number): Promise<ModerationResult>
Moderates a single text string.
Parameters:
text(string): Text to moderatethreshold(number, optional): Approval threshold (0-1), default: 0.5
Returns: ModerationResult
interface ModerationResult {
approved: boolean; // Whether content passes moderation
confidence: number; // Confidence score (0-1)
probabilities: {
reject: number; // Rejection probability (0-1)
approve: number; // Approval probability (0-1)
};
}Example:
const result = await moderator.moderate('نص للمراجعة', 0.7);
console.log(result);
// {
// approved: true,
// confidence: 0.85,
// probabilities: { reject: 0.15, approve: 0.85 }
// }moderateBatch(texts: string[], threshold?: number): Promise<ModerationResult[]>
Moderates multiple texts in parallel for better performance.
const texts = ['نص أول', 'نص ثاني', 'نص ثالث'];
const results = await moderator.moderateBatch(texts);
results.forEach((result, i) => {
console.log(`Text ${i + 1}: ${result.approved ? '✅' : '❌'}`);
});dispose(): void
Releases resources and cleans up the ONNX session. Call when done using the moderator.
moderator.dispose();💡 Advanced Usage
Custom Threshold
Adjust sensitivity based on your use case:
// Strict moderation (fewer false positives)
const strict = await moderator.moderate(text, 0.8);
// Lenient moderation (fewer false negatives)
const lenient = await moderator.moderate(text, 0.3);
// Balanced (default)
const balanced = await moderator.moderate(text, 0.5);Error Handling
try {
const moderator = new ClientContentModeration();
await moderator.initialize();
const result = await moderator.moderate(userInput);
if (!result.approved) {
// Handle rejected content
console.warn('Content flagged:', result.probabilities);
}
} catch (error) {
console.error('Moderation failed:', error);
// Fallback: allow content or use server-side moderation
}Performance Optimization
// Initialize once, reuse for all requests
const moderator = new ClientContentModeration();
await moderator.initialize(); // ~100-200ms initial load
// Subsequent calls are fast (~10-50ms per text)
const result1 = await moderator.moderate(text1);
const result2 = await moderator.moderate(text2);
// Batch processing for multiple texts
const results = await moderator.moderateBatch([text1, text2, text3]);
// Clean up when done
moderator.dispose();🏗️ Architecture
Model Details
- Base Model:
asafaya/bert-mini-arabic - Task: Binary sequence classification (approve/reject)
- Quantization: INT8 for 4x smaller size and faster inference
- Max Sequence Length: 128 tokens
- Tokenizer: WordPiece with Unicode normalization
Processing Pipeline
- Tokenization - Text → BERT tokens with proper punctuation handling
- Chunking - Long texts split into 128-token chunks
- Inference - ONNX Runtime processes each chunk
- Aggregation - Max pooling on rejection probability (safety-first)
- Decision - Threshold-based approval/rejection
Safety-First Design
The library uses max pooling on rejection probabilities rather than averaging. This means:
- ✅ A single toxic chunk in long text → rejection
- ✅ Prevents dilution of toxic signals
- ✅ Better safety for user-generated content
📊 Performance
| Metric | Value | |--------|-------| | Model Size | ~12 MB (INT8 quantized) | | Initial Load | ~100-200ms | | Inference (per text) | ~10-50ms | | Memory Usage | ~50-100 MB | | Browser Support | Chrome 91+, Firefox 89+, Safari 15+ |
🛠️ Development
Building from Source
# Clone repository
git clone <repository-url>
cd speakout-platform-local-model
# Install dependencies
npm install
# Build TypeScript
npm run build
# Format code
npm run format
# Format Python (if contributing to training scripts)
npm run format:pyProject Structure
├── src/ # TypeScript source
│ ├── index.ts # Main entry point
│ ├── model.ts # ONNX model wrapper
│ ├── tokenizer.ts # BERT tokenizer
│ ├── types.ts # Type definitions
│ └── utils/
│ └── constants.ts # Configuration constants
├── lib/ # Compiled JavaScript (generated)
├── models/ # ONNX model and tokenizer
│ └── bert-mini-moderation-output/
│ ├── model.int8.onnx
│ └── tokenizer.json
├── src/training/ # Python training scripts (not published)
├── src/data_processing/ # Data pipeline (not published)
└── tests/ # Test filesTypeScript Types
All types are exported for your convenience:
import type {
ModerationResult,
ModerationOptions,
TokenizerEncoding,
TokenizerVocab,
InferenceSession,
} from '@masteryhub-its/speakout-local-client-model';🔒 Privacy & Security
- 100% Client-Side - No data sent to external servers
- No Telemetry - Zero tracking or analytics
- Offline Capable - Works without internet after initial load
- GDPR Compliant - No personal data collection
🤝 Contributing
We welcome contributions from the community! Whether you're fixing bugs, adding features, or improving documentation, your help is appreciated.
Ways to Contribute
- 🐛 Report Bugs - Open an issue with detailed reproduction steps
- 💡 Suggest Features - Share your ideas for improvements
- 📝 Improve Documentation - Help make our docs better
- 🔧 Submit Code - Fix bugs or implement new features
- 🧪 Write Tests - Improve test coverage
- 🌍 Translate - Help with internationalization
Development Setup
Fork & Clone
git clone https://github.com/your-username/speakout-platform-local-model.git cd speakout-platform-local-modelInstall Dependencies
npm installMake Changes
- Create a feature branch:
git checkout -b feature/your-feature-name - Write your code following our style guide
- Add tests if applicable
- Create a feature branch:
Test Your Changes
npm run build # Ensure it builds npm run format # Format TypeScript/JavaScript npm run format:py # Format Python (if applicable)Commit & Push
git add . git commit -m "feat: add your feature description" git push origin feature/your-feature-nameOpen Pull Request
- Go to the repository on GitHub
- Click "New Pull Request"
- Describe your changes clearly
- Link any related issues
Code Style Guidelines
- TypeScript: Follow existing patterns, use proper types
- Python: Follow PEP 8, use Black formatter
- Commits: Use Conventional Commits
feat:- New featuresfix:- Bug fixesdocs:- Documentation changesrefactor:- Code refactoringtest:- Adding testschore:- Maintenance tasks
Pull Request Guidelines
- ✅ Keep PRs focused on a single feature/fix
- ✅ Update documentation if needed
- ✅ Add tests for new functionality
- ✅ Ensure all checks pass
- ✅ Respond to review feedback promptly
Code of Conduct
We are committed to providing a welcoming and inclusive environment. Please:
- Be respectful and considerate
- Accept constructive criticism gracefully
- Focus on what's best for the community
- Show empathy towards others
📄 License
MIT License
Copyright (c) 2024-2026 MasteryHub ITS
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Third-Party Licenses
This project uses the following open-source libraries:
- ONNX Runtime Web - MIT License
- BERT Model (asafaya/bert-mini-arabic) - Apache 2.0 License
Copyright Notice
All original code and documentation:
- Copyright © 2024-2026 MasteryHub ITS
- Licensed under MIT License
Model files and training data:
- Based on
asafaya/bert-mini-arabic(Apache 2.0) - Fine-tuned by MasteryHub ITS
- Distributed under Apache 2.0 License
🙏 Acknowledgments
- BERT Model: asafaya/bert-mini-arabic
- ONNX Runtime: Microsoft ONNX Runtime Web
- Transformers: Hugging Face Transformers
📞 Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: [email protected]
Made with ❤️ by MasteryHub ITS
Website • Documentation • npm
