word-sensor

v2.0.0

Published

5 months ago

A powerful and flexible word filtering library for JavaScript/TypeScript with advanced features like regex patterns, statistics, and batch processing

0High
0Medium
0Low

asruldev

word-filter bad-words profanity content-moderation text-filtering spam-detection phishing-detection regex-patterns batch-processing statistics typescript javascript

WordSensor v2.0.0 🚀

WordSensor is a powerful and flexible word filtering library for JavaScript/TypeScript. It helps you detect, replace, or remove forbidden words from text with advanced features like regex patterns, statistics, batch processing, and more.

✨ Features

🔍 Advanced Detection: Detect prohibited words with precise positioning
🚫 Multiple Filtering Modes: Replace, remove, or highlight forbidden words
🎭 Smart Masking: Full, partial, or smart masking options
📊 Statistics & Analytics: Track detections and get detailed insights
🔧 Regex Support: Use custom regex patterns for complex filtering
📦 Batch Processing: Process multiple texts efficiently
🎯 Preset Filters: Ready-to-use profanity, spam, and phishing filters
🔄 Custom Replacers: Create custom replacement functions
📈 Real-time Monitoring: Log and track all detections
🌐 API Integration: Load forbidden words from external APIs
📁 File Support: Import word lists from files
⚡ High Performance: Optimized for speed and memory efficiency
🎨 Emoji Replacers: Replace words with emojis
🔒 Word Boundaries: Configurable word boundary detection
📝 TypeScript Support: Full TypeScript definitions included

📦 Installation

npm install word-sensor

yarn add word-sensor

🚀 Quick Start

Basic Usage

import { WordSensor } from 'word-sensor';

// Create a sensor with forbidden words
const sensor = new WordSensor({
  words: ['badword', 'offensive', 'rude'],
  maskChar: '*',
  caseInsensitive: true,
  logDetections: true
});

// Filter text
const result = sensor.filter('This is a badword test.');
console.log(result); // "This is a ******* test."

Using Preset Filters

import { createProfanityFilter, createSpamFilter, createPhishingFilter } from 'word-sensor';

// Create specialized filters
const profanityFilter = createProfanityFilter();
const spamFilter = createSpamFilter();
const phishingFilter = createPhishingFilter();

// Use them
console.log(profanityFilter.filter('This is badword content.')); // "This is ******* content."
console.log(spamFilter.filter('Buy now! Free money!')); // "#### now! #### money!"

📚 API Reference

WordSensor Class

Constructor

new WordSensor(config?: WordSensorConfig)

Configuration Options:

words?: string[] - Initial list of forbidden words
maskChar?: string - Character used for masking (default: "*")
caseInsensitive?: boolean - Case-insensitive matching (default: true)
logDetections?: boolean - Enable detection logging (default: false)
enableRegex?: boolean - Enable regex pattern support (default: false)
wordBoundary?: boolean - Use word boundaries (default: true)
customReplacer?: (word: string, context: string) => string - Custom replacement function

Core Methods

`filter(text: string, mode?: "replace" | "remove" | "highlight", maskType?: "full" | "partial" | "smart"): string`

Filter text with specified mode and masking type.

// Replace with full masking
sensor.filter('This is badword.'); // "This is *******."

// Remove forbidden words
sensor.filter('This is badword.', 'remove'); // "This is ."

// Highlight forbidden words
sensor.filter('This is badword.', 'highlight'); // "This is [FILTERED: badword]."

// Smart masking
sensor.filter('This is badword.', 'replace', 'smart'); // "This is b****d."

`detect(text: string): string[]`

Detect all forbidden words in text.

const detected = sensor.detect('This contains badword and offensive content.');
console.log(detected); // ["badword", "offensive"]

`detectWithPositions(text: string): Array<{word: string, start: number, end: number}>`

Detect forbidden words with their positions.

const positions = sensor.detectWithPositions('This badword is offensive.');
console.log(positions);
// [
//   { word: "badword", start: 5, end: 12 },
//   { word: "offensive", start: 16, end: 25 }
// ]

Word Management

// Add words
sensor.addWord('newbadword', '###'); // With custom mask
sensor.addWords(['word1', 'word2']);

// Remove words
sensor.removeWord('badword');
sensor.removeWords(['word1', 'word2']);

// Check words
sensor.hasWord('badword'); // true/false
sensor.getWords(); // Get all forbidden words
sensor.clearWords(); // Clear all words

Regex Patterns

// Enable regex support
const regexSensor = new WordSensor({ enableRegex: true });

// Add regex patterns
regexSensor.addRegexPattern('\\b\\w+@\\w+\\.\\w+\\b', '[EMAIL]');
regexSensor.addRegexPattern('\\b\\d{4}-\\d{4}-\\d{4}-\\d{4}\\b', '[CARD]');

// Filter with regex
const result = regexSensor.filter('Contact me at [email protected]');
console.log(result); // "Contact me at [EMAIL]"

Statistics & Monitoring

// Get detection statistics
const stats = sensor.getStats();
console.log(stats);
// {
//   totalDetections: 5,
//   uniqueWords: ["badword", "offensive"],
//   detectionCounts: { "badword": 3, "offensive": 2 },
//   lastDetectionTime: Date
// }

// Get detection logs
const logs = sensor.getDetectionLogs();
console.log(logs); // ["badword", "offensive", "badword", ...]

// Reset statistics
sensor.resetStats();

Configuration Methods

// Update configuration
sensor.setMaskChar('#');
sensor.setCaseInsensitive(false);
sensor.setLogDetections(true);
sensor.setCustomReplacer((word) => `[${word.toUpperCase()}]`);

Utility Methods

// Check if text is clean
sensor.isClean('This is clean text.'); // true
sensor.isClean('This has badword.'); // false

// Get clean percentage
sensor.getCleanPercentage('This badword is offensive.'); // 50

// Sanitize text (quick filter)
sensor.sanitizeText('This is badword.'); // "This is *******."

Utility Functions

Preset Filters

import { 
  createProfanityFilter, 
  createSpamFilter, 
  createPhishingFilter,
  PRESET_WORDS 
} from 'word-sensor';

// Create specialized filters
const profanityFilter = createProfanityFilter('*');
const spamFilter = createSpamFilter('#');
const phishingFilter = createPhishingFilter('!');

// Access preset word lists
console.log(PRESET_WORDS.profanity);
console.log(PRESET_WORDS.spam);
console.log(PRESET_WORDS.phishing);

Batch Processing

import { batchFilter, batchDetect, getBatchStats } from 'word-sensor';

const texts = [
  'This is bad.',
  'This is offensive.',
  'This is clean.'
];

// Batch filter
const filtered = batchFilter(texts, sensor);
console.log(filtered);
// ["This is ***.", "This is *********.", "This is clean."]

// Batch detect
const detected = batchDetect(texts, sensor);
console.log(detected);
// [
//   { text: "This is bad.", detected: ["bad"] },
//   { text: "This is offensive.", detected: ["offensive"] },
//   { text: "This is clean.", detected: [] }
// ]

// Get batch statistics
const stats = getBatchStats(texts, sensor);
console.log(stats);
// {
//   totalTexts: 3,
//   cleanTexts: 1,
//   dirtyTexts: 2,
//   totalDetections: 2,
//   averageCleanPercentage: 66.67
// }

Custom Replacers

import { createCustomReplacer, createEmojiReplacer } from 'word-sensor';

// Create custom replacer
const customReplacer = createCustomReplacer({
  'bad': 'good',
  'offensive': 'appropriate',
  'rude': 'polite'
});

// Create emoji replacer
const emojiReplacer = createEmojiReplacer();

// Use with sensor
sensor.setCustomReplacer(customReplacer);
sensor.setCustomReplacer(emojiReplacer);

Regex Utilities

import { validateRegexPattern, escapeRegexSpecialChars } from 'word-sensor';

// Validate regex pattern
validateRegexPattern('\\b\\w+\\b'); // true
validateRegexPattern('invalid['); // false

// Escape special characters
escapeRegexSpecialChars('test.com'); // "test\\.com"
escapeRegexSpecialChars('test*test'); // "test\\*test"

API Integration

import { loadForbiddenWordsFromAPI, loadWordsFromFile } from 'word-sensor';

// Load from API
await loadForbiddenWordsFromAPI(
  'https://api.example.com/forbidden-words',
  'data.words',
  sensor
);

// Load from file (browser)
const fileInput = document.getElementById('file') as HTMLInputElement;
const file = fileInput.files[0];
if (file) {
  const words = await loadWordsFromFile(file);
  sensor.addWords(words);
}

🎯 Advanced Examples

Content Moderation System

import { WordSensor, createProfanityFilter, createSpamFilter } from 'word-sensor';

class ContentModerator {
  private profanityFilter: WordSensor;
  private spamFilter: WordSensor;
  private customFilter: WordSensor;

  constructor() {
    this.profanityFilter = createProfanityFilter();
    this.spamFilter = createSpamFilter();
    this.customFilter = new WordSensor({
      enableRegex: true,
      wordBoundary: false
    });

    // Add custom patterns
    this.customFilter.addRegexPattern('\\b\\w+@\\w+\\.\\w+\\b', '[EMAIL]');
    this.customFilter.addRegexPattern('\\b\\d{10,}\\b', '[PHONE]');
  }

  moderateContent(content: string): {
    isClean: boolean;
    filteredContent: string;
    violations: string[];
    stats: any;
  } {
    // Apply all filters
    let filteredContent = content;
    const violations: string[] = [];

    // Check profanity
    const profanityDetected = this.profanityFilter.detect(content);
    if (profanityDetected.length > 0) {
      violations.push('profanity');
      filteredContent = this.profanityFilter.filter(filteredContent);
    }

    // Check spam
    const spamDetected = this.spamFilter.detect(content);
    if (spamDetected.length > 0) {
      violations.push('spam');
      filteredContent = this.spamFilter.filter(filteredContent);
    }

    // Apply custom filters
    filteredContent = this.customFilter.filter(filteredContent);

    return {
      isClean: violations.length === 0,
      filteredContent,
      violations,
      stats: {
        profanity: this.profanityFilter.getStats(),
        spam: this.spamFilter.getStats(),
        custom: this.customFilter.getStats()
      }
    };
  }
}

// Usage
const moderator = new ContentModerator();
const result = moderator.moderateContent('This is badword spam content with [email protected]');
console.log(result);

Real-time Chat Filter

import { WordSensor, createEmojiReplacer } from 'word-sensor';

class ChatFilter {
  private sensor: WordSensor;
  private messageHistory: string[] = [];

  constructor() {
    this.sensor = new WordSensor({
      words: ['badword', 'offensive'],
      logDetections: true,
      customReplacer: createEmojiReplacer()
    });
  }

  processMessage(message: string, userId: string): {
    filteredMessage: string;
    isClean: boolean;
    warning: string | null;
  } {
    const filteredMessage = this.sensor.filter(message);
    const isClean = this.sensor.isClean(message);
    
    // Check user history
    const userViolations = this.messageHistory.filter(msg => 
      msg.includes(userId) && !this.sensor.isClean(msg)
    ).length;

    let warning = null;
    if (!isClean) {
      if (userViolations >= 3) {
        warning = 'You have been warned multiple times. Further violations may result in a ban.';
      } else {
        warning = 'Please keep the chat appropriate.';
      }
    }

    // Log message
    this.messageHistory.push(`${userId}: ${message}`);

    return { filteredMessage, isClean, warning };
  }

  getModerationStats() {
    return this.sensor.getStats();
  }
}

Batch Content Analysis

import { WordSensor, batchDetect, getBatchStats } from 'word-sensor';

class ContentAnalyzer {
  private sensor: WordSensor;

  constructor() {
    this.sensor = new WordSensor({
      words: ['inappropriate', 'spam', 'offensive'],
      logDetections: true
    });
  }

  analyzeBatch(contentList: string[]): {
    summary: any;
    details: Array<{
      content: string;
      isClean: boolean;
      detectedWords: string[];
      cleanPercentage: number;
    }>;
  } {
    const batchResults = batchDetect(contentList, this.sensor);
    const batchStats = getBatchStats(contentList, this.sensor);

    const details = contentList.map((content, index) => ({
      content,
      isClean: batchResults[index].detected.length === 0,
      detectedWords: batchResults[index].detected,
      cleanPercentage: this.sensor.getCleanPercentage(content)
    }));

    return {
      summary: {
        ...batchStats,
        sensorStats: this.sensor.getStats()
      },
      details
    };
  }
}

🧪 Testing

# Run tests
npm test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm run test:coverage

📦 Build

# Build for production
npm run build

# Build in watch mode
npm run dev

# Clean build artifacts
npm run clean

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Author

Developed by Asrul Harahap.

GitHub: @asruldev
Twitter: @asruldev

🙏 Acknowledgments

Thanks to all contributors who helped improve this library
Inspired by the need for better content moderation tools
Built with TypeScript for better developer experience

📈 Changelog

v2.0.0

✨ Major Release: Complete rewrite with advanced features
🔧 New Constructor: Config-based initialization
📊 Statistics: Comprehensive detection tracking
🔍 Regex Support: Custom regex pattern filtering
📦 Batch Processing: Efficient multi-text processing
🎯 Preset Filters: Ready-to-use specialized filters
🎨 Custom Replacers: Flexible replacement functions
📈 Position Detection: Get exact word positions
🔄 Smart Masking: Intelligent masking algorithms
🌐 API Integration: External word list loading
📁 File Support: Import word lists from files
🎨 Emoji Replacers: Fun emoji-based replacements
📝 Enhanced Types: Better TypeScript support
🧪 Comprehensive Tests: 36 test cases covering all features

v1.0.5

🐛 Bug fixes and improvements
📝 Better documentation

⭐ Star this repository if you find it useful!

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

WordSensor v2.0.0 🚀

✨ Features

📦 Installation

🚀 Quick Start

Basic Usage

Using Preset Filters

📚 API Reference

WordSensor Class

Constructor

Core Methods

filter(text: string, mode?: "replace" | "remove" | "highlight", maskType?: "full" | "partial" | "smart"): string

detect(text: string): string[]

detectWithPositions(text: string): Array<{word: string, start: number, end: number}>

Word Management

Regex Patterns

Statistics & Monitoring

Configuration Methods

Utility Methods

Utility Functions

Preset Filters

Batch Processing

Custom Replacers

Regex Utilities

API Integration

🎯 Advanced Examples

Content Moderation System

Real-time Chat Filter

Batch Content Analysis

🧪 Testing

📦 Build

🤝 Contributing

📄 License

👨‍💻 Author

🙏 Acknowledgments

📈 Changelog

v2.0.0

v1.0.5

`filter(text: string, mode?: "replace" | "remove" | "highlight", maskType?: "full" | "partial" | "smart"): string`

`detect(text: string): string[]`

`detectWithPositions(text: string): Array<{word: string, start: number, end: number}>`