prompt-bouncer

v1.0.8

Published

3 months ago

A lightweight, customizable content moderation library for AI applications. Filters profanity, explicit content, and inappropriate prompts for text-to-image generation.

0High
0Medium
0Low

kaweendras

content-moderation profanity-filter ai-safety text-filtering content-safety moderation nsfw-filter ai-content-filter typescript

Prompt Bouncer

A lightweight, customizable content moderation library for AI applications. Perfect for filtering profanity, explicit content, and inappropriate prompts for text-to-image generation and other AI services.

🚀 Features

🛡️ Multi-Category Filtering: Profanity, explicit content, violence, drugs, hate speech, and self-harm detection
⚙️ Highly Configurable: Enable/disable specific categories, add custom words, whitelist exceptions
🎯 AI-Focused: Specifically designed for AI prompt filtering and content moderation
🪶 Lightweight: Zero external dependencies, pure TypeScript implementation
🔧 Flexible: Class-based API with convenient utility functions
📊 Detailed Results: Get flagged words, categories, confidence scores, and cleaned text
🔍 Smart Detection: Handles leetspeak, character substitutions, and word boundaries

📦 Installation

npm install prompt-bouncer

🔧 Quick Start

Basic Usage

import { AIContentFilter, moderate, isSafe } from "prompt-bouncer";

// Quick safety check
if (isSafe("Create a beautiful sunset image")) {
  console.log("Content is safe!");
}

// Detailed moderation with recommended approach
const result = moderate("This is inappropriate content");
console.log(result);
// {
//   isSafe: false,
//   reason: "Content contains profanity violations",
//   flaggedWords: ['inappropriate'],
//   categories: ['profanity'],
//   confidence: 0.25,
//   severity: 'medium',
//   cleanedText: 'This is ************* content'
// }

// ✅ RECOMMENDED: Use severity-based logic for better UX
const shouldAllow = result.isSafe || result.severity === "low";
if (shouldAllow) {
  console.log("✅ Content allowed");
} else {
  console.log("❌ Content blocked:", result.categories);
}

Advanced Usage

import { AIContentFilter } from "prompt-bouncer";

// Recommended configuration for most applications
const filter = new AIContentFilter({
  enableProfanityFilter: true, // Block offensive language
  enableExplicitFilter: true, // Block adult content
  enableViolenceFilter: true, // Block violence and threats
  enableSelfHarmFilter: true, // Block self-harm content
  enableDrugsFilter: false, // Optional - context dependent
  enableHateSpeechFilter: true, // Block hate speech
  enableMildFilter: false, // Keep disabled for contextual usage
  customBannedWords: ["mycustomword"],
  allowedWords: ["damn"], // Whitelist specific words
  strictWordBoundaries: true,
  caseSensitive: false,
});

// Moderate content with severity-based decision
const result = filter.moderate("Generate an image of...");
const shouldAllow = result.isSafe || result.severity === "low";

// Clean text (replace flagged words with asterisks)
const cleaned = filter.clean("Remove bad words from this text");

// Get only flagged words
const flaggedWords = filter.getFlaggedWords("Text to analyze");

// Dynamic configuration updates
filter.updateConfig({
  enableDrugsFilter: true, // Enable for stricter filtering
  customBannedWords: ["newword"],
});

🎯 Use Cases

Text-to-Image Generation

import { moderate } from "prompt-bouncer";

function generateImage(prompt: string) {
  const moderation = moderate(prompt);

  if (!moderation.isSafe) {
    throw new Error(`Content policy violation: ${moderation.reason}`);
  }

  // Proceed with image generation
  return callImageGenerationAPI(prompt);
}

Chat Applications

import { AIContentFilter } from "prompt-bouncer";

const chatFilter = new AIContentFilter({
  enableProfanityFilter: true,
  enableExplicitFilter: true,
  strictWordBoundaries: true,
});

function moderateMessage(message: string) {
  const result = chatFilter.moderate(message);

  if (!result.isSafe) {
    return {
      allowed: false,
      cleanedMessage: result.cleanedText,
      warning: `Message contains ${result.categories.join(", ")} content`,
    };
  }

  return { allowed: true, message };
}

Content Management

import { moderate } from "prompt-bouncer";

function validateUserContent(content: string) {
  const result = moderate(content, {
    enableProfanityFilter: true,
    enableExplicitFilter: true,
    enableViolenceFilter: true,
    enableSelfHarmFilter: true,
    enableHateSpeechFilter: true,
    enableMildFilter: false, // Allow contextual usage
  });

  // Use severity-based logic for better user experience
  const canPublish = result.isSafe || result.severity === "low";

  return {
    canPublish,
    isSafe: result.isSafe,
    severity: result.severity,
    issues: result.flaggedWords,
    categories: result.categories,
    suggestion: canPublish ? null : "Please revise your content",
    cleanedVersion: result.cleanedText,
  };
}

⚙️ Configuration Options

| Option | Type | Default | Description | | ------------------------ | -------- | ------- | ---------------------------------------------------------- | | enableProfanityFilter | boolean | true | Enable profanity detection | | enableExplicitFilter | boolean | true | Enable explicit content detection | | enableViolenceFilter | boolean | true | Enable violence detection | | enableSelfHarmFilter | boolean | true | Enable self-harm/suicide detection | | enableDrugsFilter | boolean | false | Enable drug-related content detection | | enableHateSpeechFilter | boolean | true | Enable hate speech detection | | enableMildFilter | boolean | false | Enable mild content detection (recommended: keep disabled) | | customBannedWords | string[] | [] | Additional words to ban | | allowedWords | string[] | [] | Words to allow (whitelist) | | caseSensitive | boolean | false | Case sensitive matching | | strictWordBoundaries | boolean | true | Prevent partial word matches |

🔧 Recommended: Keep enableMildFilter: false to allow contextual words (gaming, technical terms, mild expressions) and check severity === "low" in your application logic instead.

📊 Detection Categories

| Category | Severity | Description | Examples | | ------------- | -------- | ---------------------------------------------- | -------------------------------- | | mild | low | Light offensive language, gaming/fantasy terms | damn, kill (gaming), fire, blood | | profanity | medium | General profanity and offensive language | Common swear words | | explicit | high | Sexually explicit and adult content | Adult themes, nudity | | violence | high | Violent and harmful content | Violence, weapons, harm | | drugs | medium | Drug-related content | Illegal substances | | hate | high | Hate speech and discrimination | Racist, sexist content | | self_harm | high | Self-harm and suicide content | Self-injury, suicide |

� Best Practices & Recommendations

✅ Allow the "Mild" Category

We strongly recommend allowing content flagged as "mild" in most applications. Here's why:

🎮 Context-Aware Design

The mild category includes words that have legitimate uses in many contexts:

Gaming: "kill", "attack", "blood", "weapon" (RPG/gaming terminology)
Creative Writing: "death", "dark", "evil" (fantasy/horror themes)
Technical: "kill process", "fire event" (programming/technical terms)
Everyday Language: "damn", "hell" (mild expressions)

🎯 Smart Phrase Detection

Our filter uses advanced phrase detection to distinguish context:

❌ "want to kill you" → Detected as violence (high severity)
❌ "I want to set fire on someone" → Detected as violence (high severity)
✅ "kill the bug" → Safe (with mild filter disabled)
✅ "this song is fire" → Safe (with mild filter disabled)
✅ "fire the employee" → Safe (business context)

⚖️ Balanced Moderation

// Recommended: Allow mild, block medium/high severity
const result = moderate(text);
const shouldAllow = result.isSafe || result.severity === "low";

if (!shouldAllow) {
  // Block only medium/high severity content
  console.log("Content blocked:", result.categories);
} else {
  // Allow safe content and mild violations
  console.log("Content allowed");
}

🛠️ Flexible Configuration

// For creative/gaming applications
const creativeFilter = new AIContentFilter({
  enableProfanityFilter: true, // Medium severity
  enableExplicitFilter: true, // High severity
  enableViolenceFilter: true, // High severity
  enableSelfHarmFilter: true, // High severity
  enableDrugsFilter: false, // Optional - depends on your use case
  enableHateSpeechFilter: true, // High severity
  enableMildFilter: false, // Keep disabled - allow contextual usage
});

// Recommended: Check severity instead of just isSafe
function isContentAcceptable(text: string) {
  const result = creativeFilter.moderate(text);

  // Allow safe content and mild violations (severity: "low")
  return result.isSafe || result.severity === "low";
}

📊 Real-world Impact: Blocking mild content can lead to 40-60% false positives in creative applications, gaming platforms, and technical documentation.

�🔄 API Reference

Class: AIContentFilter

Constructor

new AIContentFilter(config?: FilterConfig)

Methods

`moderate(text: string): ModerationResult`

Performs comprehensive content moderation.

`isSafe(text: string): boolean`

Quick boolean check if content is safe.

`clean(text: string): string`

Returns text with banned words replaced by asterisks.

`getFlaggedWords(text: string): string[]`

Returns array of flagged words found in text.

`addBannedWords(words: string[]): void`

Adds custom words to the banned list.

`removeBannedWords(words: string[]): void`

Removes words from the banned list.

`addAllowedWords(words: string[]): void`

Adds words to the allowed list (whitelist).

`updateConfig(config: Partial<FilterConfig>): void`

Updates the filter configuration.

Utility Functions

// Quick moderation with default settings
moderate(text: string, config?: FilterConfig): ModerationResult

// Quick safety check
isSafe(text: string, config?: FilterConfig): boolean

// Quick text cleaning
clean(text: string, config?: FilterConfig): string

// Get flagged words
getFlaggedWords(text: string, config?: FilterConfig): string[]

🛠️ Integration Examples

💡 Complete integration examples are available in the samples/ folder

Express.js Middleware

import { moderate } from "prompt-bouncer";

const RECOMMENDED_CONFIG = {
  enableProfanityFilter: true,
  enableExplicitFilter: true,
  enableViolenceFilter: true,
  enableSelfHarmFilter: true,
  enableHateSpeechFilter: true,
  enableMildFilter: false, // Allow contextual usage
};

function contentModerationMiddleware(req, res, next) {
  const { prompt } = req.body;

  if (prompt) {
    const result = moderate(prompt, RECOMMENDED_CONFIG);
    const shouldAllow = result.isSafe || result.severity === "low";

    if (!shouldAllow) {
      return res.status(400).json({
        error: "Content policy violation",
        reason: result.reason,
        categories: result.categories,
        severity: result.severity,
      });
    }
  }

  next();
}

app.post("/generate-content", contentModerationMiddleware, (req, res) => {
  // Safe to proceed with content generation
});

React Hook with Severity Logic

import { useState, useCallback } from "react";
import { moderate } from "prompt-bouncer";

function useContentModeration() {
  const [lastResult, setLastResult] = useState(null);

  const checkContent = useCallback((text: string) => {
    const result = moderate(text, {
      enableMildFilter: false, // Allow contextual usage
    });

    setLastResult(result);
    return {
      ...result,
      shouldAllow: result.isSafe || result.severity === "low",
    };
  }, []);

  return { checkContent, lastResult };
}

Next.js API Route

// app/api/moderate/route.ts
import { moderate } from "prompt-bouncer";

export async function POST(request: Request) {
  const { text } = await request.json();

  const result = moderate(text);
  const shouldAllow = result.isSafe || result.severity === "low";

  return Response.json({
    isSafe: result.isSafe,
    shouldAllow,
    severity: result.severity,
    categories: result.categories,
  });
}

Configuration Presets

import { AIContentFilter } from "prompt-bouncer";

// For creative/gaming applications
const creativeFilter = new AIContentFilter({
  enableMildFilter: false, // Allow gaming terms
  enableDrugsFilter: false, // Context-dependent
});

// For professional environments
const strictFilter = new AIContentFilter({
  enableMildFilter: true, // Stricter control
  enableDrugsFilter: true, // Block all drug references
});

📁 View complete integration examples →

📁 Project Structure & Examples

prompt-bouncer/
├── src/                      # Source code
│   ├── filter.ts            # Main AIContentFilter class
│   ├── types.ts             # TypeScript interfaces
│   ├── wordLists.ts         # Detection categories and keywords
│   └── index.ts             # Main exports
├── samples/                  # Integration examples
│   └── integration-example.ts # Complete integration samples
└── dist/                     # Compiled JavaScript

📚 Available Examples

integration-example.ts - Complete integration patterns for:
- Express.js middleware
- React hooks
- Next.js API routes
- Configuration presets
- Different moderation strategies

🧪 Testing

npm test                 # Run tests
npm run test:watch      # Run tests in watch mode
npm run test:coverage   # Run tests with coverage

📝 License

MIT License - see LICENSE file for details.

🤝 Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📧 Support

For support, please open an issue on GitHub

🎯 Roadmap

[ ] Multi-language support
[ ] Machine learning-based detection
[ ] Browser/Web Worker support
[ ] Custom severity levels
[ ] Regex pattern support
[ ] Performance optimizations