kata-kasar

v1.0.2

Published

a day ago

Filter badwords in Bahasa and English words

0High
0Medium
0Low

joshuamanly

badwords filter censor profanity indonesia english moderation clean-text kata-kasar

kata-kasar

A powerful, lightweight, and multilingual library to filter, censor, and detect profanity (bad words) in text. Primarily designed for Bahasa Indonesia and English, it supports fuzzy matching, leetspeak detection, and dynamic dictionary management.

Features

Multilingual Support: Built-in support for Indonesian and English dictionaries.
Leetspeak Detection: Automatically normalizes and detects leetspeak (e.g., 4njing, sh1t).
Fuzzy Matching: Uses Damerau-Levenshtein distance to detect typos or deliberate misspellings.
Detailed Analysis: Get comprehensive details about detected words, including confidence levels and match categories.
Dictionary Management: Add, remove, or override usage dictionaries dynamically at runtime.
TypeScript Support: Fully typed for better developer experience.

Installation

npm install kata-kasar

Usage

Basic Usage

import { analyze, censor, flag } from "kata-kasar";

// 1. Analyze text (Recommended)
const result = analyze("Dasar lo anjing!");
console.log(result);
/*
{
  original: "Dasar lo anjing!",
  filtered: "Dasar lo ******!",
  isProfane: true,
  decision: "CENSOR",
  data: [ ... ]
}
*/

// 2. Simple Boolean Check
const isBad = flag("This is shit"); // true

// 3. Censor Text
const clean = censor("Dasar lo anjing"); // "Dasar lo ******"

API Reference

Core Functions

`analyze(text: string, options?: AnalyzeOptions): AnalyzeResult`

The most robust function in the library. It performs validation, normalization, tokenization, and fuzzy detection against the blacklist and whitelist.

Options:

type: 'username' | 'text' (default: 'text')
threshold: number (default: 3) - Maximum edit distance for fuzzy matching.

Returns an AnalyzeResult object containing:

original: Input text.
filtered: Text with profanity masked.
isProfane: Boolean indicating if profanity was found.
decision: 'CENSOR' | 'ALLOW'.
data: Array of detailed matches.

`censor(text: string, replacement?: string): string`

Replaces bad words in the text.

replacement: defaults to "***", but can be a single character like * to mask each character.

`flag(text: string): boolean`

Returns true if the text contains any bad words.

`filter(text: string): string`

Removes bad words from the text completely.

`extract(text: string): string[]`

Returns an array of all bad words found in the text.

Note: Methods like leetCensor, leetFlag, leetFilter, and leetExtract are also available. These normalize the input (convert leetspeak) before processing, referring to the normalized text in their output.

Dictionary Management

You can modify the blacklist and whitelist at runtime. Functions support a lang parameter (e.g., 'id', 'en') to target specific dictionaries.

import { addBlacklist, removeBlacklist, addWhitelist } from "kata-kasar";

// Add specific words to the Indonesian blacklist
addBlacklist(["krupuk", "seblak"], "id");

// Remove words from the English blacklist
removeBlacklist("hell", "en");

// Add words to the whitelist (bypass checks)
addWhitelist(["hello", "analysis"], "en");

Fuzzy Utilities

Exposed for cases where you need raw string comparison.

import { distance, similarity } from "kata-kasar";

// Damerau-Levenshtein distance
const dist = distance("kitten", "sitten"); // 1

// Cosine similarity
const sim = similarity("hello", "hello world"); // 0.7...

License

MIT License

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

kata-kasar

Features

Installation

Usage

Basic Usage

API Reference

Core Functions

analyze(text: string, options?: AnalyzeOptions): AnalyzeResult

censor(text: string, replacement?: string): string

flag(text: string): boolean

filter(text: string): string

extract(text: string): string[]

Dictionary Management

Fuzzy Utilities

License

`analyze(text: string, options?: AnalyzeOptions): AnalyzeResult`

`censor(text: string, replacement?: string): string`

`flag(text: string): boolean`

`filter(text: string): string`

`extract(text: string): string[]`