ban-word
v1.0.0
Published
A modular sensitive word detector using the DFA (Deterministic Finite Automaton) algorithm, with built-in support for Chinese simplified/traditional conversion.
Readme
BanWord Detector
A modular sensitive word detector using the DFA (Deterministic Finite Automaton) algorithm, with built-in support for Chinese simplified/traditional conversion.
Features
- DFA Algorithm: Efficient sensitive word detection and masking.
- Chinese Conversion: Integrated with
chinese-simple2traditionalto normalize text (Simplified/Traditional) for consistent detection. - Modular Design: Decoupled Engine, Provider, and Processor components.
Setup
Install the dependencies:
pnpm installUsage
import { BanWordDetector } from 'ban-word'
const detector = new BanWordDetector({
enhance: true,
words: ['自定义敏感词'],
// Use built-in dictionary categories: 'politics', 'porn', 'violence'
dictionaries: ['politics', 'porn'],
ignore: /[\s!@#$%^&*()_+\-=\[\]{};':"\\|,.<>\/?]/,
})
// Process text (handles simplified/traditional, ignored characters, and masking)
const { hit, matches, text } = detector.check('這是一个敏!感@词')
console.log(hit) // true
console.log(matches) // [{ word: '敏感词', start: 4, end: 8 }]
console.log(text) // "这是一个*****"Options
| 参数 | 类型 | 说明 |
| :------------- | :----------- | :------------------------------------------------------------------------------------ |
| enhance | boolean | 是否启用短语增强转换。开启后转换繁简体更精准,但会额外加载约 374kb 的短语库。 |
| words | string[] | 自定义敏感词列表。 |
| dictionaries | Category[] | 内置词库分类。可选值:'politics' (政治), 'porn' (色情), 'violence' (暴力)。 |
| ignore | RegExp | 干扰字符忽略正则。匹配该正则的字符在检测时会被跳过,用于对付“跳词”(如 敏!感@词)。 |
Development
Build the library:
pnpm run buildBuild the library in watch mode:
pnpm run dev