@redsocs/spam-warden

v1.2.3

Published

10 hours ago

Lightweight universal JavaScript library for real-time spam detection. Trained on Thai spam data using Bernoulli Naive Bayes.

0High
0Medium
0Low

rcortex

spam-detection thai-nlp client-side naive-bayes moderation

SpamWarden.js

Lightweight, universal JavaScript library for real-time spam detection and automated form protection. Optimized for Thai text and high-performance cross-platform environments.

What is this?

SpamWarden.js is a zero-dependency, universal engine that detects spam directly at the source. It uses a Present-Only Naive Bayes model (derived from Bernoulli Naive Bayes) trained specifically on Thai spam patterns (gambling, loans, "fast money" scams) and optimized with a dynamic, length-calibrated decision threshold to eliminate false positives on longer, clean text.

By running natively, it allows you to block spam before it ever hits your database, saving server resources and keeping your data clean.

SIEM Endpoint & Spam Block Demo

Live Demo & Scanner

You can test the spam engine interactively, analyze your forms, and generate auto-blocking script configurations directly on our GitLab Pages site:

👉 Live Demo & Generator

Quickstart

[!IMPORTANT] Are you a Thai government agency or public sector website administrator? Get your free token configuration and drop-in script to protect your online portals from annoying gambling/loan ads and spam campaigns at redsocs.com/spam-warden.

1. Zero-Config Local Protection (No Telemetry)

Add this script to your page with the data-auto-protect attribute. It will automatically find your most significant forms (using an intelligent heuristic: top 2 forms with >= 2 inputs) and block submission if spam is detected.

By default, this mode also enables PII masking (DLP). To disable PII masking, add data-sd="0".

<script
  src="https://cdn.redsocs.com/js/spamwarden.min.js"
  data-auto-protect
></script>

2. Enterprise Telemetry (SIEM Integration)

If you need to report blocked spam payloads to a central SIEM/SOC, provide a Base64 configuration string via the endpoint parameter.

<script src="https://cdn.redsocs.com/js/spamwarden.min.js?endpoint=MHxzaWVtLnJlZHNvY3MuY29tL3Yx"></script>

Note: The endpoint parameter is a Base64 encoded string of sdFlag|siemEndpoint (e.g., 0|siem.redsocs.com/v1).

3. API Usage (Node Only)

const result = spamwarden.spamcheck(
  "[Hello, this is a Thai casino & scam ads — and guess what? Your tax pays for my traffic.]",
);
if (result.isSpam) {
  console.log("Blocked:", result.reason || "AI match");
  console.log("Confidence:", result.prob);
}

Scope

SpamWarden is designed for interactive web elements:

Contact Forms: Prevent bot and manual spam submissions.
Comment Sections: Real-time feedback for users before they post.
Chat Inputs: Instant filtering of malicious links and currency-heavy spam.
Privacy-First Apps: Since detection happens locally, user data doesn't leave the browser unless explicitly reported.

What's inside?

Hybrid Detection Engine:
- Hard Rules: Instant blocking for currency symbols ($€£฿) and known spam link patterns (line[dot]me, bit[dot]ly).
- Thai-Optimized Tokenizer: Extracts whitespace tokens, trigrams, and quadgrams to handle the space-less nature of the Thai language.
- Present-Only NB Classifier: A modified Naive Bayes model trained on real-world spam samples. It only evaluates present vocabulary features and utilizes a length-dependent threshold offset ($5.5 + 0.49 \times N$ matched features) to calibrate confidence and prevent false positives on longer clean texts.
Telemetry System: Optional auto-reporting of spam hits to api.redsocs.com for global threat intelligence.
Auto-Interceptor: Event listeners that hook into DOM forms to provide "Drop-in" protection.

Why this exists?

Traditional spam filters (like Akismet or ReCaptcha) often:

Require a round-trip to a server (latency).
Are expensive for high-volume sites.
Over-collect user data (privacy concerns).
Struggle with specific Thai-language spam patterns.

SpamWarden exists to provide a local, fast, and Thai-centric alternative that stops spam at the source: the user's input field.

Security & Active Defense

[!WARNING] Honesty First: All client-side code is inherently bypassable by a sufficiently motivated human. However, we have engineered this library to be an absolute nightmare for automated bots and script kiddies.

We do not rely solely on "Security through Obscurity." SpamWarden employs a Hostile Active Defense architecture:

The Ghost Tarpit (Honeypot): We intentionally deploy a "Poison Pill" decoy. If a bot or attacker attempts to bypass or tamper with the script, they are redirected into this trap, which is designed to actively retaliate by crashing headless browsers (Puppeteer/Playwright) and wasting attacker compute credits.
Build-Time Randomization (The Moving Target): The real machine-learning engine is hidden inside an isolated closure and bound to the DOM using a randomized cryptographic key generated during compilation. The internal execution path changes on every release, defeating static bypass scripts.
Brutal DOM Protection: By utilizing Document-Level Capturing Phase listeners, Prototype Monkey-Patching, and MutationObservers, SpamWarden intercepts submissions before they reach the form element. This defeats trivial bypasses like form cloning or direct document.forms[0].submit() calls.
Aggressive Obfuscation: The final distribution is run through proprietary, high-entropy obfuscation routines to protect the model weights and heavily penalize reverse engineering attempts.

If you require absolute, mathematically unbroken security, client-side protection will never be enough. You must validate payloads on your backend:

For WordPress: Use our SpamWarden WP Plugin to protect your server at the PHP layer (Paid).
For Node.js/Custom Stacks: Grab this NPM package directly, bundle it internally, and run the spamcheck() function on your backend server before hitting your database (Free).

Local Simulation & Testing

You can spin up a local simulation server to test the DOM auto-blocking behavior and inspect the SIEM telemetry payloads in real time:

Start the simulation server:
```
npm run test-server
```
Open the test page in your browser: http://localhost:3000/
Submit a spam message (e.g., including currency signs like ฿ or links like line[dot]me).

Observe the result:

The form submission will be blocked on the page.

The terminal will display the defanged and sanitized telemetry payload sent to the SIEM receiver:

🚨 [SIEM RECEIVER] Blocked Payload Received!
================================================
Endpoint Token: MXxodHRwOi8vbG9jYWxob3N0OjMwMDAvdjEvdGVsZW1ldHJ5
URL:          h_tt_p://localhost:3000/
Rule Matched: currency_symbol
Confidence:   100%
PII Masked?   false
Pasted?       false
Actors:       []
Sanitized:    "Win [CARD_MASKED] now!"
================================================

About

Version: 1.1.11 (Engine v11.06)
Author: RedSocs
License: MIT
Model Origin: Trained via RedSocs/spam-labeler
Inquiries & Enterprise Support: pichit[at]redsocs.com
Sponsor: Buy Me a Coffee

Technical Specs

| Property | Value | | ----------------- | ------------------------- | | Minified Size | ~2.0 MB (including model) | | Gzipped Size | ~341 KB | | Dependencies | 0 (Vanilla JS) | | Vocabulary | 28106 features |