@andrewlabs/openclaw-messageguard-ml
v1.2.1
Published
OpenClaw plugin: ML-based secret detection and redaction for outgoing messages using transformers.js/ONNX.
Maintainers
Readme
@andrewlabs/openclaw-messageguard-ml
ML-powered companion plugin to @andrewlabs/openclaw-messageguard.
This plugin uses a DistilBERT token classification model via @huggingface/transformers (ONNX Runtime under the hood) to detect sensitive content in outgoing messages and replace detected spans with [REDACTED].
Features
- OpenClaw
before_tool_callhook — interceptsmessagetool sends before execution - OpenClaw
message_sendinghook — intercepts agent replies (when wired up by the gateway) - Automatic model download from Hugging Face on first use (then cached locally)
- Fails open: if model download/inference fails, message passes through unchanged
- Configurable model id, confidence threshold, and redaction token
Install
openclaw plugins install @andrewlabs/openclaw-messageguard-ml
openclaw gateway restartOr via npm:
npm install @andrewlabs/openclaw-messageguard-mlEnsure your OpenClaw plugin loader can discover package extensions via openclaw.extensions.
Manifest
The package includes openclaw.plugin.json with plugin id messageguard-ml and configuration schema.
Configuration
openclaw.plugin.json supports:
enabled(boolean, defaulttrue)modelId(string, defaultAndrewAndrewsen/distilbert-secret-masker)threshold(number, 0-1, default0.5)mask(string, default[REDACTED])
How it Works
- On plugin startup, two hooks are registered:
before_tool_call— primary enforcement; interceptsmessagetool sends (actionsend/broadcast) and redacts sensitive content in the message parameter before the tool executes.message_sending— secondary; intercepts agent replies in the delivery pipeline. Note: in OpenClaw 2026.2.x, this hook does not fire for all outbound paths (see openclaw#XXXX).
- For each outgoing message, the model runs token classification.
- Spans predicted as sensitive at/above threshold are grouped, reconstructed to character offsets, extended to word boundaries, and merged.
- Sanitized content is returned to OpenClaw.
If model loading or inference fails (for example, model repo not yet available), the plugin logs a warning and returns without modifying the message.
Changelog
1.2.0
- Fix: Hook now fires. Switched primary hook from
message_sending(not fired for tool sends in 2026.2.x) tobefore_tool_call, which reliably interceptsmessagetool sends.message_sendingis kept as a secondary hook for future compatibility. - Fix: Span reconstruction. The DistilBERT tokenizer produces subword tokens (
##ia,##9, etc.) andtransformers.jsreturnsundefinedforstart/endoffsets. The previousindexOf-per-subword approach matched fragments at wrong positions, causing partial/broken redaction. New approach: group consecutive sensitive subword tokens into word groups, reconstruct the full text fragment, find it case-insensitively in the original text, and extend to word boundaries to catch trailing characters the tokenizer dropped.
1.1.0
- Initial release with
message_sendinghook and basic span reconstruction.
Exporting/Quantizing ONNX
Use the helper script:
python3 scripts/export_onnx.pyOptional push to Hugging Face (requires token):
HF_TOKEN=... python3 scripts/export_onnx.py --pushIf AndrewAndrewsen/distilbert-secret-masker does not exist on Hugging Face, the script exits with a clear warning.
Comparison With Regex Plugin
@andrewlabs/openclaw-messageguard: deterministic regex rules and policy actions@andrewlabs/openclaw-messageguard-ml: learned token classification for broader, context-sensitive detection
Running both can provide layered defense.
Development
npm install
npm run build