iword-rs
v0.1.11
Published
High-speed keyword search — Rust implementation of iWord
Maintainers
Readme
iword-rs
High-speed keyword search — Rust implementation of iWord.
Scans text in O(N) time (N = text length) using a rolling hash table, finding all matching keywords regardless of dictionary size.
Features
- O(N) scan — single pass over text, independent of dictionary size
- Flexible dictionary — build programmatically, load from files, merge multiple sources
- Category keys — each word carries a u8 key (0–254); action-oriented or user-defined
- Classification — score and classify text by dominant category, with runtime weight tuning
- HTML-aware — optionally skips tag content during scan
- Pure Rust — no C dependency, no shared memory, no external process
- WASM-ready — compiles to WebAssembly for browser and edge runtimes (Cloudflare Workers, Fastly, Deno)
Installation
Add to Cargo.toml:
[dependencies]
iword-rs = "0.1"Or run:
cargo add iword-rsOptional features:
# JSON serialization for Match and ClassifyResult
iword-rs = { version = "0.1", features = ["serde"] }
# WebAssembly / wasm-bindgen
iword-rs = { version = "0.1", features = ["wasm"] }
# Regex pattern support in dictionary files
iword-rs = { version = "0.1", features = ["regex"] }
# Dictionary save/load (binary cache via postcard)
iword-rs = { version = "0.1", features = ["save"] }Quick start
use iword::{Dictionary, Mode, key};
let dict = Dictionary::builder()
.add_many(&["shutdown", "crash"], key::BLOCK)
.add_many(&["disk_full", "oom"], key::ALERT)
.add_many(&["slow_query"], key::THROTTLE)
.add_many(&["health_check"], key::PASS)
.build();
// Seek
assert_eq!(dict.seek("shutdown"), Some(key::BLOCK));
assert_eq!(dict.seek("unknown"), None);
// Scan — returns all matches with position/length/key
let matches = dict.scan("system shutdown after disk_full", Mode::FORBID);
for m in &matches {
println!("key={} word={:?}", m.key, m.extract("system shutdown after disk_full"));
}
// Filter — replace matched words with ***
let clean = dict.filter("system shutdown detected", Mode::FORBID);
// → "system ******** detected"
// Case-insensitive matching
let matches = dict.scan("SHUTDOWN detected", Mode::FORBID | Mode::IGNORE_CASE);
// → matches "SHUTDOWN" (dictionary entry "shutdown")
// Or load from a file
let dict = Dictionary::from_file("words.txt")?;Dictionary format
Tab-separated word list. Optional third column sets per-word weight (default 1.0).
health_check # key 9 (default)
shutdown\t0 # key 0 — BLOCK
disk_full\t1 # key 1 — ALERT
deprecated_api\t2 # key 2 — FLAG
slow_query\t3 # key 3 — THROTTLE
user_login\t4 # key 4 — LOG
ping\t5 # key 5 — PASS
critical_crash\t0\t5.0 # key 0 — BLOCK, weight 5.0
# comment lines are ignored
/\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}/\t1\t10.0 # regex — credit card (requires regex feature)Category keys
Keys 0–4 are "actionable" — only returned when Mode::FORBID is set.
| key | Constant | Action |
|-----|----------|--------|
| 0 | key::BLOCK | Immediate rejection — do not process further |
| 1 | key::ALERT | Notify + log — requires immediate attention |
| 2 | key::FLAG | Mark for review — suspicious but not critical |
| 3 | key::THROTTLE | Apply rate limiting |
| 4 | key::LOG | Log only — informational match |
| 5 | key::PASS | Explicit allow — whitelist match |
| 6–254 | key::USER_START+ | User-defined |
Mode flags
| Flag | Effect |
|------|--------|
| Mode::HTML | Skip HTML tag content during scan |
| Mode::FORBID | Include forbidden-category words (key < 5) |
| Mode::ENGLISH | Match only at English word boundaries |
| Mode::IGNORE_CASE | Case-insensitive matching (dictionary must be lowercase) |
Dictionary save / load (optional feature)
Enable with features = ["save"]. Serializes a compiled Dictionary to a compact binary (postcard format). Works on both server and WASM environments.
iword-rs = { version = "0.1", features = ["save"] }// Build once, save to disk
let dict = Dictionary::builder()
.load_file("dicts/prompt_injection.txt")?
.load_file("dicts/pii.txt")?
.build();
dict.save_to_file("dict.bin")?;
// Load at startup — no rebuild cost
let dict = Dictionary::load_from_file("dict.bin")?;
// Or work with raw bytes (useful in WASM)
let bytes: Vec<u8> = dict.save()?;
let dict2 = Dictionary::load(&bytes)?;Combine with |: Mode::HTML | Mode::FORBID
Convenience methods
use iword::{Dictionary, Mode, key};
let dict = Dictionary::builder()
.add_many(&["crash", "disk_full"], key::BLOCK)
.add("slow_query", key::THROTTLE)
.build();
// scan_first — stop at the first match
if let Some(m) = dict.scan_first("system crash detected", Mode::FORBID) {
println!("first: {:?} key={}", m.extract("system crash detected"), m.key);
}
// contains — bool check, no allocation
if dict.contains("disk_full on /var/log", Mode::FORBID) {
println!("alert!");
}
// severity — highest-severity (lowest key) match
let text = "slow_query caused disk_full";
if let Some(m) = dict.severity(text, Mode::FORBID) {
println!("worst: {:?} key={}", m.extract(text), m.key);
}Classification and scoring
use iword::{Dictionary, Mode, key};
let dict = Dictionary::builder()
.add_many(&["crash", "shutdown", "panic"], key::BLOCK)
.add_many(&["disk_full", "oom"], key::ALERT)
.add_many(&["slow_query", "high_latency"], key::THROTTLE)
.build();
let text = "crash and shutdown caused disk_full";
// classify — dominant category (highest weighted score)
if let Some(r) = dict.classify(text, Mode::FORBID) {
println!("key={} score={}", r.key, r.score);
// → key=0 score=2.0 (BLOCK matched twice)
}
// score — per-key weighted totals
let scores = dict.score(text, Mode::FORBID);
// → { 0(BLOCK): 2.0, 1(ALERT): 1.0 }
// classify_with_weights — tune priorities at call time
// e.g. boost ALERT 3× for night-time monitoring
let r = dict.classify_with_weights(text, Mode::FORBID, &[(key::ALERT, 3.0)]);
// → key=1 (ALERT: 3.0) beats BLOCK (2.0)
// score_with_weights — same idea for raw scores
let scores = dict.score_with_weights(text, Mode::FORBID, &[(key::BLOCK, 10.0)]);
// → { 0(BLOCK): 20.0, 1(ALERT): 1.0 }Dictionary-level weights
// Per-word weight via add_weighted()
let dict = Dictionary::builder()
.add_weighted("critical_crash", key::BLOCK, 5.0)
.add_weighted("minor_glitch", key::BLOCK, 0.5)
.build();
// Per-key weight via set_key_weight()
let dict = Dictionary::builder()
.add_many(&["crash", "panic"], key::BLOCK)
.set_key_weight(key::BLOCK, 10.0)
.build();Final score per match = per-word weight × per-key weight × runtime weight (all default 1.0).
DictionaryBuilder
let dict = Dictionary::builder()
.add("word", 9) // single word
.add_weighted("important", 0, 5.0) // with per-word weight
.add_many(&["spam", "free", "prize"], 2) // multiple words, same key
.set_key_weight(key::BLOCK, 10.0) // per-key weight
.load_str("apple\t9\nbanana\t9\n") // from string
.load_file("extra.txt")? // from file
.merge(other_builder) // merge another builder
.build();WebAssembly / Edge
iword-rs compiles to WebAssembly out of the box — no C, no system calls, no shared memory.
Browser
wasm-pack build --target web --features wasmimport init, { IwordDict } from './pkg/iword.js';
await init();
const dict = new IwordDict("shutdown\t0\ndisk_full\t1\nslow_query\t3\n");
dict.contains("slow_query on users table", true); // → true
dict.filter("system shutdown detected", true); // → "system ******** detected"To test locally (WASM requires HTTP, not file://):
python3 -m http.server 8080Via npm (browser, Node.js, Cloudflare Workers, Fastly, Deno):
npm install iword-rsimport init, { IwordDict } from 'iword-rs';
await init();
const dict = new IwordDict("shutdown\t0\ndisk_full\t1\n");
dict.filter("system shutdown detected", true); // → "system ******** detected"Edge runtimes:
| Platform | Support | |----------|---------| | Cloudflare Workers | Native WASM | | Fastly Compute | Native WASM | | Deno Deploy | Native WASM | | AWS Lambda@Edge | Via Node.js |
Cloudflare Workers example
import init, { IwordDict } from 'iword-rs';
import wasm from 'iword-rs/iword_bg.wasm';
await init(wasm);
const dict = new IwordDict("shutdown\t0\ndisk_full\t1\nslow_query\t3\n");
export default {
async fetch(request) {
const text = await request.text();
if (dict.contains(text, true)) {
const m = dict.severity(text, true);
return new Response(`blocked: key=${m.key}`, { status: 400 });
}
return new Response('ok');
},
};Regex patterns (optional feature)
Enable with features = ["regex"]. Lines starting and ending with / in dictionary files are compiled as regular expressions using Rust's regex crate (DFA-based, always O(N), no backtracking).
iword-rs = { version = "0.1", features = ["regex"] }let dict = Dictionary::builder()
.add("password", key::ALERT)
.load_str("/\\d{4}[- ]?\\d{4}[- ]?\\d{4}[- ]?\\d{4}/\t1\t10.0") // credit card
.load_str("/\\d{3}-\\d{2}-\\d{4}/\t0\t10.0") // SSN
.build();
let matches = dict.scan("card: 4111-1111-1111-1111", Mode::FORBID);
// → Match { position: 6, length: 19, key: 1 } (ALERT)Dict file format with regex:
password 1
/\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}/ 1 10.0
/\d{3}-\d{2}-\d{4}/ 0 10.0
/\S+@\S+\.\S+/ 1Benchmarks
Measured on Apple M1 (MacBook Air), single core, cargo bench.
| Benchmark | Time |
|-----------|------|
| seek hit | ~10 ns |
| seek miss | ~8 ns |
| scan 60 chars, 20-word dict | ~8 µs |
| scan 6000 chars, 20-word dict | ~652 µs |
| scan no match, 27 chars | ~3.7 µs |
| filter 60 chars | ~13 µs |
| Build 20-word dict | ~3.5 µs |
Run yourself:
cargo benchPre-built dictionaries
Ready-to-use word lists in the dicts/ directory:
| File | Keys | Description |
|------|------|-------------|
| dicts/prompt_injection.txt | BLOCK (0) | Prompt injection / jailbreak patterns |
| dicts/pii.txt | ALERT (1) | PII keyword detection (SSN, credit card, API keys…) |
| dicts/pii-regex.txt | BLOCK/ALERT (0,1) | PII regex patterns — credit card, SSN, email, phone (regex feature required) |
| dicts/off_topic.txt | FLAG (2) | Off-topic signals (crypto, gambling…) |
Load and merge multiple dictionaries:
let dict = Dictionary::builder()
.load_file("dicts/prompt_injection.txt")?
.load_file("dicts/pii.txt")?
.load_file("dicts/off_topic.txt")?
.build();CLI
cargo install iword-cli
iword-scan scan "system shutdown detected" --dict words.txt
iword-scan filter "system shutdown detected" --dict words.txt
iword-scan seek shutdown --dict words.txt
iword-scan classify "crash and disk_full" --dict words.txt
iword-scan score "crash and disk_full" --dict words.txt --json
# Compile dictionary to binary cache (faster startup)
iword-scan save --dict words.txt --out words.bin
iword-scan load "system shutdown detected" --cache words.binExample apps
Ready-to-run applications in the apps/ directory:
| App | Description |
|-----|-------------|
| apps/cli/ | iword-scan CLI — scan / filter / seek / classify / score |
| apps/axum-api/ | Axum REST API — POST /scan, /filter, /seek |
| apps/cf-worker/ | Cloudflare Workers edge filter — WASM, no external process |
| apps/wasm-react/ | React + WASM — real-time input filter demo |
| apps/llm-filter/ | rig + iword-rs — LLM input/output safety filter (batch demo) |
| apps/llm-chat/ | Interactive REPL with LLM filter — type prompts, see results live |
| apps/llm-api/ | Axum REST API with LLM filter — POST /chat with filter pipeline |
Credits
Original iWord algorithm by imos.
Multi-language extensions by 0xkaz.
This Rust port by 0xkaz.
