noisemake
v0.1.1
Published
Controlled text perturbations that make polished text less obviously AI-generated.
Downloads
257
Readme
noisemake
noisemake injects controlled, reproducible imperfections into text.
It is a TypeScript CLI and npm library for deterministic text perturbation, not an LLM rewriting tool. Same input, same seed, same options, same output.
The current package supports:
- Chinese IME-style wrong-word substitutions
- English keyboard typos, including insertion-like slips
- Spacing glitches in English words, after punctuation, and across Chinese-English boundaries
- Punctuation normalization from full-width Chinese marks to ASCII marks
- Adjacent word swaps
- Light repetition
The browser playground is live at https://noisemake.xyz.
Quick Start
CLI:
npx noisemake "这个 parser 很 stable" --frequency 200 --seed baseline
echo "这是一段测试文本" | npx noisemake --frequency 1000 --seed 42
npx noisemake --file ./input.txt --seed 42
npx noisemake --file ./input.txt --out ./output.txt --seed 42Options:
--frequency <n> Average one perturbation per n eligible tokens (default: "200")
--seed <seed> Seed for deterministic output
--types <list> Enabled noise types: typo,repeat,spacing,punct,swap (default: "typo,repeat,spacing,punct,swap")
--languages <list> Enabled languages: zh,en (default: "zh,en")
--file <path> Read input text from a UTF-8 file
--out <path> Write output text to a UTF-8 file, creating parent directories if needed
-h, --help Show helpInput and output:
- Use exactly one input source: positional text, stdin, or
--file <path>. - Multiple positional text arguments are joined with a single space.
--filereads UTF-8 text from a file.--outwrites UTF-8 output to a file instead of stdout, and creates parent directories if needed.- Input formatting is preserved. Existing trailing newlines stay unchanged.
--frequency 100is noisier than--frequency 1000.- Short text can legitimately produce no changes.
Library:
import { noisemake } from "noisemake";
const output = noisemake("这个 parser 很 stable", {
frequency: 200,
seed: "baseline",
types: ["typo", "repeat"],
languages: ["zh", "en"],
});Documentation
If you just want to use the package:
- This README is enough for the public surface.
If you want to understand how the package is shaped:
- Documentation map
- Core docs index
- Core repo map
- Core package plan and decisions
- Core history and superseded assumptions
If you want to work on the web playground:
- Live web playground
- Web docs index
- Web package plan
- Web design system
- Web design brief
- Web implementation spec
- Web package commands
If you want research context:
If you need licensing and bundled data provenance:
- Legal and data-boundary docs
- License boundary summary
- NOTICE
- Rime Luna source note
- Rime Pinyin Simplified source note
If you are changing code in this repo:
Data License
The project code is MIT. Bundled Chinese IME confusion data is derived from LGPL-3.0-or-later Rime dictionary data and remains LGPL-covered data. See NOTICE, third_party/rime-luna-pinyin/SOURCE.md, and third_party/rime-pinyin-simp/SOURCE.md.
