noisemake

v0.1.1

Published

2 months ago

Controlled text perturbations that make polished text less obviously AI-generated.

0High
0Medium
0Low

sorcererxw

noisemake

noisemake injects controlled, reproducible imperfections into text.

It is a TypeScript CLI and npm library for deterministic text perturbation, not an LLM rewriting tool. Same input, same seed, same options, same output.

The current package supports:

Chinese IME-style wrong-word substitutions
English keyboard typos, including insertion-like slips
Spacing glitches in English words, after punctuation, and across Chinese-English boundaries
Punctuation normalization from full-width Chinese marks to ASCII marks
Adjacent word swaps
Light repetition

The browser playground is live at https://noisemake.xyz.

Quick Start

CLI:

npx noisemake "这个 parser 很 stable" --frequency 200 --seed baseline
echo "这是一段测试文本" | npx noisemake --frequency 1000 --seed 42
npx noisemake --file ./input.txt --seed 42
npx noisemake --file ./input.txt --out ./output.txt --seed 42

Options:

--frequency <n>     Average one perturbation per n eligible tokens (default: "200")
--seed <seed>       Seed for deterministic output
--types <list>      Enabled noise types: typo,repeat,spacing,punct,swap (default: "typo,repeat,spacing,punct,swap")
--languages <list>  Enabled languages: zh,en (default: "zh,en")
--file <path>       Read input text from a UTF-8 file
--out <path>        Write output text to a UTF-8 file, creating parent directories if needed
-h, --help          Show help

Input and output:

Use exactly one input source: positional text, stdin, or --file <path>.
Multiple positional text arguments are joined with a single space.
--file reads UTF-8 text from a file.
--out writes UTF-8 output to a file instead of stdout, and creates parent directories if needed.
Input formatting is preserved. Existing trailing newlines stay unchanged.
--frequency 100 is noisier than --frequency 1000.
Short text can legitimately produce no changes.

Library:

import { noisemake } from "noisemake";

const output = noisemake("这个 parser 很 stable", {
  frequency: 200,
  seed: "baseline",
  types: ["typo", "repeat"],
  languages: ["zh", "en"],
});

Documentation

If you just want to use the package:

This README is enough for the public surface.

If you want to understand how the package is shaped:

If you want to work on the web playground:

If you want research context:

If you need licensing and bundled data provenance:

If you are changing code in this repo:

AGENTS.md

Data License

The project code is MIT. Bundled Chinese IME confusion data is derived from LGPL-3.0-or-later Rime dictionary data and remains LGPL-covered data. See NOTICE, third_party/rime-luna-pinyin/SOURCE.md, and third_party/rime-pinyin-simp/SOURCE.md.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

noisemake

Quick Start

Documentation

Data License