unicode-escaper
v1.0.1
Published
A robust Unicode escape/unescape library supporting multiple formats with streaming support
Maintainers
Readme
unicode-escaper
A robust, zero-dependency Unicode escape/unescape library for JavaScript and TypeScript. Supports multiple escape formats, bidirectional conversion, and streaming for large files.
Features
- Multiple escape formats:
\uXXXX,\u{XXXXX},\xNN,&#xNNNN;,&#NNNN;,U+XXXX - Bidirectional: Both escape and unescape in one package
- Streaming support: Process large files efficiently with Node.js and Web Streams
- Full Unicode support: Handles BMP, supplementary planes, surrogate pairs, and emoji
- Zero dependencies: Lightweight and fast
- TypeScript-first: Written in TypeScript with strict types
- Dual ESM/CJS: Works with both module systems
- Customizable filters: Control exactly which characters to escape
Installation
npm install unicode-escaper
# or
pnpm add unicode-escaper
# or
yarn add unicode-escaperQuick Start
import { escape, unescape } from "unicode-escaper";
// Escape non-ASCII characters
escape("Hello 世界");
// => 'Hello \u4E16\u754C'
// Unescape back to original
unescape("Hello \\u4E16\\u754C");
// => 'Hello 世界'Escape Formats
| Format | Example | Description |
| -------------- | ---------- | -------------------------------------------- |
| unicode | \u4E16 | Standard JavaScript Unicode escape (default) |
| unicode-es6 | \u{4E16} | ES6 Unicode escape (supports full range) |
| hex | \xE9 | Hex escape (0x00-0xFF only) |
| html-hex | 世 | HTML hexadecimal entity |
| html-decimal | 世 | HTML decimal entity |
| codepoint | U+4E16 | Unicode code point notation |
API Reference
Core Functions
escape(input, options?)
Escapes Unicode characters in a string.
import { escape } from "unicode-escaper";
// Default: preserve ASCII, escape everything else
escape("Café 世界 😀");
// => 'Caf\u00E9 \u4E16\u754C \uD83D\uDE00'
// Use ES6 format for emoji (cleaner output)
escape("Hello 😀", { format: "unicode-es6" });
// => 'Hello \u{1F600}'
// HTML entities
escape("Café", { format: "html-hex" });
// => 'Café'
escape("Café", { format: "html-decimal" });
// => 'Café'
// Escape everything (including ASCII)
escape("Hi", { preserveAscii: false });
// => '\u0048\u0069'
// Preserve Latin-1 characters
escape("Café 世界", { preserveLatin1: true });
// => 'Café \u4E16\u754C'
// Lowercase hex digits
escape("世", { uppercase: false });
// => '\u4e16'unescape(input, options?)
Unescapes Unicode sequences back to characters.
import { unescape } from "unicode-escaper";
// Automatically detects and unescapes all formats
unescape("\\u4E16"); // => '世'
unescape("\\u{1F600}"); // => '😀'
unescape("\\xE9"); // => 'é'
unescape("世"); // => '世'
unescape("世"); // => '世'
unescape("U+4E16"); // => '世'
// Handle surrogate pairs
unescape("\\uD83D\\uDE00"); // => '😀'
// Only unescape specific formats
unescape("\\u4E16 世", { formats: ["unicode"] });
// => '世 世'
// Strict mode (throws on invalid sequences)
unescape("\\uZZZZ", { lenient: false });
// => throws ErrorConvenience Functions
import {
escapeToUnicode, // \uXXXX format
escapeToUnicodeES6, // \u{XXXXX} format
escapeToHex, // \xNN format
escapeToHtmlHex, // &#xNNNN; format
escapeToHtmlDecimal, // &#NNNN; format
escapeToCodePoint, // U+XXXX format
escapeAll, // Escape all characters
escapeNonPrintable, // Escape control chars and non-ASCII
} from "unicode-escaper";
escapeToUnicodeES6("😀"); // => '\u{1F600}'
escapeToHtmlHex("世"); // => '世'
escapeAll("Hi"); // => '\u0048\u0069'import {
unescapeUnicode, // Only \uXXXX
unescapeUnicodeES6, // Only \u{XXXXX}
unescapeHex, // Only \xNN
unescapeHtmlHex, // Only &#xNNNN;
unescapeHtmlDecimal, // Only &#NNNN;
unescapeCodePoint, // Only U+XXXX
unescapeHtml, // Both HTML formats
unescapeJs, // All JavaScript formats
} from "unicode-escaper";Custom Filters
Control which characters to escape using filter functions:
import { escape, isNotAscii, isNotBmp, and, or, oneOf } from "unicode-escaper";
// Escape only non-ASCII (default behavior)
escape("Hello 世界", { filter: isNotAscii });
// Escape only emoji (non-BMP characters)
escape("Hello 世界 😀", { filter: isNotBmp });
// => 'Hello 世界 \uD83D\uDE00'
// Escape vowels
escape("Hello", { filter: oneOf("aeiouAEIOU") });
// => 'H\u0065ll\u006F'
// Combine filters
escape("Test", { filter: and(isNotAscii, isNotBmp) });Available filters:
isAscii/isNotAscii- ASCII range (0x00-0x7F)isLatin1/isNotLatin1- Latin-1 range (0x00-0xFF)isBmp/isNotBmp- Basic Multilingual Plane (0x0000-0xFFFF)isPrintableAscii/isNotPrintableAscii- Printable ASCII (0x20-0x7E)isControl- Control charactersisWhitespace- Whitespace charactersisSurrogate/isHighSurrogate/isLowSurrogate- Surrogate code pointsinRange(start, end)/notInRange(start, end)- Custom rangeoneOf(chars)/noneOf(chars)- Character setand(...filters)/or(...filters)/not(filter)- Combinatorsall/none- Always true/false
Utility Functions
import {
getCodePoint, // Get code point of a character
fromCodePoint, // Create character from code point
getCharInfo, // Get detailed character information
toCodePoints, // Convert string to code point array
fromCodePoints, // Convert code point array to string
codePointLength, // Get length in code points (not UTF-16)
toHex, // Convert code point to hex string
parseHex, // Parse hex string to code point
isValidUnicode, // Check for unpaired surrogates
normalizeNFC, // Normalize to NFC
normalizeNFD, // Normalize to NFD
unicodeEquals, // Compare Unicode equivalence
} from "unicode-escaper";
// Get code point
getCodePoint("😀"); // => 128512 (0x1F600)
// Character info
getCharInfo("😀");
// => {
// char: '😀',
// codePoint: 128512,
// hex: '1F600',
// isAscii: false,
// isBmp: false,
// isLatin1: false,
// isHighSurrogate: false,
// isLowSurrogate: false,
// utf16Length: 2
// }
// Code point length (differs from string.length for emoji)
"😀".length; // => 2 (UTF-16 code units)
codePointLength("😀"); // => 1 (actual characters)
// Parse various formats
parseHex("U+1F600"); // => 128512
parseHex("0x4E16"); // => 19990
parseHex("\\u{4E16}"); // => 19990Streaming Support
Process large files efficiently without loading everything into memory:
Node.js Streams
import { createReadStream, createWriteStream } from "fs";
import { pipeline } from "stream/promises";
import { EscapeStream, UnescapeStream } from "unicode-escaper";
// Escape a file
await pipeline(
createReadStream("input.txt", "utf8"),
new EscapeStream({ escapeOptions: { format: "unicode-es6" } }),
createWriteStream("escaped.txt")
);
// Unescape a file
await pipeline(
createReadStream("escaped.txt", "utf8"),
new UnescapeStream(),
createWriteStream("output.txt")
);Web Streams API
import {
createWebEscapeStream,
createWebUnescapeStream,
} from "unicode-escaper";
// Works in browsers and modern Node.js
const response = await fetch("data.txt");
const escaped = response.body
.pipeThrough(new TextDecoderStream())
.pipeThrough(createWebEscapeStream({ format: "html-hex" }))
.pipeThrough(new TextEncoderStream());Detection Utilities
import { hasEscapeSequences, countEscapeSequences } from "unicode-escaper";
hasEscapeSequences("\\u4E16"); // => true
hasEscapeSequences("Hello"); // => false
countEscapeSequences("\\u4E16\\u754C"); // => 2
// Filter by format
hasEscapeSequences("\\u4E16", ["unicode"]); // => true
hasEscapeSequences("\\u4E16", ["html-hex"]); // => falseTypeScript Support
Full TypeScript support with strict types:
import type {
EscapeFormat,
EscapeOptions,
UnescapeOptions,
FilterFunction,
CharacterInfo,
EscapeResult,
} from "unicode-escaper";
// Type-safe options
const options: EscapeOptions = {
format: "unicode-es6",
preserveAscii: true,
uppercase: true,
};
// Custom filter with proper typing
const myFilter: FilterFunction = (char, codePoint) => {
return codePoint > 0x7f;
};Comparison with escape-unicode
| Feature | escape-unicode | unicode-escaper |
| --------------- | ---------------- | --------------- |
| Escape formats | \uXXXX only | 6 formats |
| Unescape | Separate package | Built-in |
| Streaming | No | Yes |
| Web Streams | No | Yes |
| ESM + CJS | CJS only | Both |
| Browser support | Node only | Both |
| TypeScript | Yes | Yes (strict) |
| Zero deps | Yes | Yes |
International Language Support
Fully tested with diverse Unicode scripts:
| Language | Script | Example | Escaped |
| ---------- | ----------------------- | -------------- | -------------------------------------- |
| Korean | Hangul | 안녕하세요 | \uC548\uB155\uD558\uC138\uC694 |
| Japanese | Hiragana/Katakana/Kanji | こんにちは | \u3053\u3093\u306B\u3061\u306F |
| Arabic | Arabic | مرحبا | \u0645\u0631\u062D\u0628\u0627 |
| Thai | Thai | สวัสดี | \u0E2A\u0E27\u0E31\u0E2A\u0E14\u0E35 |
| Russian | Cyrillic | Привет | \u041F\u0440\u0438\u0432\u0435\u0442 |
| Hindi | Devanagari | नमस्ते | \u0928\u092E\u0938\u094D\u0924\u0947 |
| Chinese | Han | 你好 | \u4F60\u597D |
| Vietnamese | Latin Extended | Xin chào | Xin ch\u00E0o |
| French | Latin Extended | Café | Caf\u00E9 |
| Turkish | Latin Extended | Türkçe | T\u00FCrk\u00E7e |
| Spanish | Latin Extended | ¡Hola! | \u00A1Hola! |
| Portuguese | Latin Extended | São Paulo | S\u00E3o Paulo |
import { escape, unescape } from "unicode-escaper";
// Korean
escape("안녕하세요"); // => '\uC548\uB155\uD558\uC138\uC694'
// Japanese (mixed scripts)
escape("東京 とうきょう トウキョウ");
// Arabic (RTL)
escape("مرحبا"); // => '\u0645\u0631\u062D\u0628\u0627'
// Thai (with tone marks)
escape("สวัสดี");
// Russian
escape("Привет"); // => '\u041F\u0440\u0438\u0432\u0435\u0442'
// Hindi (with combining marks)
escape("नमस्ते"); // => '\u0928\u092E\u0938\u094D\u0924\u0947'
// Chinese
escape("你好世界"); // => '\u4F60\u597D\u4E16\u754C'
// Vietnamese (with diacritics)
escape("Xin chào"); // => 'Xin ch\u00E0o'
// Turkish (special i variants)
escape("İstanbul"); // => '\u0130stanbul'
// Spanish (inverted punctuation)
escape("¡Hola!"); // => '\u00A1Hola!'
// Portuguese (tildes and cedilla)
escape("São Paulo"); // => 'S\u00E3o Paulo'
// Mixed multi-language content
const mixed = "Hello 안녕 こんにちは 你好 مرحبا สวัสดี Привет नमस्ते";
unescape(escape(mixed)) === mixed; // => trueSupported Features
- Combining characters: Thai tone marks, Arabic diacritics, Hindi matras/virama, Vietnamese diacritics
- Bidirectional text: RTL markers, mixed LTR/RTL content
- Native numerals: Thai ๒๐๒๔, Arabic ٢٠٢٤, Devanagari २०२४
- Conjunct consonants: Hindi samyuktakshar (क्ष, त्र, ज्ञ)
- Supplementary planes: Emoji, ancient scripts, mathematical symbols
- Normalization: Handles NFC/NFD forms correctly
- Extended Latin: French accents, Turkish special i (ı İ), Spanish ñ, Portuguese ã/õ
Browser Support
Works in all modern browsers that support ES2022. For older browsers, you may need polyfills for:
String.prototype.codePointAtString.fromCodePoint- Web Streams API (if using streaming)
License
Apache-2.0
