universal-encoding-toolkit
v1.0.1
Published
A comprehensive encoding detection and conversion toolkit — supports 100+ character encodings including CJK, Cyrillic, Arabic, Hebrew, Thai, and more.
Maintainers
Readme
Universal Encoding Toolkit
A comprehensive encoding detection and conversion toolkit for Node.js — supports 100+ character encodings including CJK, Cyrillic, Arabic, Hebrew, Thai, and more.
Combines the encoding conversion power of iconv-lite with a custom-built multi-stage auto-detection engine.
Features
- 🔍 Auto-detect 100+ encodings from raw buffers — CJK, Cyrillic, Arabic, Hebrew, Thai, Latin, etc.
- 🔄 Encode / Decode / Transcode between any supported encodings
- 🧠 Smart decode — detect + decode in one step
- 📦 Single dependency — only
iconv-lite - 💡 TypeScript type declarations included
- 🌊 Stream support — encode/decode streams for piping
Supported Encodings (10 Groups)
| Group | Count | Examples | |-------|-------|---------| | Node.js Built-in | 8 | utf8, ucs2, ascii, base64, hex | | Unicode Extended | 7 | utf16be, utf32, utf7, utf7-imap | | Windows Code Pages | 10 | windows-874, windows-1250 ~ 1258 | | ISO-8859 Series | 15 | iso-8859-1 ~ iso-8859-16 | | IBM/DOS Code Pages | 28 | cp437, cp850, cp866, cp1125... | | Macintosh Encodings | 11 | macintosh, macgreek, macukraine... | | KOI8 Series | 4 | koi8-r, koi8-u, koi8-ru, koi8-t | | Other Single-byte | 12 | armscii8, viscii, tis620, mik... | | CJK Multi-byte (DBCS) | 11 | GBK, GB18030, Big5, Shift_JIS, EUC-JP, EUC-KR | | Common Aliases | 12 | latin1, chinese, korean, sjis... |
Installation
npm install universal-encoding-toolkitQuick Start
const toolkit = require('universal-encoding-toolkit');
// 1. Encode & Decode
const buf = toolkit.encode('你好世界', 'gbk');
const str = toolkit.decode(buf, 'gbk');
console.log(str); // 你好世界
// 2. Auto-detect encoding
const detected = toolkit.detect(buf);
console.log(detected);
// { encoding: 'gbk', confidence: 0.88, source: 'gbk-analysis' }
// 3. Smart decode (detect + decode in one step)
const result = toolkit.smartDecode(buf);
console.log(result.text); // 你好世界
console.log(result.encoding); // gbk
console.log(result.confidence); // 0.88
// 4. Transcode (encoding → encoding)
const big5Buf = toolkit.transcode(buf, 'gbk', 'big5');
// 5. Check encoding support
toolkit.encodingExists('gbk'); // true
toolkit.encodingExists('chinese'); // true (alias)
// 6. Normalize encoding names
toolkit.normalize('sjis'); // 'shiftjis'
toolkit.normalize('latin1'); // 'iso-8859-1'
toolkit.normalize('chinese'); // 'gbk'API Reference
Encoding Conversion
| Method | Description |
|--------|-------------|
| encode(str, encoding) | String → Buffer |
| decode(buffer, encoding) | Buffer → String |
| transcode(buffer, from, to) | Re-encode buffer from one encoding to another |
| encodeStream(encoding) | Create a writable encode stream |
| decodeStream(encoding) | Create a writable decode stream |
Encoding Detection
| Method | Description |
|--------|-------------|
| detect(buffer) | Auto-detect, returns { encoding, confidence, source } |
| detectAll(buffer) | Returns all candidates sorted by confidence |
| smartDecode(buffer) | Auto-detect + decode, returns { text, encoding, confidence } |
Utilities
| Method | Description |
|--------|-------------|
| encodingExists(name) | Check if an encoding is supported |
| normalize(name) | Normalize encoding name (e.g. 'sjis' → 'shiftjis') |
| getSupportedEncodings() | Get flat list of all supported encoding names |
| getEncodingGroups() | Get encoding groups object |
Real-World Examples
Read a file with unknown encoding
const fs = require('fs');
const toolkit = require('universal-encoding-toolkit');
const buf = fs.readFileSync('unknown-file.txt');
const { text, encoding, confidence } = toolkit.smartDecode(buf);
console.log(`Detected: ${encoding} (confidence: ${(confidence * 100).toFixed(1)}%)`);
console.log(text);HTTP response decoding
const http = require('http');
const toolkit = require('universal-encoding-toolkit');
http.get('http://example.com/data', (res) => {
const chunks = [];
res.on('data', chunk => chunks.push(chunk));
res.on('end', () => {
const buf = Buffer.concat(chunks);
const { text, encoding } = toolkit.smartDecode(buf);
console.log(`Response encoding: ${encoding}`);
console.log(text);
});
});Batch convert files to UTF-8
const fs = require('fs');
const path = require('path');
const toolkit = require('universal-encoding-toolkit');
function convertToUTF8(filePath) {
const buf = fs.readFileSync(filePath);
const detected = toolkit.detect(buf);
if (detected.encoding !== 'utf-8' && detected.confidence > 0.7) {
const text = toolkit.decode(buf, detected.encoding);
fs.writeFileSync(filePath, Buffer.from('\uFEFF' + text, 'utf-8'));
console.log(`Converted ${filePath}: ${detected.encoding} → utf-8`);
}
}Stream piping
const fs = require('fs');
const toolkit = require('universal-encoding-toolkit');
// Decode a GBK file to UTF-8 via streams
fs.createReadStream('input-gbk.txt')
.pipe(toolkit.decodeStream('gbk'))
.pipe(fs.createWriteStream('output-utf8.txt'));Using with ES Modules / TypeScript
import toolkit from 'universal-encoding-toolkit';
// or import specific exports:
import { UniversalEncodingToolkit, EncodingDetector, normalizeEncoding } from 'universal-encoding-toolkit';
const buf = toolkit.encode('Hello, 世界!', 'utf-8');
const result = toolkit.smartDecode(buf);
// Full IntelliSense support with included type declarationsDetection Engine
The auto-detection engine uses a 9-stage pipeline:
Input Buffer
│
├─ Stage 1: BOM signature detection → confidence 1.0
├─ Stage 2: Pure ASCII fast path → confidence 1.0
├─ Stage 3: UTF-8 validation → confidence 0.85~0.99
├─ Stage 4: UTF-16 null-byte heuristics → confidence 0.80
├─ Stage 5: High-byte pattern analysis
├─ Stage 6: CJK multi-byte evaluation
│ (GBK/GB18030/Big5/Shift_JIS/EUC-JP/EUC-KR)
├─ Stage 7: Single-byte statistical scoring
│ (windows-125x, ISO-8859-x, KOI8-x, etc.)
├─ Stage 8: Arbitration (with short-text protection)
└─ Stage 9: Post-process disambiguation (12 known patterns)Advanced Usage
Custom instance
const { UniversalEncodingToolkit } = require('universal-encoding-toolkit');
const myToolkit = new UniversalEncodingToolkit();
const result = myToolkit.detect(someBuffer);Access sub-modules
const {
EncodingDetector,
normalizeEncoding,
ENCODING_GROUPS,
ENCODING_ALIASES
} = require('universal-encoding-toolkit');
// Use detector directly
const detector = new EncodingDetector();
const result = detector.detectAll(buffer);
// Normalize encoding names
normalizeEncoding('sjis'); // 'shiftjis'
normalizeEncoding('cp936'); // 'gbk'License
MIT © Universal Encoding Team
