npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

universal-encoding-toolkit

v1.0.1

Published

A comprehensive encoding detection and conversion toolkit — supports 100+ character encodings including CJK, Cyrillic, Arabic, Hebrew, Thai, and more.

Readme

Universal Encoding Toolkit

A comprehensive encoding detection and conversion toolkit for Node.js — supports 100+ character encodings including CJK, Cyrillic, Arabic, Hebrew, Thai, and more.

Combines the encoding conversion power of iconv-lite with a custom-built multi-stage auto-detection engine.

npm version License: MIT


Features

  • 🔍 Auto-detect 100+ encodings from raw buffers — CJK, Cyrillic, Arabic, Hebrew, Thai, Latin, etc.
  • 🔄 Encode / Decode / Transcode between any supported encodings
  • 🧠 Smart decode — detect + decode in one step
  • 📦 Single dependency — only iconv-lite
  • 💡 TypeScript type declarations included
  • 🌊 Stream support — encode/decode streams for piping

Supported Encodings (10 Groups)

| Group | Count | Examples | |-------|-------|---------| | Node.js Built-in | 8 | utf8, ucs2, ascii, base64, hex | | Unicode Extended | 7 | utf16be, utf32, utf7, utf7-imap | | Windows Code Pages | 10 | windows-874, windows-1250 ~ 1258 | | ISO-8859 Series | 15 | iso-8859-1 ~ iso-8859-16 | | IBM/DOS Code Pages | 28 | cp437, cp850, cp866, cp1125... | | Macintosh Encodings | 11 | macintosh, macgreek, macukraine... | | KOI8 Series | 4 | koi8-r, koi8-u, koi8-ru, koi8-t | | Other Single-byte | 12 | armscii8, viscii, tis620, mik... | | CJK Multi-byte (DBCS) | 11 | GBK, GB18030, Big5, Shift_JIS, EUC-JP, EUC-KR | | Common Aliases | 12 | latin1, chinese, korean, sjis... |


Installation

npm install universal-encoding-toolkit

Quick Start

const toolkit = require('universal-encoding-toolkit');

// 1. Encode & Decode
const buf = toolkit.encode('你好世界', 'gbk');
const str = toolkit.decode(buf, 'gbk');
console.log(str); // 你好世界

// 2. Auto-detect encoding
const detected = toolkit.detect(buf);
console.log(detected);
// { encoding: 'gbk', confidence: 0.88, source: 'gbk-analysis' }

// 3. Smart decode (detect + decode in one step)
const result = toolkit.smartDecode(buf);
console.log(result.text);       // 你好世界
console.log(result.encoding);   // gbk
console.log(result.confidence); // 0.88

// 4. Transcode (encoding → encoding)
const big5Buf = toolkit.transcode(buf, 'gbk', 'big5');

// 5. Check encoding support
toolkit.encodingExists('gbk');      // true
toolkit.encodingExists('chinese');  // true (alias)

// 6. Normalize encoding names
toolkit.normalize('sjis');    // 'shiftjis'
toolkit.normalize('latin1');  // 'iso-8859-1'
toolkit.normalize('chinese'); // 'gbk'

API Reference

Encoding Conversion

| Method | Description | |--------|-------------| | encode(str, encoding) | String → Buffer | | decode(buffer, encoding) | Buffer → String | | transcode(buffer, from, to) | Re-encode buffer from one encoding to another | | encodeStream(encoding) | Create a writable encode stream | | decodeStream(encoding) | Create a writable decode stream |

Encoding Detection

| Method | Description | |--------|-------------| | detect(buffer) | Auto-detect, returns { encoding, confidence, source } | | detectAll(buffer) | Returns all candidates sorted by confidence | | smartDecode(buffer) | Auto-detect + decode, returns { text, encoding, confidence } |

Utilities

| Method | Description | |--------|-------------| | encodingExists(name) | Check if an encoding is supported | | normalize(name) | Normalize encoding name (e.g. 'sjis''shiftjis') | | getSupportedEncodings() | Get flat list of all supported encoding names | | getEncodingGroups() | Get encoding groups object |


Real-World Examples

Read a file with unknown encoding

const fs = require('fs');
const toolkit = require('universal-encoding-toolkit');

const buf = fs.readFileSync('unknown-file.txt');
const { text, encoding, confidence } = toolkit.smartDecode(buf);

console.log(`Detected: ${encoding} (confidence: ${(confidence * 100).toFixed(1)}%)`);
console.log(text);

HTTP response decoding

const http = require('http');
const toolkit = require('universal-encoding-toolkit');

http.get('http://example.com/data', (res) => {
  const chunks = [];
  res.on('data', chunk => chunks.push(chunk));
  res.on('end', () => {
    const buf = Buffer.concat(chunks);
    const { text, encoding } = toolkit.smartDecode(buf);
    console.log(`Response encoding: ${encoding}`);
    console.log(text);
  });
});

Batch convert files to UTF-8

const fs = require('fs');
const path = require('path');
const toolkit = require('universal-encoding-toolkit');

function convertToUTF8(filePath) {
  const buf = fs.readFileSync(filePath);
  const detected = toolkit.detect(buf);

  if (detected.encoding !== 'utf-8' && detected.confidence > 0.7) {
    const text = toolkit.decode(buf, detected.encoding);
    fs.writeFileSync(filePath, Buffer.from('\uFEFF' + text, 'utf-8'));
    console.log(`Converted ${filePath}: ${detected.encoding} → utf-8`);
  }
}

Stream piping

const fs = require('fs');
const toolkit = require('universal-encoding-toolkit');

// Decode a GBK file to UTF-8 via streams
fs.createReadStream('input-gbk.txt')
  .pipe(toolkit.decodeStream('gbk'))
  .pipe(fs.createWriteStream('output-utf8.txt'));

Using with ES Modules / TypeScript

import toolkit from 'universal-encoding-toolkit';
// or import specific exports:
import { UniversalEncodingToolkit, EncodingDetector, normalizeEncoding } from 'universal-encoding-toolkit';

const buf = toolkit.encode('Hello, 世界!', 'utf-8');
const result = toolkit.smartDecode(buf);
// Full IntelliSense support with included type declarations

Detection Engine

The auto-detection engine uses a 9-stage pipeline:

Input Buffer
    │
    ├─ Stage 1: BOM signature detection          → confidence 1.0
    ├─ Stage 2: Pure ASCII fast path              → confidence 1.0
    ├─ Stage 3: UTF-8 validation                  → confidence 0.85~0.99
    ├─ Stage 4: UTF-16 null-byte heuristics       → confidence 0.80
    ├─ Stage 5: High-byte pattern analysis
    ├─ Stage 6: CJK multi-byte evaluation
    │           (GBK/GB18030/Big5/Shift_JIS/EUC-JP/EUC-KR)
    ├─ Stage 7: Single-byte statistical scoring
    │           (windows-125x, ISO-8859-x, KOI8-x, etc.)
    ├─ Stage 8: Arbitration (with short-text protection)
    └─ Stage 9: Post-process disambiguation (12 known patterns)

Advanced Usage

Custom instance

const { UniversalEncodingToolkit } = require('universal-encoding-toolkit');

const myToolkit = new UniversalEncodingToolkit();
const result = myToolkit.detect(someBuffer);

Access sub-modules

const {
  EncodingDetector,
  normalizeEncoding,
  ENCODING_GROUPS,
  ENCODING_ALIASES
} = require('universal-encoding-toolkit');

// Use detector directly
const detector = new EncodingDetector();
const result = detector.detectAll(buffer);

// Normalize encoding names
normalizeEncoding('sjis');   // 'shiftjis'
normalizeEncoding('cp936');  // 'gbk'

License

MIT © Universal Encoding Team