@chr33s/pdf-unicode-properties
v5.0.10
Published
Provides fast access to unicode character properties
Downloads
1,266
Maintainers
Readme
@chr33s/pdf-unicode-properties
Fast lookup of Unicode character metadata packaged as modern ES modules.
@chr33s/pdf-unicode-properties is part of the chr33s/pdf monorepo and continues the
Hopding/unicode-properties fork of the original foliojs project. This
edition is native ES modules only:
- ships native ES modules with NodeNext resolution (Node.js 18+ or a modern bundler required),
- is authored in TypeScript with generated declaration files, and
- keeps the compressed trie assets embedded for seamless usage across Node.js, browsers, and React Native.
unicode-properties
Provides fast access to unicode character properties. Uses @chr33s/pdf-unicode-trie to compress the properties for all code points into just 12KB.
Usage
import unicodeProperties, {
getCategory,
getNumericValue,
} from "@chr33s/pdf-unicode-properties";
getCategory("2".codePointAt(0) ?? 0); //=> 'Nd'
getNumericValue("2".codePointAt(0) ?? 0); //=> 2
// The default export bundles all helpers together when that is convenient.
unicodeProperties.isDigit("9".codePointAt(0) ?? 0); //=> trueInstallation
npm install @chr33s/pdf-unicode-propertiesThe package is distributed as native ES modules. Use Node.js 18+ or configure your bundler to resolve NodeNext-style imports.
API
getCategory(codePoint)
Returns the unicode general category for the given code point.
getScript(codePoint)
Returns the script for the given code point.
getCombiningClass(codePoint)
Returns the canonical combining class for the given code point.
getEastAsianWidth(codePoint)
Returns the East Asian width for the given code point.
getNumericValue(codePoint)
Returns the numeric value for the given code point, or null if there is no numeric value for that code point.
isAlphabetic(codePoint)
Returns whether the code point is an alphabetic character.
isDigit(codePoint)
Returns whether the code point is a digit.
isPunctuation(codePoint)
Returns whether the code point is a punctuation character.
isLowerCase(codePoint)
Returns whether the code point is lower case.
isUpperCase(codePoint)
Returns whether the code point is upper case.
isTitleCase(codePoint)
Returns whether the code point is title case.
isWhiteSpace(codePoint)
Returns whether the code point is whitespace: specifically, whether the category is one of Zs, Zl, or Zp.
isBaseForm(codePoint)
Returns whether the code point is a base form. A code point of base form does not graphically combine with preceding characters.
isMark(codePoint)
Returns whether the code point is a mark character (e.g. accent).
