@leonsilicon/biau1
v0.0.2
Published
Taiwan Ministry of Education character frequency table (字頻總表, BIAU1) as JSON.
Maintainers
Readme
biau1
Taiwan Ministry of Education character frequency table (字頻總表, file BIAU1.TXT) as JSON.
Source: the official BIAU1.TXT published by the ROC Ministry of Education — 5,731 distinct characters covering 1,982,882 total occurrences in the sample corpus.
Install
npm install @leonsilicon/biau1Usage
import biau1 from "@leonsilicon/biau1";
biau1.metadata;
// { source: "BIAU1.TXT", title: "字頻總表", totalCharacters: 5731, totalFrequency: 1982882 }
biau1.headers;
// ["rank", "character", "radical", "strokes", "frequency", "cumulativeFrequency", "cumulativePercent"]
biau1.data[0];
// [1, "的", "白", 8, 32739, 32739, 1.651]The raw JSON is also reachable directly:
import data from "@leonsilicon/biau1/biau1.json" with { type: "json" };Data shape
Each row in data is an array matching headers:
| index | field | type | notes |
| ----- | --------------------- | -------- | ------------------------------------------------- |
| 0 | rank | number | 1-based frequency rank |
| 1 | character | string | |
| 2 | radical | string | Kangxi radical (部首) |
| 3 | strokes | number | stroke count (筆畫) |
| 4 | frequency | number | occurrences in the sample |
| 5 | cumulativeFrequency | number | running sum of frequency |
| 6 | cumulativePercent | number | running cumulative coverage, as a percent (0–100) |
Characters in the Big5 HKSCS extension range (0xFA–0xFE) are decoded via the WHATWG Big5 index table (data/index-big5.txt); some of them are CJK Extension B+ codepoints above U+FFFF (e.g. 𨯨, 𡭄).
Regenerating the JSON
bun scripts/parse.tsReads data/BIAU1.TXT (using data/index-big5.txt for Big5 decoding) and writes biau1.json at the repo root.
License
MIT
