subtlex-ch-chr
v0.0.0
Published
Typed JSON package of the SUBTLEX-CH-CHR Chinese character frequency dataset.
Maintainers
Readme
subtlex-ch-chr
Typed JSON distribution of the Chinese character frequency dataset SUBTLEX-CH-CHR.
This package publishes a parsed JSON version of data/SUBTLEX-CH-CHR (tab-separated source)
and exposes it as a default ESM export with TypeScript definitions.
What this package contains
SUBTLEX-CH-CHR.json: Parsed dataset payloadindex.js: ESM entrypoint that default-exports the JSONindex.d.ts: Handwritten type definitions for the exported structure
The exported object shape is:
metadata.totalCharacterCount: total character tokens in the corpusmetadata.contextNumber: number of subtitle contextsheaders: original source headers (preserved exactly)data: array of rows with fields:CharacterCHRCountCHR/millionlogCHRCHR-CDCHR-CD%logCHR-CD
Install
npm install subtlex-ch-chrUsage
import subtlexChChr from "subtlex-ch-chr";
console.log(subtlexChChr.metadata.totalCharacterCount);
console.log(subtlexChChr.data[0]);You can also import the raw JSON subpath export:
import dataset from "subtlex-ch-chr/SUBTLEX-CH-CHR.json";Regenerating the JSON
The source TSV-like file lives at data/SUBTLEX-CH-CHR. To regenerate the published JSON:
bun run convert:dataThis runs scripts/convert-subtlex-ch-chr.ts and writes SUBTLEX-CH-CHR.json to the repository root.
