subtlex-ch-wf
v0.0.0
Published
Typed JSON package of the SUBTLEX-CH-WF Chinese word frequency dataset.
Maintainers
Readme
subtlex-ch-wf
Typed JSON distribution of the Chinese word frequency dataset SUBTLEX-CH-WF.
This package publishes a parsed JSON version of data/SUBTLEX-CH-WF (tab-separated source)
and exposes it as a default ESM export with TypeScript definitions.
What this package contains
SUBTLEX-CH-WF.json: Parsed dataset payloadindex.js: ESM entrypoint that default-exports the JSONindex.d.ts: Handwritten type definitions for the exported structure
The exported object shape is:
metadata.totalCharacterCount: total character tokens in the corpusmetadata.contextNumber: number of subtitle contextsheaders: original source headers (preserved exactly)data: array of rows with fields:WordWCountW/millionlogWW-CDW-CD%logW-CD
Install
npm install subtlex-ch-wfUsage
import subtlexChWf from "subtlex-ch-wf";
console.log(subtlexChWf.metadata.totalCharacterCount);
console.log(subtlexChWf.data[0]);You can also import the raw JSON subpath export:
import dataset from "subtlex-ch-wf/SUBTLEX-CH-WF.json";Regenerating the JSON
The source TSV-like file lives at data/SUBTLEX-CH-WF. To regenerate the published JSON:
bun run convert:dataThis runs scripts/convert-subtlex-ch-wf.ts and writes SUBTLEX-CH-WF.json to the repository root.
