ja-pitch-accent

v1.2.0

Published

19 days ago

Pitch accent lookup and HTML formatting extracted from 10ten Japanese Reader

0High
0Medium
0Low

joliss

japanese pitch-accent intonation kana dictionary

ja-pitch-accent

Standalone pitch-accent lookup and HTML formatting extracted from 10ten Japanese Reader.

Completely vibe-coded. Use at your own discretion!

CLI

npx ja-pitch-accent <spelling> [reading] [--html]

For example, to get a JSON array (see below), run:

npx ja-pitch-accent 閉める

To print HTML instead:

npx ja-pitch-accent 閉める --html | head -n 1

When there are multiple dictionary entries, we print one per line. We use head -n 1 in this example to only print the first.

JavaScript API

Installation

npm install ja-pitch-accent

Usage

import { formatJaPitchAccentHtml, getJaPitchAccent } from 'ja-pitch-accent';

const matches = getJaPitchAccent('閉める', 'しめる');
const html = formatJaPitchAccentHtml(matches[0]);

getJaPitchAccent(spelling, reading?) takes a Japanese word, and optionally a reading to narrow the results to a specific reading. It returns an array of matches. The first match is usually the best.

type JaPitchAccentMatch = {
  accent: number;
  partOfSpeech: string[];
  reading: string;
  spellings: string[];
};

accent is the pitch-accent downstep position counted in mora:

0 means heiban.
1 means atamadaka.
2 or greater means the pitch drops after that mora.
If accent === mora count, the pattern is odaka.

formatJaPitchAccentHtml(match, renderCharacter?) renders the same binary pitch-accent outline style used by 10ten. The optional renderCharacter(character, index) callback can return custom HTML for each kana character.

CSS variables:

--ja-pitch-accent-border-color
--ja-pitch-accent-border-style
--ja-pitch-accent-border-width
--ja-pitch-accent-display
--ja-pitch-accent-margin-bottom

Browser use

This package can be used in the browser as-is. However, your bundle size will be several megabytes, as the entire dataset JSON is included.

Contributing

Data

To rebuild the dataset from 10ten, run:

git submodule update --init --recursive
pnpm run build-data

By default that reads from the vendored 10ten submodule at vendor/10ten-ja-reader/data/words.ljson, but you can also pass an explicit source path and output directory to scripts/build-dataset.ts.

Licensing

The package code is GPL-3.0-only.

The bundled generated dataset also carries upstream attribution/licence notices from the data sources used by 10ten, including JMdict/EDICT and pitch-accent data attributed by 10ten to Uros Ozvatic/Kanjium. See NOTICE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ja-pitch-accent

CLI

JavaScript API

Installation

Usage

Browser use

Contributing

Data

Licensing