ve-japanese

v1.0.3

Published

8 months ago

A Japanese language parser based on Ve, using kuromoji.

0High
0Medium
0Low

metrovoc

ve-japanese

A Japanese language parser, ported from the original ve project.

This package intelligently groups inflected forms (like verbs and adjectives) into single, meaningful words, while preserving their dictionary forms (lemmas).

It uses kuromoji.js as its underlying tokenizer and is fully self-contained, requiring no external dependencies like native MeCab.

Usage

First, import the parse function from the package.

const { parse } = require('./dist/index.js'); // Adjust the path if needed

async function main() {
    const text = 'これ食べました';
    const words = await parse(text);

    // The result is an array of Word objects
    console.log(words);
}

main();

Getting the Dictionary Form (Lemma)

Each object in the returned array represents a word. The .word property gives you the surface form from the text, while the .lemma property gives you its basic, or dictionary, form.

This is especially useful for conjugated verbs.

const { parse } = require('./dist/index.js');

async function findLemmas() {
    const text = 'これ食べました';
    const words = await parse(text);

    const verb = words[1];

    console.log(`Surface form: ${verb.word}`); // Outputs: 食べました
    console.log(`Dictionary form: ${verb.lemma}`); // Outputs: 食べる
}

findLemmas();

Example Word Object

A Word object for 食べました will look like this:

{
  "word": "食べました",
  "lemma": "食べる",
  "part_of_speech": "verb",
  "tokens": [
    { "surface_form": "食べ", "pos": "動詞", ... },
    { "surface_form": "まし", "pos": "助動詞", ... },
    { "surface_form": "た", "pos": "助動詞", ... }
  ],
  "extra": {
    "reading": "タベマシタ",
    "transcription": "タベマシタ"
  }
}

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ve-japanese

Usage

Getting the Dictionary Form (Lemma)

Example Word Object