@kuzulabz/expo-kagome

v0.2.1

Published

10 months ago

A Japanese morphological analyzer

0High
0Medium
0Low

smashinfries

react-native expo expo-kagome ExpoKagome japanese Japanese

expo-kagome

A react native port of Kagome using the IPA dictionary.

Sizes

aar: 53MB
xcframework: 56.9MB
wasm: 15MB

To-do

[ ] support user dict
[ ] lower wasm file size

Install

Bun

bun add @kuzulabz/expo-kagome

Yarn

yarn add @kuzulabz/expo-kagome

Npm

npm install @kuzulabz/expo-kagome

Installation in bare React Native projects

For bare React Native projects, you must ensure that you have installed and configured the expo package before continuing.

Configure for iOS

Run npx pod-install after installing the npm package.

Usage

Full example

import { getTokens } from '@kuzulabz/expo-kagome';

const text = '私は日本語を勉強しています。';
const tokens = await getTokens(text);

Web

Web requires these extra steps:

Metro Config

Add config.resolver.assetExts.push('wasm'); to your metro config file.

Load WASM

import { initializeKagome, initializeKagomeAsync } from '@kuzulabz/expo-kagome';

// for example, in App.tsx
initializeKagome();

// or asyncronously
await initializeKagomeAsync();

API

Functions

`getTokens(text: string): Promise<Token[]>`

Basic tokenization

text: The text to be analyzed.
returns: Token[]

`getAnalysis(text: string, mode: AnalyzeMode = 'normal'): Promise<Token[]>`

Same as getTokens, but with an additional mode parameter.

text: The text to be analyzed.
mode: AnalyzeMode (default: normal)
returns: Token[]

`getWakati(text: string): Promise<string[]>`

Segmentation of text into words

text: The text to be segmented.
returns: string[]

`getGraph(text: string, mode: AnalyzeMode = 'normal'): Promise<string>`

Generate a graph of the tokenization process in Graphvis dot format.

text: The text to be analyzed.
mode: AnalyzeMode (default: normal)
returns: A Graphvis in dot format (string)

Types

Token

An object with the following properties:

word_id<number>: The ID of the token.
word_position<number>: The position of the token in the text.
word_type<string>: The word class.
base_form<string>: The base form of the token.
surface_form<string>: The surface form of the token.
reading<string>: The reading of the token.
features<string[]>: An array of features of the token. Each feature is a string.
pos<string[]>: The part of speech of the token.
pronunciation<string>: The pronunciation of the token.

AnalyzeMode

The mode of analysis. Can be one of the following:

normal
search
extend

For more information about these modes, refer to the Kagome documentation.

Contributing

Contributions are very welcome! Please refer to guidelines described in the contributing guide.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme