npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@edwinho/kotoba-core

v0.2.5

Published

Framework-neutral language-learning data models, language profiles, and translation draft utilities.

Downloads

1,126

Readme

@edwinho/kotoba-core

Framework-neutral data contracts and utilities for Kotoba learning entries, translation drafts, language profiles, cache keys, and sanitizers.

This package is intentionally pure. It does not import Expo, React Native, Gemini provider code, CLI code, app storage, service policy, or runtime transport helpers.

@edwinho/kotoba-core is the shared data-contract package used by Kotoba providers, CLI tools, and app integrations. It does not translate text by itself and it does not call Gemini or hosted services.

Install

Inside this monorepo, depend on the package by version. Bun workspaces link the local package when the workspace version satisfies the range:

{
  "dependencies": {
    "@edwinho/kotoba-core": "^0.2.0"
  }
}

For external consumers:

bun add @edwinho/kotoba-core

Concepts

Version 0.2.0 includes optional Japanese study-token metadata, the StudyTokenMetadata contract, and the generateJapaneseFormTable() helper for rendering deterministic verb and adjective form tables from trusted metadata.

TranslationDraft is the normalized shape returned by translation providers before a phrase is saved. It carries source and target text, optional reading support, enrichment data, study tokens, freshness metadata, and capability flags that describe which enrichments are present. It keeps the same field shape while supporting future-language drafts through a generic language parameter. App consumers can keep using the default Japanese/Chinese draft type, while core/provider tests can infer TranslationDraft<"ko"> from normalizeTranslationDraft({ targetLanguage: "ko", ... }).

LearningEntry and LearningEntryDraft are the saved-entry contracts used by library and note surfaces.

Language helpers such as detectDirection, resolveLanguageProfile, resolveActiveLanguageContext, and Chinese variant helpers keep language, script, reading-system, and cache-scope decisions centralized.

Language support is represented through capability profiles. LearningLanguage tracks product surfaces that only expose Japanese and Chinese today, while SupportedLearningLanguage also includes package-level languages such as Korean. Each LanguageCapabilityProfile declares language, script, reading-system, locale, register, romanization, and variant support.

Japanese and Chinese profiles preserve the existing locale, reading, romanization, and variant behavior. Korean is included with Hangul/Revised Romanization, ko-KR speech locale metadata, and register support.

Validators and normalizers accept provider-like or imported data defensively: malformed sections are dropped and reported through droppedSections; valid sections are normalized into the public contracts. For example, sanitizeStudyTokens rejects out-of-range tokens and tokens whose surface text does not match the target text. sanitizeEnrichmentData drops malformed nested sections such as invalid grammar breakdowns, examples, contrasts, and cached note details. normalizeTranslationDraft derives completeness and capability metadata after reading support, Chinese metadata, and study tokens are normalized.

Japanese study tokens can optionally carry trusted morphology metadata under StudyToken.metadata. Phase 1 supports Japanese verb and adjective metadata only:

import type { StudyTokenMetadata } from "@edwinho/kotoba-core";

const metadata: StudyTokenMetadata = {
  language: "ja",
  category: "morphology",
  kind: "verb",
  surface: "飲んだ",
  lemma: "飲む",
  verbClass: "godan-mu",
  observedForm: "past",
  confidence: "high",
};

surface is the study token that owns the metadata. Core may add observedSurface when deterministic repair can prove that a split sequence is one observed morphology phrase, for example a token 高く with observedSurface: "高くないです".

sanitizeStudyTokens preserves valid metadata and drops only malformed metadata while keeping the token. Dropped metadata is reported through a metadata-specific path such as studyTokens[0].metadata.

The sanitizer also performs bounded Japanese morphology repair for common provider tokenization failures. For example, split polite verb sequences such as 食べ + ました, and split adjective sequences such as 静か + でした, 高く + ない + です, and 静か + では + なかった, can be normalized into observed surfaces like 食べました, 静かでした, 高くないです, and 静かではなかった so form tables can mark the observed cell while preserving the original tappable token surface. These repairs require contiguous target-text spans and trusted verb/adjective metadata or clear adjective part-of-speech notes; core does not guess arbitrary morphology from unrelated tokens.

Package Boundary

The public package set is intentionally split:

  • @edwinho/kotoba-core owns framework-neutral language profiles, draft contracts, validators, normalizers, cache/version helpers, and learning-entry utilities.
  • @edwinho/kotoba-gemini owns Gemini prompt/schema/provider logic and requires a caller-provided Gemini API key.
  • @edwinho/kotoba-cli is a terminal consumer of the public packages. It sends the user's input text to Gemini through @edwinho/kotoba-gemini using the user's Gemini API key.
  • App integrations own runtime policy, persistence, and product-specific behavior around these contracts.

Examples

Normalize a cloud translation result:

import { normalizeTranslationDraft, sanitizeEnrichmentData } from "@edwinho/kotoba-core";

const { enrichment, droppedSections } = sanitizeEnrichmentData(providerPayload.enrichment);

const draft = normalizeTranslationDraft(
  {
    targetLanguage: "ja",
    sourceLanguage: "en",
    sourceText: "I'm hungry",
    targetText: "お腹が空きました。",
    readingSegments: [{ text: "お腹", reading: "おなか" }],
    romanization: "onaka ga sukimashita",
    translationText: "I'm hungry.",
    register: "polite",
    enrichment,
    studyTokens: providerPayload.studyTokens,
  },
  {
    source: "cloud",
    canRegenerateWithCloud: true,
  }
);

console.log(draft.completeness, draft.capabilities, droppedSections);

Resolve language behavior:

import {
  detectDirection,
  resolveActiveLanguageContext,
  resolveLanguageProfile,
  resolveTTSLocale,
} from "@edwinho/kotoba-core";

const context = resolveActiveLanguageContext({
  learningLanguage: "zh",
  chineseVariant: "cantonese-traditional",
});

const profile = resolveLanguageProfile(context.learningLanguage);
const inputMode = detectDirection("我肚餓", context.learningLanguage);
const ttsLocale = resolveTTSLocale(
  context.learningLanguage,
  context.chineseDisplayScript ?? undefined,
  context.chineseVariant ?? undefined
);

console.log(profile.defaultScript, inputMode, ttsLocale);

Generate Japanese form tables from trusted metadata:

import { generateJapaneseFormTable } from "@edwinho/kotoba-core";

const table = generateJapaneseFormTable({
  language: "ja",
  category: "morphology",
  kind: "verb",
  surface: "飲んだ",
  lemma: "飲む",
  verbClass: "godan-mu",
  observedForm: "past",
  confidence: "high",
});

console.log(table?.coreRows);
// [
//   {
//     key: "non-past",
//     label: "Non-past",
//     plain: { value: "飲む" },
//     polite: { value: "飲みます" },
//   },
//   {
//     key: "past",
//     label: "Past",
//     plain: { value: "飲んだ", observed: true, note: "Seen here" },
//     polite: { value: "飲みました" },
//   },
//   ...
// ]

console.log(table?.otherRows);
// [
//   { label: "Te-form", value: "飲んで" },
//   { label: "Potential", value: "飲める" },
// ]

By default, only high-confidence metadata generates a table. Medium-confidence metadata can be enabled explicitly:

generateJapaneseFormTable(metadata, { minConfidence: "medium" });

Form tables are deterministic once metadata is accepted. They should be treated as grammar support generated from validated metadata, not as a standalone Japanese dictionary. Consumers should prefer high-confidence metadata for learner-facing surfaces and should hide or soften tables when metadata is missing, low-confidence, or unsupported.

Resolve Korean as a future-language fixture:

import { normalizeTranslationDraft, resolveLanguageProfile } from "@edwinho/kotoba-core";

const profile = resolveLanguageProfile("ko");

const draft = normalizeTranslationDraft(
  {
    targetLanguage: "ko",
    sourceLanguage: "en",
    sourceText: "I'm going now",
    targetText: "저 지금 가요.",
    readingSystem: "revised_romanization",
    readingSegments: [
      { text: "저", reading: "jeo" },
      { text: "지금", reading: "jigeum" },
      { text: "가요", reading: "gayo" },
    ],
    romanization: "jeo jigeum gayo",
    translationText: "I'm going now.",
    register: "polite",
  },
  {
    source: "cloud",
    canRegenerateWithCloud: false,
  }
);

console.log(profile.defaultTTSLocale, draft.readingSystem, draft.register);

New-Language Checklist

Add future languages through bounded metadata and tests rather than changing the TranslationDraft field shape:

  1. Add profile metadata in src/languages/languageProfiles.ts, including scripts, reading systems, locale metadata, register support, and variant support.
  2. Add script metadata to ScriptCode and reading metadata to ReadingSystem or a language-specific reading-system alias.
  3. Add detection fixtures for detectDirection when the target script can be detected locally.
  4. Add provider prompt guidance and response-schema additions in the provider package for language-specific reading, romanization, register, and enrichment expectations.
  5. Add normalization and sanitizer fixtures that prove language-specific metadata survives and unsupported fields are dropped.
  6. Add public API tests for profile resolution and any new runtime exports.
  7. Add product work separately if the language should appear in settings, navigation, add phrase, library, TTS/STT, or persistence flows.

Verification

Run package-level checks from the repository root:

bun run --cwd packages/core typecheck
bun run --cwd packages/core test
bun run --cwd packages/core build

From the repository root, bun run build, bun run typecheck, and bun run test run all package checks.