npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@nerdbond/text

v0.0.2

Published

<br/> <br/> <br/> <br/> <br/> <br/> <br/>

Downloads

4

Readme

Overview

This library aims to be a way of converting text in all kinds of writing systems to a consistent and stable ASCII encoding, which can then further be processed into a more readable form. It should be able to take text mixing various scripts and isolate out the parts to romanize in as best a way as possible.

There are many languages which use the same script in slightly different ways. For example, Vietnamese uses the Latin alphabet with all kinds of specialized diacritics, same with Chinese Pinyin. And Arabic is used in various forms such as Standard Arabic, Persian, and Urdu, amongst others. So in these cases, given an arbitrary chunk of text which we don't know the encoding for, it can only do a rough approximation of a guess (like it's a Latin or Arabic script, not knowing if it's Vietnamese vs. Finnish vs. Icelandic, etc.).

When we know the encoding of the text, such as given some Icelandic text, we can write a custom handler for transliterating that as best as we can. So we have two entrypoints:

  1. Unknown text
  2. Known text

If we know the type of text and system it's written in, we can potentially add a parser for that. Otherwise it falls back to a more generic parser like the Latin parser.

Some languages have very good transliteration capabilities, such as the many Indic scripts used for just one or a few languages (like Tamil, or Thai, or Sinhala for example). These languages can be transliterated fairly well. But given Yoruba or Vietnamese, without knowing it's one of those langugaes, we won't be able to get super close in terms of pronunciation automatically, you need to tell it to use those specific parsers.

Installation

pnpm add @nerdbond/text
yarn add @nerdbond/text
npm i @nerdbond/text

Usage

You can use this library to process text in a few steps:

  1. Convert written text in various languages to ASCII chat text (seed chat text).
  2. Convert that ASCII chat text to diacritic-rich chat text (rose chat text).
  3. Or convert the ASCII text to simplified chat text (bird chat text), which loses the pronunciation factors but makes it easy on the eyes.
import text from '@nerdbond/text'

text.tibetan.make('འཁངས') // => khaq

Make it seemingly human readable:

import text from '@nerdbond/text'
import chat from '@nerdbond/chat'

chat.read(text.tibetan.make('འཁངས')) // => khang

Find out what script some text is from:

import text from '@nerdbond/text'

text.find('कल्पना') // => { form: 'devanagari', rank: 1 }
text.rank('कल्पना') // gives back more than one language if apparent.

TODO

Take mixed script writings and transliterate them as best as possible.

import text from '@nerdbond/text'

text.make('कल्पनाའཁངས')

License

MIT

NerdBond

This is being developed by the folks at NerdBond, a California-based project for helping humanity master information and computation. NerdBond started off in the winter of 2008 as a spark of an idea, to forming a company 10 years later in the winter of 2018, to a seed of a project just beginning its development phases. It is entirely bootstrapped by working full time and running Etsy and Amazon shops. Also find us on Facebook, Twitter, and LinkedIn. Check out our other GitHub projects as well!