bn-translit

v0.1.1

Published

2 months ago

Bidirectional Bangla ↔ Banglish transliteration with slugify support. Use Banglish→Bangla for search, Bangla→Banglish for SEO and URL slugs.

0High
0Medium
0Low

shahjalal.bu

bangla bengali banglish transliteration translit romanization slug slugify seo search

bn-translit

Bidirectional Bangla ↔ Banglish transliteration with slug & search helpers. Type "kalam" → match "কালাম". Convert "জম জম পর্দা" → slug jom-jom-porda.

A small, dependency-free TypeScript library that handles two complementary problems Bangla apps run into:

Search — users type Banglish (Roman) but data is stored in Bangla. Use toBangla() or searchPattern() to make kalam match কালাম.
SEO & URL slugs — content is in Bangla but URLs need ASCII. Use toBanglish() or slugify() to turn জম জম পর্দা into jom-jom-porda.

🌐 Live demo →

Install

npm install bn-translit
# or
bun add bn-translit
# or
pnpm add bn-translit

Works in Node ≥ 18, Bun, Deno, and modern browsers. Zero runtime dependencies, ships with full TypeScript types.

React Native: works with Metro bundler (the package is CommonJS, which Metro handles via interop). Pure-JS code with no native modules — drop in and use. See the RN example below.

Quick start

import { toBangla, toBanglish, slugify, searchPattern } from "bn-translit";

toBangla("kalam");                 // "কালাম"
toBangla("abdul khan");            // "আবদুল খান"

toBanglish("কালাম");                // "kalam"
toBanglish("জম জম পর্দা");          // "jom jom porda"

slugify("জম জম পর্দা, ফেনী");        // "jom-jom-porda-phenee"
slugify("Md. Shahjalal Khan");     // "md-shahjalal-khan"

searchPattern("kalam");            // "kalam|কালাম"  (regex-ready)

Why this exists

If you've ever shipped an app for a Bangla-speaking audience, you've hit at least one of these problems. They all share the same root cause: Bangla is written in a non-Latin script, but most input methods, URL standards, and search infrastructure assume ASCII.

Problem 1 — Search across input methods

A pharmacy / POS / CRM stores customers in Bangla:

{ name: "কালাম বেডিং, ফেনী", phone: "01700000000" }
{ name: "রহমান এন্টারপ্রাইজ",  phone: "01711111111" }

Users in the field on Android phones rarely have a Bangla keyboard — they type kalam in English and expect to find the customer. A naive WHERE name LIKE '%kalam%' returns zero results because kalam and কালাম share no characters.

The fix is to normalize the search query so it matches both forms:

Server-side: pass the search term through searchPattern() and feed it into MongoDB's $regex (or any other regex matcher).
Or pre-compute a Banglish copy of every record at write time (toBanglish(name)) and search the indexed Banglish field directly.

This matters most on mobile, where Bangla typing is slow and many users type Banglish out of habit even when a Bangla keyboard is available.

Problem 2 — URL slugs and SEO

Bangla content needs ASCII URLs:

Most CDNs, analytics tools, and link-shorteners deal poorly with percent-encoded UTF-8 (জম%20জম%20পর্দা is valid but ugly).
Search engines extract keywords from URLs — Romanized slugs let English-speaking crawlers index Bangla pages.
Users copy-paste links into apps that mangle non-ASCII.

slugify("জম জম পর্দা") gives you jom-jom-porda — readable, sharable, and stable.

Problem 3 — Cross-script duplicate detection

If you accept user-submitted data, the same store might be entered as both "কালাম বেডিং" and "Kalam Beding". Running both through toBanglish gives you a normalized form you can compare or use as a unique key.

Why not just use a Bangla collation?

Database collations like MongoDB's bn locale handle sorting and case folding, but they don't transliterate. A kalam query still won't match কালাম — the collation only kicks in once you're already operating on the same script. This library bridges that gap.

API

`toBangla(input)`

Convert Banglish (Roman phonetic) text to Bangla.

| Input | Output | | -------------- | ------------ | | kalam | কালাম | | khan | খান | | abdul | আবদুল | | bangla | বাংলা | | shapla | শাপলা | | dhan | ধান |

Conventions:

Single a after a consonant becomes the visible vowel sign া. Type the consonant alone (no a) for the inherent (silent) "অ" sound.
Two consonants in a row do not auto-insert halant — they sit side-by-side with their inherent vowels. To force a conjunct, type the same consonant twice (kk → ক্ক, tt → ত্ত).
Digraphs are recognized greedily: kh → খ, gh → ঘ, ch → চ, chh → ছ, jh → ঝ, th → থ, dh → ধ, ph → ফ, bh → ভ, sh → শ, ng → ং.
Long vowels: aa → আ/া, ee → ঈ/ী, oo → ঊ/ূ, oi → ঐ/ৈ, ou → ঔ/ৌ.
Digits, spaces, punctuation pass through unchanged.

`toBanglish(input, options?)`

Convert Bangla text to Banglish (Roman).

toBanglish("কালাম");                            // "kalam"
toBanglish("রহমান");                           // "rohoman"
toBanglish("রহমান", { schwaDeletion: false });  // "rohomano"
toBanglish("কালাম", { inherentVowel: "a" });    // "kalam"
toBanglish("হুমায়ূন আহমেদ");                    // "humayoon ahomed"

Options:

| Option | Type | Default | Description | | ---------------- | --------- | ------- | ---------------------------------------------------------------------------------------------------- | | schwaDeletion | boolean | true | Drop the inherent vowel from word-final consonants (so রহমান ends as …man rather than …mano). | | inherentVowel | string | "o" | Roman letter for a bare consonant's vowel. Common alternatives: "a" (more English-feeling output). |

The library auto-composes Bengali nukta letters (য়, ড়, ঢ়) — these are explicitly excluded from Unicode NFC composition (UAX #15 §5.7), so we handle them manually so input from any source decomposes consistently.

`slugify(input, options?)`

Build a URL-safe slug. Internally calls toBanglish then strips non-ASCII.

slugify("জম জম পর্দা");                        // "jom-jom-porda"
slugify("Naïve café résumé");                  // "naive-cafe-resume"
slugify("hello world", { separator: "_" });    // "hello_world"
slugify("a b c d e f", { maxLength: 5 });      // "a-b-c"  (truncated at last separator)

Options (extends ToBanglishOptions):

| Option | Type | Default | Description | | ----------------- | --------- | ------- | ----------------------------------------------------------------- | | separator | string | "-" | Joiner between words. | | lowercase | boolean | true | Lowercase the result. | | stripDiacritics | boolean | true | Remove combining marks (NFD-normalize and strip). | | maxLength | number | — | Truncate at the last separator before this length. |

`searchPattern(input)`

Returns a regex-pattern string that matches the input AND its transliteration. Suitable for new RegExp(...) or MongoDB $regex.

searchPattern("kalam");   // "kalam|কালাম"
searchPattern("কালাম");   // "কালাম|kalamo|kalam"   (deduped)
searchPattern("0170");    // "0170"                 (no transliteration for digits)
searchPattern("");        // ""

Special regex characters in the input are escaped, so user input like "test.com" matches the literal dot rather than any character.

Use cases

1. Server-side search (MongoDB)

import { searchPattern } from "bn-translit";

router.get("/customers", async (req, res) => {
  const q = req.query.q as string;
  const pattern = searchPattern(q);
  const filter = pattern
    ? {
        $or: [
          { name: { $regex: pattern, $options: "i" } },
          { organizationName: { $regex: pattern, $options: "i" } },
        ],
      }
    : {};
  res.json(await Customer.find(filter).limit(50));
});

Now both "kalam" (typed by mobile users) and "কালাম" (typed by desktop users with a Bangla keyboard) hit the same record.

2. Storing a Banglish copy for full-text indexing

import { toBanglish } from "bn-translit";

CustomerSchema.pre("save", function () {
  this.banglishName = toBanglish(this.name);
});

// then index { banglishName: 1 } and search the user's English query
// against banglishName directly — no regex tricks needed.

3. SEO-friendly URLs

import { slugify } from "bn-translit";

const article = {
  title: "জম জম পর্দা — এক বছরের অভিজ্ঞতা",
  slug: slugify("জম জম পর্দা — এক বছরের অভিজ্ঞতা"),
  // → "jom-jom-porda-ek-bochhorer-obhijnota"
};

4. Client-side search (e.g. React Native)

When you already have data on the device and want instant filtering as the user types — no server roundtrip:

import { useMemo, useState } from "react";
import { TextInput, FlatList, Text } from "react-native";
import { searchPattern } from "bn-translit";

const customers = [
  { id: 1, name: "কালাম বেডিং" },
  { id: 2, name: "রহমান এন্টারপ্রাইজ" },
  { id: 3, name: "জম জম পর্দা" },
];

export function CustomerList() {
  const [q, setQ] = useState("");

  const filtered = useMemo(() => {
    const pattern = searchPattern(q);
    if (!pattern) return customers;
    const re = new RegExp(pattern, "i");
    return customers.filter((c) => re.test(c.name));
  }, [q]);

  return (
    <>
      <TextInput
        value={q}
        onChangeText={setQ}
        placeholder="নাম খুঁজুন / Search"
      />
      <FlatList
        data={filtered}
        keyExtractor={(c) => String(c.id)}
        renderItem={({ item }) => <Text>{item.name}</Text>}
      />
    </>
  );
}

Typing kalam filters down to "কালাম বেডিং" — even though no character of the query matches the data byte-for-byte.

Conventions & limitations

This is a phonetic, heuristic transliteration — not a linguistically exact one. Bangla orthography has historical irregularities that no rule-based system handles perfectly.

The mapping is lossy in both directions:

| Bangla letters that share one Banglish form | Why | |---|---| | স / শ / ষ ← all s or sh | All voice-less sibilants in modern speech | | ত / ট ← both t | Dental vs retroflex distinction is hard for English speakers | | দ / ড ← both d | Same | | ন / ণ / ঞ ← all n | Same | | র / ড় ← both r | Same | | জ / য ← both j | Same |

The library is biased toward search and slug usefulness, not authentic spelling. If you need precise transliteration (e.g., academic papers, IAST/ITRANS schemes), use a linguistically-aware tool.

Schwa deletion is heuristic — applied only at word ends. Real Bangla applies it more broadly mid-word, which requires a pronunciation dictionary outside the scope of this library.

For best search results, consider feeding both the original query and a transliterated form into your matcher (the searchPattern helper does this for you) — that way users typing imperfect Banglish still hit the right records.

Building & testing

npm install
npm test       # node:test runner with tsx loader (28 tests)
npm run build  # tsc → dist/

The package builds to CommonJS (dist/index.js) with .d.ts types.

Comparison with similar packages

bangla-to-banglish — covers only one direction. This library does both and adds slug + search-pattern helpers for the most common application use cases.
Avro Phonetic keyboard — a typing system, not a programmatic API. The conventions here are inspired by Avro but tuned for search/slug use rather than precise text input.

Contributing

Issues and PRs welcome. Please include test cases that demonstrate the expected behavior — character mappings can be opinionated, so context helps us judge whether a change is a fix or a regression for someone else's setup.

License

MIT — see LICENSE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

bn-translit

Install

Quick start

Why this exists

Problem 1 — Search across input methods

Problem 2 — URL slugs and SEO

Problem 3 — Cross-script duplicate detection

Why not just use a Bangla collation?

API

toBangla(input)

toBanglish(input, options?)

slugify(input, options?)

searchPattern(input)

Use cases

1. Server-side search (MongoDB)

2. Storing a Banglish copy for full-text indexing

3. SEO-friendly URLs

4. Client-side search (e.g. React Native)

Conventions & limitations

Building & testing

Comparison with similar packages

Contributing

License

`toBangla(input)`

`toBanglish(input, options?)`

`slugify(input, options?)`

`searchPattern(input)`